Thanks... with this new knowledge and the nightly, it would be useful if you could identify the reproducable bug(s) that can get fixed.
I actually haven't been able to reproduce this with any consistency. I suspect I must have overworked the drives and hung it (kubernetes, and the zfs controller somehow) by creating a bunch of VM's and having their installers all running squashfs at the same time (or actually I mean writing their file systems at the same time).
Two or three out of five or so tries Kubernetes will hang if I follow that same pattern (as rapidly as possible), which is
- install
- boot
- wipe two drives
- create mirror pool
- set the pool to sync
- create three ubuntu vms, from the same iso, uploading iso through the ui for the first one
- launch vnc for each one simultaneously
- quickly run through the interactive portion of the installations
- browse to Apps
- attempt to launch an app
if/when Kubernetes does hang, reboots won't fix it, the only fix I've found is to "unset" the pool, then "choose pool" again.
Honestly though the Kubernetes implementation in Apps isn't going to work for us. We are going to begin directing our hardware towards testing "rolling our own" kubernetes installations with worker vm's running on top of multiple instances of SCALE. Which is much like what we've done in the past, except in the past the Freenas itself was a VM and peer to the worker nodes, and handled the storage for the hypervisor.
We've done it on the ix minis plenty as well, passing the sata controllers through to Freenas/Truenas instance usind intel iommu (vfio), which works quite well, perhaps even better than the LSI HBA's.
Using SCALE saves us a single VM per node, and may even award a small performance boost due to less context switching. The only downside here is that we didn't always deploy
everything to a vm or to a cluster, typically we would run a jail or plugin or two as well for a few basic apps, just for convenience. Some stateful apps that require lot of good fast storage and don't require high availability are just far simpler to throw in a jail. And the apps module in SCALE doesn't seem to be a suitable replacement to that. Not even close.
I suppose the other downside is that in our current stack we are accustomed to using libvirt, and all tools that work with it. On our larger deployments (mostly DELL Poweredges) we may have to keep thing the way they are for awhile. I honestly hate having to deal with the HBA controllers though. Sometimes they go bad, sometimes the slots go bad, sometimes there's a firmware bug, etc etc. None of that stuff happens often but it bites when it does.