Kubernetes not coming up on a 21.06-BETA

inman.turbo · Aug 11, 2021

Anyone else facing this issue?

I need to run some more tests and try a few more installs, but so far kubernetes is just stuck at "not ready",

Setup is supermicro A2SDi-8c. 250 G Samsung Evo m.2/nvme as the boot drive. Only using the first SAS port, with a mirror of 10 G TB WD reds at the end of a forward breakout cable.

64G ECC RAM and I've opted for no swap during the installation. I'll post more info when I get a chance.

I should note that everything aside from the "Apps" module seems to be working perfectly.

morganL · Aug 11, 2021

Thanks.

It would be good to know what steps you took to get Kubernetes started and what errors you are getting when trying to start a simple App?

inman.turbo · Aug 11, 2021

morganL said:
It would be good to know what steps you took to get Kubernetes started and what errors you are getting when trying to start a simple App?

I didn't take any steps on my own to get Kubernetes started. I let truenas handle that. And the first time I try to create an app I get "Kubernetes service isn't running" error in a pop up alert in the ui. Oddly there's no kubernetes service listed in the services section of the UI as one would expect after such an announcement.

I tried a again and it let me create a Minio app, and deploy an apline:latest container, however they get stuck forever.

Anyway there's been a new development here. I "unset" the pool in the Apps module from within the UI, then set it again to the same pool and kubernetes system pods started coming up.

Once I saw that the openebs-zfs pods where ready I tried again, and was able to get minio up and running! The only change I made in between is I change the ix-applications dataset from inherited sync=always to sync=standard. IDK if that was related but I will try and cycle again through all the steps I've taken to see if reproduction is possible. I'll wipe the pool first and try it without the sync as well in another cycle. I'll post clear steps if I'm able to isolate anything.

Another thing to note though: I still can't get the alpine pod up. I've tried without filling in any unrequired options during launch, as well as different variations of volume/port/privileged/unprivileged etc.

It seems the system tries to create a container called "ix-chart", but then gets stuck in "back off restarting failed container".

inman.turbo · Aug 11, 2021

Originally I had set sync to always on the pool (I was going to run some tests with sync and random writes in a VM). I'm not sure if that was clear. Again I have no idea if that may have been what caused the issue.

guyp2k · Aug 11, 2021

I always have this issue after applying an update/nightlies, I have to reboot again after the update and K8s starts....Haven't really done any debugging given it's in beta....If a debug would help after an update when I have the issue I will be glad to open a jira.

Just applied an update and k8s will not start, tried to get a debug and the debug hung at 25%/Dump Kubernetes Information.

inman.turbo · Aug 11, 2021

guyp2k said:
Just applied an update and k8s will not start, tried to get a debug and the debug hung at 25%/Dump Kubernetes Information.

Have you tried unsetting/ then resetting the apps pool? Strangely this has worked for me 3 times in a row on the beta.

guyp2k · Aug 11, 2021

inman.turbo said:
Have you tried unsetting/ then resetting the apps pool? Strangely this has worked for me 3 times in a row on the beta.

I assume that would wipe out/reset K8s resulting in deleting the apps I have installed....

An additional reboot after an update usually addresses the problem.

inman.turbo · Aug 11, 2021

guyp2k said:
I assume that would wipe out/reset K8s resulting in deleting the apps I have installed....

I you keep the same pool everything is stored in the ix-applications dataset. Unsetting the pool won't wipe it out. Resetting will only trigger a redeploy of all your apps. I just tried it with three apps and everything remains intact.

However if your apps contain important data .. I can't guarantee no data loss.

guyp2k · Aug 11, 2021

inman.turbo said:
I you keep the same pool everything is stored in the ix-applications dataset. Unsetting the pool won't wipe it out. Resetting will only trigger a redeploy of all your apps. I just tried it with three apps and everything remains intact.

However if your apps contain important data .. I can't guarantee no data loss.

Thanks for that and will give a try, it's a test box so NP.

morganL · Aug 11, 2021

inman.turbo said:
I you keep the same pool everything is stored in the ix-applications dataset. Unsetting the pool won't wipe it out. Resetting will only trigger a redeploy of all your apps. I just tried it with three apps and everything remains intact.

However if your apps contain important data .. I can't guarantee no data loss.

Thanks... with this new knowledge and the nightly, it would be useful if you could identify the reproducable bug(s) that can get fixed.

inman.turbo · Aug 11, 2021

morganL said:
it would be useful if you could identify the reproducable bug(s) that can get fixed.

I'm working on it. The POST is very long on the supermicro so I'll have to wait till I get more time to sit through it and try the install again from scratch.

inman.turbo · Aug 11, 2021

morganL said:
Thanks... with this new knowledge and the nightly, it would be useful if you could identify the reproducable bug(s) that can get fixed.

I actually haven't been able to reproduce this with any consistency. I suspect I must have overworked the drives and hung it (kubernetes, and the zfs controller somehow) by creating a bunch of VM's and having their installers all running squashfs at the same time (or actually I mean writing their file systems at the same time).

Two or three out of five or so tries Kubernetes will hang if I follow that same pattern (as rapidly as possible), which is

install
boot
wipe two drives
create mirror pool
set the pool to sync
create three ubuntu vms, from the same iso, uploading iso through the ui for the first one
launch vnc for each one simultaneously
quickly run through the interactive portion of the installations
browse to Apps
attempt to launch an app

if/when Kubernetes does hang, reboots won't fix it, the only fix I've found is to "unset" the pool, then "choose pool" again.

Honestly though the Kubernetes implementation in Apps isn't going to work for us. We are going to begin directing our hardware towards testing "rolling our own" kubernetes installations with worker vm's running on top of multiple instances of SCALE. Which is much like what we've done in the past, except in the past the Freenas itself was a VM and peer to the worker nodes, and handled the storage for the hypervisor.

We've done it on the ix minis plenty as well, passing the sata controllers through to Freenas/Truenas instance usind intel iommu (vfio), which works quite well, perhaps even better than the LSI HBA's.

Using SCALE saves us a single VM per node, and may even award a small performance boost due to less context switching. The only downside here is that we didn't always deploy everything to a vm or to a cluster, typically we would run a jail or plugin or two as well for a few basic apps, just for convenience. Some stateful apps that require lot of good fast storage and don't require high availability are just far simpler to throw in a jail. And the apps module in SCALE doesn't seem to be a suitable replacement to that. Not even close.

I suppose the other downside is that in our current stack we are accustomed to using libvirt, and all tools that work with it. On our larger deployments (mostly DELL Poweredges) we may have to keep thing the way they are for awhile. I honestly hate having to deal with the HBA controllers though. Sometimes they go bad, sometimes the slots go bad, sometimes there's a firmware bug, etc etc. None of that stuff happens often but it bites when it does.

morganL · Aug 12, 2021

Thanks for the write-up. @inman.turbo
Report-a-bug and capture the debugs would make sense if you want to fix.
If you would like tweaks to enable your architecture supported better, try to identify the key issues. Right now, its not a clustered K8s model.
In 2022, we are planning to provide clustered Kubernetes.... and have Kubernetes manage the VMs. Apps can then be deployed with a unified API.

tianyaxun · Aug 24, 2021

I have this issue too.
Kubernetes not coming up on a 21.06-BETA1，

morganL · Aug 25, 2021

Have you enabled Kubernetes 1st?
Then you install applications.

inman.turbo · Aug 25, 2021

morganL said:
Have you enabled Kubernetes 1st?
Then you install applications.

Can you explain what you mean by enable Kubernetes? There is no way to enable or disable Kubernetes service directly from the UI, as far as I can tell. It is not listed as a service. You must select a pool for Kubernetes to use in order to use Apps but, on a fresh install, if you haven't selected a pool yet, an alert will popup directing you to do so.

The only way to enable/disable Kubernetes is to set or unset the storage pool, which doesn't actually completely stop/start Kubernetes from running, as far as I can tell. It does seem to enable/disable some of the neworking needed to use the service. So essentially it just cripples it.

morganL · Aug 26, 2021

So the ix-applications pool is set-up
the kubernetes settings and network are configured.
Under systems settings, services, I thought there was a Kubernetes status indicator (Apologies, i don't have access to a system right now).
Otherwise, it would require cli to check kubernetes status.

inman.turbo · Aug 26, 2021

morganL said:
Under systems settings, services, I thought there was a Kubernetes status indicator (Apologies, i don't have access to a system right now).

Yeah that was what I was thinking at first, at least that is what you would come to expect when accustomed to working within the UI. It is the typical pattern, and it would be nice.

Kubernetes is a pretty complex beast though so I guess it would be pretty difficult to control expected behavior with a toggle switch in the ~~settings~~ services table. The things is it would have to trigger a series of jobs when toggled, which could lead to the switch getting spammed and cause further complications, I suppose. It could maybe work if the switch was grayed out until the previous on/off job was completed.

guyp2k · Aug 26, 2021

Still same issue w/ me, just updated just now to latest nightly and will not start, on my 3rd reboot.....If you need a debug or more specifics let mw know.

3rd reboot was the ticket :)

TrueNAS-SCALE-21.08-MASTER-20210826-232919

morganL · Aug 27, 2021

guyp2k said:
Still same issue w/ me, just updated just now to latest nightly and will not start, on my 3rd reboot.....If you need a debug or more specifics let mw know.

3rd reboot was the ticket :)

TrueNAS-SCALE-21.08-MASTER-20210826-232919

If it reliable going forward a debug would not be useful... but if it fails after another reboot, it might be useful to submit a bug report. Obviously, its very hard to work out what is different about your system that causes the issue.

Important Announcement for the TrueNAS Community.

Kubernetes not coming up on a 21.06-BETA

Contributor

Captain Morgan

Contributor

Contributor

Dabbler

Contributor

Dabbler

Contributor

Dabbler

Captain Morgan

Contributor

Contributor

Captain Morgan

Dabbler

Captain Morgan

Contributor

Captain Morgan

Contributor

Dabbler

Captain Morgan

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Kubernetes not coming up on a 21.06-BETA"

Similar threads