How to delete files from pool when "out of swap" error crashes processes on boot

Billabong

Dabbler
Joined
Feb 9, 2022
Messages
10
I have a jail running Transmission and a pool that it writes to. I accidentally added too many torrents (total file size exceeding the capacity of the pool) to Transmission, and that crashed my NAS. Now, when I boot the device, just after the pool is loaded, a bunch of "out of swap" errors occur, which causes a bunch of important processes to be killed (e.g. sh, python). So I can't use the web console or shell access as normal.

How could I go about remedying this? Is there a way I could still access the pool to delete files?
 

WN1X

Explorer
Joined
Dec 2, 2019
Messages
77
Have you tried SSH or console access?
 

Billabong

Dabbler
Joined
Feb 9, 2022
Messages
10
> So I can't use the web console or shell access as normal.

The only type of shell I know I have access to is when I boot in single-user mode. From there, I can import and mount my pool, but none of the files seem to be there... Any thoughts?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Did you accidentally put the torrent files in the boot pool?
Or the temporary location for the torrent files in the boot pool?

In general, filling a data pool should not cause a crash. But, if the boot pool filled up, and the GUI tried to launch, I could see a problem occurring.
 

Billabong

Dabbler
Joined
Feb 9, 2022
Messages
10
Are jails installed in the boot pool by default? If not, I don't think it's possible. If so... I _have_ previously observed a behaviour where Transmission won't persist my default location setting, so when the jail is reset, that location is reverted to its "local" storage. But I didn't notice it occurring this time (could have missed it I suppose).
 

Billabong

Dabbler
Joined
Feb 9, 2022
Messages
10
Bumping with summary of questions so far:

1. Is it possible to delete files from a pool by whatever access can be gained outside of the normal processes (web console, normal shell access; both processes are killed by an out-of-swap error)? I have so far managed to mount the main pool in single-user mode, but I see no files at the mountpoint except the home directories of the various users.

2. What is the mechanism by which accidentally filling a pool could cause out-of-swap process killings that persist through reboot?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I was under the impression that an out-of-swap error was related to basically out of RAM and the system is swapping RAM to the SWAP space on the pool, not because the pool got too full. I could see this causing the system to crash. Maybe I've just never heard of this kind of crash before so i could learn something here. How much RAM do you have?

So to think outside the box here, you could remove your boot drive, bootstrap to a clean install of TrueNAS, then import your pool, then you will be able to delete files as you desire from the CLI. Once all done, bootstrap from your original TrueNAS boot device and hopefully all will be good in the world.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
If you've truly reached the point of a 100% full pool, then you've put yourself into a rather precarious position, as you've discovered that a 100% full pool will often refuse to mount or panic a system where it's mounted.

If it's mountable in single-user mode without crashing the system, that's a good start.

Once you've mounted the data pool, look to see if it's been mounted in an unusual location via zfs get mountpoint - we can then try going to that directory to see what you can see.

If you can't delete files via rm, you can try truncating them

echo > /mnt/path/to/file/to/delete

or

cat /dev/null > /mnt/path/to/file/to/delete

Check for things like snapshots existing as well - you'll need to take out a file that's outside a snapshot.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
So after looking at my posting above I'm rethinking this problem. I'm still thinking that you are running out of RAM. I think your torrents/VM's are using too much RAM and you are thus using SWAP Space. If you run out of SWAP Space then your system crashes.

But I also recommended that you boot from a different TrueNAS boot device and import the pool, do not restore your configuration file. This "should not" run any jails/VM's and thus just give you access to your pool. You can then delete any files you desire IF your pool is in fact full.

I'm thinking the error I made was in stating that you should be able to reboot your original TureNAS boot device and all would be good. Well if the problem really is a VM gone wild and using up all your RAM, this alone I don't think will fix it. You may need to manually remove/disable the VM. I could remove the VM but I have no idea how to disable it before the GUI boots up. Maybe someone has an idea?

So hopefully the issue is only as simple as a full pool and not running out of RAM.
 

Billabong

Dabbler
Joined
Feb 9, 2022
Messages
10
Sorry, busy couple of days, so I'm slow to get to replying here.

Big development. I tried restarting the machine again, and this time it just... booted like normal. Aside from having to manually start the Transmission jail, nothing out of the ordinary... memory usage by the jail is 1.3 GB out of the 2 allocated to it (according to top). No data loss.

No idea why it started working this time, when it was consistently failing in the same way previously. I did nothing substantial while poking around in single-user mode... I think. Just setting the fs to not readonly so I could mount the pool.

Anyway, I'll respond to the individual points in case they help someone down the line. And then I guess I have to figure out how wtf to do now that I'm through the 50 TB of storage in my NAS, which physically cannot support more HDDs :smile:
How much RAM do you have?
8 GB
So to think outside the box here, you could remove your boot drive, bootstrap to a clean install of TrueNAS, then import your pool, then you will be able to delete files as you desire from the CLI. Once all done, bootstrap from your original TrueNAS boot device and hopefully all will be good in the world.
Yeah, this is a good idea. I'll probably try this if I get this error again and can't resolve it as I did this time.
Once you've mounted the data pool, look to see if it's been mounted in an unusual location via zfs get mountpoint - we can then try going to that directory to see what you can see.
The mount point is right in the root directory: /poolname. I was doing some fiddling previously, but I don't think I changed that.
If you can't delete files via rm, you can try truncating them

echo > /mnt/path/to/file/to/delete

or

cat /dev/null > /mnt/path/to/file/to/delete

Check for things like snapshots existing as well - you'll need to take out a file that's outside a snapshot.
It's really just that the files don't appear where I mounted the pool. Like, in /poolname, all I see is some home directories for the users I created, with the default files that get put there. None of the actual files I downloaded with transmission.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Hmm, I may know why you are not having the problem:
The mount point is right in the root directory: /poolname. I was doing some fiddling previously, but I don't think I changed that.
The normal location of pools is;
/mnt/poolname
This means that if the problematic task / jail / etc... was expecting /mnt/poolname and can't see it, it's failing. Thus not causing the problematic behavior that leads to the crash.

Of course, I could be wrong...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
So, 8GB RAM is not much RAM for TrueNAS when you are running jails or any VM's. Look in the GUI at SWAP space, hopefully you will be able to see some history on what it was doing. If it's not zero then you are using SWAP space which is not good. A few K bytes is fine but if you are seeing GB's then you likely did run out of SWAP space and that caused the crash. And as I recall, SWAP space is not ZFS so it's possible for data corruption on these partitions, however if routine SMART long tests pass then I wouldn't suspect this to be the case. (Note: SWAP space is not part of your pool, it's a seperate partition normally created across all your VDEV drives, the default as I recall is still 2GB per drive.)

Also, when you start up any jails/VM's, look at the SWAP space periodically and remember that a properly designed system will not use SWAP space. If you do find that you are using SWAP space, you will need to increase your RAM. If you only started having this issue since you upgraded to a newer TrueNAS version, you might need to revert to the older version if you cannot upgrade your RAM.

And of course this is all pure speculation on my part.

Keep us posted on what you find out.
 
Top