Pool stuck on export

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
Hi,

I had an issue with a pool which I did not have in active use and anything on it is not important atm. A disk has a number of ZF Errors.

So I tried removing the offending disk from the pool. I got the error that it's not possible to remove the disk on the current state.

So I decided I'll remove the pool and just rebuild it. Since the dataset that is on it no longer showed up.

Now removing the pool is stuck. Probably due to something I did. How do I figure out what the issue is and what to do to resolve it. The UI is EXTREMELY lacking in the information department. There seem to be few useful logs as far as I can find.

Can anyone point me to where to look and/or how to resolve it. Although a solution to the issue would be nice I would prefer to know how to figure out what the actual issue is so I can debug and resolve future issues myself.

Thank you in advanced,

Davy Vaessen
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
You might already know this or have done all this in your troubleshooting but take this carefully as you could really mess things up..... I will give you my thoughts. I believe at this point the errors on the pool are preventing you from exporting it or doing any other operations on it. You may have had snapshots and datasets still referencing the pool at time of failure and so zfs is not letting you at this point do anything because of the issues. To zfs it is unknown if there is any data left. The export command will attempt to unmount any mounted file systems within the pool before continuing but it . If drives are unavailable at the time of export, then they cannot be identified as cleanly exported. If one of these drives is later attached to a system it would appear as “potentially active.” If ZFS thinks volumes are active in the pool, the pool cannot be exported, even with the -f option.

Maybe someone has a better idea but this is probably the way I would approach it.
First make sure your other data on the system is safe and backed up.

You could put the bad disk back in and if the pool in question shows up then try to offline the bad disk and then replace the failed disk with a good one and go from there to try to save the pool. If you get a good pool, then export it. If you get a pool where you can remove datasets, shares, snaps etc. do it, then try either fixing it or exporting it. This would give you an opportunity to try to figure out how it got this way and how to prevent it in the future.

Since the data is not anything important and the pool may not show up anymore at this point, you may find you need go to the command line to force a destroy operation on the bad zfs pool to get rid of it. If you do decide to do this, make very sure you have the correct pool to destroy and any other data on the system safely backed up. The force option (the don't care, do it anyway option) would be necessary because the pool cannot be opened, so it is unknown to zfs if data is still there.
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
You might already know this or have done all this in your troubleshooting but take this carefully as you could really mess things up.....
I'm aware. The pool is empty though so no real danger there and there is no data on the server yet that may not be lost.

I will give you my thoughts. I believe at this point the errors on the pool are preventing you from exporting it or doing any other operations on it. You may have had snapshots and datasets still referencing the pool at time of failure and so zfs is not letting you at this point do anything because of the issues. To zfs it is unknown if there is any data left. The export command will attempt to unmount any mounted file systems within the pool before continuing but it . If drives are unavailable at the time of export, then they cannot be identified as cleanly exported. If one of these drives is later attached to a system it would appear as “potentially active.” If ZFS thinks volumes are active in the pool, the pool cannot be exported, even with the -f option.
There was indeed a dataset when it was healthy. My main problem and disappointment is that the job just hangs, after a reboot the ZFS pool comes back without errors. The issue is in one SSD which seems to have a broken controller. It has ZFS errors.

The UI provides no other information then that it's in an error state. There has to be a way to get the information on what is wrong. This is not giving me a lot of confidence in the product.

Maybe someone has a better idea but this is probably the way I would approach it.
First make sure your other data on the system is safe and backed up.
I've solved it by restarting the server after the export. That solved it. Weirdly enough that also reset the ZFS error counters.

You could put the bad disk back in and if the pool in question shows up then try to offline the bad disk and then replace the failed disk with a good one and go from there to try to save the pool. If you get a good pool, then export it. If you get a pool where you can remove datasets, shares, snaps etc. do it, then try either fixing it or exporting it. This would give you an opportunity to try to figure out how it got this way and how to prevent it in the future.
That is a good idea. It does surprise me that the first and only advise for a enterprise level product is, just experiment a bit. Don't get me wrong I appreciate your help. I am just disappointed that this seems to be the answer to many questions asked.

Since the data is not anything important and the pool may not show up anymore at this point, you may find you need go to the command line to force a destroy operation on the bad zfs pool to get rid of it. If you do decide to do this, make very sure you have the correct pool to destroy and any other data on the system safely backed up. The force option (the don't care, do it anyway option) would be necessary because the pool cannot be opened, so it is unknown to zfs if data is still there.
Agreed, this is what I was indeed thinking. The pool is a test pool with a few old striped SSDs so no RAIDZ, since it is meant to be a storage location for transient data where it's loss is trivial.

I am getting the feeling from how difficult it is to get information out of the system, that if an actual error occurs on a "proper" pool. It'll be a hassle to resolve. Which worries me, I'm highly doubting the robustness and maintainability of the system.

With robustness I do not blame the pool going bad. That is what it is, it's also something I've caused myself choosing to take this risk. The risk I choose to take though is with that specific pool. The risk I did not sign up for is the lack of information provided by the system and the effect a failed pool that contains NO data and is irrelevant to the operation of the system has on said system.

Anyway, thank you for the reply and your insights.
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
An update for anyone interested.

I have been able to remove the pool. After a reboot the pool would always go back to healthy. The datasets were at that point available in the UI.

I removed the datasets and then removed the pool. Which is fine for this situation and works.

It would still be nice to know where to find the information on how to do this properly...
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
It would still be nice to know where to find the information on how to do this properly...
Here? https://www.truenas.com/docs/scale/scaletutorials/storage/pools/managepoolsscale/

Exporting/Disconnecting or Deleting a Pool​

The Export/Disconnect button allows you to disconnect a pool and transfer drives to a new system where you can import the pool. It also lets you completely delete the pool and any data stored on it.

Click on Export/Disconnect on the Storage Dashboard.

Export/Disconnect Pool Window Figure 3: Export/Disconnect Pool Window
A dialog displays showing any system services affected by exporting the pool.

To delete the pool and erase all the data on the pool, select Destroy data on this pool. The pool name field displays at the bottom of the window. Type the pool name into this field. To export the pool, do not select this option.

Select Delete configuration of shares that used this pool? to delete shares connected to the pool.

Select Confirm Export/Disconnect

Click Export/Disconnect. A confirmation dialog displays when the export/disconnect completes.
 

nemesis1782

Contributor
Joined
Mar 2, 2021
Messages
105
Not sure if you're trying to be helpfull or sarcastic. So not sure how to respond, I'll assume it was meant to be helpful.

The document you linked is an overview of UI functions, providing the same information that is already present with the UI itself if I may add. It adds little to anyone that has a technical background.

What I would expect is documentation that describes the following:
- When can you safely use the UI and when not. Apparently there are quite a few cases where you cannot
--> So prerequisites as the system is does not verify if the action can be performed safely before hand
- What to do in which erroneous situations
--> Basically if a pool is stuck on exporting a reboot will fix it (at least that seems to be the case)
- How and where to find the information when a error occurs
--> For instance logging can be found here, look for this, make sure these services are running, etc
 
Top