After one disk got lost from pool, not possible to add it back

vafk

Contributor
Joined
Jun 22, 2017
Messages
132
I had a pool existing of 7 disks (2 TB) in RAIDZ2 in a Dell PowerVault MD1000, da0 to da6. The PowerVault is used as additional backup and powered on by request. Yesterday after power on the pool showed as degraded with disk da4 N/A. I rebooted Truenas and the MD1000 and then the pool was showing as healthy again. Short SMART-Test on da4 was good.

After the backup was finished overnight, I found the pool with da4 was N/A again, this time healthy but with less capacity because of removed da4.

I wanted to add the da4 back to the pool and the only available option "ADD VDEV" shows "This type of VDEV requires at least 4 disks".

I understand that somehow da4 was lost from pool, pool was degraded and somehow corrected itself but now cannot get the lost disk back into the array.

What can I do? Thank you!
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
with less capacity because of removed da4.
a zfs pool cannot lose capacity in the way you are describing. the capacity of the pool will be same no matter how many disks are online. if too many disks are offline, the pool will just be not available.

your disk is dead; this is what degraded means, a disk is missing. you need to replace it. if a disk is dissapearing the smart results are irrelevant. immediately begin planning for its replacement, as that disk is no longer reliable. zfs will fail disks out of the pool if they return too much corrupted data, or a disk that is physically failing will dissapear as it fails semi-randomly and the controller looses access to it.

adding to the pool is expanding it; with a raidz2, that means adding another raid2 vdev to the pool.
this is NOT replacing a disk. a disk that drops from a pool will begin resyncing as soon as it reappears. the fact that it hasn't tells us that it's on it's last legs.
 

vafk

Contributor
Joined
Jun 22, 2017
Messages
132
@artlessknave

thank you. to understand what happened. I believe the drive did not fail but was not correctly mounted due to the hardware setup. the MD1000 is connected via interface card to a hp microserver. whenever I need to do the backup I switch the MD1000 by remotly powering on the APC UPS. this went well for over a year. but yesterday happened what I wrote before. now the HDD is still there and it is still good. but I cannot get it back into the existing array. was the pool resilvered automatically after it lost the hdd (no matter why, hdd failed or was mounted wrong)? then if the pool resilvered and so "healed" itself, who told the pool to shrink? for whatever reason a hdd fails it must leave me the decision to replace it to keep its size. but now I cannot replace the missing drive, not with the old one and not with the new one. the only thing I see is to delete the pool and create it again. or am I missing something? thank you!
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
as long as the old disk is reachable zfs will automatically resilver it. a disk is not removed from the pool automatically, merely offlined if there are many errors until admin actions can fix problems.
if this is not happening, then something has happened to prevent that. something significant. or the pool was not designed the way you are expecting somehow.

I am not at home right now, so I can't check against my install, but I think these commands will help figure out what is going on.
Code:
zpool status -v
gpart show
glabel status
 
Top