How screwed am I ZFS degraded

jdabb

Dabbler
Joined
Aug 4, 2019
Messages
42
I just replaced one of my faulted drives and now the drive shows up as replacing and I can see some ghost drives as either unavailable or removed. I also have more faulted drives and my pool is operating on degraded as of right now how screwed am I and how can i fix the drive that is stuck on replacing?

1694561270926.png
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Your vDev is RaidZ2, so can survive the loss of 2 drives.

First thing would be to do a complete backup of all your data while they are still available.

Once the backup is done, you will need to replace these faulted drives. Please, post the output of zpool status -v as well as the entire description of your hardware.
 

jdabb

Dabbler
Joined
Aug 4, 2019
Messages
42
~ > zpool status -v pool: Dexter state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Sep 12 19:00:08 2023 2.87T scanned at 1.75G/s, 1.68T issued at 1.02G/s, 60.3T total 8.97G resilvered, 2.78% done, 16:22:35 to go config: NAME STATE READ WRITE CKSUM Dexter DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 a748340d-bb16-11e9-9bb0-0cc47a694170 ONLINE 0 0 0 676cfe29-d36f-11e9-a78e-0cc47a694170 ONLINE 0 0 0 b14af7d5-bb16-11e9-9bb0-0cc47a694170 ONLINE 0 0 0 b64d09c5-bb16-11e9-9bb0-0cc47a694170 ONLINE 0 0 0 replacing-4 DEGRADED 0 0 0 7596868719220382264 UNAVAIL 0 0 0 was /dev/disk/by-partuuid/bb52065f-bb16-11e9-9bb0-0cc47a694170 2f72e03e-2820-4415-bee7-85e456a93abf REMOVED 0 0 0 1c269990-88ef-467e-a68d-8d2350fa95da REMOVED 0 0 0 fb3bf593-68cc-4f77-b24f-a377d94636e3 ONLINE 0 0 0 (resilvering) c05bd936-bb16-11e9-9bb0-0cc47a694170 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 63508770-9d3f-11ea-81d7-0cc47a694170 ONLINE 0 0 0 648d22a5-9d3f-11ea-81d7-0cc47a694170 ONLINE 0 0 0 65bb283c-9d3f-11ea-81d7-0cc47a694170 ONLINE 0 0 0 66ddb31a-9d3f-11ea-81d7-0cc47a694170 ONLINE 0 0 0 66297536752324112 UNAVAIL 0 0 0 was /dev/disk/by-partuuid/6abdb6cd-9d3f-11ea-81d7-0cc47a694170 6fc92053-9d3f-11ea-81d7-0cc47a694170 ONLINE 0 0 0 raidz2-2 DEGRADED 0 0 0 97d795c2-13a2-4434-afd7-c28caefc4165 DEGRADED 0 0 0 too many errors c77dd2f1-3f2f-4949-b73a-d079255a0847 ONLINE 0 0 0 c0f28785-1bef-4e1b-a655-69e421ca1343 ONLINE 0 0 0 (resilvering) 5af512ea-f485-4ca1-ada5-ab1a7f46b51c DEGRADED 0 0 0 too many errors 78e48027-c7cb-46df-bdfb-6e8998b0f469 ONLINE 0 0 0 80aab91e-66b9-48d6-82f1-c0009d764d37 ONLINE 0 0 0 (resilvering) raidz2-3 DEGRADED 0 0 0 a3260249-2c26-42c0-aebc-f5804f47bbb6 ONLINE 0 0 0 b6aba92c-08ae-4263-987c-47f56687c278 ONLINE 0 0 0 d25945ea-09ae-4f21-a59c-2352411435b0 ONLINE 0 0 1 (resilvering) 2c7e5a12-4122-4702-8da4-4fc0ec6d5492 ONLINE 0 0 0 (resilvering) ad359653-8699-4821-a89e-3c7c715175de DEGRADED 0 0 0 too many errors eee2e748-20c7-48c6-b508-65e700ce95d3 ONLINE 0 0 0 errors: No known data errors
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Thanks for the output.

So you have 4x RaidZ2 vdevs. Each one can loose 2 drives without loosing your data.

So vDev 0 :
One drive is down and is being actively replaced as of now (see the drive marked as re-silvering).

So this vDev is degraded but still safe (still 1 redundant drive present) and is about to come back perfect (once re-silvering is over).

vDev 1 :
One disk is down (missing).

So this vDev is degraded but situation is not dramatic. That missing drive must be re-inserted or a new drive must replace it. Until that, the vDev is safe because there is still 1 redundant drive but do not gamble on this and fix that vDev ASAP.

vDev 2:
That one is in bad shape. You have 3 drives in problem when RaidZ2 can survive the loss of only2. Luckily, some drives are only partially in problem. As long as their problems are not for the very same data, ZFS will manage around that. One of these drives is re-silvering but even once done, you will have 2 drives in problem. Let the re-silvering finish and then work out to replace the other drives.

vDev 3 :
2 drives re-silvering and another in trouble. Again, this is shaky.


What surprises me here is how many problematic drives you have at once. Are these drives SMR ? What exact model are they ?
That can also be the sign of bad cabling, problematic ports, problematic RAM, ....

In all cases, it looks like you have something wrong in your hardware. Try to be easy on the server (stop any service like Plex, Torrent or whatever you have) and let it re-silver everything.

Once re-silvering is done, try to get your vDev in regular state by replacing the missing / problematic drives.

Once your vDev are stabilized, do a complete backup of that pool.

Once you fixed as many vDevs as you can and completed a full backup, you will have to identify what piece of hardware is problematic.
 

jdabb

Dabbler
Joined
Aug 4, 2019
Messages
42
most of these drives are bought used from ebay (Seagate Desktop HDD ST4000DM000 4TB 5.9K 6gb/s 3.5" SATA Hard Drive _ Dell VF3T3)
I haven't been proactively like i should so partly my fault and the fact that they did have a lot of hours on them.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Ok... I hope you have a lot of spare parts because you need to replace many of them and know that most of your other drives will die the same way sooner than later...

How many more drives can you fit in that server ? You may be better to buy few much larger new drives that will last longer instead of buying a ton of used one that will die as soon as you plug them in your server...

You have a high level of local redundancy but working with that many used drive will require you a continuous and strict monitoring of your situation as well as quick replace of anything at fault.
 

jdabb

Dabbler
Joined
Aug 4, 2019
Messages
42
After resilvering i get insufficient replicas for the drive i was replacing.

Code:
zpool status -v
  pool: Dexter
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: resilvered 48.3G in 02:18:41 with 0 errors on Wed Sep 13 15:36:27 2023
config:

    NAME                                        STATE     READ WRITE CKSUM
    Dexter                                      DEGRADED     0     0     0
      raidz2-0                                  DEGRADED     0     0     0
        a748340d-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        676cfe29-d36f-11e9-a78e-0cc47a694170    ONLINE       0     0     0
        b14af7d5-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        b64d09c5-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        replacing-4                             UNAVAIL      0     0     0  insufficient replicas
          7596868719220382264                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/bb52065f-bb16-11e9-9bb0-0cc47a694170
          2f72e03e-2820-4415-bee7-85e456a93abf  REMOVED      0     0     0
          1c269990-88ef-467e-a68d-8d2350fa95da  REMOVED      0     0     0
          fb3bf593-68cc-4f77-b24f-a377d94636e3  FAULTED      0    13     0  too many errors
        c05bd936-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
      raidz2-1                                  DEGRADED     0     0     0
        63508770-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        648d22a5-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        65bb283c-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        66ddb31a-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        66297536752324112                       UNAVAIL      0     0     0  was /dev/disk/by-partuuid/6abdb6cd-9d3f-11ea-81d7-0cc47a694170
        6fc92053-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
      raidz2-2                                  DEGRADED     0     0     0
        97d795c2-13a2-4434-afd7-c28caefc4165    DEGRADED     0     0     0  too many errors
        c77dd2f1-3f2f-4949-b73a-d079255a0847    ONLINE       0     0     0
        c0f28785-1bef-4e1b-a655-69e421ca1343    FAULTED    160     0     0  too many errors
        5af512ea-f485-4ca1-ada5-ab1a7f46b51c    DEGRADED     0     0     0  too many errors
        78e48027-c7cb-46df-bdfb-6e8998b0f469    DEGRADED   145     0     0  too many errors
        80aab91e-66b9-48d6-82f1-c0009d764d37    ONLINE       0     0    52
      raidz2-3                                  DEGRADED     0     0     0
        a3260249-2c26-42c0-aebc-f5804f47bbb6    ONLINE       0     0     0
        b6aba92c-08ae-4263-987c-47f56687c278    ONLINE       0     0     0
        d25945ea-09ae-4f21-a59c-2352411435b0    ONLINE       0     0     1
        2c7e5a12-4122-4702-8da4-4fc0ec6d5492    FAULTED    112     0     0  too many errors
        ad359653-8699-4821-a89e-3c7c715175de    DEGRADED     0     0     0  too many errors
        eee2e748-20c7-48c6-b508-65e700ce95d3    ONLINE       0     0     0

 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Please use the [CODE][/CODE] tags instead... it's very difficult to understand the output from mobile.
 
Top