How screwed am I ZFS degraded

jdabb · Sep 12, 2023

I just replaced one of my faulted drives and now the drive shows up as replacing and I can see some ghost drives as either unavailable or removed. I also have more faulted drives and my pool is operating on degraded as of right now how screwed am I and how can i fix the drive that is stuck on replacing?

Heracles · Sep 12, 2023

Your vDev is RaidZ2, so can survive the loss of 2 drives.

First thing would be to do a complete backup of all your data while they are still available.

Once the backup is done, you will need to replace these faulted drives. Please, post the output of zpool status -v as well as the entire description of your hardware.

jdabb · Sep 12, 2023


~ > zpool status -v
  pool: Dexter
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep 12 19:00:08 2023
    2.87T scanned at 1.75G/s, 1.68T issued at 1.02G/s, 60.3T total
    8.97G resilvered, 2.78% done, 16:22:35 to go
config:

    NAME                                        STATE     READ WRITE CKSUM
    Dexter                                      DEGRADED     0     0     0
      raidz2-0                                  DEGRADED     0     0     0
        a748340d-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        676cfe29-d36f-11e9-a78e-0cc47a694170    ONLINE       0     0     0
        b14af7d5-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        b64d09c5-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        replacing-4                             DEGRADED     0     0     0
          7596868719220382264                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/bb52065f-bb16-11e9-9bb0-0cc47a694170
          2f72e03e-2820-4415-bee7-85e456a93abf  REMOVED      0     0     0
          1c269990-88ef-467e-a68d-8d2350fa95da  REMOVED      0     0     0
          fb3bf593-68cc-4f77-b24f-a377d94636e3  ONLINE       0     0     0  (resilvering)
        c05bd936-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
      raidz2-1                                  DEGRADED     0     0     0
        63508770-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        648d22a5-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        65bb283c-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        66ddb31a-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        66297536752324112                       UNAVAIL      0     0     0  was /dev/disk/by-partuuid/6abdb6cd-9d3f-11ea-81d7-0cc47a694170
        6fc92053-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
      raidz2-2                                  DEGRADED     0     0     0
        97d795c2-13a2-4434-afd7-c28caefc4165    DEGRADED     0     0     0  too many errors
        c77dd2f1-3f2f-4949-b73a-d079255a0847    ONLINE       0     0     0
        c0f28785-1bef-4e1b-a655-69e421ca1343    ONLINE       0     0     0  (resilvering)
        5af512ea-f485-4ca1-ada5-ab1a7f46b51c    DEGRADED     0     0     0  too many errors
        78e48027-c7cb-46df-bdfb-6e8998b0f469    ONLINE       0     0     0
        80aab91e-66b9-48d6-82f1-c0009d764d37    ONLINE       0     0     0  (resilvering)
      raidz2-3                                  DEGRADED     0     0     0
        a3260249-2c26-42c0-aebc-f5804f47bbb6    ONLINE       0     0     0
        b6aba92c-08ae-4263-987c-47f56687c278    ONLINE       0     0     0
        d25945ea-09ae-4f21-a59c-2352411435b0    ONLINE       0     0     1  (resilvering)
        2c7e5a12-4122-4702-8da4-4fc0ec6d5492    ONLINE       0     0     0  (resilvering)
        ad359653-8699-4821-a89e-3c7c715175de    DEGRADED     0     0     0  too many errors
        eee2e748-20c7-48c6-b508-65e700ce95d3    ONLINE       0     0     0

errors: No known data errors

Heracles · Sep 12, 2023

Thanks for the output.

So you have 4x RaidZ2 vdevs. Each one can loose 2 drives without loosing your data.

So vDev 0 :
One drive is down and is being actively replaced as of now (see the drive marked as re-silvering).

So this vDev is degraded but still safe (still 1 redundant drive present) and is about to come back perfect (once re-silvering is over).

vDev 1 :
One disk is down (missing).

So this vDev is degraded but situation is not dramatic. That missing drive must be re-inserted or a new drive must replace it. Until that, the vDev is safe because there is still 1 redundant drive but do not gamble on this and fix that vDev ASAP.

vDev 2:
That one is in bad shape. You have 3 drives in problem when RaidZ2 can survive the loss of only2. Luckily, some drives are only partially in problem. As long as their problems are not for the very same data, ZFS will manage around that. One of these drives is re-silvering but even once done, you will have 2 drives in problem. Let the re-silvering finish and then work out to replace the other drives.

vDev 3 :
2 drives re-silvering and another in trouble. Again, this is shaky.

What surprises me here is how many problematic drives you have at once. Are these drives SMR ? What exact model are they ?
That can also be the sign of bad cabling, problematic ports, problematic RAM, ....

In all cases, it looks like you have something wrong in your hardware. Try to be easy on the server (stop any service like Plex, Torrent or whatever you have) and let it re-silver everything.

Once re-silvering is done, try to get your vDev in regular state by replacing the missing / problematic drives.

Once your vDev are stabilized, do a complete backup of that pool.

Once you fixed as many vDevs as you can and completed a full backup, you will have to identify what piece of hardware is problematic.

jdabb · Sep 12, 2023

most of these drives are bought used from ebay (Seagate Desktop HDD ST4000DM000 4TB 5.9K 6gb/s 3.5" SATA Hard Drive _ Dell VF3T3)
I haven't been proactively like i should so partly my fault and the fact that they did have a lot of hours on them.

Heracles · Sep 12, 2023

Ok... I hope you have a lot of spare parts because you need to replace many of them and know that most of your other drives will die the same way sooner than later...

How many more drives can you fit in that server ? You may be better to buy few much larger new drives that will last longer instead of buying a ton of used one that will die as soon as you plug them in your server...

You have a high level of local redundancy but working with that many used drive will require you a continuous and strict monitoring of your situation as well as quick replace of anything at fault.

jdabb · Sep 13, 2023

After resilvering i get insufficient replicas for the drive i was replacing.

Code:

zpool status -v
  pool: Dexter
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: resilvered 48.3G in 02:18:41 with 0 errors on Wed Sep 13 15:36:27 2023
config:

    NAME                                        STATE     READ WRITE CKSUM
    Dexter                                      DEGRADED     0     0     0
      raidz2-0                                  DEGRADED     0     0     0
        a748340d-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        676cfe29-d36f-11e9-a78e-0cc47a694170    ONLINE       0     0     0
        b14af7d5-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        b64d09c5-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
        replacing-4                             UNAVAIL      0     0     0  insufficient replicas
          7596868719220382264                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/bb52065f-bb16-11e9-9bb0-0cc47a694170
          2f72e03e-2820-4415-bee7-85e456a93abf  REMOVED      0     0     0
          1c269990-88ef-467e-a68d-8d2350fa95da  REMOVED      0     0     0
          fb3bf593-68cc-4f77-b24f-a377d94636e3  FAULTED      0    13     0  too many errors
        c05bd936-bb16-11e9-9bb0-0cc47a694170    ONLINE       0     0     0
      raidz2-1                                  DEGRADED     0     0     0
        63508770-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        648d22a5-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        65bb283c-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        66ddb31a-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
        66297536752324112                       UNAVAIL      0     0     0  was /dev/disk/by-partuuid/6abdb6cd-9d3f-11ea-81d7-0cc47a694170
        6fc92053-9d3f-11ea-81d7-0cc47a694170    ONLINE       0     0     0
      raidz2-2                                  DEGRADED     0     0     0
        97d795c2-13a2-4434-afd7-c28caefc4165    DEGRADED     0     0     0  too many errors
        c77dd2f1-3f2f-4949-b73a-d079255a0847    ONLINE       0     0     0
        c0f28785-1bef-4e1b-a655-69e421ca1343    FAULTED    160     0     0  too many errors
        5af512ea-f485-4ca1-ada5-ab1a7f46b51c    DEGRADED     0     0     0  too many errors
        78e48027-c7cb-46df-bdfb-6e8998b0f469    DEGRADED   145     0     0  too many errors
        80aab91e-66b9-48d6-82f1-c0009d764d37    ONLINE       0     0    52
      raidz2-3                                  DEGRADED     0     0     0
        a3260249-2c26-42c0-aebc-f5804f47bbb6    ONLINE       0     0     0
        b6aba92c-08ae-4263-987c-47f56687c278    ONLINE       0     0     0
        d25945ea-09ae-4f21-a59c-2352411435b0    ONLINE       0     0     1
        2c7e5a12-4122-4702-8da4-4fc0ec6d5492    FAULTED    112     0     0  too many errors
        ad359653-8699-4821-a89e-3c7c715175de    DEGRADED     0     0     0  too many errors
        eee2e748-20c7-48c6-b508-65e700ce95d3    ONLINE       0     0     0

Davvo · Sep 13, 2023

Please use the [CODE][/CODE] tags instead... it's very difficult to understand the output from mobile.

Important Announcement for the TrueNAS Community.

How screwed am I ZFS degraded

jdabb

Dabbler

Heracles

Wizard

jdabb

Dabbler

Heracles

Wizard

jdabb

Dabbler

Heracles

Wizard

jdabb

Dabbler

Davvo

MVP

Similar threads

Important Announcement for the TrueNAS Community.

How screwed am I ZFS degraded

Dabbler

Wizard

Dabbler

Wizard

Dabbler

Wizard

Dabbler

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How screwed am I ZFS degraded"

Similar threads