Checksum errors - hex files

naximov · Nov 12, 2022

The pool detected some checksum errors. I replaced HBA card AND SATA cables for all affected drives, and running a scrub again after booting. It seems that errors are still there..

Smartctl is all clear...

Any idea how to fix these hex file errors?

Code:

root@truenas[~]# zpool status -v Raiden2                                                                                                                                          

  pool: Raiden2
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Sat Nov 12 18:39:44 2022
        8.92T scanned at 977M/s, 3.73T issued at 409M/s, 8.92T total
        1M repaired, 41.88% done, 03:41:21 to go
config:

        NAME                                      STATE     READ WRITE CKSUM
        Raiden2                                   ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            eb91b763-32ae-4e5a-a591-ce74f8478fac  ONLINE       0     0     2
            26eb6cb2-aa56-4841-9cb8-242d9edbb180  ONLINE       0     0     2
          mirror-1                                ONLINE       0     0     0
            ad1493a1-f58c-444d-bccc-1994bb651715  ONLINE       0     0     0
            4609d5b1-ff07-488a-9d47-57a4709a86e0  ONLINE       0     0     0
          mirror-3                                ONLINE       0     0     0
            3e63f1da-e5ad-4afe-9dbc-af276c881c65  ONLINE       0     0     0
            a638e79c-3bd9-4c26-be1f-b216e4e5fcc1  ONLINE       0     0     0
          mirror-4                                ONLINE       0     0     0
            95ddfaf9-defa-421e-9250-e3dce9aa59bd  ONLINE       0     0     1  (repairing)
            2e9bbeed-1d96-40e8-8f6e-4e50983da1d9  ONLINE       0     0     0
        special
          mirror-5                                ONLINE       0     0     0
            f1c0dc58-54b8-4a7e-940c-74ecf1862b4e  ONLINE       0     0     0
            41ae18fd-230d-4d8c-83a4-14a113e4bece  ONLINE       0     0     0
        cache
          a032af66-2ced-4199-86a6-f4db5f4bb383    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:
        <0x13665>:<0x3130e>
        <0x13665>:<0x30876>
        <0xa7c>:<0x3130e>

Samuel Tai · Nov 12, 2022

Those aren't files; that's corrupted file system metadata, which corresponds to the checksum errors in your pool members. The one in mirror-4 is likely <0xa7c>:<0x3130e>, which may be repairable from the other member of the mirror VDEV. The other 2 appear to match the checksum errors in mirror-0, and aren't repairable, as both members of the VDEV are faulty.

The only way to fix this, unfortunately, is to destroy the pool, recreate, and reload from backup.

naximov · Nov 13, 2022

Samuel Tai said:
Those aren't files; that's corrupted file system metadata, which corresponds to the checksum errors in your pool members. The one in mirror-4 is likely <0xa7c>:<0x3130e>, which may be repairable from the other member of the mirror VDEV. The other 2 appear to match the checksum errors in mirror-0, and aren't repairable, as both members of the VDEV are faulty.

The only way to fix this, unfortunately, is to destroy the pool, recreate, and reload from backup.

Thank you, that's a shame!

I've deleted all snapshots and the final results of the scrub were:

root@truenas[~]# zpool status -v Raiden2
pool: Raiden2
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub in progress since Sun Nov 13 08:36:25 2022
5.96T scanned at 6.15G/s, 464G issued at 479M/s, 5.96T total
0B repaired, 7.61% done, 03:20:54 to go
config:

NAME STATE READ WRITE CKSUM
Raiden2 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
eb91b763-32ae-4e5a-a591-ce74f8478fac ONLINE 0 0 2
26eb6cb2-aa56-4841-9cb8-242d9edbb180 ONLINE 0 0 2
mirror-1 ONLINE 0 0 0
ad1493a1-f58c-444d-bccc-1994bb651715 ONLINE 0 0 2
4609d5b1-ff07-488a-9d47-57a4709a86e0 ONLINE 0 0 2
mirror-3 ONLINE 0 0 0
3e63f1da-e5ad-4afe-9dbc-af276c881c65 ONLINE 0 0 0
a638e79c-3bd9-4c26-be1f-b216e4e5fcc1 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
95ddfaf9-defa-421e-9250-e3dce9aa59bd ONLINE 0 0 1
2e9bbeed-1d96-40e8-8f6e-4e50983da1d9 ONLINE 0 0 0
special
mirror-5 ONLINE 0 0 0
f1c0dc58-54b8-4a7e-940c-74ecf1862b4e ONLINE 0 0 0
41ae18fd-230d-4d8c-83a4-14a113e4bece ONLINE 0 0 0
cache
a032af66-2ced-4199-86a6-f4db5f4bb383 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

<0xa7c>:<0x3130e>
<0xa7c>:<0x30876>

Is there a better pool layout that can prevent / minimize this from happening again?

Samuel Tai · Nov 13, 2022

What model disks do you have? Usually, I only see this sort of corruption with SMR disks.

naximov · Nov 14, 2022

None of them are SMR, according to at least google results of the model lookups.

I managed to fix it by deleting all snapshots, and running scrub twice, and then running zpool clear command.

Important Announcement for the TrueNAS Community.

Checksum errors - hex files

naximov

Cadet

Samuel Tai

Never underestimate your own stupidity

naximov

Cadet

Samuel Tai

Never underestimate your own stupidity

naximov

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Checksum errors - hex files

naximov

Cadet

Samuel Tai

Never underestimate your own stupidity

naximov

Cadet

Samuel Tai

Never underestimate your own stupidity

naximov

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Checksum errors - hex files"

Similar threads