What does "Checksum" mean on Boot pool status page?

User555

Cadet
Joined
Apr 25, 2021
Messages
6
I assume "Read" and "Write" are the number of "Read" and "Write" operations. What does "Checksum" mean? Is it number of some kind of "Checksum" operations? Or a number of errors? I just removed a failed drive and added a new one. So having errors already would be strange?
 

Attachments

  • checksum.png
    checksum.png
    17.7 KB · Views: 623

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Read errors (the disk failed a read), write errors (the disk failed a write) and checksum errors (the disk read back supposedly good data, but it is corrupted and does not match the checksum).

For us to provide useful advice, you'll have to tell us more about your system.
 

User555

Cadet
Joined
Apr 25, 2021
Messages
6
It is a Intel NUC i3 system, with a 2 TB SATA SSD as "main pool" drive, 1 TB USB SSD as a System drive and 2x Sandisk Ultra Fit 128 GB flash drives as boot drives. (I know USB drives are not really recommended but this a very compact system.)

The reason I am not using the System SSD as a boot drive also is that as I remember encrypted boot drives are not supported.

If the Checksum values are errors and they are appearing immediately on a fresh USB from the packaging, this seems very weird.

When resilvering, I noticed I actually got a permanent error in a file:
Code:
  pool: freenas-boot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 6.90G in 00:53:44 with 2 errors on Sun Apr 25 17:59:55 2021
config:

        NAME          STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da1p2     ONLINE       0     0     0
            da2p2     ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        freenas-boot/ROOT/12.0-U3@2019-07-24-16:17:44:/data/factory-v1.db


I think /data/factory-v1.db some kind of factory default settings so it should "fine". And maybe "@2019-07-24-16:17:44" means it is just an old snapshot? But I am not sure. I kind of wonder if this is where the errors are coming from. I just updated to U3 and rebooted and so far now there are 0 errors.

I will have to figure out if I can "repair" this file somehow.
 

User555

Cadet
Joined
Apr 25, 2021
Messages
6
I deleted the affected snapshot but now the error looks strange:
Code:
        NAME          STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da1p2     ONLINE       0     0     0
            da2p2     ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0x44>:<0x9>


I will see if running a Scrub and then maybe rebooting helps things.
 

User555

Cadet
Joined
Apr 25, 2021
Messages
6
Ok, after a couple scrubs, the second "old" drive that had not been replaced continued showing errors but not the new drive. So I assume the checksum errors from the new drive were because of bad data copied from the old failing drive. I replaced the second USB as well. I hope everything should work well again now.

Code:
  pool: freenas-boot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Apr 25 19:13:06 2021
        7.77G scanned at 306M/s, 106M issued at 4.07M/s, 7.77G total
        107M resilvered, 1.33% done, no estimated completion time
config:

        NAME          STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da2p2     ONLINE       0     0     0
            da1p2     ONLINE       0     0     0  (resilvering)

errors: No known data errors
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I just removed a failed drive and added a new one. So having errors already would be strange?
How did you do that, exactly? Traditional resilver would immediately correct this before writing anything. New "copy things straight over and then scrub" resilver should have fixed the problem with a scrub immediately afterwards.
 

User555

Cadet
Joined
Apr 25, 2021
Messages
6
How did you do that, exactly? Traditional resilver would immediately correct this before writing anything. New "copy things straight over and then scrub" resilver should have fixed the problem with a scrub immediately afterwards.
I plugged in the drive to the system and attached it to the pool. I guess this is what you refer to as a "traditional resilver". It is a bit strange what would cause the checksum errors then.

I didn't get any checksum errors on the replaced drive after the first scrub though. And now with both drives replaced and the system running for a few hours, I haven't gotten any errors. (Even running a scrub.)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Top