Fastest way to compare 2 datasets?

Phase

Explorer
Joined
Sep 30, 2020
Messages
63
Looking to do a file contents comparison in 35 TiB of data. I have both pools on the same machine.

I tried 3 methods they are all terribly slow for this much data:

diff
diff -qrN "set1" "set2"

md5
find "set1" -print0 -type f | xargs -0 md5
find "set2" -print0 -type f | xargs -0 md5
then compare md5 outputs

rsync
rsync -n -arc --no-perms --no-owner --no-group --no-times "set1" "set2"

Thoughts?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
How did both sets get created?

If the answer involves snapshots, you can use zfs diff.
 

Phase

Explorer
Joined
Sep 30, 2020
Messages
63
The zfs diff seems to be based on metadata. We would like to run checks on the contents of the files themselves.

After a disk failing and resilvering we got bad checksums, an unrecoverable file, and a 1-bit difference in a file when compared with its replicated copy. The issue seems to have been "resolved" after an TrueNAS reinstall and an additional scrub. However, without a contents diff we do not know if any issues remain.

In that same pool there is another dataset (11 TiB) that was not replicated. If we compare the 35 TiB with the replicated copy and there are no differences, it would be a good indication that the other 11 TiB are good to go, and whatever voodoo happened ended well.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
If your interest is integrity checking, a ZFS scrub already does that.

Every block stored carries a checksum, which is checked against the actual stored data in the block during the scrub.

If you're encountering irreparable corruption of data that sits on ZFS, you need to consider if your pool design is right for your use case. (or look into potential bugs via the OpenZFS project)
 

Phase

Explorer
Joined
Sep 30, 2020
Messages
63
Every block stored carries a checksum, which is checked against the actual stored data in the block during the scrub.

If you're encountering irreparable corruption of data that sits on ZFS, you need to consider if your pool design is right for your use case. (or look into potential bugs via the OpenZFS project)

Yeap, we encountered both: 2 unrecoverable errors affecting 1 file in 1 snapshot in a Z3 pool and a 1-bit difference in a 34GiB file when compared to a previously replicated copy. Yea, hard to explain, specially why the 1-bit difference went away after a new scrub.

Meaning:
  1. Disk goes bad
  2. Replace and resilver
  3. Oh! Errors and a difference in the contents of at least 1 file
  4. Let's try scrubbing again
  5. Oh the 1-bit difference is gone. Doing a scrub immediately after a resilvering, yet it found 24K of data to fix (weird)
  6. <stuff>
  7. May be the drivers or some part of the OS are corrupted...
  8. Fresh OS/TrueNAS install
  9. Scrub comes out clean

There is a thread on that here https://www.truenas.com/community/t...delete-restore-only-the-affected-file.101026/

Anyway, since a comparison was not viable to assess if any other errors remained, I'm rebuilding the old dataset from the replicated copy.
 
Last edited:
Top