zpool status -D, interpreting the table?

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
I've kind of guessed my way of interpreting the output of the above command but would like something more concrete as to what the columns are stating.

Example output of mine:

1682124170189.png


Overall, it's a 1.08x deduplication rate, which, by contrast, with this same data on a Microsoft ReFS volume w/ Deduplication enabled, instead hovers around 20-21% deduplication; however on ReFS; that data is not compressed as I am doing with ZFS; so it's probably a bit of a mixed bag / trade-off. (1.08x compression x 1.08 deduplication is about an overall 16.5% reduction)

I more or less look at the bottom of this and compare DSIZE, with 'allocated' equating to actual / as-written, where 'referenced' is more as to what it is if there was not deduplication. But what am I really looking at; can someone break this down for me or give me a link? Seems what I have found so far has been just recommendations about not deduplicating data because of the performance hit, but my array is SATA SSD based, with a mirrored pair of NVMe being used for the special VDEV...so performance hit isn't that spectacular; though, truth be told here, the deduplication rate also isn't that spectacular...so I regularly debate if it's 'worth' it...lol
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
deduplication is discouraged because it gives relatively minimal results, except in very specific scenarios, while requiring a seriously beefy system; not just storage speeds.
it needs RAM to hold the dedup tables, ram that could be caching data instead.
CPU to process and calculate boatloads of dedup hashes/parity/checksums/etc.
it needs storage speeds to keep all of that fed.

the dedup performance issues are typically not noticed until the system is being used, often when it's too late to easily back out of using dedup.

it also depends on what you are deduping. zfs will generally not depup something that dedups horribly, while we know MS tends to not bother with optimizations and tends to not be as accurate in telling you what its doing.

we would need a better idea of what hardware you are using (a forum requirement, by the way) and what you are are trying to dedup, and even then, the chances are pretty much 99.99% that you just should use compression instead.

if you are simply experimenting, it's a good idea to state so, as we see many people do unrecomended things with real data and can get hung up on just saying not to do it.
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
deduplication is discouraged because it gives relatively minimal results, except in very specific scenarios, while requiring a seriously beefy system; not just storage speeds.
it needs RAM to hold the dedup tables, ram that could be caching data instead.
CPU to process and calculate boatloads of dedup hashes/parity/checksums/etc.
it needs storage speeds to keep all of that fed.

the dedup performance issues are typically not noticed until the system is being used, often when it's too late to easily back out of using dedup.

it also depends on what you are deduping. zfs will generally not depup something that dedups horribly, while we know MS tends to not bother with optimizations and tends to not be as accurate in telling you what its doing.

we would need a better idea of what hardware you are using (a forum requirement, by the way) and what you are are trying to dedup, and even then, the chances are pretty much 99.99% that you just should use compression instead.

if you are simply experimenting, it's a good idea to state so, as we see many people do unrecomended things with real data and can get hung up on just saying not to do it.
Tx. Scenario is home lab; personal data.

Supermicro 5028D-TNT, 128GB ECC RAM
6 WD Red SATA 4TB disks, Z2
2 Samsung 970 Evo Plus 1TB NVMe on Supermicro 2-port AOC NVME PCIE card (bifurcation): Special VDEV, Mirror
1 Samsun 970 Evo Plus 1TB NVMe on board, Cache VDEV

I realize the compression vs dedupe debate has been going on forever, but since special vdevs became available in FreeNAS, I've been using them and been happy with the performance of having deduplication on...before it would make a system near unusable after some magic threshold.

Same with the debate over using USB boot these days; buy a slew of them from a reputable maker, mirror them out, and it's not that big of a deal if one pops. ;)

What I was really after was some clarification as to the layout of the table. Apologies this likely reads as unappreciative.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
there aren't all that many people using dedup. it's largely new territory and your mileage will vary.
it looks to me like you are interpreting the table correctly, and it's telling you part of why dedup isn't used. most of the time it gives the same as compression, and then 3 years down the road becomes so useless and slow, because it's fast to start with as as there is more and more data and dedup tables it gets more and more bogged down.
 
Top