Disk Deduplication confusion

Status
Not open for further replies.

Nightowl805

Explorer
Joined
May 12, 2014
Messages
77
I have two MBP that I will be backing up along with two PC's. Many have the same pictures. I have been tempted to enable this feature but I always see the warning about poor performance and to use compression instead. I do have LOTS of memory though but not sure which one to use.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I can tell you this: 99.27% of the people that enable this feature in a home environment wish they hadn't. I actually don't think you have enough memory to dedup this pool, to be honest. You've probably got around 40TB in your pool, give or take, this implies, accordingly to our own documentation and experience, that the amount of RAM you have is the very, very, very, very barest teensy minimum you would need to even think about it.

I strongly suggest you don't attempt it. If you do it, it's not so easy to turn it off. Your files will not magically "undedupe".
 

Nightowl805

Explorer
Joined
May 12, 2014
Messages
77
Wow. I completely believe you and will follow your advice but 96GB of RAM is not enough:( So does everyone just put up with duplicate files for family computers?

Thanks again
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Wow. I completely believe you and will follow your advice but 96GB of RAM is not enough:( So does everyone just put up with duplicate files for family computers?

Thanks again
Well, I am not the expert on deduplication. But let me tell you what I (think I) know:

Deduplication doesn't actually deduplicate FILES. This is a misnomer. What it deduplicates is data blocks. So, for example, if you had a file that had 1000 data blocks to store it, EACH OF THOSE BLOCKS has its own dedupe instruction/pointer. So if you change one byte in the file, then the whole pointer table has to be rewritten, and 999 of the blocks will be deduped still, but you'll now have another pointer for the new block type. It's really quite a bad bit of spaghetti. Anyway, that whole pointer table, for all the deduped blocks, is a big hunk of RAM, which must be live at all times, and it's quite a rat's nest.

So at the end of the day, it almost NEVER makes sense, unless you have an *INSANE* amount of RAM, and a ****LARGE PROPORTION**** of your file blocks are duplicated. Not just a few pictures here and there.

You can look in the documentation, which suggests that you not consider dedupe unless you have around 5GB of RAM for *EVERY* TB of storage in the affected dataset.

Dedupe, and encryption, are the two biggest sucker bets in the appliance. Dedupe definitely the worse of the two, for the home user.
 
Status
Not open for further replies.
Top