diedrichg
Wizard
- Joined
- Dec 4, 2012
- Messages
- 1,319
Reference this spreadsheet I created to get your answer about # drives optimization: https://docs.google.com/spreadsheets/d/1wi-4CN2opzw2yAiM9JHC5VAiT9FFIBxhVECBk73jH1Y/htmlview
Could have sworn I saw 4GB as the minimum back on the 9.1 branch, but looking at the old docs I see 6GB as far back as 8.0.1
There actually can be a waste of space if you use ashift=12 (which most people with modern consumer disks should be using).
See this with detail: https://web.archive.org/web/2014040...s.org/ritk/zfs-4k-aligned-space-overhead.html
For example, with 12x4TB RAIDZ2 there is a waste of 2.91TiB due to alignment padding and allocation overhead.
Though I should also note that there are ways to deal with this. For example, if you change the recordsize on your datasets (from the default 128KiB) to 1 MiB and you make sure to have compression enabled. Then you can avoid pretty much all of this alignment padding and allocation overhead.
But if you stick with 128 KiB blocks and ashift=12 then you will always have some amount of wasted space unless you are using 6-disks or 18 disks in your RAIDZ2. with 6 and 18 disks the alignment padding and allocation overhead are zero and the only overhead comes from the reserved space for metadata which will always, alawys be there.
I should also note that adding another disk to your vdev will always increase your space. You never lose space by adding a disk, but for some number of disks in your vdev you don't gain as much by adding 1 more disk as you do with other numbers of disks in your vdev.
Very interesting. So, if I change my vdev's record size to 1024K instead of Inherit (via Edit Options), there should be no lost space any more? Less at least?
My pool consists of 8x2TB hard drives formated with one vdev in RAIDZ2. Thanks.
root@nick-server:/# zfs get recordsize nickarrayold/test/128k NAME PROPERTY VALUE SOURCE nickarrayold/test/128k recordsize 128K local root@nick-server:/# zfs get recordsize nickarrayold/test/1M NAME PROPERTY VALUE SOURCE nickarrayold/test/1M recordsize 1M local root@nick-server:/# cd /nickarrayold/test/128k/ root@nick-server:/nickarrayold/test/128k# ls -la total 1024483 drwxr-xr-x 2 root root 3 Mar 11 21:03 . drwxr-xr-x 4 root root 5 Mar 11 21:02 .. -rw-r--r-- 1 root root 1048576000 Mar 11 21:03 1GB.bin root@nick-server:/nickarrayold/test/128k# du 1024483 . root@nick-server:/nickarrayold/test/128k# du --apparent-size 1024001 . root@nick-server:/nickarrayold/test/128k# cd .. root@nick-server:/nickarrayold/test# cd 1M/ root@nick-server:/nickarrayold/test/1M# ls -la total 941578 drwxr-xr-x 2 root root 3 Mar 11 21:03 . drwxr-xr-x 4 root root 5 Mar 11 21:02 .. -rw-r--r-- 1 root root 1048576000 Mar 11 21:03 1GB.bin root@nick-server:/nickarrayold/test/1M# du 941577 . root@nick-server:/nickarrayold/test/1M# du --apparent-size 1024001 . root@nick-server:/nickarrayold/test/1M# zfs list nickarrayold/test/128k NAME USED AVAIL REFER MOUNTPOINT nickarrayold/test/128k 1001M 124G 1001M /nickarrayold/test/128k root@nick-server:/nickarrayold/test/1M# zfs list nickarrayold/test/1M NAME USED AVAIL REFER MOUNTPOINT nickarrayold/test/1M 920M 124G 920M /nickarrayold/test/1M
Dataset, not vdev. Vdevs don't have settings or properties.
Here I set up a real example on my ZFS. Note, that my zpool is a 12-disk RAIDZ2.
According to a spreadsheet I made, a 12-disk RAIDZ2 configuration has a 9.375% overhead in 128K recordsize, and a 0.586% overhead in 1M recordsize.
https://docs.google.com/spreadsheet...J-Dc4ZcwUdt6fkCjpnXxAEFlyA/edit#gid=804965548
Code:root@nick-server:/# zfs get recordsize nickarrayold/test/128k NAME PROPERTY VALUE SOURCE nickarrayold/test/128k recordsize 128K local root@nick-server:/# zfs get recordsize nickarrayold/test/1M NAME PROPERTY VALUE SOURCE nickarrayold/test/1M recordsize 1M local root@nick-server:/# cd /nickarrayold/test/128k/ root@nick-server:/nickarrayold/test/128k# ls -la total 1024483 drwxr-xr-x 2 root root 3 Mar 11 21:03 . drwxr-xr-x 4 root root 5 Mar 11 21:02 .. -rw-r--r-- 1 root root 1048576000 Mar 11 21:03 1GB.bin root@nick-server:/nickarrayold/test/128k# du 1024483 . root@nick-server:/nickarrayold/test/128k# du --apparent-size 1024001 . root@nick-server:/nickarrayold/test/128k# cd .. root@nick-server:/nickarrayold/test# cd 1M/ root@nick-server:/nickarrayold/test/1M# ls -la total 941578 drwxr-xr-x 2 root root 3 Mar 11 21:03 . drwxr-xr-x 4 root root 5 Mar 11 21:02 .. -rw-r--r-- 1 root root 1048576000 Mar 11 21:03 1GB.bin root@nick-server:/nickarrayold/test/1M# du 941577 . root@nick-server:/nickarrayold/test/1M# du --apparent-size 1024001 . root@nick-server:/nickarrayold/test/1M# zfs list nickarrayold/test/128k NAME USED AVAIL REFER MOUNTPOINT nickarrayold/test/128k 1001M 124G 1001M /nickarrayold/test/128k root@nick-server:/nickarrayold/test/1M# zfs list nickarrayold/test/1M NAME USED AVAIL REFER MOUNTPOINT nickarrayold/test/1M 920M 124G 920M /nickarrayold/test/1M
As you can see, I write a 1GB file to the 128K recordsize dataset, and it takes up USED=1001M. But I write it to a 1M recordsize dataset and it only takes up USED=920M of space.
The file is still 1GB either way and you can see the differences reported also by du and du --apparent-size being the same.
Remember that ZFS assumes 128K recordsize, so upon zpool creation, it already as reduced the capacity of the pool to assume you are writing 128K records to it. This is why using more efficient 1M recordsizes makes the files appear to take up less space than how large they really are. The end result here is I can store over 8% more data on my pool when the files are stored on 1M recordsize. If my pool is 32TB capacity, that's an extra 2.56TB of space to use.
Meanwhile in GUI, should the space availability number change after changing dataset's recordsize? In my case it did not:
View attachment 23466
P. S. My confusion in terminology comes from what I see in FreeNAS GUI as well. While refering to @cyberjock's FreeNAS Guide 9.10, I was boldly sure that by creating a pool/volume GUI formats vdev (in raidz1/2/3 etc.) and datasets goes inside the vdev. Apparently not.
But if you stick with 128 KiB blocks and ashift=12 then you will always have some amount of wasted space unless you are using 6-disks or 18 disks in your RAIDZ2. with 6 and 18 disks the alignment padding and allocation overhead are zero and the only overhead comes from the reserved space for metadata which will always, alawys be there.
Also, freshly formated pool w/o any shares is already filled with 701 MiB of some kind of data, and I saw it gradually moving up. Is that metadata?
Oh sorry, didn't saw it wasn't my post in the quote.
So yeah, the reserved space for metadata is 1/64 of the total space, so about 1.6 %.
While searching for the metadata/CoW overhead numbers to understand this 1/32 and 1/64 mess I found this post: https://forums.freenas.org/index.ph...act-checksum-size-overhead.28187/#post-183802 and it's you who told me there's 1/64 of the space reserved for metadata so now I'm lost.
Now I also wonder why you can't delete files if you fill your pool to 100 %... because if there's reserved space for the CoW you should be able to do it, no?