Optimizing raidz for sequential append-only databases?

skyhawk

Cadet
Joined
Jan 1, 2024
Messages
5
I'm new to TrueNAS, but not to ZFS.

I'm intending to get a bitcoin-daemon running on my new TrueNAS machine, and I'm pondering the optimal storage solution for it's data.

Bitcoind, (and presumably most similar software carbon-copied from bitcoind) needs to store 3 primary things
a) General housekeeping information - this can live anywhere
b) The Chainstate - this is a highly-active and fairly small database. Nothing other than an SSD makes any sense for this
c) The Blocks - These files represent a huge, and growing, amount of storage requirement. Disregarding the initial-block-download, the bitcoind will append ~2MB to a block file every 10 minutes or so. Reads are usually highly sequential, to the best of my knowledge.

I was planning to give my bitcoind a dedicated disk to store it's blocks on, given that I don't care at all about redundancy for this data. But I've got a large raidz2 that could easily accommodate the blocks, especially if zfs can be configured to keep the stripes large to maximize the benefit of sequential reads.

ZFS Record Size sets the maximum stripe size for any given allocation, as I understand. Unlike most other filesystems, a large Record Size does not impose any "slop" or lost usable space on the filesystem, because Record Size is not a minimum size - small files will still get small allocations.

Is there a way to force ZFS to have a minimum Record Size for a given dataset, so that (infrequent) writes will be re-written to the maximum Record Size, and subsequent reads will be for full 16MB strips at a time?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
TL;DR your append-only workflow, assuming it continues to write to the "tail-end" offset of the file, should behave how you want by default, up to the limit of recordsize.

The initial record for the file will start small, growing up to the limit specified by recordsize as it gets rewritten/appended to - additional records comprising the same file will be identical in size to the initial one, and compressed using whatever algorithm is used on the dataset. So, if you want to set recordsize=16M when creating the dataset (crucially - before the initial block download, otherwise you'll be limited to the default 128K) then go for it.

I expect though that you'll probably see diminishing returns beyond 1M - but we might have some interesting opportunities to tune readahead.
 
Top