SOLVED SMB share performance improves...but only during active scrub

Chimaera

Dabbler
Joined
Mar 26, 2019
Messages
11
If anyone has experienced anything vaguely resembling this madness, please help. I can't make heads or tails of it. Exactly as it says on the tin: my file transfer speeds hit the network bottleneck...but only if I initiate a scrub first. I feel like a dirty date.

My performance for non-cached files (not in memory, definite drive access, zpool iostat -v showing numbers bopping) is abysmal. I get erratic jumping in the 5 to 50 MB/s range (leaning towards the 5). I know it's likely not a networking issue because cached files saturate the connection (it's gigabit ethernet, so 90-110 MB/s, give or take).

Here's the kicker though: if I initiate a scrub first (zpool scrub X), any file transfer initiated afterwards, even non-cached, obscure, files, deep in the file structure, saturate the connection. No more inconsistencies in transfer speed. Beautiful 100MB/s bliss.

I genuinely feel like I'm going mad. You'd think a scrub would lower performance.

I'm running an Asrock C236 WSI, with 16GB of ECC memory and a G3900. 6 drives in a single vdev in a zfs-raidz2 running off the on-board SATA.

Troubleshooting ideas:

Is it the CPU? I love this G3900. Never runs hotter than 35C, passively cooled, and the highest utilization it registers during normal operation is around 20-40%. I wondered if it's powerd or something clocking it down too hastily, but it seems to run constantly at the rated 2800 MHz. I need someone more technically oriented to help me troubleshoot this. Not sure where to go with this idea.

Is it the onboard SATA? I have an LSI9211 I could try, but that hardly makes sense given that speeds improve during scrubs. Right?

Is it the hard drives? I played around with the power management to set them all to 192. This seems to have an effect for the next few transfers, but reverts when I leave the machine idle for a while.

Any other ideas?
 

Chimaera

Dabbler
Joined
Mar 26, 2019
Messages
11
No takers?

I disabled the reporting database (to stop the odd writes which may be interrupting reads from the pool) and enabled autotune, but nothing here is catching my eye:

Code:
vfs.zfs.arc_max
13298774528

vfs.zfs.l2arc_headroom
2

vfs.zfs.l2arc_noprefetch
0

vfs.zfs.l2arc_norw
0

vfs.zfs.l2arc_write_boost
40000000

vfs.zfs.l2arc_write_max
10000000

vfs.zfs.metaslab.lba_weighting_enabled
1

vfs.zfs.zfetch.max_distance
33554432

vm.kmem_size
20987500000


Performance has significantly improved, even after several cold boots.
 
Joined
Jan 4, 2014
Messages
1,644
So yes, turning off the reporting database fixed it. Knew it was probably staring me in the face. Thanks?
I'm curious and still not sure how you made the connection between SMB performance and the reporting database? Please explain. How do you turn off the reporting database? What's the downside of turning off the reporting db?'
 

Chimaera

Dabbler
Joined
Mar 26, 2019
Messages
11
I'm curious and still not sure how you made the connection between SMB performance and the reporting database? Please explain. How do you turn off the reporting database? What's the downside of turning off the reporting db?'

Well, what I'd do is run a large file transfer and then watch the shell as it ran "zpool iostat -v 1" trying to figure out if one of the drives is being picky or something. What I observed was a burst of writes now and then, coinciding with an immediate drop in transfer speed, and interrupting reads on the pool. I didn't make much of it at first, but then I read somewhere on here how the reporting database shouldn't be sent to SLOG or the boot disks, as it causes intermittent writes which can slow down other writes or cause wear. In premise, that didn't make sense to me (why wouldn't ZFS manage that traffic? Why not just write to cache and figure it out later?), but I figured, what the hey, maybe.

I guess I didn't disable reporting, per se, I just disabled the write to system dataset, which is on the pool in question. System > System Dataset > uncheck Reporting Database.
 
Joined
Jan 4, 2014
Messages
1,644
Well, what I'd do is run a large file transfer and then watch the shell as it ran "zpool iostat -v 1" trying to figure out if one of the drives is being picky or something. What I observed was a burst of writes now and then, coinciding with an immediate drop in transfer speed, and interrupting reads on the pool. I didn't make much of it at first, but then I read somewhere on here how the reporting database shouldn't be sent to SLOG or the boot disks, as it causes intermittent writes which can slow down other writes or cause wear. In premise, that didn't make sense to me (why wouldn't ZFS manage that traffic? Why not just write to cache and figure it out later?), but I figured, what the hey, maybe.
That's pretty clever and tenacious troubleshooting. Well done!
 
Joined
Jan 4, 2014
Messages
1,644
So, on reflection, why do you think the scrub and playing around with power management temporarily improved SMB performance?
 

Chimaera

Dabbler
Joined
Mar 26, 2019
Messages
11
So, on reflection, why do you think the scrub and playing around with power management temporarily improved SMB performance?

The answer to that may need an understanding of how ZFS handles writes during scrubs. Or how reads are prioritized? Maybe they're suspended or sent to memory cache until the scrub is over, unless absolutely necessary to do a write? I haven't the faintest idea.

On power management, maybe just the act of changing that setting puts the drives into a high-power mode for a while, but then, left idle long enough, they start to ignore the OS and cycle down on their own?

Maybe I didn't solve it at all and I just have the impression I did? It's still working fine now. I could arrange for a slow write to the pool and do a simultaneous read from another system (but I know what would happen is just that the writes would go to memory and the reads would take priority, as per any normal zfs operation).

I'm happy I "fixed" it, but I guess I'd be happier knowing where the real problem is. If you have any ideas on how to troubleshoot, I'd be willing to give it a go. I'll start by defaulting the power management and trying again. If all is well, I'll turn on reporting again, and try that.
 

Chimaera

Dabbler
Joined
Mar 26, 2019
Messages
11
Incidentally, I figured out the scrub thing when I came back from a long holiday where the system was off. It turned on and started doing a scrub, and suddenly my performance improved.

By the next day things had slowed down again. I did another scrub, and boom, speed up.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Disable auto tune and remove the tuneable it created. Auto tune will only hurt you with your setup.
 
Top