Massive drop in Write speed on one pool

chris.d

Cadet
Joined
Nov 11, 2022
Messages
2
In the last couple of weeks we have experienced a massive drop in write performance from our production pool. Specs:
1668171231083.png



I don't believe it is a network issue as another pool with the same setup is currently running fine. This appeared to coincide migrating TrueCore from USB to SSD.
Production pool is currently at 77% of 60TB's. Though it was running fine at 90% capacity before the move to SSD.
I am a bit of a newbie, so if any more screengrabs/info will help, please let me know.

Thanks in advance.

Production Pool
1668170930837.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
77% of 60TB's. Though it was running fine at 90% capacity

This could be experiencing high fragmentation, which could account for slowness. ZFS is a copy-on-write filesystem, so when you replace a block, it allocates a new block in empty space, writes it, and then frees the old block wherever that was. This has a tendency to create free space in a scattershot manner which can be a performance killer as the system needs to start seeking heavily to find the smallish blocks of free space. This can have a dramatic impact on performance, and is one of the reasons ZFS folks are skeptical anytime someone comes in saying "it was running fine at 90% capacity" -- it quite probably wasn't, and rather it's likely that the damage wasn't immediately obvious.

Check to see how busy the disk is when you're experiencing the problem. Run "gstat -f da8" for example and see how busy the disk is. Here's an example of a relatively saturated disk:

Code:
dT: 1.012s  w: 1.000s  filter: da3
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    2    856      8     99   27.7    846  56210    1.8   86.8  da3
    0      0      0      0    0.0      0      0    0.0    0.0  da3p1
    2    856      8     99   27.7    846  56210    1.8   86.8  da3p2


It's still sustaining a high level of writes per second and kBps written per second despite being 87% busy, but as the percentage of seeks goes down, those metrics will fall.
 
Top