TRIM takes ages to complete.

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
The iostat output is far better in terms of details, but anyway here is a screenshot for two of the disks, it is the same on all others:
1703074142164.png
 

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
Here is the output of iostat.
 

Attachments

  • iostat.txt
    51.2 KB · Views: 166
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996

Here is the iostat output as well https://pastecode.io/s/wa6y4i6i (view in fullscreen for better readability)
Please do not use external links, many of our forum people will not click on the links. Please post directly in our forums. And this site has a lot of popups! How many ways can we infect a computer, let me count the ways. The attached text file is much better for me to view.

From the data provided, while it may not be a lot of data, it does appear that 'writing data' to be virtually continuous. I have no idea if that is contributing to your problem or not.

Another piece of information you have not provided is the version of SCALE you are running now and what were you running when your TRIM times were good? I'm trying to take notice in your postings, you just joined the forum yesterday (thanks for joining), you have been running TrueNAS for months but what version of TrueNAS were you running? Did you run CORE and upgrade to SCALE, what version of SCALE? We need some more history. Did this problem start after an upgrade? Can you roll back to the previous version that worked to verify it TRIM works and the current version of SCALE is causing your issue?

If you upgraded your pool, you may not be able to roll back, thus I never upgrade my pools as the new features are something that I would not use and they restrict me from rolling back to a previous version. Just something worth noting.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hey @IvoT

The defaults in ZFS are to limit a maximum of 10 TRIM commands in queue per leaf vdev - so with your 8x SSD's in RAIDZ2, you're averaging just over one TRIM command in queue per physical disk, and it's also aiming to aggregate your TRIMs across transaction groups as well. This is designed to limit the TRIM speed in order not to impact pool I/O.

SAS drives do better with discards than SATA, so you may be able to use autotrim in your situation, and simply let the VMFS layer pass the UNMAP commands down the chain (VMware can also rate-limit the discard speed)

I need to make a longer effortpost on TRIM in general. Someone nag me if I haven't done it in a reasonable amount of time. :wink:
 

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
Please do not use external links, many of our forum people will not click on the links. Please post directly in our forums. And this site has a lot of popups! How many ways can we infect a computer, let me count the ways. The attached text file is much better for me to view.

From the data provided, while it may not be a lot of data, it does appear that 'writing data' to be virtually continuous. I have no idea if that is contributing to your problem or not.

Another piece of information you have not provided is the version of SCALE you are running now and what were you running when your TRIM times were good? I'm trying to take notice in your postings, you just joined the forum yesterday (thanks for joining), you have been running TrueNAS for months but what version of TrueNAS were you running? Did you run CORE and upgrade to SCALE, what version of SCALE? We need some more history. Did this problem start after an upgrade? Can you roll back to the previous version that worked to verify it TRIM works and the current version of SCALE is causing your issue?

If you upgraded your pool, you may not be able to roll back, thus I never upgrade my pools as the new features are something that I would not use and they restrict me from rolling back to a previous version. Just something worth noting.

Note taken for the external links - I will attach logs directly from now on.
The 'writing data' is the actual trimming taking place. If you look in iostat it is clearly seen that dMB/s is causing it (which is discarded mb/s).
I am running TrueNAS-23.10.0.1. I don't remember on what version was TRIM working fast, but it was not upgraded from CORE, it was a clean install.

Hey @IvoT

The defaults in ZFS are to limit a maximum of 10 TRIM commands in queue per leaf vdev - so with your 8x SSD's in RAIDZ2, you're averaging just over one TRIM command in queue per physical disk, and it's also aiming to aggregate your TRIMs across transaction groups as well. This is designed to limit the TRIM speed in order not to impact pool I/O.

Is there a way to rise that limit? I don't have a lot of workload on the pool so I prefer faster trim times than pool IO.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
@IvoT What does the output of zpool iostat -vq YourPoolName 5 show? The trimq_write columns should show how deep your current TRIM queues are.

Your iostat output also shows some fairly small dareq-sz (delete average request size) values of ~128K only - to contrast, I manually TRIMmed a set of four SSDs here and was getting much larger chunks with higher throughput.

Code:
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
dm-0             0.00      0.00     0.00   0.00    0.08    21.75    0.00      0.00     0.00   0.00    0.00     2.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1          0.20      0.00     0.00   0.00    0.23    21.23   30.60      0.10     0.00   0.00    0.04     3.45    0.00      0.00     0.00   0.00    0.00     0.00    0.42    1.45    0.00   0.36
sda              0.00      0.00     0.00   0.00    3.28    28.88    0.00      0.00     0.00   0.00    0.98    10.05    0.00      0.00     0.00   0.00    0.00     0.00    0.00    4.75    0.00   0.00
sdb              0.00      0.00     0.00   0.00    3.29    27.59    0.00      0.00     0.00   0.00    0.89    10.05    0.00      0.00     0.00   0.00    0.00     0.00    0.00    4.75    0.00   0.00
sdc              0.01      0.00     0.00   0.00    3.50    25.75    0.00      0.00     0.00   0.00    1.49    10.27    0.00      0.00     0.00   0.00    0.00     0.00    0.00   12.42    0.00   0.00
sdd              0.01      0.00     0.00   0.00    3.84    26.30    0.00      0.00     0.00   0.00    1.51    10.63    0.00      0.00     0.00   0.00    0.00     0.00    0.00   12.58    0.00   0.00
sde              0.06      0.00     0.00   0.00    1.38    20.33   17.97      0.12     0.04   0.25    0.06     6.98    0.27     22.90     0.00   0.20    1.11 85860.53    0.54    0.06    0.00   0.53
sdf              0.06      0.00     0.00   0.00    1.87    19.59   17.91      0.12     0.05   0.27    0.06     6.99    0.27     22.90     0.00   0.19    1.11 86084.19    0.54    0.06    0.00   0.53
sdg              0.06      0.00     0.00   0.00    0.25    19.19   17.97      0.12     0.05   0.25    0.06     6.98    0.27     22.90     0.00   0.19    1.12 86080.18    0.54    0.06    0.00   0.53
sdh              0.06      0.00     0.00   0.02    0.25    19.47   17.92      0.12     0.05   0.25    0.06     6.99    0.27     22.90     0.00   0.18    1.11 85844.60    0.54    0.06    0.00   0.53
sdi              0.00      0.00     0.00   0.00    3.35    26.74    0.00      0.00     0.00   0.00    1.47    10.14    0.00      0.00     0.00   0.00    0.00     0.00    0.00   11.58    0.00   0.00
sdj              0.00      0.00     0.00   0.00    3.81    27.19    0.00      0.00     0.00   0.00    1.45     9.99    0.00      0.00     0.00   0.00    0.00     0.00    0.00   12.50    0.00   0.00
sdk              0.00      0.00     0.00   0.00    3.83    27.53    0.00      0.00     0.00   0.60    1.62    10.40    0.00      0.00     0.00   0.00    0.00     0.00    0.00    5.00    0.00   0.00
sdl              0.01      0.00     0.00   0.00    3.11    26.27    0.00      0.00     0.00   0.59    1.07    10.11    0.00      0.00     0.00   0.00    0.00     0.00    0.00    7.33    0.00   0.00
zd0              0.00      0.00     0.00   0.00    0.00    21.75    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00


A virtualization workload does produce a lot of small records. I skimmed through the posts, but didn't see an answer - has there been a significant amount of write I/O to the system since the last manual TRIM?
 

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
Sure there was, my homelab is running on that and for the last few months it may have produced some 50-60TB of writes.
Here is the output of the command:
Code:
                                            capacity     operations     bandwidth    syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write  rebuildq_write
pool                                      alloc   free   read  write   read  write   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
SAS                                       12.7T  15.2T     42    699   249K  4.51M      0      0      0      0      0      0      0      0      0      0     63     16      0      0
  raidz2-0                                12.7T  15.2T     42    699   249K  4.51M      0      0      0      0      0      0      0      0      0      0     63     16      0      0
    547ba0f9-ba44-4f01-ad33-4bda61310bdd      -      -      5     81  30.4K   560K      0      0      0      0      0      0      0      0      0      0      8      2      0      0
    ae6f3e47-7d8e-45ff-b56c-0907c47b83bf      -      -      5     88  29.6K   577K      0      0      0      0      0      0      0      0      0      0      8      2      0      0
    555672f7-1f29-451d-8fad-820612d19d05      -      -      5     87  30.4K   573K      0      0      0      0      0      0      0      0      0      0      8      2      0      0
    fba71670-e645-4906-818b-20238c8f97fd      -      -      4     89  29.6K   584K      0      0      0      0      0      0      0      0      0      0      8      2      0      0
    5e41b921-b752-490d-bf6d-b79ecb9b1c22      -      -      5     90  34.4K   584K      0      0      0      0      0      0      0      0      0      0      8      2      0      0
    d1b7284c-ab83-498c-bad3-72ea74a7f6c8      -      -      5     89  31.2K   579K      0      0      0      0      0      0      0      0      0      0      8      2      0      0
    03efaaf3-eb40-492c-8f16-87c2b88f2aed      -      -      5     86  32.0K   586K      0      0      0      0      0      0      0      0      0      0      8      2      0      0
    b26c8881-71ed-40b7-95c4-e45df9b4d9e1      -      -      5     87  31.2K   577K      0      0      0      0      0      0      0      0      0      0      7      2      0      0
----------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Instructions below. If you're coming here from a search, future reader, bear in mind the impacts discussed as well rather than just applying a tunable blindly. :wink:

You can change the per-device TRIM limit with:

echo N >> /sys/module/zfs/parameters/zfs_vdev_trim_max_active

The default value is 2 but you can increase N up to the value of zfs_vdev_max_active - you'll with the understanding that increased TRIM/UNMAP activity will negatively impact pool I/O.

Because of your vdev drive count you'll also possibly have to bump up the per-vdev limit:
echo N >> /sys/module/zfs/parameters/zfs_trim_queue_limit

Again - more TRIM means less actual I/O. Increase gradually, monitor your iostat and zpool iostat as well as general application latencies.
 

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
I've tried the tunables, but nothing major happend, beside the trimq_write pending dropped to 0 and maybe 2-3 dMB/s faster . I've tried a few values and it seems the zfs_trim_queue_limit does noting in reality, as long as you have zfs_vdev_trim_max_active at least the same as zfs_trim_queue_limit the trimq_write pending goes to 0 and that's it. In my case it seems something is not sending discards fast enough. Something is not letting per-vdev active to go above 4 after pending goes to 0.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
You mentioned you have one of these SSDs in another Linux machine and it trims significantly faster (400-500MB/s) - presumably using a different filesystem. Is there any chance you can collect some iostat -mx dumps from that one as well, to compare the delete column metrics?

Also, if you can show an hdparm -I /dev/sdX output, specifically with regards to the reported logical/physical sector size (since it's a Samsung, I expect both to report as 512b) and the number of TRIM blocks supported (under the "Data Set Management TRIM supported (limit 8 blocks)"
 

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
I will try with the other system at some point, because I don't have access to it right now. Here is the output of hdparm, but this is a SAS drive, so it may not be accurate:
Code:
/dev/sdb:

ATA device, with non-removable media
Standards:
    Likely used: 1
Configuration:
    Logical        max    current
    cylinders    0    0
    heads        0    0
    sectors/track    0    0
    --
    Logical/Physical Sector size:           512 bytes
    device size with M = 1024*1024:           0 MBytes
    device size with M = 1000*1000:           0 MBytes 
    cache/buffer size  = unknown
Capabilities:
    IORDY not likely
    Cannot perform double-word IO
    R/W multiple sector transfer: not supported
    DMA: not supported
    PIO: pio0 

Smartctl reports this:
Logical block size: 512 bytes
Physical block size: 4096 bytes

Where to get the "Data Set Management TRIM supported (limit 8 blocks)"?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Right, SAS vs SATA.

Try sdparm /dev/sdb -p bl | grep unmap

Code:
  Maximum unmap LBA count: -1 [unbounded]
  Maximum unmap block descriptor count: -1 [unbounded]
  Optimal unmap granularity: 8 blocks


Edit: While we're here, how about sdparm /dev/sdb --get WCE to see if write caching is enabled?
 

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
Maximum unmap LBA count: -1 [unbounded]
Maximum unmap block descriptor count: -1 [unbounded]
Optimal unmap granularity: 16 blocks

WCE 1 [cha: y, def: 1, sav: 1]
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
This could be a result of a fragmented pool - if it has to unmap in tiny little blocks rather than the optimal granularity (double my drive in question) your drives may have to spend more time doing internal housekeeping to ensure they keep the valuable information while only zapping that which is supposed to be blanked out.

How far has it progressed thus far; and the real test will be "if you TRIM again after a day or two, will it then go much faster?"
 

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
Currently it is at 71% trimmed. I will retry after a day or two, but I don't think it will be faster. To me it seems like for some reason truenas is not sending enough unmap requests to the pool.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Currently it is at 71% trimmed. I will retry after a day or two, but I don't think it will be faster. To me it seems like for some reason truenas is not sending enough unmap requests to the pool.
It may be something specific to your drives (PM1643) as it doesn't seem to have an issue pushing UNMAPs to my SSDs - although not at the same speed as the 400-500MB/s you mentioned, but mine are both in-use and not quite the same speed as your Samsungs. Might be worth submitting a bug/Jira ticket for this, and including a debug to indicate the slow TRIM performance even when manually launched.
 

IvoT

Dabbler
Joined
Dec 19, 2023
Messages
15
It finished. Started another one, same thing - goes as slow as it was before.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Hey @IvoT

The defaults in ZFS are to limit a maximum of 10 TRIM commands in queue per leaf vdev - so with your 8x SSD's in RAIDZ2, you're averaging just over one TRIM command in queue per physical disk, and it's also aiming to aggregate your TRIMs across transaction groups as well. This is designed to limit the TRIM speed in order not to impact pool I/O.

SAS drives do better with discards than SATA, so you may be able to use autotrim in your situation, and simply let the VMFS layer pass the UNMAP commands down the chain (VMware can also rate-limit the discard speed)

I need to make a longer effortpost on TRIM in general. Someone nag me if I haven't done it in a reasonable amount of time. :wink:
NAG

:smile:
 
Top