After Reboot, HBA keep getting reset and pool become unavail

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
It may well be middleware, but it is wrapped in a systemd unit/service, or was yesterday when I was observing a boot. I think it is generally a good system, but needs reigning in at times.

I've looked at the md devices constructed, and at the dmsetup tables for the encryption layer, and even scripted something equivalent for more swap on a TN system so a VM could have a larger virtual address space.

Code:
# /lib/systemd/system/ix-swap.service
[Unit]
Description=Configure swap filesystem on boot pool
DefaultDependencies=no

Before=network-pre.target

After=middlewared.service
Before=local-fs.target

[Service]
Type=oneshot
RemainAfterExit=yes
TimeoutStartSec=300
ExecStart=midclt call disk.swaps_configure
StandardOutput=null

[Install]
WantedBy=multi-user.target
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I just wanted to point out that the code executed is midclt call disk.swaps_configure - so you need to look in the TrueNAS source rather than systemd documentation to find out how to prevent setting up swap.

Out of curiosity: why would one want to run a system without any swap at all?
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
Thanks, I realize it is not a native systemd unit, and I saw the midctl call, however I had already reverse engineered the details from observing the created md and dm devices, and from the a posting on this on this board a while ago discussing the process.

I always like to have at least some swap, gives the kernel room to move things around.
If creating so much swap in such a way that it creates an IO/CPU spike on boot, then something needs to change.
I've also been told that sometimes there are swap devices created which are mirrors between flash and rust, also to be avoided I think.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
I don't have enough information about your system to tell you what's different other than the swap partitions not being created (which you can put a stop to if you really want by setting that number from 2 to 0 using the API call... midclt call system.advanced.update '{"swapondrive": 0}')
I just wanted to point out that the code executed is midclt call disk.swaps_configure - so you need to look in the TrueNAS source rather than systemd documentation to find out how to prevent setting up swap.

Out of curiosity: why would one want to run a system without any swap at all?

Hello Sretalla and Patrick.

On a clean setup I disabled the swap creation with:
Code:
root@st02:~# midclt call system.advanced.update '{"swapondrive": 0}'

{"id": 1, "consolemenu": true, "serialconsole": false, "serialport": "ttyS0", "serialspeed": "9600", "powerdaemon": false, "swapondrive": 0, "overprovision": null, "traceback": true, "advancedmode": false, "autotune": false, "debugkernel": false, "uploadcrash": true, "anonstats": true, "anonstats_token": "", "motd": "Welcome to TrueNAS", "boot_scrub": 7, "fqdn_syslog": false, "sed_user": "USER", "sysloglevel": "F_INFO", "syslogserver": "", "syslog_transport": "UDP", "kdump_enabled": false, "isolated_gpu_pci_ids": [], "kernel_extra_options": "", "syslog_tls_certificate": null, "syslog_tls_certificate_authority": null, "consolemsg": false}


I created a raidz_2 pool with only ssd drives (there was no swap partitions. ) I writed 1 TB of data with fio and had no issue.
I rebooted the servers and I did not see any problem

I created one more raidz_2 pool with only hdd drives (there was no swap partitions. ) I writed 1 TB of data with fio and had no issue.

I rebooted the servers and during booting seq, HBA reset occured and HDD pool become unavail, the SSD pool is fine.

Dmesg and zpool status output attached.

If I run zpool clear then this is the result: (unavail drives are just random, it changes every reboot)

Code:
  pool: hddtest
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ
  scan: resilvered 288K in 00:00:01 with 0 errors on Sun Jul  9 10:14:41 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        hddtest                                   UNAVAIL      0     0     0  insufficient replicas
          raidz2-0                                UNAVAIL      0     0     0  insufficient replicas
            632ee37b-abf1-4d5d-8bd0-2efd927cf109  REMOVED      0     0     0
            7c80113e-a5b5-4290-b303-f6f2b6f7ce7f  REMOVED      0     0     0
            36f31d95-9e20-4f0e-9717-986a784a6b65  REMOVED      0     0     0
            3dfd5d37-be5f-4d0f-92ae-74aa137e4cd4  REMOVED      0     0     0
            c6482789-120d-4a57-a732-fd1279e2ebc2  REMOVED      0     0     0
            d32bafb9-b076-4cce-8421-310429c5d21a  REMOVED      0     0     0
            d115d997-0c9b-4031-8df5-b0c80ae4593e  REMOVED      0     0     0
            91a2cc2b-bc27-47f6-b127-018d41645a67  REMOVED      0     0     0
            03453b32-5793-4113-9ebd-64723a51b017  REMOVED      0     0     0
            e34d0917-70a4-4b00-8a99-34ce29c8e3ac  REMOVED      0     0     0
            a0b87b9c-2e68-4f74-ae61-f0250f2444ec  REMOVED      0     0     0
            4422dcf4-3dc2-41d4-800e-aaa3b1b4c5c5  ONLINE       0     4     0
            64bb4e1d-23e7-47be-ae78-42ff30dddb47  REMOVED      0     0     0
            71e226df-18d5-4ccf-ac1b-64dd69c49c9b  REMOVED      0     0     0
          raidz2-1                                ONLINE       0     0     0
            574761ca-a585-4a9d-ac2f-a9e8051ba5dd  ONLINE       0     0     0
            60bee690-1b9c-41e2-9ad7-c94a72411212  ONLINE       0     0     0
            cd04703b-d694-47bc-88a4-add49c01907c  ONLINE       0     0     0
            404c0baa-9f40-45b4-984d-92c75e03b94d  ONLINE       0     0     0
            89aa2e6c-b1dc-4141-814b-fd4f0acf5d85  ONLINE       0     0     0
            bd369b70-df3f-478e-b51e-313a0f9d0e2c  ONLINE       0     0     0
            a20862c1-7f7e-4dc0-b261-8485793bb170  ONLINE       0     0     0
            a997b627-aa70-4c51-ae25-b9f8adf519f8  ONLINE       0     0     0
            7bd6210b-b670-4520-aadc-ea5923825993  ONLINE       0     0     0
            9c8f9597-9817-476d-b063-b8483646781b  ONLINE       0     0     0
            6fc6792d-f095-4768-865a-1a5be5e9c636  ONLINE       0     0     0
            5341cce4-6f16-461b-a952-9e5737430037  ONLINE       0     0     0
            f4a6e721-b119-4476-928f-24eaaa9d7af6  ONLINE       0     0     0
            2a2364c1-5273-46ad-9f85-e27ca227ee9a  ONLINE       0     0     0

errors: No known data errors


After few zpool clear command, zfs found the drives and the pool becomes re-usable again.
Remember! I have 2x UCS S3260 server and both server has the same issue. The HDD drives are brand-new.

Any idea?

Note: I don't have this problem if I manually import the pool if I boot as "initial install" so the problem is something else.
How can I disable zpool auto-import on Truenas?
 

Attachments

  • dmesg-after-reboot.txt
    440.9 KB · Views: 186
Last edited:

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
@sretalla @Patrick M. Hausen @Arwen
I've carefully read the dmesg and let me tell you the story.

At boot sequence, the first drive related logs starts like this: (does not contain any error but maybe you can see something, it should not exist, so lets check)

Part 1:
After this point, kernel module sd_mod -> "SCSI disk support" starts checking the drives and attaches: (no error but I didn't like it, as if there is something strange. )

Part 2:

After 45 seconds, the sd_mod "SCSI disk support" starting to do something, I don't know why and I want to learn.
Because of this or not, we are starting to see some problems and mpt3sas kicks in and handles the removing.
Who removed? I don't know.
[Sun Jul 9 09:56:20 2023] mpt3sas_cm1: removing handle(0x0012), sas_addr(0x5000c500cac022ca)

Part 3: https://gist.github.com/ozkangoksu/e58c07eaf672c5e9cae336560d3a9367

At the end of part 3, finally the pool is suspended by zfs because of the absense of the drives:
- [Sun Jul 9 09:56:24 2023] WARNING: Pool 'hddtest' has encountered an uncorrectable I/O failure and has been suspended.


After all these, the SAS_HDD pool is suspended but the SAS_SSD pool is working without any problem.
The SAS_HDD's are "SEAGATE ST4000NM017A" and they are brand new. I checked all of the drives with smartctl and all passed the long-self-test.
The problem only occurs at boot sequence, I can manually import the pool and use without any issue.
If you read this reply, you learned the whole story line by line.

I think someone fighting with the drives or "sd_mod tries to get some information but fails and gets mad about it.
Code:
[Sun Jul  9 09:56:20 2023] sd 6:0:5:0: [sdh] tag#4609 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=0s
[Sun Jul  9 09:56:20 2023] sd 6:0:5:0: [sdh] tag#4609 CDB: Write(16) 8a 00 00 00 00 00 01 6d d7 08 00 00 00 08 00 00
[Sun Jul  9 09:56:20 2023] blk_update_request: I/O error, dev sdh, sector 23975688 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0

After that mpt3sas kicks-in and removes the drive interface connection and regenerates to solve the problem:

Code:
[Sun Jul  9 09:56:21 2023] mpt3sas_cm1: mpt3sas_transport_port_remove: removed: sas_addr(0x5000c500cabfc96a)
[Sun Jul  9 09:56:21 2023] mpt3sas_cm1: removing handle(0x0014), sas_addr(0x5000c500cabfc96a)
[Sun Jul  9 09:56:21 2023] mpt3sas_cm1: enclosure logical id(0x540f078d86014000), slot(3)
[Sun Jul  9 09:56:21 2023] mpt3sas_cm1: enclosure level(0x0000), connector name(     )


Maybe these mpt3sas logs can give an idea about the problem. I will check after I post this.
Code:
[Sun Jul  9 09:56:16 2023] sd 6:0:32:0: device_block, handle(0x0032)
[Sun Jul  9 09:56:16 2023] sd 6:0:33:0: device_block, handle(0x0033)
[Sun Jul  9 09:56:16 2023] ses 6:0:34:0: _scsih_block_io_device skip device_block for SES handle(0x0034)
[Sun Jul  9 09:56:16 2023] ses 6:0:34:0: _scsih_block_io_device skip device_block for SES handle(0x0034)
[Sun Jul  9 09:56:16 2023] ses 6:0:34:0: _scsih_block_io_device skip device_block for SES handle(0x0034)
[Sun Jul  9 09:56:16 2023] ses 6:0:34:0: _scsih_block_io_device skip device_block for SES handle(0x0034)
[Sun Jul  9 09:56:16 2023] mpt3sas_cm1: log_info(0x3112010c): originator(PL), code(0x12), sub_code(0x010c)
[Sun Jul  9 09:56:16 2023] mpt3sas_cm1: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)



Any idea folks?
What is the problem?
 
Last edited:

samarium

Contributor
Joined
Apr 8, 2023
Messages
192

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
UCS S3260
Ultimately, I think it's going to boil down to this:

That card and the LSI chip on it don't have a lot of hours of testing on them in this environment and the firmware hasn't likely been proven either as a result.

I don't know what the likely cause is, but it may be the queue depth point mentioned already or something else.

Rather than spending the time (and hundreds of dollars it could be equated to) trying to work this out in detail, why wouldn't you just spend a few hundred on cards that are well tested and known to work?
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
Ultimately, I think it's going to boil down to this:

That card and the LSI chip on it don't have a lot of hours of testing on them in this environment and the firmware hasn't likely been proven either as a result.

I don't know what the likely cause is, but it may be the queue depth point mentioned already or something else.

Rather than spending the time (and hundreds of dollars it could be equated to) trying to work this out in detail, why wouldn't you just spend a few hundred on cards that are well tested and known to work?

I totaly agree with you.
Past 10 years, I never see an issue like this. This HBA an OEM so I don't know what they changed or tuned.
I tried to find an official firmware for this card but I couldn't find.
I think I have to compare the mpt3sas kernel parameters with default parameters, maybe I can find the difference.

At this point, I spent 3 days on this issue and one way or another, I want to find the root cause to learn.
Even if I'm gonna change the HBA card, I can not leave with unsolved issue as an engineer :)

My new route:
1- Find a new HBA card and it should be compatible with:
UCS S3260 Dual Pass Through Controller based on Broadcom 3316 ROC - UCS-S3260-DHBA
mpt3sas_cm0: LSISAS3316: FWVersion(13.00.08.00), ChipRevision(0x01), BiosVersion(15.00.06.00)
mpt3sas_cm1: LSISAS3316: FWVersion(13.00.08.00), ChipRevision(0x01), BiosVersion(15.00.06.00)

Looks like "LSI SAS 9300-16I 12GB/S HBA" is similar and good choice. What do you think?

As you know our HBA cards are oem and I don't know their form and connection and I'm not able to touch the hardware,

But after a quick search, I found these pictures. Looks like the HBA card in a special form and I don't know can we change it or not.
hba_card_pci_slot.png


ucs_s3260m5_scality_design_7.png


I only have 1 PCI experss slot and it is using by NVME cache drive. Maybe I will sacrafice the NVME and use the PCI slot for 9300-16i HBA card. Also I don't know the internal cabling, maybe I have to change the cables to use PCI slot location.
Unfortunally, it is not an easy change for me.


2- I will compare and try mpt3sas kernel parameters to solve this issue.

I recalled seeing something about an IO overload, and limiting the concurrent IO, this is I think the issue, I wonder if it is related here?

https://forum.proxmox.com/threads/s...-issue-with-lsi-2008-controllers.93781/page-2

Or 10000 seems popular. My SAS2008 runs with max_queue_depth=-1 == default, but I don't have many disks.
2008/3xxx both use mpt3sas driver.

Thanks samarium, I will first learn my "mpt3sas.max_queue_depth" value and after that I will change as "mpt3sas.max_queue_depth=10000" and give a try.

Is there a default mpt3sas parameter list for truenas? I want to compare.

These are my default values:
root@st02[/sys/module/mpt3sas/parameters]# for i in $(ls); do echo $i; cat $i; done
diag_buffer_enable = -1
disable_discovery = -1
enable_sdev_max_qd = N
hbas_to_enumerate = 0
host_tagset_enable = 1
irqpoll_weight = -1
logging_level = 0
max_lun = 16895
max_msix_vectors = -1
max_queue_depth = -1
max_sectors = 65535
max_sgl_entries = -1
missing_delay = -1,-1
mpt3sas_fwfault_debug = 0
msix_disable = -1
perf_mode = -1
poll_queues = 0
prot_mask = -1
smp_affinity_enable = 1
 
Last edited:

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
Thanks samarium, I will first learn my "mpt3sas.max_queue_depth" value and after that I will change as "mpt3sas.max_queue_depth=10000" and give a try.

Is there a default mpt3sas parameter list for truenas? I want to compare.

These are my default values:
root@st02[/sys/module/mpt3sas/parameters]# for i in $(ls); do echo $i; cat $i; done
diag_buffer_enable = -1
disable_discovery = -1
enable_sdev_max_qd = N
hbas_to_enumerate = 0
host_tagset_enable = 1
irqpoll_weight = -1
logging_level = 0
max_lun = 16895
max_msix_vectors = -1
max_queue_depth = -1
max_sectors = 65535
max_sgl_entries = -1
missing_delay = -1,-1
mpt3sas_fwfault_debug = 0
msix_disable = -1
perf_mode = -1
poll_queues = 0
prot_mask = -1
smp_affinity_enable = 1
I'm a litte lazier than you:
(cd /sys/module/mpt3sas/parameters && grep . *)
but I get the same default -1 for the max_queue_depth, but I'm on SAS2008 anyway. I thought enable_sdev_max_qd might be an interesting parameter too. https://patchwork.kernel.org/projec...-git-send-email-sreekanth.reddy@broadcom.com/

I think the default for max_queue_depth is the driver default, and it only gets changed, like most parameters, if it needs changing. Conveniently, the driver default is defined close to the top https://github.com/torvalds/linux/blob/master/drivers/scsi/mpt3sas/mpt3sas_base.c as 30000, so 10000 would be significantly less.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
@sretalla @Patrick M. Hausen @Arwen

You not gonna believe me but I found the guilty!!!

The root cause is the "ix-zfs.service"

1- I disabled the service, rebooted, pools are not imported and we dont have any problem.
2- I imported the pools manualy with "zpool import $poolname" everyhing was fine, no problem.
3- I rebooted server again, pools are not imported and we dont have any problem. I started the "systemctl start ix-zfs.service" and everything started to crash again. Everything is similiar. I'm %100 sure the root cause is this service.
4- I rebooted server again, I imported pools manually, no problem. I started the "systemctl start ix-zfs.service" and again no problem...

The problem only happens if I let ix-zfs.service to import pool... o_O

Actually

Now I'm going to learn everything about this service and find the exact function who causes this problem.
If you dont believe me. I created 3 dmesg output, check yourself...

This is the service file:

Code:
cat ix-zfs.service
[Unit]
Description=Import ZFS pools
DefaultDependencies=no
Before=network-pre.target
Before=local-fs.target
After=middlewared.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=-midclt call disk.sed_unlock_all
ExecStart=midclt call -job --job-print description pool.import_on_boot
StandardOutput=null
TimeoutStartSec=15min

[Install]
WantedBy=multi-user.target
 

Attachments

  • dmesg-auto-import-disabled.txt
    223.2 KB · Views: 174
  • dmesg-ix-zfs-disabled-pools-manual-imported.txt
    261.3 KB · Views: 172
  • dmesg-ix-zfs-started-manually.txt
    365.6 KB · Views: 177
Last edited:

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
@sretalla @Patrick M. Hausen @Arwen

You not gonna believe me but I found the guilty!!!

The root cause is the "ix-zfs.service"

1- I disabled the service, rebooted, pools are not imported and we dont have any problem.
2- I imported the pools manualy with "zpool import $poolname" everyhing was fine, no problem.
3- I rebooted server again, pools are not imported and we dont have any problem. I started the "systemctl start ix-zfs.service" and everything started to crash again. Everything is similiar. I'm %100 sure the root cause is this service.
4- I rebooted server again, I imported pools manually, no problem. I started the "systemctl start ix-zfs.service" and again no problem... So the problem happens when this service triggers "midclt call disk.sed_unlock_all", now I started to play around with these system calls. if its easy to you can you send the code block for this call ? I want to read...

The problem only happens if I let ix-zfs.service to import pool... o_O

Now I'm going to learn everything about this service and find the exact function who causes this problem.
If you dont believe me. I created 3 dmesg output, check yourself...

This is the service file:

Code:
cat ix-zfs.service
[Unit]
Description=Import ZFS pools
DefaultDependencies=no
Before=network-pre.target
Before=local-fs.target
After=middlewared.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=-midclt call disk.sed_unlock_all
ExecStart=midclt call -job --job-print description pool.import_on_boot
StandardOutput=null
TimeoutStartSec=15min

[Install]
WantedBy=multi-user.target
Okay, but isn't it more likely that the problem is that your controllers misbehave when placed under pressure? Something that was first illustrated by them failing the moment md touched the disks and now during zfs import?

The errors in your boot dmesg show a plethora of read and write errors resulting in the drives getting booted. To me, that suggests that time is best spent focusing on the RAID controllers and maybe cabling/backplane. Some RAID cards can be fully reflashed to run in IT-mode as if they were a typical HBA card. I don't trust your cards actually being capable of that, I think they're misrepresenting their status.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
Okay, but isn't it more likely that the problem is that your controllers misbehave when placed under pressure? Something that was first illustrated by them failing the moment md touched the disks and now during zfs import?

The errors in your boot dmesg show a plethora of read and write errors resulting in the drives getting booted. To me, that suggests that time is best spent focusing on the RAID controllers and maybe cabling/backplane. Some RAID cards can be fully reflashed to run in IT-mode as if they were a typical HBA card. I don't trust your cards actually being capable of that, I think they're misrepresenting their status.
I understand but, I can not touch the servers, the HBA card is OEM and I can not use the same place for a new HBA.
The only place is 1 PCI and it means I have to lose NVME cache drive.
Thats why I'm pushing hard.

Also I don't believe this problem occurs because of STRESS. I tried to write data with 20 parallel process, the pool even did not flick...
So the problem is the import_on_boot function does something and I'm starting to read the code right now and one way or another, I want to learn CAUSE.


I'm on fire baby, whoohooo :cool:
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
I understand but, I can not touch the servers, the HBA card is OEM and I can not use the same place for a new HBA.
The only place is 1 PCI and it means I have to lose NVME cache drive.
Thats why I'm pushing hard.

Also I don't believe this problem occurs because of STRESS. I tried to write data with 20 parallel process, the pool even did not flick...
So the problem is the import_on_boot function does something and I'm starting to read the code right now and one way or another, I want to learn CAUSE.


I'm on fire baby, whoohooo :cool:
The moment you sidestep fix one problem another stands in line further down the road.
You're unlikely to resolve this by treating the symptoms.

Maybe you have no other option than to work with the equipment at hand, but be realistic; there's a real risk of dataloss here.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
The moment you sidestep fix one problem another stands in line further down the road.
You're unlikely to resolve this by treating the symptoms.

Maybe you have no other option than to work with the equipment at hand, but be realistic; there's a real risk of dataloss here.

When I meet this hardware, The first thing I said: sold these, we can get better option with lower price.
But this is not an option, I accepted this and started to change PARTS.
I really want to get rid of this OEM old firmware HBA solution, but this is not look just an oem, As I understand from the pictures (I've sended earlier) the cards in different forms and uses special place.

The question is:
1- Can we fit two HBA in their slots ???
2- Is it possible to use the ONLY ONE PCI x8 slot for HBA, card. This means we sucraficed nvme cache...

This is not a normal server. This server has 2 compute unit, it is like blade server. There is too many tricks the vendor have done with this hardware. I use ZFS more then 10 years I never lost even 1 byte...
The Benchmark result is good, I did not see any problem except reboot problem man...
So I'm trying to solve this problem because this is the last piece of this puzzle.
When I solve this problem, this project will be over for me.

So be sure of that, while trying to solve the function, also I'm trying to find a way to change the HBA's.

The real question is, one way or another. The service generates weird problems. Why you are not curious about this?
This is a serious issue for everyone and if we solve this, everyone will be a little bit safer.


COMMUNITY. I DONT HAVE A "READY" DEBUG ENVIRONMENT TO TEST THIS ISSUE AND I'M "C PERSON".
TO CREATE DEBUG ENVIRONMENT WITH PYTHON, I HAVE TO SPEND 1-2 HOURS.
IF YOU HAVE A DEBUG ENVIRONMENT AND EXPERIENCED ON TRUENAS, PLEASE HELP ME FOR AN HOUR TO RUN THIS FUNCTION IN STEP-BY-STEP DEBUG MODE AND FIND THE ROOT CAUSE. THANK YOU.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@morphin, I would suggest not performing the pool import using zpool. Use the NAS's CLI or GUI instead. By using the Unix command line, the NAS software does not start the services that use the pool.

I don't know the TrueNAS SCALE CLI, but it is my understanding you can launch a session from a Unix shell. Or even script your pool import.
 

morphin

Dabbler
Joined
Jun 27, 2023
Messages
31
@morphin, I would suggest not performing the pool import using zpool. Use the NAS's CLI or GUI instead. By using the Unix command line, the NAS software does not start the services that use the pool.

I don't know the TrueNAS SCALE CLI, but it is my understanding you can launch a session from a Unix shell. Or even script your pool import.

Hello Arwen. I understand that. And we talked about this. I used zpool directly only for test purpose.
My goal is FİX this issue and regain the auto-import feature.

At this moment. I'm trying to understand which call in "import_on_boot" function, generates this problem. My goal is fixing this issue and re-gain the auto import feature. Until that time, I disabled the ix-zfs.service and I have the freedom of, can import anyway that I want. I'm going to hand-over this system who can able to click buttons. Don't worry about this anymore please.
Code:
    
    def import_on_boot(self, job):
        cachedir = os.path.dirname(ZPOOL_CACHE_FILE)
        if not os.path.exists(cachedir):
            os.mkdir(cachedir)

        if self.middleware.call_sync('failover.licensed'):
            return

        zpool_cache_saved = f'{ZPOOL_CACHE_FILE}.saved'
        if os.path.exists(ZPOOL_KILLCACHE):
            with contextlib.suppress(Exception):
                os.unlink(ZPOOL_CACHE_FILE)
            with contextlib.suppress(Exception):
                os.unlink(zpool_cache_saved)
        else:
            with open(ZPOOL_KILLCACHE, 'w') as f:
                os.fsync(f)

        try:
            stat = os.stat(ZPOOL_CACHE_FILE)
            if stat.st_size > 0:
                copy = False
                if not os.path.exists(zpool_cache_saved):
                    copy = True
                else:
                    statsaved = os.stat(zpool_cache_saved)
                    if stat.st_mtime > statsaved.st_mtime:
                        copy = True
                if copy:
                    shutil.copy(ZPOOL_CACHE_FILE, zpool_cache_saved)
        except FileNotFoundError:
            pass

        job.set_progress(0, 'Beginning pools import')
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Let rewind the clock here a bit, shall we?
You've identified several symptoms of the problem, most of which surround middleware.

ix-zfs.service?

All it does is import the pool on boot. It literally exists for that function. So disabling it on your system of course will result in the pool not importing on boot up...

But what we have not done yet is isolate the root cause of the problem. Speculation about the SAS HBA is validity is probably part of the conversation here. "9361-16i" or "SAS3316" haven't even made it onto first party hardware yet. I have a TrueNAS M50 I bought on eBay, which I believe is the current generation, and it uses a SAS3216.

This output indicates that the OS is seeing all of the drives properly enumerated. I do notice that SCSI device 6:0:28:0 reports as "enclosu Cisco C3260" so a SAS expander seems to also be in play, and given that we have 60 drives that's no surprise.
Code:
admin@st01[~]$ lsscsi -s
[0:0:0:0]    enclosu Cisco    C3260            2     -               -
[0:0:1:0]    enclosu Cisco    C3260            2     -               -
[1:0:0:0]    disk    ATA      Samsung SSD 870  2B6Q  /dev/sda    500GB (OS  drive from different sata controller)
[2:0:0:0]    disk    ATA      Samsung SSD 870  2B6Q  /dev/sdb    500GB (OS  drive from different sata controller)
[5:0:0:0]    enclosu AHCI     SGPIO Enclosure  2.00  -               -
[6:0:0:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdc   4.00TB
[6:0:1:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdd   4.00TB
[6:0:2:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sde   4.00TB
[6:0:3:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdf   4.00TB
[6:0:4:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdg   4.00TB
[6:0:5:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdh   4.00TB
[6:0:6:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdi   4.00TB
[6:0:7:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdj   4.00TB
[6:0:8:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdk   4.00TB
[6:0:9:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdl   4.00TB
[6:0:10:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdm   4.00TB
[6:0:11:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdn   4.00TB
[6:0:12:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdo   4.00TB
[6:0:13:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdp   4.00TB
[6:0:14:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdq   4.00TB
[6:0:15:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdr   4.00TB
[6:0:16:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sds   4.00TB
[6:0:17:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdt   4.00TB
[6:0:18:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdu   4.00TB
[6:0:19:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdv   4.00TB
[6:0:20:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdw   4.00TB
[6:0:21:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdx   4.00TB
[6:0:22:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdy   4.00TB
[6:0:23:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdz   4.00TB
[6:0:24:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaa  4.00TB
[6:0:25:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdab  4.00TB
[6:0:26:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdac  4.00TB
[6:0:27:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdad  4.00TB
[6:0:28:0]   enclosu Cisco    C3260            2     -               -
[6:0:29:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdae  4.00TB
[6:0:30:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaf  4.00TB
[6:0:31:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdag  4.00TB
[6:0:32:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdah  4.00TB
[6:0:33:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdai  4.00TB
[6:0:34:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaj  4.00TB
[6:0:35:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdak  4.00TB
[6:0:36:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdal  4.00TB
[6:0:37:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdam  4.00TB
[6:0:38:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdan  4.00TB
[6:0:39:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdao  4.00TB
[6:0:40:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdap  4.00TB
[6:0:41:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaq  4.00TB
[6:0:42:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdar  4.00TB
[6:0:43:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdas  4.00TB
[6:0:44:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdat  4.00TB
[6:0:45:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdau  4.00TB
[6:0:46:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdav  4.00TB
[6:0:47:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaw  4.00TB
[6:0:48:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdax  4.00TB
[6:0:49:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sday  4.00TB
[6:0:50:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaz  4.00TB
[6:0:51:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdba  4.00TB
[6:0:52:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbb  4.00TB
[6:0:53:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbc  4.00TB
[6:0:54:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbd  4.00TB
[6:0:55:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbe  4.00TB
[6:0:56:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbf  4.00TB


Let's see if
Code:
camcontrol devlist

Gives us a bit more to work on.

I also noticed there was conversations about manually creating the pool using a non-standard partition layout. partuuid is used for a reason, especially in larger pools.

Before we continue down these rabbit holes: Stop and rebase a second here.
The particular card in question is an https://www.broadcom.com/products/storage/raid-on-chip/sas-3316
That is FreeBSD 13.1, and even 4 newer generations are supported.
These controllers are supported by the mpr driver:
  • Broadcom Ltd./Avago Tech (LSI) SAS 3004 (4 Port SAS)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3008 (8 Port SAS)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3108 (8 Port SAS)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3216 (16 Port SAS)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3224 (24 Port SAS)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3316 (16 Port SAS)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3324 (24 Port SAS)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3408 (8 Port SAS/PCIe)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3416 (16 Port SAS/PCIe)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3508 (8 Port SAS/PCIe)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3516 (16 Port SAS/PCIe)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3616 (16 Port SAS/PCIe)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3708 (8 Port SAS/PCIe)
  • Broadcom Ltd./Avago Tech (LSI) SAS 3716 (16 Port SAS/PCIe)
Current for Broadcom is SAS3816, I believe, not that you'll see it anywhere yet AKAIK.
Can you upload the contents of the file /var/log/messages and /var/log/middlewared.log in text files?
OP please stop fitzing with middleware you are ONLY going to make things worse.
 

neofusion

Contributor
Joined
Apr 2, 2022
Messages
159
Let rewind the clock here a bit, shall we?
You've identified several symptoms of the problem, most of which surround middleware.

ix-zfs.service?

All it does is import the pool on boot. It literally exists for that function. So disabling it on your system of course will result in the pool not importing on boot up...

But what we have not done yet is isolate the root cause of the problem. Speculation about the SAS HBA is validity is probably part of the conversation here. "9361-16i" or "SAS3316" haven't even made it onto first party hardware yet. I have a TrueNAS M50 I bought on eBay, which I believe is the current generation, and it uses a SAS3216.

This output indicates that the OS is seeing all of the drives properly enumerated. I do notice that SCSI device 6:0:28:0 reports as "enclosu Cisco C3260" so a SAS expander seems to also be in play, and given that we have 60 drives that's no surprise.
Code:
admin@st01[~]$ lsscsi -s
[0:0:0:0]    enclosu Cisco    C3260            2     -               -
[0:0:1:0]    enclosu Cisco    C3260            2     -               -
[1:0:0:0]    disk    ATA      Samsung SSD 870  2B6Q  /dev/sda    500GB (OS  drive from different sata controller)
[2:0:0:0]    disk    ATA      Samsung SSD 870  2B6Q  /dev/sdb    500GB (OS  drive from different sata controller)
[5:0:0:0]    enclosu AHCI     SGPIO Enclosure  2.00  -               -
[6:0:0:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdc   4.00TB
[6:0:1:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdd   4.00TB
[6:0:2:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sde   4.00TB
[6:0:3:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdf   4.00TB
[6:0:4:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdg   4.00TB
[6:0:5:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdh   4.00TB
[6:0:6:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdi   4.00TB
[6:0:7:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdj   4.00TB
[6:0:8:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdk   4.00TB
[6:0:9:0]    disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdl   4.00TB
[6:0:10:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdm   4.00TB
[6:0:11:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdn   4.00TB
[6:0:12:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdo   4.00TB
[6:0:13:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdp   4.00TB
[6:0:14:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdq   4.00TB
[6:0:15:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdr   4.00TB
[6:0:16:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sds   4.00TB
[6:0:17:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdt   4.00TB
[6:0:18:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdu   4.00TB
[6:0:19:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdv   4.00TB
[6:0:20:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdw   4.00TB
[6:0:21:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdx   4.00TB
[6:0:22:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdy   4.00TB
[6:0:23:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdz   4.00TB
[6:0:24:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaa  4.00TB
[6:0:25:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdab  4.00TB
[6:0:26:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdac  4.00TB
[6:0:27:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdad  4.00TB
[6:0:28:0]   enclosu Cisco    C3260            2     -               -
[6:0:29:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdae  4.00TB
[6:0:30:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaf  4.00TB
[6:0:31:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdag  4.00TB
[6:0:32:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdah  4.00TB
[6:0:33:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdai  4.00TB
[6:0:34:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaj  4.00TB
[6:0:35:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdak  4.00TB
[6:0:36:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdal  4.00TB
[6:0:37:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdam  4.00TB
[6:0:38:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdan  4.00TB
[6:0:39:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdao  4.00TB
[6:0:40:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdap  4.00TB
[6:0:41:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaq  4.00TB
[6:0:42:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdar  4.00TB
[6:0:43:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdas  4.00TB
[6:0:44:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdat  4.00TB
[6:0:45:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdau  4.00TB
[6:0:46:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdav  4.00TB
[6:0:47:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaw  4.00TB
[6:0:48:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdax  4.00TB
[6:0:49:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sday  4.00TB
[6:0:50:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdaz  4.00TB
[6:0:51:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdba  4.00TB
[6:0:52:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbb  4.00TB
[6:0:53:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbc  4.00TB
[6:0:54:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbd  4.00TB
[6:0:55:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbe  4.00TB
[6:0:56:0]   disk    TOSHIBA  MG04SCA40EN      5705  /dev/sdbf  4.00TB


Let's see if
Code:
camcontrol devlist

Gives us a bit more to work on.

I also noticed there was conversations about manually creating the pool using a non-standard partition layout. partuuid is used for a reason, especially in larger pools.

Before we continue down these rabbit holes: Stop and rebase a second here.
The particular card in question is an https://www.broadcom.com/products/storage/raid-on-chip/sas-3316
That is FreeBSD 13.1, and even 4 newer generations are supported.

Current for Broadcom is SAS3816, I believe, not that you'll see it anywhere yet AKAIK.
Can you upload the contents of the file /var/log/messages and /var/log/middlewared.log in text files?
OP please stop fitzing with middleware you are ONLY going to make things worse.
It's SCALE so camcontrol isn't available, perhaps the output of lsblk -S is an acceptable alternative?
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
HA! My bad, I don't know why I had thought this was CORE. I think the general logic of my above comment still holds though.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
If it is load induced then I think it is worthwhile testing limiting the driver queue length, which was a solution for other people with load issues, else I would not have found the link.
 
Top