Truenas Core hangs every-other-boot at SAS card after CPU upgrade

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
I just upgraded my server from a G3220 to an i7-4790S (a substantial leap I think you'll agree). The upgrade was pretty uneventful until I returned my Truenas boot media to their respective USB slots. Now it hangs (randomly) at the following line:

mps0: <Avago Technologies (LSI) SAS2008> at device 0.0 on pci1

Now at first I thought I had bumped a power cable or something so I fully powered off the machine and checked everything. It was all fine. Booted up and it went all the way to Truenas launching! Something didn't feel right so on a hunch I rebooted the machine from the truenas menu - hung up again at the line above.

Then I fully powered off the machine again trying to replicate the conditions it booted under - hung again. Then I hit the reset button - booted up fine? I'm very perplexed here but I suspect it's a boot media issue? On one successful boot I performed a scrub on the boot pool and that went fine.

My boot pool is a mirror of an M2 in a usb housing and an SSD that has a usb-to-sata adapter.

As additional info I grabbed this from the top of the log after a successful boot up. It may or may not be relevant.
Oct 8 08:14:25 freenas 1 2023-10-08T08:14:25.365970-05:00 freenas.local daemon 2172 - - 2023-10-08 08:14:25,365:wsdd WARNING(pid 2173): no interface given, using all interfaces
Oct 8 08:14:26 freenas kernel: igb3: link state changed to UP
mass0: SCSI over Bulk-Only; quirks = 0x0100
Oct 8 08:14:18 freenas umass0:10:0: Attached to scbus10
Oct 8 08:14:18 freenas (probe6:umass-sim0:0:0:0): REPORT LUNS. CDB: a0 00 00 00 00 00 00 00 00 10 00 00
Oct 8 08:14:18 freenas (probe6:umass-sim0:0:0:0): CAM status: SCSI Status Error
Oct 8 08:14:18 freenas (probe6:umass-sim0:0:0:0): SCSI status: Check Condition
Oct 8 08:14:18 freenas (probe6:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
Oct 8 08:14:18 freenas (probe6:umass-sim0:0:0:0): Error 22, Unretryable error
Oct 8 08:14:18 freenas Root mount waiting for: CAM
Oct 8 08:14:18 freenas Root[1731]: Last message 'mount waiting for: C' repeated 1 times, suppressed by syslog-ng on freenas.local

Any help would be appreciated. I don't like resetting my server multiple times to get it to boot up.
 
Last edited:

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
As an update here on a whim I decided to pull the video card out that I had in there for future VM passthrough. The system now boots up fine from either boot disk individually from any usb port on the system. The problem is I would actually like to use that video card at some point and it's part of the reason I got a new CPU. I don't think it's a power issue because there's a 700W power supply in there and per kilowatt the system is only drawing about 220W on startup (sans video card which is only a GTX1050).

Would this be related to virtualization in some way? Why on earth would removing the video card allow the system to boot consistently?
 
Last edited:

potatosword

Dabbler
Joined
Dec 30, 2018
Messages
44
This appears to be PCI passthrough related. I reinstalled FreeNAS (due to messing up my mirror testing boot disks individually) and everything was running fine (with video card installed) for about a week. Several reboots with no issue.

Then I decided to start working on my NVR project by following this guide for PCI passthrough. I set my tunables and rebooted the machine. Great! Booted up just fine and I watched my monitor jump over to the VM once it started.

Subsequent boots however... failed and I had to pull the video card in order to get back into TrueNAS. I disabled the tunables, reinstalled the video card, and it boots up fine.

Edit: Ok so I got reset again, put my tunables in, got the passthrough devices sent to the VM and that was working but now every time I get set up drives ada0 and ada1 both stop having access to their SMART information (including test history). If I remove the passthrough device from the VM and disable my tunables the SMART issue stops. This is one wild ride.
 
Last edited:
Top