NVMe issue on virtualized TrueNas

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
Hi All

So I've been running a virtualized FreeNas/TrueNas box for nearly a year now (relatively) issue free, allowing me to combine my VM hypervisor and storage into one. (See my build thread here) While this has lots about my build, the core specs are below

X10DRL-i board
2x E5-2630v4 10 core CPUs
256Gb ECC 2133Mhz ram
8x 8TB Reds
8x 8TB Golds
4x 860 evo Sata SSDs
3x 960 Pro Sata SSDs
1x 240Gb Optane 900p
3x 1Tb 970 Evo NVMEs (offending drives)
10Gb Intel X550
3x LSI 9211-i8’s
2x cheap 120Gb SSDs for FreeNas VM to live on.
32Gb USB for ESX to boot off
1000w Psu.
TrueNas 12.0
ESX 6.7U3


Since the original build (But I've had it for nearly 6 months I think) I installed an Asrock 4xNVME (16x slot) card, to which I connected a single Optane 900p, and (for testing the boards 4x4x4x4 bifurcation) a 500Gb Samsung Evo. It works superbly, and the Optane is carved up into two small SLOGs for two pools and a large L2ARC for another, and I was able to use the 500Gb NVMe independently without issue too. Since then the 500Gb was removed and installed into the Wifes gaming PC (which is what I'd brought it for) and since then, its been plain sailing with superb transfer rates and the Optane drive taking any abuse I could throw!

As of the last few months, my 3x512Gb 860 Pro Sata SSDs (in RaidZ1) that I use as an iSCSI store for my VMs has reached 88% capacity, so with Amazon having a sale, i was able to pick up 3x 1Tb 970 Evo M.2 drives for a really good price. I installed these in the remaining 3 M.2 slots, passed them through to TrueNas, created a pool without problem...

This is where the issues started....

As a simple test, I created a RaidZ1 pool, created a data set, shared it with SMB, and tried to copy some large video files to it. After a few seconds of transfer rates of around 4Gbps, it stops dead, and I start seeing sporadic errors on the console, and it never recovers.

At this point I'm lost, I did a lot of searching but the only references i found were people who like me were adding Optane to a FreeNas VM and needed to make a tweek to the Passthru.map file in ESX, but I don't know if this is the same issue, and if it is, if need to add to the file, and what that would be for a Samsung drive? Also saw a few posts from various sources stating this might be a FreeBSD issue? Eitherway, I'm looking for ideas. (Screenshots and pretty pictures attached)

Any and all thoughts and input would be greatly appreciated.

Thanks in advanced
 

Attachments

  • FCF8CC71-43EC-4A51-BF2F-A901CFFC9455.JPG
    FCF8CC71-43EC-4A51-BF2F-A901CFFC9455.JPG
    367.2 KB · Views: 345
  • Transfer.JPG
    Transfer.JPG
    29.1 KB · Views: 324
  • IMG_5568.JPG
    IMG_5568.JPG
    336.7 KB · Views: 602
  • IMG_5567.JPG
    IMG_5567.JPG
    397.1 KB · Views: 361
  • Errors.JPG
    Errors.JPG
    91.3 KB · Views: 335
  • Errors 2.JPG
    Errors 2.JPG
    88.9 KB · Views: 338

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
So I've had another play this morning and am noticing some interesting results. If i take just two of the drives, while I still get the occasional error on the console, throughput is pretty good. As soon as I introduce the 3rd drive either into a 3 drive stripe or Raidz1, performance starts becoming sporadic. One moment you'll get good through put, able to max out my 10Gb link with ease, then it'll tank to a near stop, while producing the "syslog-ng[2917]: I/O error occurred while writing; fd='xx', error='Connection refused (61)' error, followed by "nvme'x': Missing interrupt", with the X being either nvme2 or 3.

Will continue testing and looking for patterns.
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
Also, I passed the NVME drives to a Windows VM, and the work flawlessly, all 3 drives hitting same transfer rates and IOPS, so certainly seems suggest its something todo with TrueNas?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
One of the problems with virtualizing FreeNAS/TrueNAS, and why we tend to discourage it here in the forums, is because you are stacking two incredibly complicated software products on top of each other, introducing further complexity in the form of the mainboard, and in your case, also adding additional challenges on the PCIe passthru front. I can virtually (har har) guarantee that no one here has your setup. It's possible that there's a problem with FreeBSD, but it could also equally be a problem with that specific version of ESXi, the mainboard, etc.

Since you are not that likely to get a solid answer on this, I would suggest you might want to search for NVMe and MSI/MSI-X problems with ESXi on the forums. My memory is saying that this sounds vaguely familiar but I do not have the time this morning to go off tilting at windmills.
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
One of the problems with virtualizing FreeNAS/TrueNAS, and why we tend to discourage it here in the forums, is because you are stacking two incredibly complicated software products on top of each other, introducing further complexity in the form of the mainboard, and in your case, also adding additional challenges on the PCIe passthru front. I can virtually (har har) guarantee that no one here has your setup. It's possible that there's a problem with FreeBSD, but it could also equally be a problem with that specific version of ESXi, the mainboard, etc.

Since you are not that likely to get a solid answer on this, I would suggest you might want to search for NVMe and MSI/MSI-X problems with ESXi on the forums. My memory is saying that this sounds vaguely familiar but I do not have the time this morning to go off tilting at windmills.

Yeah I appreciate the bespoke nature of my build, and have considered separating out my my Hypervisor and storage. Its an option that remains open, but requires more hardware and rack space :( (including a 10Gb Switch so my I can keep iSCSI on TrueNas, and till be able to connect TrueNas, a separate hypervisor AND my editing desktop. (at the moment I'm just using p2p 10Gb link between TrueNas and my editing rig)

In the mean time, appreciate the steer, will have a little search around MSI/MSI-X (Read up on what that is first) and go from there and will report back. Thanks again
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
We did experience similar problems without the hypervisor. The solution was twofold:
- update from FreeNAS 11.2 to 11.3 - Warner Losh did some improvements in the FreeBSD NVME code - but you are on 12.0 already
- flash an updated firmware to all our Intel SSDs

That finally solved it. I'd check for updates for your SSDs.
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
We did experience similar problems without the hypervisor. The solution was twofold:
- update from FreeNAS 11.2 to 11.3 - Warner Losh did some improvements in the FreeBSD NVME code - but you are on 12.0 already
- flash an updated firmware to all our Intel SSDs

That finally solved it. I'd check for updates for your SSDs.

First thing I did was pass them through to a Windows VM so I could do exactly that, installed the "Samsung Magician" software, and checked all had the latest firmware (which they did) and did benchmark tests on them all.

The only thing that isn't the latest and greatest is ESX 6.7U3... I believe 7.0 is out now... but my fear there is that I'm likely going to break something else and still not solve the issue :(
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
You could try installing TrueNAS to a USB medium, boot that, import your pool. Then check if the problem persists. I would expect as much, I doubt it's the hypervisor at the moment, but only because I have seen the "lost interrupt" problem without one, too.

Then you could open a bug report with FreeBSD ...

HTH,
Patrick
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
You could try installing TrueNAS to a USB medium, boot that, import your pool. Then check if the problem persists. I would expect as much, I doubt it's the hypervisor at the moment, but only because I have seen the "lost interrupt" problem without one, too.

Then you could open a bug report with FreeBSD ...

HTH,
Patrick

Thats a cracking idea, take ESX out of the loop entirely! Once I shut it down again after testing, having imported the pools to that USB instant of TrueNas won't cause a problem when i boot back to ESX/TrueNas will it? If not, I'll give that a whirl and see how I get on.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Won't pose a problem. Just export the pool before each change of installation.
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
So, exported my pools, restarted, I then installed a fresh copy of TrueNas to USB, imported my my main pool with its NVME Slog, and built a new pool using the new 3 NVME drives, and so far, it works perfectly, so thanks for the suggestion Patrick!

Am moving data over the network and maxing out the 10Gb link with ease, but importantly, no errors or interrupts. So I guess, this points the finger squarely at ESX?... But how come it works with the Optane in port 1 of the card?

The issue is, to seperate out the box, I'd need a new chassis, new PSU, new board (already have heaps of ram floating about and a spare 10 Core Xeon), and a 10Gb Nic and 10Gb switch... so as you can imagine... far from my first choice :( Where should I start with troubleshooting ESX? Should I try a clean install of TrueNas in VMware, could that help? Could installing it as UFEI help? (I think its currently BIOS mode).

Open to ideas that saves me spending hundreds of pounds :(
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Sorry, I don't have a clue what exactly goes wrong in the virtualised case. I only learned at EuroBSDCon that this interrupt multiplexing and handling stuff is complicated. So something about the interrupt routing is different, even with PCI passthrough.

What workloads are you running in ESXi? Could you move them to TrueNAS instead? You know it is not only storage, right?
I run 1 Windows 10 and 4 Linux VMs happily in my TrueNAS CORE 12.
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
Sorry, I don't have a clue what exactly goes wrong in the virtualised case. I only learned at EuroBSDCon that this interrupt multiplexing and handling stuff is complicated. So something about the interrupt routing is different, even with PCI passthrough.

What workloads are you running in ESXi? Could you move them to TrueNAS instead? You know it is not only storage, right?
I run 1 Windows 10 and 4 Linux VMs happily in my TrueNAS CORE 12.

I have played with the Hypervisor in Freenas and really didn't get on with it. My VMs kept freezing, so sadly its no use to me. I run about 7 to 8 VMs, some are very small, others are big heavy Media servers (I don't like using jails/plugins, prefer to keep them standalone).
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
You could try TrueNAS SCALE ...
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
You could try TrueNAS SCALE ...

So I had "heard" of TrueNas scale, but hadn't really looked into what it was... having just gone off and taken a look... Yes... VERY interesting... Might be a challenge to migrate my ESX VMs (Most I can rebuild by my Plex VM I really don't want to have to rebuild).
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I don't have a 970 EVO here to check, can I get an lspci -nn from your VMware host, showing the PCI ID for the 970 EVO? I believe it should be

Code:
# Samsung 970 EVO
144d a808 d3d0 false


though. Power the FreeNAS VM off, add that to the bottom of /etc/vmware/passthru.map and power it back on.

If that doesn't work, try disabling MSI interrupts. Make a backup of your FreeNAS .vmx file and then add the lines:

pciPassthru0.msiEnabled = "FALSE"

However, if you passed your Optane through first, that will be pciPassthru0 - change that to 1, 2, or 3 for the EVO cards. (And you'll have to do all three lines to make it take effect.)
 
Last edited:

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
I don't have a 970 EVO here to check, can I get an lspci from your VMware host, showing the PCI ID for the 970 EVO? I believe it should be

Code:
# Samsung 970 EVO
144d a808 d3d0 false


though. Power the FreeNAS VM off, add that to the bottom of /etc/vmware/passthru.map and power it back on.

If that doesn't work, try disabling MSI interrupts. Make a backup of your FreeNAS .vmx file and then add the lines:

pciPassthru0.msiEnabled = "FALSE"

However, if you passed your Optane through first, that will be pciPassthru0 - change that to 1, 2, or 3 for the EVO cards. (And you'll have to do all three lines to make it take effect.)

Output for that command is as below:
0000:04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 [vmhba4]
0000:05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 [vmhba5]
0000:06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 [vmhba6]

My current passthru.map looks like this:


# Intel 82579LM Gig NIC can be reset with d3d0
8086 1502 d3d0 default
# Intel 82598 10Gig cards can be reset with d3d0
8086 10b6 d3d0 default
8086 10c6 d3d0 default
8086 10c7 d3d0 default
8086 10c8 d3d0 default
8086 10dd d3d0 default
# Broadcom 57710/57711/57712 10Gig cards are not shareable
14e4 164e default false
14e4 164f default false
14e4 1650 default false
14e4 1662 link false
# Qlogic 8Gb FC card can not be shared
1077 2532 default false
# LSILogic 1068 based SAS controllers
1000 0056 d3d0 default
1000 0058 d3d0 default
# NVIDIA
10de ffff bridge false
# Intel Optane 900P
8086 2700 d3d0 false


So I've added the lines you suggested, and am booting the VM back up now, and will report back shortly. Thank you for taking the time to have a look at this :) Greatly appreciated
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
So the first trick didn't work, so I'm now attempting trick two... Below is the pciPassthru. I have 3 LSI cards passed through as well, so i think thats messing with the number. Using a bit of logic from what you said above, I'm assuming the entries I'm interested in are:....

scsi0:1.fileName = "/vmfs/volumes/5e9b28e1-bbd8cf6e-86eb-ac1f6b781c6e/FreeNas/FreeNas_2-000001.vmdk"
sched.scsi0:1.shares = "normal"
sched.scsi0:1.throughputCap = "off"
scsi0:1.present = "TRUE"
pciPassthru0.id = "00000:001:00.0"
pciPassthru0.deviceId = "0x0087"
pciPassthru0.vendorId = "0x1000"
pciPassthru0.systemId = "5ca12a2a-6606-028e-a31f-ac1f6b781c6e"
pciPassthru0.present = "TRUE"
pciPassthru1.id = "00000:002:00.0"
pciPassthru1.deviceId = "0x0087"
pciPassthru1.vendorId = "0x1000"
pciPassthru1.systemId = "5ca12a2a-6606-028e-a31f-ac1f6b781c6e"
pciPassthru1.present = "TRUE"
pciPassthru2.id = "00000:003:00.0"
pciPassthru2.deviceId = "0x2700"
pciPassthru2.vendorId = "0x8086"
pciPassthru2.systemId = "5ca12a2a-6606-028e-a31f-ac1f6b781c6e"
pciPassthru2.present = "TRUE"
pciPassthru6.id = "00000:130:00.0"
pciPassthru6.deviceId = "0x0087"
pciPassthru6.vendorId = "0x1000"
pciPassthru6.systemId = "5ca12a2a-6606-028e-a31f-ac1f6b781c6e"
pciPassthru6.present = "TRUE"
pciPassthru3.id = "00000:004:00.0"
pciPassthru3.deviceId = "0xa808"
pciPassthru3.vendorId = "0x144d"
pciPassthru3.systemId = "5ca12a2a-6606-028e-a31f-ac1f6b781c6e"
pciPassthru3.present = "TRUE"
pciPassthru4.id = "00000:005:00.0"
pciPassthru4.deviceId = "0xa808"
pciPassthru4.vendorId = "0x144d"
pciPassthru4.systemId = "5ca12a2a-6606-028e-a31f-ac1f6b781c6e"
pciPassthru4.present = "TRUE"
pciPassthru5.id = "00000:006:00.0"
pciPassthru5.deviceId = "0xa808"
pciPassthru5.vendorId = "0x144d"
pciPassthru5.systemId = "5ca12a2a-6606-028e-a31f-ac1f6b781c6e"



I've changed them to False, and will re-test :)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Good to know I was on the money for the PCI vendor ID, sorry it didn't work. Let me know if the changes below help:

Code:
pciPassthru3.msiEnabled = "FALSE"
pciPassthru4.msiEnabled = "FALSE"
pciPassthru5.msiEnabled = "FALSE"


I forgot the -nn flag (numeric and name IDs) in lspci - sorry about that. Try changing the d3d0 to default instead in the passthru.map to see if the default reset method works.
 

Sprint

Explorer
Joined
Mar 30, 2019
Messages
72
So with the pciPassthruX.msiEnabled = "false", we seem to be stuck at the boot process, so looks like thats a no go :(

Will try the above change in passthru.map next and report back shortly. :)
 

Attachments

  • Missing Interupts.JPG
    Missing Interupts.JPG
    94.7 KB · Views: 311
Top