Scale ixgbe external card not powering on

Scepterus

Dabbler
Joined
Nov 19, 2022
Messages
15
Hello, I've been using TrueNAS for years now, and recently move to Scale in my home server.
As part of the upgrade, I got a 10Gb card that I want to use as my main port to that server. However, during boot time there's a link and everything is dandy, but once TrueNAS starts loading the link goes dark.
I've looked into previous answers, but none mention the physical link not showing as up. I tried ifconfig <nameofif> up, that did nothing.
There was a post regarding drivers, so I made sure again, my card is silicom Intel, so I checked the intel folder, and it had ixgbe folder.

What exactly am I missing? And is this solvable? Or is this some sort of limitation?
Thanks and have a great day!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You're going to have to be a bit clearer. From your thread title, I have no idea what an "ixgbe external card" is or why it wouldn't be "powering on". Is this in some sort of external PCIe expander?

I've also never heard of an "ixgbe card", though there is a device driver by that name. It also isn't clear what you mean by "silicom Intel"; there is a company that goes by the name of Silicom that makes custom ethernet cards, some of which are based on Intel parts, but which quite possibly might not be recognized by an Intel-authored driver.

Also missing is a description of your switch, what technology is being used to attach it (SFP+, 10GBase-T, CX4, etc), or how it is configured, all of which play various bits and roles in the whole process.
 

Scepterus

Dabbler
Joined
Nov 19, 2022
Messages
15
You're going to have to be a bit clearer. From your thread title, I have no idea what an "ixgbe external card" is or why it wouldn't be "powering on". Is this in some sort of external PCIe expander?

I've also never heard of an "ixgbe card", though there is a device driver by that name. It also isn't clear what you mean by "silicom Intel"; there is a company that goes by the name of Silicom that makes custom ethernet cards, some of which are based on Intel parts, but which quite possibly might not be recognized by an Intel-authored driver.

Also missing is a description of your switch, what technology is being used to attach it (SFP+, 10GBase-T, CX4, etc), or how it is configured, all of which play various bits and roles in the whole process.
Sorry, I took the name of the driver, I am a bit fuzzy as I am a bit under the weather.
It's a PCIe 10Gb card Silicom Intel 82599.
Silicom Intel 82599 Dual Port 10GB Fiber Channel PE210G2SPI9A-SR High /P M/B is the exact model.
It's the same card on both ends, the other end is a pfSense that works with the other port on it already.
They both have a converter to Ethernet, and a cat7 Ethernet cable run between them.

That's the basics, let me know if I missed something else.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Did the 82599 ever do Fiber Channel or is the seller just smoking something?

With that out of the way, and assuming we're talking about plain old Ethernet here (and keep in mind that Ethernet does not imply twisted pair), there are a few things to investigate:
  1. The SFP+ modules - you're using 10GBase-T converters, which is a rather odd thing to do when you're running to another SFP+ device. Literally any other option (DAC or one of the many fiber options) would have been better, unless you're really stuck with twisted pair, which doesn't sound like your situation. Thing is, 10GBase-T converters are not that great:
    • There are a few different chipsets out there, with different feature sets and very little traceability to ensure you're getting what you pay for.
    • Cable length is very limited
    • Power consumption is rough
    • They're a fairly recent thing and the Intel 82599 is a rather old controller (still fine, employed properly and with the right expectations)
    • Intel allows games to be played with SFP lock-in. Silicom might be better about it than OEM Intel, but I'll let someone with that specific experience elaborate.
  2. Some weird driver issue, possible related to point 1. above. Try a different OS just to make sure. Not super likely.
  3. Plain broken hardware. Not a likely fake, since Silicom is a reputable manufacturer and major Intel networking OEM, but defective and/or damaged.
  4. Cooling. These things need proper cooling, with a decent amount of airflow over the heatsink. Old thermal interface material can also impede heat transfer from the controller IC to the heatsink, which is a concern on cards that may be older.
 

mervincm

Contributor
Joined
Mar 21, 2014
Messages
157
I have two HP 560SFP+, based on the same intel controller, with the same dual SFP+ connection, and have them installed in both a physical PFSENSE box and my physical trueNAS scale box. While I have inbound packet loss issues from both systems, I have not seen any vendor lock in issues with DAC, SR optical, or 10Gbase-T. I didn't have to do anything (in either pfsense or truenas scale) for driver support.
I do install a 40mm fan on each of these cards to ensure they remain cool. DAC is surely the easiest when your boxes are close.

I do seem to recall that during boot the NIC link lights do go dark and then light up again, but I am not sure how / why it works like that.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
@Scepterus, which version of SCALE are you running, and in which directory path did you find the ixgbe.ko driver? You might simply need to add a Init/Shutdown pre-init task under System->Advanced to modprobe ixgbe to load the kernel driver on boot, which would make your NIC appear as an available interface.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
@Scepterus, which version of SCALE are you running, and in which directory path did you find the ixgbe.ko driver? You might simply need to add a Init/Shutdown pre-init task under System->Advanced to modprobe ixgbe to load the kernel driver on boot, which would make your NIC appear as an available interface.

For a mainline Intel ethernet chipset driver? Seems unlikely. I could possibly believe that it wasn't being associated with the off-brand card or something like that. I'm guessing this card is Vendor ID 8086, Device ID 10fb so it would be interesting to get an "lspci -nn" confirmation of that.
 

Scepterus

Dabbler
Joined
Nov 19, 2022
Messages
15
Did the 82599 ever do Fiber Channel or is the seller just smoking something?
It came with Fiber SFP+ I just put Ethernet one because of the infrastructure I have right now. When I move to a rack, that may change.
1. As I mentioned, it's not the first port on the router side that is used on that card. It's a 2 port card where 1 port has been working (albeit in a 1Gb capacity) for months now. So the card is fine and so are the connectors, since I got the same ones again.
2. As I mentioned, when the PC boots up, and is in the Bios and pre OS stages, the link is up. It's at a very certain point that the link goes down, and that's during TrueNAS's boot sequence.
3. Not probable, see 2.
4. I even have a small fan directly on the connectors, as I mentioned I have another one just like it already working for months.

which version of SCALE are you running
TrueNAS-SCALE-22.02.4
which directory path did you find the ixgbe.ko driver
/lib/modules/5.10.142+truenas/kernel/drivers/net/ethernet

which would make your NIC appear as an available interface.
It is showing up as an interface, both in the shell and the GUI, but as a down one, or a no link one.

it would be interesting to get an "lspci -nn" confirmation of that.
03:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
03:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)

Correct you are.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
As I mentioned, when the PC boots up, and is in the Bios and pre OS stages, the link is up.
How can you meaningfully evaluate that without an OS that can actually try to transmit and receive actual data? Lights being on are not the same as having a working link.
It's a 2 port card where 1 port has been working (albeit in a 1Gb capacity)
That can make a difference. It's quite possible (not saying it's the case here for sure) for something (anything, really) to operate well at 1Gb/s but fail at 10Gb/s.
3. Not probable, see 2.
Well, you need to start eliminating variables, otherwise this isn't going anywhere...

So, overall, I suspect something involving the SFP+ modules. Since the cards came with fiber SFP+ modules, definitely start by trying those out, with appropriate fiber.

It came with Fiber SFP+ I just put Ethernet one because of the infrastructure I have right now. When I move to a rack, that may change.
Important detail: I think you are mistaken about what Fiber Channel is. It's not a synonym for Ethernet over optical fiber, it's a competitor to Ethernet. The physical parts often look similar, which adds to the confusion...
 

Scepterus

Dabbler
Joined
Nov 19, 2022
Messages
15
A
well at 1Gb/s but fail at 10Gb/s.
Already tested that, still no link.
How can you meaningfully evaluate that without an OS that can actually try to transmit and receive actual data?
You can tell if it's dead or not. That's pretty basic and important to test.

with appropriate fiber.
Yeah, I don't have fibre cable, that's one of the reasons I went the Ethernet route.

Ethernet over optical fiber
Same difference, it's the same protocol on both cards, how they connect may vary, but it makes no difference to the actual function.
Same as USB, you can have USB-c connector for a USB2 device, and a USB-a for a USB3 device.

I don't think this is the path, I think the path lies within TrueNAS, and it's drivers. The same connectors and card have been working with other network devices.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Yeah, I don't have fibre cable, that's one of the reasons I went the Ethernet route.
Again, over fiber it's still Ethernet. This is not mere pedantry, getting terminology right is important.
You can tell if it's dead or not. That's pretty basic and important to test.
Sure, but it's far from conclusive. All you can really say, without other testing (e.g. on a different machine) is that it won't catch fire. Yeah, it's important, but it's not getting you much closer to an answer.

I don't think this is the path, I think the path lies within TrueNAS, and it's drivers. The same connectors and card have been working with other network devices.
Thousands, if not millions, have been using Intel 82599s for over a decade. A damn large chunk of those on Linux (TrueNAS Scale) and many on FreeBSD (TrueNAS Core). And usually when issues arise, it's because of SFP+ modules. The reasons could be many:
  • Physical damage to the card or SFP+ module (of course)
  • Lock-in (Intel does this on their Intel-branded stuff; OEMs using Intel sometimes do, sometimes don't) - typically this can be worked around by tweaking the driver configuration
  • Plain incompatible SFP+ modules
    • As I mentioned, even if they look externally identical, that's not a strong guarantee that they are internally identical. And this applies doubly to 10GBase-T adapters
You haven't made it clear at all exactly what you've tested where (testing "the same model" says little about the physical condition of the problem unit). If testing the card+SFP module in a different machine is not viable for whatever reason or the results are unchanged, my advice is to just buy a suitable pre-made fiber patch cable (they're cheap) and test with the other modules.

Same difference, it's the same protocol on both cards, how they connect may vary, but it makes no difference to the actual function.
Same as USB, you can have USB-c connector for a USB2 device, and a USB-a for a USB3 device.
No, Fiber Channel is sort of like running a PCIe x1 connection over a USB 3.0 Type-A cable, as some riser cards do. Very different protocols.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You can tell if it's dead or not. That's pretty basic and important to test.

You actually can't. I've seen lots of cards and switchports that don't bother to light at all unless the circuit is both physically up and administratively enabled (and configured!). And the opposite is true as well.

Plain incompatible SFP+ modules
  • As I mentioned, even if they look externally identical, that's not a strong guarantee that they are internally identical. And this applies doubly to 10GBase-T adapters

It's easy to forget that an SFP+ is not a passive component. It has active circuitry and supports stuff like DDMI (Digital Diagnostic Monitoring Interface) which provides statistics on signal strength, temperature, voltage, etc. Both the card firmware and the switch operating system have the option to turn their noses up at an incompatible SFP+; whether this is just a jerky proprietary encoding for vendor lock-in or true incompatibility (such as putting a 2.5/5/10GBase-T SFP+ into a switch that doesn't support it), there is a LOT of room for SFP+ incompatibility when you go off the reservation and just jam whatever you like into the port. This is one of the reasons I am a SUPER big fan of encouraging nonprofessionals to just buy cheap used branded optics that are designed for the use case. The optics are cheap, the fiber is cheap, and it's damn well going to work unless something is actually broken. Hoping that a substantially newer SFP+ technology is going to work in a 2009 era card is asking for trouble, no matter how much I adore the X520 cards.

I think the path lies within TrueNAS, and it's drivers.

Then complain to Intel, they wrote the driver.

Code:
IXGBE(4)               FreeBSD Kernel Interfaces Manual               IXGBE(4)

NAME
     ixgbe â Intel(R) 10Gb Ethernet driver for the FreeBSD operating system

SYNOPSIS
     To compile this driver into the kernel, place the following lines in your
     kernel configuration file:

           device iflib
           device ixgbe

     Alternatively, to load the driver as a module at boot time, place the
     following line in loader.conf(5):

           if_ixgbe_load="YES"

DESCRIPTION
     The ixgbe driver provides support for PCI 10Gb Ethernet adapters based on
     the Intel 82598EB Intel(R) Network Connections.  The driver supports
     Jumbo Frames, MSIX, TSO, and RSS.

[...]
HARDWARE
     The ixgbe driver supports the following cards:

     ⢠  Intel(R) 10 Gigabit XF SR/AF Dual Port Server Adapter
     ⢠  Intel(R) 10 Gigabit XF SR/LR Server Adapter
     ⢠  Intel(R) 82598EB 10 Gigabit AF Network Connection
     ⢠  Intel(R) 82598EB 10 Gigabit AT CX4 Network Connection
[...]
SUPPORT
     For general information and support, go to the Intel support website at:
     http://support.intel.com.

     If an issue is identified with the released source code on the supported
     kernel with a supported adapter, email the specific information related
     to the issue to <freebsd@intel.com>.

SEE ALSO
     altq(4), arp(4), netintro(4), ng_ether(4), polling(4), vlan(4),
     ifconfig(8)

HISTORY
     The ixgbe device driver first appeared in FreeBSD 7.0.

AUTHORS
     The ixgbe driver was written by Intel Corporation <freebsd@intel.com>.

FreeBSD 12.2-RELEASE-p12       January 30, 2019       FreeBSD 12.2-RELEASE-p12
 

Scepterus

Dabbler
Joined
Nov 19, 2022
Messages
15
Just so we could move on, I booted the same pc to a WinPE, and lo and behold, the 10gig card not only works but got an IP and everything.
So now that we know the hardware is ok, can we speculate on other software factors?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
can we speculate on other software factors?

Like what?

ix based cards are known to work in both FreeBSD and Linux, so it's not TrueNAS. TrueNAS has no idea that the card isn't a gen-u-whine Intel card, nor would it care. We already verified the PCI bus ID, which is the usual problem for that.

The ix driver was written by Intel, so it's expected to work. It could possibly not work, but that seems weird, because Intel has sold millions of these chips to OEM's. It could be that the card has firmware or settings that are making it fail.

The most likely things are the things @Ericloewe and I have been focusing on, which is that the unusual (and unsupported) configuration is probably the culprit. This would require you to pull your nonoptics and replace them with known compatible optics and a fiber patch between them.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
And yes, weird SFP behavior has been known to be driver-dependent (i.e. OS-dependent). Some versions have been tweaked differently.

Which is to say, you might be able to figure out a way of making the driver play nice, but it's frankly not worth the effort in most cases.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
And yes, weird SFP behavior has been known to be driver-dependent (i.e. OS-dependent). Some versions have been tweaked differently.

Which is to say, you might be able to figure out a way of making the driver play nice, but it's frankly not worth the effort in most cases.

The astute reader will take note of both this and also my very carefully selected words

replace them with known compatible optics

Let me tell you the story of upgrading a Dell PowerConnect 8132F. It was filled with mostly Dell 10G optics but also a few random Finisars to hook to 1G switchgear. We did the upgrade to push them from 6.3.whatever to 6.5.whatever which would also make them identify as N4032F (basically the same product). Suddenly the Finisar's stopped working. The new switch firmware refused to work with the non-Dell optics. So this is the weird thing. You have to remember that both the firmware AND the driver are involved with and capable of enforcing restrictions on the type of SFP+'s that your gear will work with.

It is strongly advisable to start with the original compatible optics and then work outwards to your preferred potentially incompatible optics. I don't know for certain if Silicom has their own branded optics (in fact I think they don't), but you should probably try Intel FTLX8571D3BCV-IN which are one of the several optics that shipped with the X520-SR2. After that, you can step back down to Finisar FTLX8571D3BCV, which is the Finisar OEM part that Intel resells, and after that, you could try various generic 10G optics. This would let you know what sort of compatibility was possible.
 

Scepterus

Dabbler
Joined
Nov 19, 2022
Messages
15
Like what?

ix based cards are known to work in both FreeBSD and Linux, so it's not TrueNAS. TrueNAS has no idea that the card isn't a gen-u-whine Intel card, nor would it care. We already verified the PCI bus ID, which is the usual problem for that.

The ix driver was written by Intel, so it's expected to work. It could possibly not work, but that seems weird, because Intel has sold millions of these chips to OEM's. It could be that the card has firmware or settings that are making it fail.

The most likely things are the things @Ericloewe and I have been focusing on, which is that the unusual (and unsupported) configuration is probably the culprit. This would require you to pull your nonoptics and replace them with known compatible optics and a fiber patch between them.
Why? I stated that under windows PE (not even full windows) the same set-up works perfectly, gets an IP is pingable and so on.
So it's not a firmware lock. Plus, the other side is FreeBSD, and that works flawlessly with the same hardware. I bought these specific cards because they were compatible with pfSense, and in general less restrictive.

you might be able to figure out a way of making the driver play nice
And how would I go around doing that? You can link me to some articles or posts, and I can run with those. Because clearly this post is going nowhere fast.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Why? I stated that under windows PE (not even full windows) the same set-up works perfectly, gets an IP is pingable and so on.
So it's not a firmware lock. Plus, the other side is FreeBSD, and that works flawlessly with the same hardware. I bought these specific cards because they were compatible with pfSense, and in general less restrictive.

So then the obvious question is does it work with pfSense on both sides?

I've personally recommended the X520 cards *hundreds* of times on these forums, their main downsides is that they are power hungry and only PCIe 2.0. They're absolutely solid under FreeBSD, and they work fine under Proxmox and a few other Linuxes I've had occasion to use them under as well. I think we're just a bit stuck here because these *should* work fine.

Do they work fine on a plain vanilla Debian install? At the end of the day, TrueNAS is not doing anything special or magic like rewriting drivers for Intel cards, so the drivers come from the upstream operating systems (FreeBSD and Debian) and I would expect brokenness to show up there as well, if they're broken on TrueNAS for your particular hardware.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
And how would I go around doing that?
I would provide links if I had them, but the details don't come up very often. The keywords you're looking for include SFP+ and vendor lock.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I would provide links if I had them, but the details don't come up very often. The keywords you're looking for include SFP+ and vendor lock.

Are you talking about hacking on this with Intel's eeupdate tool? You definitely don't want to Google around for Intel proprietary internal tools to hack on card firmware and settings, specifically IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP . It's probably advisable to see if the driver supports the allow_any_sfp or allow_unsupported_ftp flags (see https://sourceforge.net/p/e1000/mailman/message/28698959/ and BE SURE to read the whole thread, accessible from the link at the bottom of that page) which do not permanently risk bricking your card. But the other thing to note is, once again, that simply forcing the card to allow the SFP+ doesn't mean that the card will successfully be able to use an SFP+ that uses a technology that was designed after the card. I would expect a Silicom card to already be unlocked.
 
Top