DIY NAS keeps disconnecting from network after a while

gyorfitam

Dabbler
Joined
Jan 20, 2021
Messages
20
I have a DIY NAS that runs TrueNAS-13.0-U6.1. I put this system together a while ago and noticed that if I run the machine for a while (few days for example), the web GUI becomes unavailable and I also can't see the device being connected to the router. After reading a bit about this, the recommendation seems to be to use different NIC rather than the (supposedly crappy) Realtek one on the motherboard. Generally the recommendation was to get an Intel one if I'm only going to use a gigabit connection, so I did buy a new Intel EXPI9301CTBLK Gigabit PRO 1000CT card, this was one of the recommended ones in one of the articles I've found. And the issue still persist - although doesn't seem to be as severe as earlier (I only checked it after around 2 weeks of running as after few days it was still fine). I don't even know how to test this issue properly so I'm not sure when did the network shutdown for example. I tried to reboot it by connecting a keyboard to it but it didn't react to it for some reason so in this scenario the only option I have AFAIK is to press the reset button on the case. The recommendation is to run the NAS constantly as it is better for the hard drives and all the automated checks (SMART, Scrub) can go through regularly, I get it. But if it keeps disconnecting, I don't think it's that great for this system that I have to force restart quite regularly. What if I restart it while scrubbing is happening? Is there any way to check if any scheduled task is happening right now while the web GUI is not available? The command line is not really showing anything - especially if you can't interact with it (not reacting to keyboard).
Any ideas what can cause this issue and how to solve it?
Some info about the system:
- ASRock B450 Steel Legend motherboard
- AMD Ryzen 5 1400 CPU
- 16GB (2x 8GB) unbuffered ECC RAM
- Boot drive is a 128GB M.2 NVME drive (TS128GMTE110S)
- There's 14 hard drives in the system in total. There is 3 vdevs, each consists of 4 drives (RAIDz1), 2 hard drives are spares. 8 hard drives are connected to an HBA, 4 to motherboard, the 2 spares to a simple dual PCIE X1 SATA card. They are a combination of WD Purple and Red drives (non of them are SMR drives).
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Have you turned off all the AMD power saving stuff that can cause issues
Your hardware list leaves something to be desired - but at least you tried. We need specifics. What HBA, is it in IT mode. What PCIe card (warning most of these are utter junk - but as its just spares - I suspect that at this stage it doesn't matter). What PSU, what case etc
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
If those disconnects happen after around 2-3 days without any error in the logs, it's most likely the bios setting that @NugentS mentioned.
Those settings are erp-ready, global c-states, and for older mainboards amd cool&quit.
When i started my truenas journey 3 years ago i started with as ryzen 5 1600x and had exactly the same problems. After some weeks of playing around with bios settings and reading on other sites i found those fixes, applied them and the system became stable.
 

gyorfitam

Dabbler
Joined
Jan 20, 2021
Messages
20
Have you turned off all the AMD power saving stuff that can cause issues
Your hardware list leaves something to be desired - but at least you tried. We need specifics. What HBA, is it in IT mode. What PCIe card (warning most of these are utter junk - but as its just spares - I suspect that at this stage it doesn't matter). What PSU, what case etc
I will check the AMD power saving stuff, thanks for the tip.
HBA is IBM ServeRAID M5110 in IT Mode (= 9211-8i). I managed to add a small 40mm fan on the heatsink so it shouldn't run too hot. There's fresh air coming on from the side panel as well.
I am aware that most of these PCIe SATA cards are trash. What I've found on this forum (from other people's comments) is that if you just want to add 1 or 2 extra hard drives, then some of the 2 port ones can work fine - mine is a 'Lazmin 2 Port SATA PCI Express SATA Controller Card', it uses an ASM1061 chip.
PSU is a 'Be Quiet! Pure Power 11 FM 550W Modular Power Supply 80 Plus Gold'.
Case is an old Antec 900. It has 9x5.25" spaces, I bought some of these generic 3x5.25" to 5x3.5" cages, so there's space for 15 hard drives in total and I use 12cm Arctic fans for each cages. There's also a huge (I think 200mm) fan on top that blows are out. So I believe the ventilation should be quite decent.
 

gyorfitam

Dabbler
Joined
Jan 20, 2021
Messages
20
If those disconnects happen after around 2-3 days without any error in the logs, it's most likely the bios setting that @NugentS mentioned.
Those settings are erp-ready, global c-states, and for older mainboards amd cool&quit.
When i started my truenas journey 3 years ago i started with as ryzen 5 1600x and had exactly the same problems. After some weeks of playing around with bios settings and reading on other sites i found those fixes, applied them and the system became stable.
I will definitely check these BIOS settings, thanks.
 
Joined
Jun 15, 2022
Messages
674
The Realtek card seems to have been a different problem with similar symptoms, the mentioned power saving states are likely another issue.

Also look for any "boost" settings and turn them off; running a system at the edge of stability is great for games though really hard on the mainboard when it's run 24/7.

DO NOT overclock memory, a solid NAS uses ECC RAM because errors are more frequent than most people imagine.
 

gyorfitam

Dabbler
Joined
Jan 20, 2021
Messages
20
I looked at the stuff you've mentioned: so RAM and CPU is not overclocked and turned off all the power saving stuff. It runs for more than 2 weeks and it's still running great. So thanks the help, it looks like this issue is solved.
 

FAHX

Cadet
Joined
Feb 5, 2024
Messages
1
I have a very similar issue - Mine can be fine for days, but if there is a large network load - it crashes the GUI. However the unit itself doesn't see this and when i connect a screen I get all the options as i would expect- it shows no errors of the disconnection. For all purposes - it would suggest there isnt an issue at al. I have to restart the Server for it to work again.

not overclocking, no power management etc
 
Joined
Jun 15, 2022
Messages
674
If there's no Virtual Machine and/or other funky stuff going on and you're running straight TrueNAS as Network Attached Storage, and heavy load is causing issues, I would *guess* from reading other members experiences it's caused by:
  1. Gaming mainboard.
  2. Clone/fake Host Bus Adapter
  3. Loose connector
  4. RealTek driver conflict, networking being the most common
  5. Old hardware (power line filter capacitors, dried thermal paste, bad cables)
  6. Old/bad hardware: (mainboard, RAM, other)
  7. Insufficient cooling
  8. Something overlooked by the system builder (generally you, and I mean that kindly and in a helpful tone, an example being component burn-in for each component)
  9. HBA driver conflict
  10. HBA misconfiguration (deep in the card's internal settings)
I wouldn't etch that in stone, but hopefully it helps.
 
Top