Read/Write/Checksum Errors Constantly

Pray4Tre

Dabbler
Joined
May 24, 2023
Messages
16
I've been logging all my errors lately, and I feel like they are increasing.
Just this morning during my weekly scrub, there's been numerous checksum errors.
I've only been on TrueNAS for maybe a year.
All the drives health checks I have set on regular schedules don't show any errors.
I don't know if the issue is related to cabling but i'm using a SuperMicro 847 chassis with 24 hard drives.
Is this normal or is something wrong?
1703517261066.png
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
The sheer number of sudden errors suggests cabling having gone rogue / loose or a bad HBA / backplane. Try reseating them all, perhaps apply some deoxit, then re-scrub. A spare HBA card on a contingency basis is also cheap insurance and helps a lot re: troubleshooting.
 

Pray4Tre

Dabbler
Joined
May 24, 2023
Messages
16
Do you know if a Supermicro 847 has an HBA? Does it go from backplane to drives?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
You should know that.
If nessesary send us internal photos of the internal drive cabling
Oh and as per forum rules a full hardware spec
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
It would
Do you know if a Supermicro 847 has an HBA? Does it go from backplane to drives?

AFAIK, backplanes never contain a HBA. The HBA is either on the motherboard or on attached PCIe card(s). Expanders may be found on backplanes, however. So it could also be the expander on the fritz. A full hardware spec would help.
 

Pray4Tre

Dabbler
Joined
May 24, 2023
Messages
16
Sorry for the late reply, busy Holiday season.
I have reseated all cables, HBA and drives and airdusted the whole thing.
After reboot, i'm still getting errors every day to every couple of days.

My HBA card is less than a year old and brand new - LSI Broadcom SAS 9300-8i
I am using a SUPERMICRO 847 36-bay chassis from ebay, so not sure if the backplanes are the issue or not.

Are these errors okay for now?
Data integrity fine and all?
 

probain

Patron
Joined
Feb 25, 2023
Messages
211
Are these errors okay for now?
Data integrity fine and all?
Whilst I don't speak for anyone other than myself. I would be surprised if anyone on these forums would consider any amount errors to be "okay". Even intermittent errors, are signs of something not being right. And as such, "Data integrity" cannot be guaranteed. :frown:
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
These errors are NOT OK
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
That many disks, that many errors.
Its backplane, cabling or HBA - just dunno which.

Where did you get the HBA from?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
That many disks, that many errors.
Its backplane, cabling or HBA - just dunno which.
Or power... how big is your PSU?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
The disk shelf - its SMC - so not (at least in theory) been put together by an amateur. So the PSU should be sized correctly.

Of course - you are correct the PSU might be faulty shoule be added to list of possibilities
 

Pray4Tre

Dabbler
Joined
May 24, 2023
Messages
16
I got the HBA from amazon:

Power supplies are 2x redundant 1280W PWS-1K28P-SQ so I don't think power is the issue.
Backplanes are these: BPN-SAS3-846EL1 at about $250-275 a peice on ebay and there are 2 inside my Supermicro 847 chassis, at that price if that is the issue, i'd be better off buying a whole new chassis as I only paid $500-600.

Is it possible that running high read writes to the pool could be the cause of these errors?
Lots of data here, lots of services running. But all my Proxmox vm's are ran on m.2 drives, still there are multiple files being added, deleted, or changed daily on this pool.

I guess my only option is buy a backup HBA card and cables and swap and see.
And if errors still happen, then it leads me to believe its backplane related and it might just be time to move on to industry new hardware and get a 45 drives 30-45 bay storinator at a cheap price of $2700 - $3333...

If anyone here has any options for chassis that hold 30-45 drives thats cheaper, please let me know, i've been searching for alternatives for months.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Did you flash the correct IT software to that LSI card?

Are those disk errors on the rear backplane, front backplane or both?

I also notice that despite asking we haven't got a full hardware spec
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
all my Proxmox vm's are ran on m.2 drives
How is Proxmox involved here? is TrueNAS a VM?

Is it possible that running high read writes to the pool could be the cause of these errors?
Not if it's properly configured, but if there's VM trickery going on here, maybe there's something to be said about that.
 

Pray4Tre

Dabbler
Joined
May 24, 2023
Messages
16
Card comes flashed to IT firmware, plenty of the reviews on amazon say so and many users reviewed it was simple plug and play as was the same for me.
My storage pool is comprised of 12x10TB western digitals, and 12x14TB HGST drives taking up all 24 drives in the front of the Supermicro 847 chassis. So errors are coming from the front, as there are no drives on the back 12 port backplane.

Proxmox isn't really involved directly, no TrueNAS is not running in a VM, TrueNAS is its own dedicated thing inside that supermicro chassis.While my VM's are stored and run from an m.2 drive inside that chassis though that is setup as a share in proxmox but no read/write errors there.
The VM's that run on that m.2 do touch the storage pool through network shares setup in TrueNAS where plenty of data is being read/write daily.
Not anything really crazy going on as far as I know.

Hardware Spec for TrueNAS Chassis:
Motherboard: Gigabyte W480 VISION W LGA 1200 - https://www.newegg.com/p/N82E16813145225?Item=9SIAYTVH1T5170
CPU: Intel Xeon W-1290 10-Core 3.2 GHz 20MB L3 Cache LGA 1200
RAM: NEMIX 128GB 4x32 GB DDR4-3200 PC4-25600 2Rx8 ECC Unbuffered Server Memory
OS Drive: 1TB Samsung 980 PRO NVMe Gen4M.2
VM Drive: 2TB WD_BLACK SN770 NVMe M.2 Gen4 PCIE
HBA: LSI Broadcom SAS 9300-8i flashed to IT firmware
Chassis: Supermicro CSE 874
Back Plane 1: BPN-SAS3-846EL1
Back Plane 2: BPN-SAS3-826EL1
PSU: 2x 1280W PWS-1K28P-SQ
HD: 12x 10TB WD drives (WDC_WD100EMAZ-00WJTA0 & WDC_WD101EMAZ-11G7DA0
12x 14TB HGST drives (Almost all WDC_WUH721414ALE604 & 1x WDC_WD140EDGZ-11B1PA0)

All smart checks on all drives have always shown SUCCESS and indicate no issues with the drives themselves.
The 14TB are new and barely a year old, the 10tbs are slightly older and where purchased in different years.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Can you move some of the drives with errors to the rear backplane, then copy a load of data and see if the error counts on those drives increase?
 

Pray4Tre

Dabbler
Joined
May 24, 2023
Messages
16
All drives at one point or another has had read/write/checksum errors, it doesn't seem to be discriminatory and is completely random on what drive the error occurred. I logged all issues for awhile and there were no repeat offenders over any others.

I can certainly give that a go when I get some time, move half the drives to the back end of the chassis and on the other backplane and see if errors happen from one or both backplanes after a week or two.

I'm not sure what happened on Christmas and why there was so many errors, but now its down to maybe once a day or once every couple of days. Haven't been able to pinpoint what process or thing is responsible for the error, if its truly even a single thing.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
In another post, someone tracked down their similar issue to the firmware of their backplane being out of date. I would pursue this by noting the firmware revision, model, and s/ns of your backplanes and seeing if the OEM has something more recent for you to try.
 

Pray4Tre

Dabbler
Joined
May 24, 2023
Messages
16
In another post, someone tracked down their similar issue to the firmware of their backplane being out of date. I would pursue this by noting the firmware revision, model, and s/ns of your backplanes and seeing if the OEM has something more recent for you to try.
Great to know, I've sent a message over to supermicro support with this thread and my information to figure out how to check my firmware and update it.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Here is the thread I was referring to.
 
Top