Should I be concerned - CAM Status: Uncorrectable parity/CRC error

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
At times I get loads of these, and then suddenly none.

my noob... tells me it is due to my HW problem I think I have, but I've now gone and scrubbed 3 drives that reported errors via seagate tools, through working ports and through ports where I thought was errors, with same cables and all passed, nota single error, making me wonder if I have bad sata ports, cables or HDD's...

But then I get these.

G
 

Attachments

  • IMG_2354 copy.jpg
    IMG_2354 copy.jpg
    105 KB · Views: 341

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
This is a HW issue, but tracking it down to specific cables, ports, or whatever will be the usual trip through Wonderland. You'll have to look at the disks, the cables and their runs, the cable terminations, the AHCI ports, and even your power supply.
  • Are all the disks securely fastened? Do any vibrate within their mounts?
  • Are all the disks getting power? Are you using splitters to provide power from the power supply to the disks?
  • Are all the SATA cables in good shape? Are any kinked, or show some kind of physical stress?
  • Are the cable runs routed cleanly? Do any lie next to known sources of electromagnetic interference, like inductors, magnets, etc.?
  • Are your connectors fully seated? Is there dust buildup in the connector sockets?
 
Last edited:

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
M<y prejudice would be bad cable, bad connection, possible bad port - all the stuff that Samuel has just written about.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
so i've swopped cables, hdd's and ports, i got 2 ports where it seems I'm stable, the other ports, i get errors random, even after changing cables and hdd's.

got a LSI on it's say, so won't be using the onboard ports anymore, got a refurbished MB on it's way also.

everything has been pulled apart and re-assembled.

done the normal try and find the problem. was mostly (as i don't know TrueNAS that well yet) if that is something I should be worried about, and was assuming thats a yes, just wanted to confirm.

G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
How about SMART tests... that would possibly eliminate a cable issue.

smartctl -a /dev/ada1

Assuming you already ran a short or long test first, otherwise, smartctl -t <short/long> /dev/ada1 and then wait the indicated time before the -a command.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
here is the strange one, smart passed, still have errors.
ended downloading the SeagateTools packet, you burn it onto a usb as bootable.
they have small, quick execute test, no damage your hdd. and then long test, runs for 7hours.
I did the 7 hours against 3 drives that reported problems, through ports which is not reporting problems, and then on ports where i had problems, all passed, with cables i had problems via, and cables no errors yet, all passed.
G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
smart passed, still have errors.
Can you share the SMART output... the result of PASS isn't necessarily an indication of no problem.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
Will do...
have to rerun it, as I ran it as part of Seagate diagnostic tool which just prints to screen.

will do command as per above. although the NAS is back together running,

I've sourced a refurbished replacement MB from local distributor, cheap... as they only have "stock for warranty's" on had now as this is a 7th gen intel board which is not sold anymore.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
... See attached. this is a drive that is currently reporting errors, and thank F for ZFS Raidz.

G
 

Attachments

  • smart_output.txt
    6.2 KB · Views: 228

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK; so read and seek errors are 0, so you're fine there (looking at the first 16 bits of 48 in fields 1 and 7).

Where you see the issue (199 UDMA_CRC_Error_Count 0x003e 200 196 000 Old_age Always - 576) is CRC, which can point to the SATA controller, cabling or some issue with the onboard controller on the disk (most likely cabling, but you seem to be saying you've eliminated that).

I do note that you didn't allow the long SMART test to complete, so you haven't finished a surface test to confirm all is good, but you would have recorded errors in the counters if you come across those as you read and write, so unlikely an issue here.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
...
I do note that you didn't allow the long SMART test to complete, so you haven't finished a surface test to confirm all is good, but you would have recorded errors in the counters if you come across those as you read and write, so unlikely an issue here
...
Re above, I did the Seagate Boot Tools testing, which included a 6 hour non destructive test, (shot and long tests) which all passed.

As much as I say I've tested the SATA ports and cables, I'm still getting errors so the only thing not swopped out yet is the MB (these are onboard SATA ports).

Not that I haven't allowed the long test to run, just been trying to keep it alive. while running these tests,
I'll try and run a long S.M.A.R.T over the weekend.

G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I did the Seagate Boot Tools testing, which included a 6 hour non destructive test, (shot and long tests) which all passed.
Strange it wasn't recorded in the SMART data on the disk in that case.

I'll try and run a long S.M.A.R.T over the weekend.
I think you can drop that for now and try to eliminate the other errors first.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
Strange it wasn't recorded in the SMART data on the disk in that case.
--> Keep in mind to do this test I disconnect all the other drives, including boot, I boot of the SeagateBoot Tools USB and run test. (On the HDD with data it was non invasive, first though the "known" to be ok ports and then a 2nd time through the known to have problems - came back clear, I then redid this set of tests with 3 x 4TB the is empty, not part of any vDev at the moment, I also did a complete read/write/test (which includes zeroing) ever block of this 2nd set, again all clear.

I think you can drop that for now and try to eliminate the other errors first.
--> OK
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I boot of the SeagateBoot Tools USB and run test.
But SMART testing results are stored on the drive itself in the area reserved for SMART data, That's what I meant... unless those Seagate Tools don't really use SMART as part of the diagnostics.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
The seagate tools runs off a small bootable usb. and it writes nothing back to the usb.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
Just an update,
got my MB swopped out, also replaced the SATA cables.
All running perfectly again, no more CRC errors etc.
Still waiting for the LSI card, will configure the 3 disconnected drives onto that card and then decide if I'm going to create a 2nd pool or add them to current pool via dedicated vdev.
G
 
Top