Currently unreadable (pending) sectors

Kylev

Cadet
Joined
Aug 6, 2022
Messages
9
Hi all,

So, I've recently installed my new NAS with TrueNAS Scale a few weeks ago. Currently, I have 4 2 TB Barracuda (Seagate Barracuda 2 TB Internal Hard Drive HDD - 6gb/s, 7200 rpm, 256 mb cache) with AMD 3000G, 8GB RAM, B450M DS3H. On August 8th, I got an alert saying that one of my disks was having issues: "Device: /dev/sdb [SAT], 8 Currently unreadable (pending) sectors) & 8 Offline uncorrectable sectors. I was kind of expecting something as such since it is an old disk and I confirmed it was the old disk but basically my other 3 disks are "brand new" and just now I got the same messages for one of the new disks.

I've been reading posts and trying finding something relevant to this but couldn't find anything.

After some research, I came across this is sometimes a bug from TrueNAS but I'm not sure about that. Can this happen even if the disks are "brand new"? I'm aware some disks may come bad due to shipping or just bad.

I did the SMART tests and noticed that the line for Current_Pending_Error was "0" for the new disk with the (pending) error message. On the other hand, the old disk do have 8 pending sectors.

Your help is greatly appreciate.

All the best,
Kyle

 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
Hello,

this is sometimes a bug from TrueNAS
I haven't heard of that and never experienced it... But I'm still on a old FreeNAS version so hard to say. :smile:


But the first thing I would do is make sure the backups are right!
Then I would focus on the disks, trying to understand.
Starting with the one with the 8 pending sectors and run a series of SMART long tests and badblocks (in destructive mode) and see what comes out of it.
If the disk is dying, then I would replace it and then move to the next disk. Same game here, long SMART and badblocks and see what comes out of it.


In any cases, for new and second hand drives, you should do a proper burn-in of them (here and here some resources about this topic) before using them.
 

Kylev

Cadet
Joined
Aug 6, 2022
Messages
9
I haven't heard of that and never experienced it... But I'm still on a old FreeNAS version so hard to say. :smile:
Thank you. Like I said, I was not sure about this because literally one person said "it is because of a bug" and I continued searching but couldn't find another user claiming the same.

Starting with the one with the 8 pending sectors and run a series of SMART long tests and badblocks (in destructive mode)
I'm currently running one long SMART test for the 8 pending sector hard drive. After, I will see what comes out of it. Do you think I would have to do it a few times just to make sure or do you think by doing just one would be enough? Thank you for the links by the way.
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
Do you think I would have to do it a few times just to make sure or do you think by doing just one would be enough?
I think you should run badblocks because SMART only does a reading test which might not be enough to confirm a pending sector.

That's why the sequence: long SMART, badblocks (in write mode), long SMART, is recommended.
You're lucky to have 2TB drives only.... it doesn't take too long... :tongue: (but you're still good for a few days of testing!)

Note: be very careful with the badblocks test if you do it on the same machine, not to run it on the wrong drive!!! Recommendation here is to run it on a separate machine but that's not always possible... (then physically disconnect the other drives)
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Other possible causes of issues:
These Barracuda being SMR.
AMD CPU with C6 state and sleep features enabled in BIOS (though I'm not sure if this applies to SCALE as it does to CORE).
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I can only emphasize what @Pitfrr already mentioned: Your first (and until it is done your only) priority should be the backup.

To get a pending unreadable sector "surface" to the OS/file system level, not one but several things have gone wrong usually.

There is a chance that everything is well, but betting on it is risky.

Good luck!
 

Kylev

Cadet
Joined
Aug 6, 2022
Messages
9
That's why the sequence: long SMART, badblocks (in write mode), long SMART, is recommended.
Got it. Thank you for the recommendation
Note: be very careful with the badblocks test if you do it on the same machine, not to run it on the wrong drive!!! Recommendation here is to run it on a separate machine but that's not always possible... (then physically disconnect the other drives)
I have another machine available but I'd like to do it on the same machine (of course, will make sure of the backup). Once the test is over, and there are indeed bad sectors in the hard drive, what could be determine? Buy another disk for replacement?
 

Kylev

Cadet
Joined
Aug 6, 2022
Messages
9
I can only emphasize what @Pitfrr already mentioned: Your first (and until it is done your only) priority should be the backup.
Thank you. I'm doing it before doing before anything else bad happens and before doing some testing with it.

There is a chance that everything is well, but betting on it is risky.
What surprises me is the fact that one of the "pending sectors" alerts are coming from one of the new disks (not even 3 weeks old). Like I said, disks are not perfect and the because they're "new" doesn't mean it's not going to have problems.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
What surprises me is the fact that one of the "pending sectors" alerts are coming from one of the new disks (not even 3 weeks old). Like I said, disks are not perfect and the because they're "new" doesn't mean it's not going to have problems.
Absolutely. New disks have a rather high probability for problems. Certainly much bigger than something 3 years old.
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
Once the test is over, and there are indeed bad sectors in the hard drive, what could be determine? Buy another disk for replacement?
Well that's up to the data you have on the disk... :smile:
Let's just assume you don't care about it (but otherwise I guess you wouldn't use TrueNAS), then don't do anything! :-D
But otherwise, yes the best course of action is to replace the drive.
You can then use this drive as backup drive, but I would closely monitor the attributes of pending sectors and reallocated sectors. If still under warranty then the drive would be a candidate for RMA.
 

Kylev

Cadet
Joined
Aug 6, 2022
Messages
9
You can then use this drive as backup drive, but I would closely monitor the attributes of pending sectors and reallocated sectors. If still under warranty then the drive would be a candidate for RMA.
Great! I would use it as a backup. Actually, I took one of the bad drives out of the NAS and plugged it into my computer yesterday and I ran a software that “detects” bad sectors. It spotted one.

Currently, I ran long smart tests in two drives already. Today, I will run the badblocks as you recommended (long, badblocks, long). Hopefully it doesn’t take thaaaat long.

Thanks for all your help, I really appreciate it.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Are they SMR drives?
Can you post the exact model number please?
Badblocks takes a long time - from memory 5+ days on my 12TB drives. But it is a thorough test. Not sure what SMR would do to a badblocks test - but I would have thought it will take weeks+
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
There is one important thing about bad sectors being detected, that sometimes gets overlooked: All disks shipped during the last 25 years or so, have an internal mechanism to re-map bad sectors. They contain a certain amount of capacity that never shows up to the outside. The whole purpose of those sectors is to serve as spares, should a "regular" sector turn bad. In that case the disk will transparently use the contingency sector. This happens in a completely invisible way to the disk controller/HBA.

What this means in turn, is that, when the first bad sector shows up during a SMART test, it is actually not the first sector having turned bad. In fact several thousand sectors have gone bad already. But the reservoir of backup sectors has been depleted and the fact that something is badly wrong cannot be hidden any longer.
 

Kylev

Cadet
Joined
Aug 6, 2022
Messages
9
What this means in turn, is that, when the first bad sector shows up during a SMART test, it is actually not the first sector having turned bad. In fact several thousand sectors have gone bad already. But the reservoir of backup sectors has been depleted and the fact that something is badly wrong cannot be hidden any longer.
That makes sense. The mechanism already repaired some bad sectors before NAS or any other software could even detect this.

Thanks for bringing that up!
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
@ChrisRJ
BUT pending sectors are sectors that may be bad. Normally SMART found, but nothing will be done with them until the system tries to write to the sector at which point it can be mapped out or recovered (preferably mapped out)
At least that how I understand it
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
@NugentS , I am not sure where the difference is between your and my understanding. Can you please elaborate?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
@ChrisRJ A pending sector does not imply that the available mapping space is used up. It says simply that this may be a bad sector. Its when you try to use that sector that it may or may not get mapped out by the disk OS.

Don't get me wrong - its not a good thing to see. But its not terminal in itself
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Bottom line: Irrective of their health, all of these SMR drives have to be replaced in the pool by CMR drives.
And in view of their dubious/defective sectors I'd be very wary to re-deploy them for non-ZFS applications.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
@NugentS , then we have a slightly different view. My understanding has always been, that once a bad sector shows up on the OS level, the disk itself has run out of options to deal with it internally. The latter implies to me that a lot has gone wrong before.

But I will certainly not claim that with absolute certainty my view is the correct one. It is what I have gathered from different bits and pieces over the years/decades.
 
Top