1 Currently Unreadable (pending) sectors

mattlach · Dec 7, 2012

Hey all,

So I haven't encountered this issue before.

This is the repeated message I am getting:

Is my disk pretty much a goner in need of replacement, or is there some way to try to scan and repair my way out of this one?

Thanks,
Matt

BobCochran · Dec 7, 2012

Well, I will defer to more experienced forum members, but the smartd message does not look real good to me, even though I do not know what that "pending" means. Unless someone else corrects me, I would treat the message as a sign that the disk is failing. If this disk is part of a ZFS RAIDed volume, and the RAID array is sufficiently large, you should be able to pull the bad disk and replace it with a new one, and then start the resilvering process.

Suppose you don't have a RAID array. You could get a new replacement disk. Use ddrescue to clone the bad disk to the new disk and see if it can recover. If there is still a problem, consider using testdisk on the clone disk, never the original disk, to see if that helps. There are no guarantees here. Hopefully you have a backup of your data. I do quite a lot of data recovery with ddrescue. Sometimes, I get lucky. I like ddrescue so much that I turn to it right away if I suspect a crashed source drive.

Bob

mattlach · Dec 7, 2012

BobCochran said:
Well, I will defer to more experienced forum members, but the smartd message does not look real good to me, even though I do not know what that "pending" means. Unless someone else corrects me, I would treat the message as a sign that the disk is failing. If this disk is part of a ZFS RAIDed volume, and the RAID array is sufficiently large, you should be able to pull the bad disk and replace it with a new one, and then start the resilvering process.

Suppose you don't have a RAID array. You could get a new replacement disk. Use ddrescue to clone the bad disk to the new disk and see if it can recover. If there is still a problem, consider using testdisk on the clone disk, never the original disk, to see if that helps. There are no guarantees here. Hopefully you have a backup of your data. I do quite a lot of data recovery with ddrescue. Sometimes, I get lucky. I like ddrescue so much that I turn to it right away if I suspect a crashed source drive.

Bob

Thank you for your suggestions

This is a RAIDz2 array, so I have two redundant drives.

I am aware I can replace the drive. I am - however - hoping I may be able to run some sort of repair scan, and fix it rather than replacing it all together.

The behavior of the array seems to be unaffected. As if nothing is wrong. The only indication are the repeated console smartd error messages.

bollar · Dec 7, 2012

I would say that the drive is statistically more likely to fail now. It may never fail, but it's now more likely. I wouldn't expect array performance to be affected at this time and if it generates errors, ZFS should be able to handle them.

If you're not interested in replacing the drive now, I'd keep an eye on it and be ready to replace it without a lot of notice. These errors may start to accumulate rapidly, and the disk might fail. Or it might not.

As for me, if I see more than two, I prepare to replace the drive -- so I can do it when the system isn't busy, I know I have good, current backups and don't have to do it in a crisis mode, should it fail.

warri · Dec 8, 2012

This question already came up quite often. I'm not going to requote the answers, please do a forum search to find the relevant threads.

Joshua Parker Ruehlig · Dec 8, 2012

Yes, please seaarch, jst wanted to get this off my chest so you're lucky.
IMHO 1 unreadable sector is not a reason to replace a HDD, if you are running with double redundancy I would just fix the problem wthout even effecting uptime. I fixed this problem without having to replace the HDD (though I did order a spare immediately cause I was scared of the unknown), and the HDD runs just fine now.

Going to outline my steps here for future refrence..

Run a long selftest on the disk ada#, it should fail somewhere

Code:

smartctl -t long /dev/ada#

check the smart information for the unreadable sector, lets call it 'X'

Code:

smartctl -A /dev/ada#

change the syscontrol and try writing to the sector. Change the 'X' below

Code:

sysctl kern.geom.debugflags=16
dd if=/dev/zero of=/dev/ada# bs=4096 count=1 seek=X conv=noerror,sync

check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat the steps above multiple times if there are multiple unreadable sectors..

Code:

smartctl -A /dev/ada#

Now run another smart test and hopefully it can complete without error.

Code:

smartctl -t long /dev/ada#
smartctl -A /dev/ada#

Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.

Code:

zpool status -v poolname

mattlach · Dec 8, 2012

Joshua Parker Ruehlig said:
Yes, please seaarch, jst wanted to get this off my chest so you're lucky.
IMHO 1 unreadable sector is not a reason to replace a HDD, if you are running with double redundancy I would just fix the problem wthout even effecting uptime. I fixed this problem without having to replace the HDD (though I did order a spare immediately cause I was scared of the unknown), and the HDD runs just fine now.

Going to outline my steps here for future refrence..

Run a long selftest on the disk ada#, it should fail somewhere

Code:
smartctl -t long /dev/ada#

check the smart information for the unreadable sector, lets call it 'X'

Code:
smartctl -A /dev/ada#

change the syscontrol and try writing to the sector. Change the 'X' below

Code:
sysctl kern.geom.debugflags=16 dd if=/dev/zero of=/dev/ada# bs=4096 count=1 seek=X conv=noerror,sync

check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat the steps above multiple times if there are multiple unreadable sectors..

Code:
smartctl -A /dev/ada#

Now run another smart test and hopefully it can complete without error.

Code:
smartctl -t long /dev/ada# smartctl -A /dev/ada#

Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.

Code:
zpool status -v poolname

Thank you, I did a quick search, but and found some threads discussing the topic, but didn't come up with any real solutions.

I appreciate your guidance.

Should I remove the drive from the volume before taking these steps, or can I do this with the drive still part of the volume?

Thanks,
Matt

Stephens · Dec 8, 2012

Joshua Parker Ruehlig said:
IMHO 1 unreadable sector is not a reason to replace a HDD, if you are running with double redundancy I would just fix the problem wthout even effecting uptime. I fixed this problem without having to replace the HDD (though I did order a spare immediately cause I was scared of the unknown), and the HDD runs just fine now.

FAQ-worthy.

bollar · Dec 8, 2012

Stephens said:
FAQ-worthy.

I agree! I love this stuff.

It needs to be emphasized, though, that does doesn't change the probability that the disk will eventually have a catastrophic failure. You'll want that spare disk at hand, "just in case."

xic044 · Oct 24, 2013

Hello Guys,

I was slowly building my FreeNAS box. For a while I had the whole system setup by I had only 1 disk drive (ada0) so I could at least play with the system. Now I bought 5 more drives and installed in the system. Now I am receiving plenty of these "currently unreadable (pending) sectors" refering 3 out of my six disks, including the ada0 that never had any error pointed out before.

At the moment I am running Joshua Parker Ruehlig sugested test commands on the 3 disks.

I am quite doubting that I have bad disks because 2 of them are brand new ( I know many are shipped bad or get dammaged on transport) other wise I am have a big bad luck. :-(

A question raised on my mind: This error could be generated by the SATA PCI controller card that the 3 disks are attached to? I plugged 3 drives on the motherboard controller and the 3 other on the pci card. I read sometime ago a user saying that PCI cards are too slow for the purpose of a FreeNAS box. Is that true?

I will bring more info when I finish doin Joshua's procedure.

Any input from you guys is apreciated.

Thanks

Dusan · Oct 24, 2013

xic044 said:
This error could be generated by the SATA PCI controller card that the 3 disks are attached to?

Nope, the "currently unreadable (pending) sectors" error can't be caused by the controller.

xic044 · Oct 24, 2013

Thanks for the info.

xic044 · Oct 24, 2013

Is LBA_of_first_error value the sector number?

Why dd operation is not being supported?

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 50% 428 654089760

[root@freenas] ~# sysctl kern.geom.debugflags=16
kern.geom.debugflags: 0 -> 16

[root@freenas] ~# dd if=/dev/zero of=/dev/ada# bs=4096 count=1 seek=654089760 conv=noerror,sync
dd: /dev/ada#: Operation not supported

tried standard block size here just for a try:

[root@freenas] ~# dd if=/dev/zero of=/dev/ada# bs=512 count=1 seek=654089760 conv=noerror,sync
dd: /dev/ada#: Operation not supported

**************************************
Sorry.. had forgoten to change device name
******************************************

xic044 · Oct 24, 2013

Did Joshua procedure on one of my disk that had 3 unreadable sectors and solved the problem! Will do on the other 2 disks.. ( one has 9 sectors the other 30 .. :-( ...) Why brand new disks showed problems at the begining? Should I low level format new disk before adding to the system?

*************************
..By the way, I had to change block size to 512k on the dd command..

*************************

Paul Martin · Feb 8, 2014

One of my WD20EARS has been giving me this error for a few weeks. The drive is otherwise fine (tested good) but is out of warranty soon. Is this sufficient grounds for an RMA?

BobCochran · Feb 8, 2014

I don't think you can "RMA" an out of warranty drive. I'd simply replace the drive entirely. I've done that once already for myself and while I did not enjoy spending the money, I have much-needed storage.

joeschmuck · Feb 8, 2014

Paul Martin said:
One of my WD20EARS has been giving me this error for a few weeks. The drive is otherwise fine (tested good) but is out of warranty soon. Is this sufficient grounds for an RMA?

If your drive is still under warranty as you indicated, doesn't matter if it expires next week so long as you start the claim process before it expires, then you could ship it back to the manufacturer for replacement. I prefer the method where I give them my credit card info and they ship me a drive and they include a paid return label and I reuse the box the new drive arrived in to return the old drive. Keep something in mind, you will be getting a refurbished replacement drive, you rarely hear someone getting a new RMA drive.

Paul Martin · Feb 8, 2014

The drive is not out of warranty yet. It will be in a month or two. I would do an Advance RMA and have WD ship me a replacement drive before I send this one back.

Paul Martin · Feb 8, 2014

@joeschmuck that would be the plan. Is this a good enough reason to file for one though?

joeschmuck · Feb 8, 2014

WD is pretty good about accepting drives. I've RMA'd a few in the past that I couldn't prove were causing me issues but I had no problems with WD. You could also include a printout which states what is wrong with the drive, that you are intermittently getting error messages about unreadable pending sectors, or whatever you specific message is.

Important Announcement for the TrueNAS Community.

1 Currently Unreadable (pending) sectors

Patron

Contributor

Patron

Patron

Guru

Hall of Famer

Patron

Patron

Patron

Dabbler

Guru

Dabbler

Dabbler

Dabbler

Dabbler

Contributor

Old Man

Dabbler

Dabbler

Old Man

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "1 Currently Unreadable (pending) sectors"

Similar threads