1 Currently Unreadable (pending) sectors

Status
Not open for further replies.

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Hey all,

So I haven't encountered this issue before.

This is the repeated message I am getting:

8253954822_ffe4d1d18e_o.jpg


Is my disk pretty much a goner in need of replacement, or is there some way to try to scan and repair my way out of this one?

Thanks,
Matt
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Well, I will defer to more experienced forum members, but the smartd message does not look real good to me, even though I do not know what that "pending" means. Unless someone else corrects me, I would treat the message as a sign that the disk is failing. If this disk is part of a ZFS RAIDed volume, and the RAID array is sufficiently large, you should be able to pull the bad disk and replace it with a new one, and then start the resilvering process.

Suppose you don't have a RAID array. You could get a new replacement disk. Use ddrescue to clone the bad disk to the new disk and see if it can recover. If there is still a problem, consider using testdisk on the clone disk, never the original disk, to see if that helps. There are no guarantees here. Hopefully you have a backup of your data. I do quite a lot of data recovery with ddrescue. Sometimes, I get lucky. I like ddrescue so much that I turn to it right away if I suspect a crashed source drive.

Bob
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Well, I will defer to more experienced forum members, but the smartd message does not look real good to me, even though I do not know what that "pending" means. Unless someone else corrects me, I would treat the message as a sign that the disk is failing. If this disk is part of a ZFS RAIDed volume, and the RAID array is sufficiently large, you should be able to pull the bad disk and replace it with a new one, and then start the resilvering process.

Suppose you don't have a RAID array. You could get a new replacement disk. Use ddrescue to clone the bad disk to the new disk and see if it can recover. If there is still a problem, consider using testdisk on the clone disk, never the original disk, to see if that helps. There are no guarantees here. Hopefully you have a backup of your data. I do quite a lot of data recovery with ddrescue. Sometimes, I get lucky. I like ddrescue so much that I turn to it right away if I suspect a crashed source drive.

Bob

Thank you for your suggestions

This is a RAIDz2 array, so I have two redundant drives.

I am aware I can replace the drive. I am - however - hoping I may be able to run some sort of repair scan, and fix it rather than replacing it all together.

The behavior of the array seems to be unaffected. As if nothing is wrong. The only indication are the repeated console smartd error messages.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
I would say that the drive is statistically more likely to fail now. It may never fail, but it's now more likely. I wouldn't expect array performance to be affected at this time and if it generates errors, ZFS should be able to handle them.

If you're not interested in replacing the drive now, I'd keep an eye on it and be ready to replace it without a lot of notice. These errors may start to accumulate rapidly, and the disk might fail. Or it might not.

As for me, if I see more than two, I prepare to replace the drive -- so I can do it when the system isn't busy, I know I have good, current backups and don't have to do it in a crisis mode, should it fail.
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
This question already came up quite often. I'm not going to requote the answers, please do a forum search to find the relevant threads.
 

Joshua Parker Ruehlig

Hall of Famer
Joined
Dec 5, 2011
Messages
5,949
Yes, please seaarch, jst wanted to get this off my chest so you're lucky.
IMHO 1 unreadable sector is not a reason to replace a HDD, if you are running with double redundancy I would just fix the problem wthout even effecting uptime. I fixed this problem without having to replace the HDD (though I did order a spare immediately cause I was scared of the unknown), and the HDD runs just fine now.

Going to outline my steps here for future refrence..

Run a long selftest on the disk ada#, it should fail somewhere
Code:
smartctl -t long /dev/ada#


check the smart information for the unreadable sector, lets call it 'X'
Code:
smartctl -A /dev/ada#


change the syscontrol and try writing to the sector. Change the 'X' below
Code:
sysctl kern.geom.debugflags=16
dd if=/dev/zero of=/dev/ada# bs=4096 count=1 seek=X conv=noerror,sync


check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat the steps above multiple times if there are multiple unreadable sectors..
Code:
smartctl -A /dev/ada#


Now run another smart test and hopefully it can complete without error.
Code:
smartctl -t long /dev/ada#
smartctl -A /dev/ada#


Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.
Code:
zpool status -v poolname
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
Yes, please seaarch, jst wanted to get this off my chest so you're lucky.
IMHO 1 unreadable sector is not a reason to replace a HDD, if you are running with double redundancy I would just fix the problem wthout even effecting uptime. I fixed this problem without having to replace the HDD (though I did order a spare immediately cause I was scared of the unknown), and the HDD runs just fine now.

Going to outline my steps here for future refrence..

Run a long selftest on the disk ada#, it should fail somewhere
Code:
smartctl -t long /dev/ada#


check the smart information for the unreadable sector, lets call it 'X'
Code:
smartctl -A /dev/ada#


change the syscontrol and try writing to the sector. Change the 'X' below
Code:
sysctl kern.geom.debugflags=16
dd if=/dev/zero of=/dev/ada# bs=4096 count=1 seek=X conv=noerror,sync


check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat the steps above multiple times if there are multiple unreadable sectors..
Code:
smartctl -A /dev/ada#


Now run another smart test and hopefully it can complete without error.
Code:
smartctl -t long /dev/ada#
smartctl -A /dev/ada#


Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.
Code:
zpool status -v poolname


Thank you, I did a quick search, but and found some threads discussing the topic, but didn't come up with any real solutions.

I appreciate your guidance.

Should I remove the drive from the volume before taking these steps, or can I do this with the drive still part of the volume?

Thanks,
Matt
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
IMHO 1 unreadable sector is not a reason to replace a HDD, if you are running with double redundancy I would just fix the problem wthout even effecting uptime. I fixed this problem without having to replace the HDD (though I did order a spare immediately cause I was scared of the unknown), and the HDD runs just fine now.

FAQ-worthy.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
FAQ-worthy.

I agree! I love this stuff.

It needs to be emphasized, though, that does doesn't change the probability that the disk will eventually have a catastrophic failure. You'll want that spare disk at hand, "just in case."
 

xic044

Dabbler
Joined
Oct 24, 2013
Messages
16
Hello Guys,

I was slowly building my FreeNAS box. For a while I had the whole system setup by I had only 1 disk drive (ada0) so I could at least play with the system. Now I bought 5 more drives and installed in the system. Now I am receiving plenty of these "currently unreadable (pending) sectors" refering 3 out of my six disks, including the ada0 that never had any error pointed out before.

At the moment I am running Joshua Parker Ruehlig sugested test commands on the 3 disks.

I am quite doubting that I have bad disks because 2 of them are brand new ( I know many are shipped bad or get dammaged on transport) other wise I am have a big bad luck. :-(

A question raised on my mind: This error could be generated by the SATA PCI controller card that the 3 disks are attached to? I plugged 3 drives on the motherboard controller and the 3 other on the pci card. I read sometime ago a user saying that PCI cards are too slow for the purpose of a FreeNAS box. Is that true?

I will bring more info when I finish doin Joshua's procedure.


Any input from you guys is apreciated.


Thanks
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
This error could be generated by the SATA PCI controller card that the 3 disks are attached to?
Nope, the "currently unreadable (pending) sectors" error can't be caused by the controller.
 

xic044

Dabbler
Joined
Oct 24, 2013
Messages
16
Is LBA_of_first_error value the sector number?

Why dd operation is not being supported?


=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 50% 428 654089760


[root@freenas] ~# sysctl kern.geom.debugflags=16
kern.geom.debugflags: 0 -> 16

[root@freenas] ~# dd if=/dev/zero of=/dev/ada# bs=4096 count=1 seek=654089760 conv=noerror,sync
dd: /dev/ada#: Operation not supported

tried standard block size here just for a try:

[root@freenas] ~# dd if=/dev/zero of=/dev/ada# bs=512 count=1 seek=654089760 conv=noerror,sync
dd: /dev/ada#: Operation not supported

**************************************
Sorry.. had forgoten to change device name
******************************************
 

xic044

Dabbler
Joined
Oct 24, 2013
Messages
16
Did Joshua procedure on one of my disk that had 3 unreadable sectors and solved the problem! Will do on the other 2 disks.. ( one has 9 sectors the other 30 .. :-( ...) Why brand new disks showed problems at the begining? Should I low level format new disk before adding to the system?


*************************
..By the way, I had to change block size to 512k on the dd command..

*************************
 

Paul Martin

Dabbler
Joined
Nov 13, 2013
Messages
10
One of my WD20EARS has been giving me this error for a few weeks. The drive is otherwise fine (tested good) but is out of warranty soon. Is this sufficient grounds for an RMA?
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
I don't think you can "RMA" an out of warranty drive. I'd simply replace the drive entirely. I've done that once already for myself and while I did not enjoy spending the money, I have much-needed storage.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
One of my WD20EARS has been giving me this error for a few weeks. The drive is otherwise fine (tested good) but is out of warranty soon. Is this sufficient grounds for an RMA?
If your drive is still under warranty as you indicated, doesn't matter if it expires next week so long as you start the claim process before it expires, then you could ship it back to the manufacturer for replacement. I prefer the method where I give them my credit card info and they ship me a drive and they include a paid return label and I reuse the box the new drive arrived in to return the old drive. Keep something in mind, you will be getting a refurbished replacement drive, you rarely hear someone getting a new RMA drive.
 

Paul Martin

Dabbler
Joined
Nov 13, 2013
Messages
10
The drive is not out of warranty yet. It will be in a month or two. I would do an Advance RMA and have WD ship me a replacement drive before I send this one back.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
WD is pretty good about accepting drives. I've RMA'd a few in the past that I couldn't prove were causing me issues but I had no problems with WD. You could also include a printout which states what is wrong with the drive, that you are intermittently getting error messages about unreadable pending sectors, or whatever you specific message is.
 
Status
Not open for further replies.
Top