SOLVED Unreadable (pending) sectors

Status
Not open for further replies.

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
Hello,

I have a HP Proliant G8 server as my FreeNAS box. It's running four 1.5Tb Western Digital hard drives in raidz2 with no encryption. My issue is I keep seeing the error listed below in the console. This error is filling up the log and also triggered the status icon to blink red in the GUI. I performed a long S.M.A.R.T test on ada0 and it came back with no errors. The zpool shows that it's online with no issues. I'm not sure what to do from here. Can someone please assist?

System Information
Hostname freenas.local
Build FreeNAS-9.10.1 (d989edd)
Platform Intel(R) Celeron(R) CPU G1610T @ 2.30GHz
Memory 8000MB
System Time Sat Aug 27 17:07:24 EDT 2016
Uptime 5:07PM up 2 days, 18:08, 0 users
Load Average 0.31, 0.30, 0.31

Aug 27 09:31:47 freenas smartd[2818]: Device: /dev/ada0, 1 Currently unreadable (pending) sectors
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Can you please post the output of smartctl -a /dev/ada0 (In CODE tags)?
index.php


Also, please provide info as to the Motherboard and how the drives are attached (HBA, SATA, etc.).
 

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
I'm not sure of the info for the motherboard. I will continue to look. The drives are connected via SATA.

Code:
[root@freenas] ~# smartctl -a /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD15EARS-00MVWB0
Serial Number:    WD-WCAZA4482466
LU WWN Device Id: 5 0014ee 2058c2e46
Firmware Version: 51.0AB51
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sat Aug 27 20:24:28 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)    Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (28080) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 272) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x3035)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       429
  3 Spin_Up_Time            0x0027   253   170   021    Pre-fail  Always       -       950
  4 Start_Stop_Count        0x0032   087   087   000    Old_age   Always       -       13007
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   041   041   000    Old_age   Always       -       43597
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       162
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       154
193 Load_Cycle_Count        0x0032   080   080   000    Old_age   Always       -       360641
194 Temperature_Celsius     0x0022   116   095   000    Old_age   Always       -       34
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       5

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     43552         -
# 2  Extended offline    Completed without error       00%     43482         -
# 3  Extended offline    Aborted by host               80%     43477         -
# 4  Extended offline    Completed without error       00%     43444         -
# 5  Short offline       Completed without error       00%     43385         -
# 6  Short offline       Completed without error       00%     43217         -
# 7  Extended offline    Completed without error       00%     43099         -
# 8  Extended offline    Completed without error       00%     43081         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited by a moderator:

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
Well, two issues right off... those drives are *old* - nearly 5 years of constant run. Second, they are WD Green drives that haven't been properly tweaked with wdidle3 to prevent them from aggressively spinning down and parking heads, accounting for the high load cycle count.

Search wdidle3 and you'll find plenty of threads talking about the necessary adjustments. You should do these for any WD Green drives you're running.
 

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
After looking at some threads I see that my drive is very old as you stated, but I also see that a lot of the thresholds are over and the drive is in pre-fail mode. From what I understand, this drive is on the way out. Is this correct?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Well, the error message says you have one offline/pending sector, and that's correct--it's SMART attribute 197. That shouldn't be filling up your logs, as they rotate (daily, I think), and the system only checks SMART attributes every 30 minutes by default.

A single bad sector isn't cause for great concern, but combined with the drive's age and load cycle count, it would probably be a good time to consider a replacement. The manual has click-by-click instructions for drive replacement; follow them and you'll be fine. Do not, under any circumstances, use the Volume Manager.

I'd suggest replacing the drive with a WD Red or other drive designed for NAS use; then you won't have the load cycle count issue. If you do use a WD Green/Blue, find the WDIDLE3.EXE utility to adjust the park timer.

You should also set up regular SMART tests on all your drives. I run a short test daily and a long test weekly; both of those are probably a bit more frequent than is really needed. A bare minimum would be a short test weekly and a long test monthly.
 

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
Thanks for the helpful information. I found the instructions in the manual. My issue now is WD Red drives do not come in 1.5Tb sizes. I see instructions on how to expand the ZFS pool. Would I do that when replacing my 1.5Tb hard drive with a 2Tb WD red instead of following the replacing a failed drive instructions?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
No, you can replace a 1.5 TB drive with a 2 TB one without any problem ;)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Replacing a disk is replacing a disk--it doesn't matter whether it's your intent to replace a fail(ing|ed) disk, expand the pool, or both. The same instructions apply in either case. In either case, if you have a place to put a spare disk temporarily, it's better to do the replacement without first taking the original disk offline.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Your drive has lasted a long time and served you well. I'd say you got your money's worth. And it was a good thing that you had SMART testing setup and the email notifications. I wish all people did that. Replacing a drive on the verge of failure is so much better than replacing two drives after the pool had failed.
 

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
Thanks for all of the replies. I've ordered a WD Red 2TB hard drive from Amazon. It should be here Tuesday. I will keep everyone posted on the outcome.
 

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
I followed the instructions on replacing the hard drive to include placing the failing disk in offline. After selecting "replace" the GUI said "please wait". I've now lost the ability to use the GUI and SSH. I can, however, ping the box with no issues. The hard drive light shows no activity. Did I do something wrong?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Tell us exactly what steps you conducted and be as detailed as you can.
 

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
I resolved the issue. I plugged in a monitor and used shell to view the zpool status. The bad drive that I had already replaced showed offline even though I selected "replace" in the GUI and was told "please wait". The other three drives showed removed. I rebooted the box and used shell to bring the replacement disk online. It's now 63% done re-silvering. I think what happened is after I put ada0 offline, I pulled the drive with the box still on. I probably should have shut it down. The drive bay clearly states the drives are NOT hot swappable.

I followed these instructions to a T.

https://www.youtube.com/watch?v=c8bvtj-LQ_A
 
Last edited:

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
The drive has been rebuilt and everything looks good. One last question before I put this thread to rest. If you look below ada0 does not have the same naming convention as the other three hard drives. Should I be worried about this?

Code:
         raidz2-0                                      ONLINE       0     0   
0                                                                             
            ada0                                        ONLINE       0     0   
0                                                                             
            gptid/139aaa87-5a95-11e6-b5bb-a01d48c7f6cc  ONLINE       0     0   
0                                                                             
            gptid/14d64dc0-5a95-11e6-b5bb-a01d48c7f6cc  ONLINE       0     0   
0                                                                             
            gptid/15e3d016-5a95-11e6-b5bb-a01d48c7f6cc  ONLINE       0     0   
0                                                                             
                                                                               
errors: No known data errors                                                   
                                                                               
  pool: freenas-boot                                                           
state: ONLINE                                                                 
  scan: none requested                                                         
config:                                                                         
                                                                               
        NAME                                          STATE     READ WRITE CKSUM
        freenas-boot                                  ONLINE       0     0     0
          gptid/f1bb2077-5a90-11e6-b012-a01d48c7f6cc  ONLINE       0     0     0
                                                                               
errors: No known data errors                 
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
I pulled the drive with the box still on. I probably should have shut it down. The drive bay clearly states the drives are NOT hot swappable.
Even if your system was capable of doing a hot swap, don't if you can shut it down. It's much safer by turning things off and then changing out drives. Hot swap should be for those system which no kidding need to be up and running all the time.

Glad things worked out for you.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
The drive has been rebuilt and everything looks good. One last question before I put this thread to rest. If you look below ada0 does not have the same naming convention as the other three hard drives. Should I be worried about this?

Code:
         raidz2-0                                      ONLINE       0     0 
0                                                                           
            ada0                                        ONLINE       0     0 
0                                                                           
            gptid/139aaa87-5a95-11e6-b5bb-a01d48c7f6cc  ONLINE       0     0 
0                                                                           
            gptid/14d64dc0-5a95-11e6-b5bb-a01d48c7f6cc  ONLINE       0     0 
0                                                                           
            gptid/15e3d016-5a95-11e6-b5bb-a01d48c7f6cc  ONLINE       0     0 
0                                                                           
                                                                             
errors: No known data errors                                                 
                                                                             
  pool: freenas-boot                                                         
state: ONLINE                                                               
  scan: none requested                                                       
config:                                                                       
                                                                             
        NAME                                          STATE     READ WRITE CKSUM
        freenas-boot                                  ONLINE       0     0     0
          gptid/f1bb2077-5a90-11e6-b012-a01d48c7f6cc  ONLINE       0     0     0
                                                                             
errors: No known data errors               

Yes, actually... did you use the gui to replace the drive?

Th
 

Joseph Lennemann

Explorer
Joined
Aug 27, 2016
Messages
69
I used shell. I explained what happened above to cause me to use shell. Long story short I made a mistake, but recovered from it. My main concern now is the naming convention was not followed for ada0. I'm not sure if that's going to cause any issues.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I used shell. I explained what happened above to cause me to use shell. Long story short I made a mistake, but recovered from it. My main concern now is the naming convention was not followed for ada0. I'm not sure if that's going to cause any issues.

So, if you offline the drive and wipe it, you should be able to re-ad it via replace via the gui.

If you add it via cli, you need to add it via gptid. I believe it's possible for the device to get lost otherwise down the track.

Be *sure* you offline the right one ;)

I did this once too.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
My main concern now is the naming convention was not followed for ada0.
The reason for this is almost certainly that you didn't specify the device properly. You did zpool replace tank olddev ada0, not zpool replace tank olddev gptid/blah. And this not only means that you didn't follow the youtube instructions "to a T", you weren't even in the same county--the youtube video has you using the GUI to replace, not the CLI.

The odds of this actually causing a problem are pretty low. If the device name changes at some point (and becomes ada2, for example), ZFS will still pick it up and everything will be fine. It doesn't have the swap partition that disks added to the pool properly have, but FreeNAS shouldn't be using swap anyway.
 
Status
Not open for further replies.
Top