Drives keep disconnecting

Status
Not open for further replies.

RChadwick

Dabbler
Joined
Jun 12, 2012
Messages
19
I installed FreeNAS 9.3 a while ago on a USB drive on my HP Microserver with 16GB. Recently, I installed 4 WD Green 4TB drives. After a short learning curve, everything seemed OK. I transferred a few TB of data on to it. Then, the green indicator at the upper right of the screen started blinking red. I don't remember the exact wording, but basically said something about drive ADA1 disconnected or failed. Also, it said (And still says):

CRITICAL: The volume MyPool (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

When I reboot, the drive comes back. This happened 3 or 4 times on the same drive, then it happened on ADA3 once. Currently, it has run overnight without any new error. I unchecked the above message, and the green light is back. I'm unsure how to get rid of the message.

The drives SMART are all OK. Before I plugged in the USB drive with FreeNAS, I ran Memtest over a long weekend, and then manually re-ran HD Tune repeatedly on each drive over another long weekend. I never saw the slightest hint of hardware problems. I also configured the drives to park every 300 seconds instead of every 8 seconds.

As I'm new to FreeNAS, I don't know where the log files are (I looked). Under 'View Volumes', it shows the status as Healthy.

I believe the RAID is a Z2.

Any ideas? Since these drives are not tuned for NAS use, maybe they are timing out? Any other ideas or solutions?

Thanks!
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Can you list your hardware please? especially the PSU.

Have tried to reseat the connectors (power and SATA, both ends)?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
As I'm new to FreeNAS, I don't know where the log files are
Under System | Advanced, there's a button labelled "Save Debug". If you click that it will dump an archive you can then attach to a forum post.
 

RChadwick

Dabbler
Joined
Jun 12, 2012
Messages
19
Thanks for the replies.
The hardware is a HP N36L Microserver. Since it has 4 hotswap trays, I'm assuming the PS should handle them. However, I'll bring it to my office Monday and put it on a load tester to be sure. I unplugged and replugged the motherboard connector, and removed and reinstalled the drives a number of times. I'll look over the log file (Been up less than a week, and already there is a LOT to go over), and report back with any more failures.
 

RChadwick

Dabbler
Joined
Jun 12, 2012
Messages
19
OK, I have some more news to report...
The power supply tested fine.
I noticed a lot of vibration coming from one of the disks. It could have been ADA1, although I'm not sure which drive is mapped where. I jammed some paper between the drive and the cage, and this seemed to completely stop the vibration, so I'm guessing it was some kind of resonance issue. Since I've done that, I haven't seen issues with any drive other than ADA1, and ADA1 no longer disconnects. However, ADA1 still shows an error, and I get these messages every half hour:

Nov 25 11:56:23 server smartd[2587]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Nov 25 11:56:23 server smartd[2587]: Device: /dev/ada1, 1 Offline uncorrectable sectors

I haven't taken the drive out yet to give it a thorough test, but on reboot the BIOS says the SMART is OK.

My guess at this point is that the vibration, combined with a possible bad connection, was disrupting communication with the drive. What do these errors mean? I'm guessing they are failing SMART somehow, or is this something else? If these are nothing to worry about, can I get rid of them?

Also, for future reference, can I swap drives? For instance, if ADA1 had issues, could I swap it with ADA0 to test the slot, but not break my volume?

Thanks!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No, that's not a communications issue, it is a bad disk sector. It may be repairable, but do keep an eye on the disk. Sometimes this is the sign of a drive about to fail.
 

RChadwick

Dabbler
Joined
Jun 12, 2012
Messages
19
Thanks for the response.

Perhaps, instead of a communication issue, it was a power issue?
If a drive is failing, I'll be the first to RMA it, but if the 'Bad' sectors were caused by power fluctuations, and may still be good, is there a way to get the drive to re-check them? If the drive is not failing, I'd hate to ignore this, and by extension ignore a real problem in the future.
 
Status
Not open for further replies.
Top