CaedenV
Dabbler
- Joined
- Jul 25, 2017
- Messages
- 15
Ok, so I am seeing an odd bit of behavior on my box
TLDR version:
When doing a scrub 2 drives always give errors, sometimes 2, and sometimes hundreds of thousands (currently up at near 800K). If there was a genuine issue I imagine it would be more consistant, but this wide range of checksum errors has me scratching my head. Also, there are no read or write errors being reported.
But the really odd thing is that both drives showing errors are reporting the exact same number of errors for both drives.... which is really weird.
Long Version:
I recently replaced the motherboard and added an 8th drive to the system (previous mobo had 7 SATA ports... which is dumb). So I backed everything up, destroid the array, and rebuild as a RAIDz2 across 8 3TB HDDs. Previous to the upgrade I had 1 drive giving errors, and I replaced it about a week before the mobo upgrade, and had no major issues. But after the upgrade I am consistently getting errors on ADA0 and ADA2. Everything 'seems' fine. Not seeing massive amounts of corruption (which I would expect with hundreds of thousands of errors). No major slow-downs or hiccups when in use.
But because I am getting the same errors on 2 drives it makes me wonder if perhaps it is a motherboard level issue. I am just at a loss on how to troubleshoot this issue.
Tonight I plan on moving the plugs around to see if I continue to get the errors on the same discs or same ports (assuming ADA0-7 line up with SATA 1-8... which I understand can be folly depending on the underlying hardware).... which makes me ask... how on earth do you get the system to give you the serial number?!?! I can see it on the disk list... why not use it for everything?!?! It makes it needlessly complicated when trying to troubleshoot.
Anywho; any thoughts, ideas, advice, etc would be greatly appreciated.
System info:
CPU: AMD A10 5800K
RAM: DDR3 4x8GB
HDDs: 2 old Seagates, 3 2 year old Seagates, and 3 referbed HGST enterprise drives All 3TB 7200rpm
-Note, one HGST, and one old Seagate are having the issue
OS: v9.10
[root@Fayth ~]# zpool status
pool: Spira
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub in progress since Thu Jul 27 19:05:11 2017
1.35T scanned out of 9.53T at 319M/s, 7h27m to go
64.6G repaired, 14.18% done
config:
NAME STATE READ WRITE CKSUM
Spira ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/23419a6d-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/2498fb5c-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/25d5f0a1-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/269c5571-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/277f1627-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/28e6c346-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/2a27458c-71ad-11e7-bd59-7085c240155c ONLINE 0 0 1.47M (repairing)
gptid/2afef4a8-71ad-11e7-bd59-7085c240155c ONLINE 0 0 1.47M (repairing)
errors: No known data errors
^^apparently millions of errors this time. yesterday's check netted 2 errors, and the day before was ~350k
TLDR version:
When doing a scrub 2 drives always give errors, sometimes 2, and sometimes hundreds of thousands (currently up at near 800K). If there was a genuine issue I imagine it would be more consistant, but this wide range of checksum errors has me scratching my head. Also, there are no read or write errors being reported.
But the really odd thing is that both drives showing errors are reporting the exact same number of errors for both drives.... which is really weird.
Long Version:
I recently replaced the motherboard and added an 8th drive to the system (previous mobo had 7 SATA ports... which is dumb). So I backed everything up, destroid the array, and rebuild as a RAIDz2 across 8 3TB HDDs. Previous to the upgrade I had 1 drive giving errors, and I replaced it about a week before the mobo upgrade, and had no major issues. But after the upgrade I am consistently getting errors on ADA0 and ADA2. Everything 'seems' fine. Not seeing massive amounts of corruption (which I would expect with hundreds of thousands of errors). No major slow-downs or hiccups when in use.
But because I am getting the same errors on 2 drives it makes me wonder if perhaps it is a motherboard level issue. I am just at a loss on how to troubleshoot this issue.
Tonight I plan on moving the plugs around to see if I continue to get the errors on the same discs or same ports (assuming ADA0-7 line up with SATA 1-8... which I understand can be folly depending on the underlying hardware).... which makes me ask... how on earth do you get the system to give you the serial number?!?! I can see it on the disk list... why not use it for everything?!?! It makes it needlessly complicated when trying to troubleshoot.
Anywho; any thoughts, ideas, advice, etc would be greatly appreciated.
System info:
CPU: AMD A10 5800K
RAM: DDR3 4x8GB
HDDs: 2 old Seagates, 3 2 year old Seagates, and 3 referbed HGST enterprise drives All 3TB 7200rpm
-Note, one HGST, and one old Seagate are having the issue
OS: v9.10
[root@Fayth ~]# zpool status
pool: Spira
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub in progress since Thu Jul 27 19:05:11 2017
1.35T scanned out of 9.53T at 319M/s, 7h27m to go
64.6G repaired, 14.18% done
config:
NAME STATE READ WRITE CKSUM
Spira ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/23419a6d-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/2498fb5c-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/25d5f0a1-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/269c5571-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/277f1627-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/28e6c346-71ad-11e7-bd59-7085c240155c ONLINE 0 0 0
gptid/2a27458c-71ad-11e7-bd59-7085c240155c ONLINE 0 0 1.47M (repairing)
gptid/2afef4a8-71ad-11e7-bd59-7085c240155c ONLINE 0 0 1.47M (repairing)
errors: No known data errors
^^apparently millions of errors this time. yesterday's check netted 2 errors, and the day before was ~350k