Scrubbing takes very long time... normal?

Status
Not open for further replies.

lord.anonymous

Dabbler
Joined
Apr 6, 2012
Messages
38
Hello,

I've a 4*2Tb array, RAIDZ1, 76% using (~4Tb)
Usually, scrubbing takes 12-20 hours.
But last scrub is running for 17-18 days...
I restarted the system, but scrubbin restarts...

Someone can tell me if it's normal or not, and what can I do?

Thank you!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Neither one of the systems in your signature are using 8.3.0 release or newer. I'd start with upgrading your FreeNAS version and see what happens.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Hello,

I've a 4*2Tb array, RAIDZ1, 76% using (~4Tb)
Usually, scrubbing takes 12-20 hours.
But last scrub is running for 17-18 days...

17-18 DAYS?

Unless you're running other stuff at the same time and putting extra load on the system, my guess is that you have a drive that is failing.

I'd abort the scrub: zpool scrub -s poolname

Then run some smart long tests on your disks from the command line. If you don't know how to do that, search the forums, it's been posted a LOT.

After you abort the scrub, do a zpool status -v and see if there were any errors. If there were, you can use that as a place to begin which disk to test.
 

lord.anonymous

Dabbler
Joined
Apr 6, 2012
Messages
38
I did it.
No errors on volume...
A extended smart test is sheduled every month, and a short one every week, no error... I look it every week.

I use this command line:
smartctl -a /dev/ada... on each HDD in the pool, no errors.

So a this point, there's nothing to do I mean...

Thank you for the response!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
If you can't complete a scrub your data is at risk. Scrubs verify that your data is good, and in the event of a disk that fails and requires replacement, you will have to resilver. If you can't finish a scrub you shouldn't expect to finish a resilver which means any redundancy lost will never be able to be replaced. You do need to find the source of your problems and fix them.
 

lord.anonymous

Dabbler
Joined
Apr 6, 2012
Messages
38
I'll scrub the pool this night to see if there's the same problem.
 

lord.anonymous

Dabbler
Joined
Apr 6, 2012
Messages
38
Hi,

I started another scrub... it works since 700 hours...

Code:
[root@freenas] ~# zpool status DISK0

  pool: DISK0
 state: ONLINE
 scrub: scrub in progress for 704h13m, 100.00% done, 0h0m to go
config:

NAME        STATE     READ WRITE CKSUM
DISK0       ONLINE       0     0     0
  raidz1    ONLINE       0     0     0
    ada0p2  ONLINE       0     0     0
    ada1p2  ONLINE       0     0     0
    ada2p2  ONLINE       0     0     0
    ada3p2  ONLINE       0     0     0

errors: No known data errors


On ada1:
Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     11011         -
# 2  Short offline       Completed without error       00%     10844         -
# 3  Extended offline    Interrupted (host reset)      90%     10703         -
# 4  Short offline       Completed without error       00%     10670         -
# 5  Short offline       Completed without error       00%     10622         -
# 6  Short offline       Completed without error       00%     10457         -
# 7  Short offline       Completed without error       00%     10290         -
# 8  Short offline       Completed without error       00%     10118         -
# 9  Extended offline    Completed without error       00%      9980         -
#10  Short offline       Completed without error       00%      9950         -
#11  Short offline       Completed without error       00%      9879         -
#12  Short offline       Completed without error       00%      9711         -
#13  Short offline       Completed without error       00%      9543         -
#14  Short offline       Completed without error       00%      9375         -
#15  Extended offline    Completed without error       00%      9237         -
#16  Short offline       Completed without error       00%      9207         -
#17  Short offline       Completed without error       00%      9039         -
#18  Short offline       Completed without error       00%      8871         -
#19  Short offline       Completed without error       00%      8703         -
#20  Extended offline    Completed without error       00%      8565         -
#21  Short offline       Completed without error       00%      8535         -


On ada2 (before)
Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%      8622         -
# 2  Short offline       Completed without error       00%      8206         -
# 3  Short offline       Completed without error       00%      8158         -
# 4  Short offline       Completed without error       00%      7990         -
# 5  Short offline       Completed without error       00%      7824         -
# 6  Short offline       Completed without error       00%      7654         -
# 7  Extended offline    Completed without error       00%      7540         -
# 8  Short offline       Completed without error       00%      7486         -
# 9  Short offline       Completed without error       00%      7414         -
#10  Short offline       Completed without error       00%      7246         -
#11  Short offline       Completed without error       00%      7078         -
#12  Short offline       Completed without error       00%      6910         -
#13  Extended offline    Completed without error       00%      6796         -
#14  Short offline       Completed without error       00%      6742         -
#15  Short offline       Completed without error       00%      6575         -
#16  Short offline       Completed without error       00%      6407         -
#17  Short offline       Completed without error       00%      6239         -
#18  Extended offline    Completed without error       00%      6125         -
#19  Short offline       Completed without error       00%      6071         -
#20  Short offline       Completed without error       00%      6003         -
#21  Short offline       Completed without error       00%      5831         -


On ada2 (now)
Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%      8938         -
# 2  Short offline       Completed without error       00%      8206         -
# 3  Short offline       Completed without error       00%      8158         -
# 4  Short offline       Completed without error       00%      7990         -
# 5  Short offline       Completed without error       00%      7824         -
# 6  Short offline       Completed without error       00%      7654         -
# 7  Extended offline    Completed without error       00%      7540         -
# 8  Short offline       Completed without error       00%      7486         -
# 9  Short offline       Completed without error       00%      7414         -
#10  Short offline       Completed without error       00%      7246         -
#11  Short offline       Completed without error       00%      7078         -
#12  Short offline       Completed without error       00%      6910         -
#13  Extended offline    Completed without error       00%      6796         -
#14  Short offline       Completed without error       00%      6742         -
#15  Short offline       Completed without error       00%      6575         -
#16  Short offline       Completed without error       00%      6407         -
#17  Short offline       Completed without error       00%      6239         -
#18  Extended offline    Completed without error       00%      6125         -
#19  Short offline       Completed without error       00%      6071         -
#20  Short offline       Completed without error       00%      6003         -
#21  Short offline       Completed without error       00%      5831         -


Same problem on ada3. The extended test offline don't work correctly.

Can someone help me? What to do while scrub is running, what can I test?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You realize that the extended offline test may take many hours to run, yes? Some new Barra ST4000DM000's here took like 10 or 12 hours. Best to do from maybe singleuser mode or while the pool is not mounted.
 

lord.anonymous

Dabbler
Joined
Apr 6, 2012
Messages
38
I know this test can take few hours. This time, the test take more than 300 hours... :confused:

Maybe this problem is linked with the very long time for scrubbing?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
When you pull the drive, use the manufacturer's diagnostics tool on it. I would have thought it unlikely that the firmware SMART test would allow it to keep running if it was substantially outside the accepted behavioural envelope.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,525
Don't try to run a long test with a scrub running...
 

lord.anonymous

Dabbler
Joined
Apr 6, 2012
Messages
38
Its solved.

The problem was the scrub was starting while the long smartest was running.
Long smartest was scheduled on 1st of the month for the ada0, 2nd of the month for ada1, 3rd of the month for ada2, etc.
And scrub was scheduled every 35 days, so there was a moment while the scrub started when a long smartest was running on an HDD.

I solved it by upgrading to 8.3.1 version, and scheduling the scrub after the long smartest. It works perfectly now, scrub takes 9 hours for ~5Tb.

Thanks for you help!
 
Status
Not open for further replies.
Top