Upgrading my RAIDZ2 Pool

Ruff.Hi · Mar 10, 2023

I am upgrading my RAIDZ2 8 x 4tb HDD pool to 8 x 12tb HDDs. I just ordered 10x of these drives from Walmart for $200 each. I am very happy with that price, even happier when CapitalOne Shopping gave me a 15% cash back credit (so, each HDD cost $170).

I need to run them all through some burn-in testing. My 4tb took about 2 to 3 days per disk ... should I expect the 12tb to take 3x longer?

I will be tmux'ing them so they run concurrently ... but I only have 5 or 6 open slots so it will still take two batches.

Then slowly replacing each 4tb with a 12tb ... and wait, wait, wait. Looking forward to see the pool bloom from 24tb to 72tb (ok, 22tb to 66tb)

Arwen · Mar 10, 2023

After you test the disks, you can use 2 of the extra slots for replace in place. Basically ZFS will "mirror" the disk being replaced, (even on a RAID-Zx). This allows full redundancy to be maintained during the replacement process. Meaning if their is a problem, you still have both the disk to be replaced and the rest of the pool for redundancy.

I list doing just 2 disks at a time. Don't know if that will mean it takes less time. Their is also a ZFS option to delay next disk replacement until the current one is done, which I don't know if that is enabled on TrueNAS.

Last, ZFS disk replacement is based on how much data is in use. Unlike most other RAID schemes.

As for how long the disk test, I don't know how long it will take.

ChrisRJ · Mar 10, 2023

At least in my view a proper burn-in is much more than simply running a single pass (or perhaps 3 of them) of some test program. Of course, doing the latter is still much better than doing nothing of that kind!

Reason: The purpose of a burn-in is to weed out those drives that are either more or less dead-on-arrival, or die of "infant mortality" within the first couple of weeks. As to the latter, there is plenty of statistics that show that HDDs have a much higher likelihood to die during the first couple of weeks/montsh of operation. Once they survived this phase, your chances are pretty good for the next couple years. Provided you treat the drives well, of course. This, in turn, means that you should treat this initial period differently. Or in other words: This is your burn-in period.

I concede that I am perhaps a bit extreme with this view. It comes from a commercial background where loosing data is simply not an option. Of course there are still backups and other mechanisms. But those are additional layers of protection and no reason to be lax on another one.

If you think this is all too much for you, I am happy to hear that by that you can have a much simpler live. Still wanted to point out those thoughts.

Hope this helps.

Ruff.Hi · Mar 10, 2023

Thanks for the replies.

I am not sure if I want to replace 2 disks at once. I have the slots open .. I might google that and see what I find. I will let you know how long the burn-in runs on a 12tb HDD.

Agree about a disk surviving a few years ... and then just running and running. I have two that are over 7 years old now.

Good point re the initial deployment is also an extended burn-in ... I am not in any rush to completely move the pool to 12tb per disk so I can probably take my time. Data usage is 53%.

Ruff.Hi · Mar 14, 2023

10x 12tb HDDs arrived at lunch time.

Here is what I purchased according to the walmart page ...

Code:

5400RPM performance class 
Supports up to 180 TB/yr workload rate
NASware™ firmware for compatibility
Small or medium business NAS systems in a 24x7 environment
3-year limited warranty

Here is what I got ...

Code:

Host:                   NASDevl.local
OS:                     FreeBSD
Drive:                  /dev/da5
Disk Type:              7200_rpm
Drive Model:            WDC_WD120EFBX-68B0EN0

Should I worry about the 7200 v 5400?

Alecmascot · Mar 14, 2023

"5400RPM Performance CLASS " is 7200 in WD speak.

Ruff.Hi · Mar 14, 2023

yeah ... found this

WD Changes 5400-RPM Hard Drive Model Numbers, Corrects Reporting (Updated)

Correcting the record

www.tomshardware.com

Ruff.Hi · Mar 14, 2023

HDD Burn in has started. 8 drives all up and running for just over 3hrs. Bad block test is about 18% of the way in. I expect that to finish in about 5.5 days :|

Ruff.Hi · Mar 15, 2023

Eight HDDs being tested. Four are still running, four have exited the badblock test (exit after first error) and are into the Long SMART test that take 22 hours. All four that failed kicked off their SMART test at around 5:50am this morning ... meaning they all failed their badblock test at around the same time. That strikes me as suspicious.

They are in my Devl kit which is similar but smaller than my prod kit. I wonder if it is one of the Kingwin KF-4001-BK 3.5″ bays causing an issue. I will wait until the Long Smart test as finished and see what is what.

There are also reporting ...

Smartctl open device: /dev/da7 failed: INQUIRY failed

And before anyone asks, they are well within their warrantee period :) I have already registered them with WD and got their bounce back email ...

Warranty Date 04/06/2026

Ruff.Hi · Mar 15, 2023

Ruff.Hi said:
All four that failed kicked off their SMART test at around 5:50am this morning ... meaning they all failed their badblock test at around the same time. That strikes me as suspicious.

No ... not suspicious at all </sarcasm>

Ruff.Hi · Mar 16, 2023

12tb drives are going to take approximately 6.25 days to trawl through the BadBlocks testing.

A 4tb drive took 2.58 days. Early indications is that it isn't 3 times as slow.

Have to wait and see how the 6.25 days estimate stacks up.

Ruff.Hi · Mar 17, 2023

The 2nd batch of BadBlock disks all generated errors at about the same time / location. BadBlocks finished the first of 4 write / read passes and was about half way through the second pass of writing when errors were encounter. Time gap was 5:45, 5:50 and 2 x 5:51.

How finicky are these new, fandangled, 12tb drives?

I am now running long SMART tests on this batch.

Ruff.Hi · Mar 25, 2023

First of 8 has finally finished the badblocks test ...

Code:

+ Running badblocks test: Sat Mar 18 15:59:28 EDT 2023
+-----------------------------------------------------------------------------
Finished badblocks test
+-----------------------------------------------------------------------------
+ Running SMART long test: Sat Mar 25 15:17:42 EDT 2023

Ruff.Hi · Mar 30, 2023

8 x resliver completed. Pool status before ...

Code:

NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
Bank          10.9T  7.23T  3.68T        -         -     0%    66%  1.00x    ONLINE  /mnt
DuffleBag       29T  15.6T  13.4T        -         -     5%    53%  1.00x    ONLINE  /mnt
freenas-boot   111G  25.8G  85.2G        -         -     1%    23%  1.00x    ONLINE  -

... and after ...

Code:

NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
Bank          10.9T  7.23T  3.68T        -         -     0%    66%  1.00x    ONLINE  /mnt
DuffleBag     87.2T  15.6T  71.7T        -         -     1%    17%  1.00x    ONLINE  /mnt
freenas-boot   111G  25.8G  85.2G        -         -     1%    23%  1.00x    ONLINE  -

WI_Hedgehog · Mar 30, 2023

To catch up on a few of your questions:
Given the transfer rate maxes out somewhere around 150 MB/sec (it depends on the drive) yes, 12 TB drives take much longer than 4 TB...figure about a week (which is what you found).

Given it's Z2 I'd only replace 1 drive at a time. (Even if it were Z3 I'd only replace 1 at a time to reduce the chance of something going very, very badly)

If this is for home use you can perhaps "normally" do one run (badblocks -p 1) and reduce the burn-in time, although given your drives are throwing bad blocks you're better off doing -p 5 at minimum, or as many here would suggest -p 10.

Ruff.Hi said:
Eight HDDs being tested. Four are still running, four have exited the badblock test (exit after first error) and are into the Long SMART test that take 22 hours. All four that failed kicked off their SMART test at around 5:50am this morning ... meaning they all failed their badblock test at around the same time. That strikes me as suspicious.

Tell badblocks not to fail out when detecting an error, let it hammer the drive through all 4 write-read passes. If the drive starts throwing hundreds of errors you'll know it's a warranty claim.

I have to wonder what your drive temps are. Consider running:

echo "$drive : `smartctl --xall "$drive" | grep 'Current' | grep 'Temperature'`"

OPINION: Discount big-box stores often sell "factory seconds" for "low prices" and still make money doing so. Chances are you bought "serviceable" drives that "pass" S.M.A.R.T. testing so.... If you look at how platters are made it could be a batch has errors all at the same spot, "but block remapping makes up for it."

Important Announcement for the TrueNAS Community.

Upgrading my RAIDZ2 Pool

Ruff.Hi

Patron

Arwen

MVP

ChrisRJ

Wizard

Ruff.Hi

Patron

Ruff.Hi

Patron

Alecmascot

Guru

Ruff.Hi

Patron

WD Changes 5400-RPM Hard Drive Model Numbers, Corrects Reporting (Updated)

Ruff.Hi

Patron

Ruff.Hi

Patron

Ruff.Hi

Patron

Ruff.Hi

Patron

Ruff.Hi

Patron

Ruff.Hi

Patron

Ruff.Hi

Patron

WI_Hedgehog

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Upgrading my RAIDZ2 Pool

Patron

MVP

Wizard

Patron

Patron

Guru

Patron

Patron

Patron

Patron

Patron

Patron

Patron

Patron

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Upgrading my RAIDZ2 Pool"

Similar threads