[13.0-U6.1] Replacing a working disk without degrading the pool

bal0an

Explorer
Joined
Mar 2, 2012
Messages
72
I've search a bit for a good way to replace a still working (albeit old) disk. Here I'd like to document my solution for discussion, and raise two issues.
Issue 1: the manual doesn't describe how to replace a working disk without degrading the pool.
Issue 2: the UI replace disk option seems not to work.

See also:
Replacing Disk does not work in TrueNAS Core v13.0?
Drive Swap With Spare
Forcing a hot spare to become a permanent drive in pool

First attempt which did not work: item 1.
My recipe for safely replacing a working disk item 2.

This is my configuration:
Code:
root@nas1:~ # zpool status -v tank
  pool: tank
 state: ONLINE
  scan: resilvered 264K in 00:00:01 with 0 errors on Sat Feb 17 12:07:35 2024
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/cfbe1afc-9caf-11ec-81a1-18c04d8f0452  ONLINE       0     0     0
            gptid/fdff7e8b-2548-11ec-b7d6-d05099921104  ONLINE       0     0     0
            gptid/fe2055b4-2548-11ec-b7d6-d05099921104  ONLINE       0     0     0
            gptid/fe15fec1-2548-11ec-b7d6-d05099921104  ONLINE       0     0     0
            gptid/fe269fa1-2548-11ec-b7d6-d05099921104  ONLINE       0     0     0

errors: No known data errors

An additional disk has been connected but not added to any pool.

1. Ideally since the pool is still fully operational replacing a disk should work without degrading the pool. Lacking other official information I followed the manual at https://www.truenas.com/docs/core/coretutorials/storage/disks/diskreplace/#figure-2 . Using the UI I added the disk as a spare, and then tried to
a) OFFLINE, and
b) REPLACE it with the spare drive.
The REPLACE step didn't work because I couldn't select non-member disks from the replace UI window. This is the case for the newly installed disk - either added as a spare to the pool or standalone. Is this still the known error from when 13.0 was released?

replacing no member disk.png


After taking the legacy disk back ONLINE I proceeded as follows:

2. ZFS instructions describe it like this: Activating and Deactivating Hot Spares in Your Storage Pool
This looks much better as it resilvers the spare disk while keeping all RAIDZ1 disks online and reducing the risk of disk failure due to high load.

Code:
root@nas1:~ # zpool replace tank gptid/fe269fa1-2548-11ec-b7d6-d05099921104 gptid/233b7e12-cd91-11ee-95ed-18c04d8f0452

root@nas1:~ # zpool status -v tank
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Feb 17 13:37:07 2024
        1.90T scanned at 21.8G/s, 32.1G issued at 369M/s, 8.71T total
        6.30G resilvered, 0.36% done, 06:50:35 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        tank                                              ONLINE       0     0     0
          raidz1-0                                        ONLINE       0     0     0
            gptid/cfbe1afc-9caf-11ec-81a1-18c04d8f0452    ONLINE       0     0     0
            gptid/fdff7e8b-2548-11ec-b7d6-d05099921104    ONLINE       0     0     0
            gptid/fe2055b4-2548-11ec-b7d6-d05099921104    ONLINE       0     0     0
            gptid/fe15fec1-2548-11ec-b7d6-d05099921104    ONLINE       0     0     0
            spare-4                                       ONLINE       0     0     0
              gptid/fe269fa1-2548-11ec-b7d6-d05099921104  ONLINE       0     0     0
              gptid/233b7e12-cd91-11ee-95ed-18c04d8f0452  ONLINE       0     0     0  (resilvering)
        spares
          gptid/233b7e12-cd91-11ee-95ed-18c04d8f0452      INUSE     currently in use

errors: No known data errors


As a last step I will have to remove the legacy drive and make the spare drive integrate with the pool (to be updated).

The subsequent display in the UI is misleading IMHO. It may be true the "ada1 is UNAVAIL as a spare" as it is being resilvered but that fact is not visible int he UI. The output of '''zfs status''' is easier to understand.

pool active spare during resilver.png

1
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
ZFS was designed to show the hot spare in use as you have shown, as an UNAVAIL "spare" vDev. And as a "SPARE" temporary mirror. This is normal.

Your comment that the GUI does not show that the hot spare now in use, and is still re-silvering, is perhaps a bug. You can report it using the "Report a Bug" link at the top on any forum web page.

ZFS does not need a hot spare to do replace in place. It is possible that the TrueNAS Core GUI, (and or associated docs), don't support such. Again, perhaps that is a bug, or more appropriately a feature request, (also reported via "Report a Bug").


ZFS can even do replace in place of a single disk pool. I've personally done it to move my active root pool, (on Linux, not TrueNAS), so I could re-partition my disk. For me today, that is not something I really need to think about or do, as I now use ZFS Mirrored root pools on all my home Linux computers, (desktop, miniature media server, new laptop, old laptop).
 

bal0an

Explorer
Joined
Mar 2, 2012
Messages
72
Update: after replacing the first disk using command line interface, the UI REPLACE decided to become operational again., so it seems to have been a transitional issue, probably just UI related. UI REPLACE worked for both cases: a) with a spare drive, and b) with a drive unassigned to any pool.
 
Top