Replace boot drives

stajo

Explorer
Joined
Jan 3, 2020
Messages
71
Hi. I have 2 nvme drives as mirrored boot pool. Both have now reported "...failed to read NVMe SMART/Health Information" at different times so I thought its due for replacing them both.

Googling indicates the easiest way would be to replace them in sequence with the regular replace a failed drive procedure, but go via the System > Boot > Status screen.

My question is: Should I do a Detach on the drive that I am replacing or just shut the environment down, remove the drive, put a new one there and then do Replace (to the fresh one) when it shows up as missing?
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
The occassional failed to read SMART would not be pushing me to replace the drives. Issue may have nothing to do with the actual drives themselves, maybe something else. Consistent failures maybe. Look at what the SMART data says, and maybe check the internet for your drive model, and your firmware version.

First take a configuration backup, irrespective. Worst comes to worst you can reinstall and reload the configuration.

If you leave the drive connected and spin up another drive concurrently, then you maintain maximum redundancy while replacing.

While it should not matter, I prefer the more controlled detach then replace.

Shutdown with both active means you can rip out any one and the pool should be OK, but if you have to replace the other drive anyway, then you have to get it right at that point in time anyway, so may as well detach and replace, and make sure of the serial numbers and the locations, before and after.

You have to make sure you have the appropriate BIOS boot sequence of course. I would generally have BIOS setup to boot either in sequence. If you haven't got that then you are vulnerable anyway. You might have to redo it with new NVME too.
 

stajo

Explorer
Joined
Jan 3, 2020
Messages
71
Thanks samarium. Strange thing just happened. I did a reboot on the system, and after log in the alert messages was still there. Then I went for a jog, and when I came home I logged in again and the alert log was now empty. I did not receive any of the usual "Alert has been removed.." messages. No idea what happened. None of the boot drives report any read-, write- or checksum errors, so maybe Ill just let them sit a while.
 
Top