SSD_LIFE_LEFT FAILING_NOW

Hostigal · Jan 25, 2017

is smart of disk say SSD_LIVE_LEFT DICE FALING_NOW
view image is normal or is prefail??

Robert Trevellyan · Jan 25, 2017

Please stop cross-posting the same issue in multiple topics.

Hostigal · Jan 26, 2017

It would be a different topic, because it is not exactly the same thing if it is fixed. Thank you.

rs225 · Jan 26, 2017

What is the SSD used for? Are there lots of writes? It doesn't seem old enough to be worn out unless the usage has been very heavy.

If there are other SSDs, do they show similar numbers?

Hostigal · Jan 27, 2017

for virtualization,

other disk for example

Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 095 095 050 Old_age Always - 0/222953239
5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0032 093 093 000 Old_age Always - 6558h+35m+46.490s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 18
171 Program_Fail_Count 0x000a 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 17
177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 1
181 Program_Fail_Count 0x000a 100 100 000 Old_age Always - 0
182 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0012 100 100 000 Old_age Always - 0
189 Airflow_Temperature_Cel 0x0000 025 033 000 Old_age Offline - 25 (Min/Max 21/33)
194 Temperature_Celsius 0x0022 025 033 000 Old_age Always - 25 (Min/Max 21/33)
195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/222953239
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
201 Unc_Soft_Read_Err_Rate 0x001c 120 120 000 Old_age Offline - 0/222953239
204 Soft_ECC_Correct_Rate 0x001c 120 120 000 Old_age Offline - 0/222953239
230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 100
231 SSD_Life_Left 0x0000 100 100 011 Old_age Offline - 4294967296
233 SandForce_Internal 0x0032 000 000 000 Old_age Always - 4753
234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 3570
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 3570
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 7261
244 Unknown_Attribute 0x0000 099 099 010 Old_age Offline - 4849701

SMART Error Log not supported

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Then that indicates, that disk can fail at any time? But the normal thing is to wait for the total failure ??

thanks!!

rs225 · Jan 27, 2017

If you want to be very careful, you would replace it now.

If you aren't worried about slow access or timeouts when the SSD fails, then you might wait and replace it when it fails. It is difficult to know exactly what the SSD will do when it fails.

It is possible that you should make some configuration changes to lighten the load on the SSDs. For example, if you are using NFS with VMWare, it will stress the SSDs greatly unless you have a separate ZIL SSD. Another alternative, if you can tolerate the risk, is to set sync=disabled on the dataset which stores the VMs. You also should check what the recordsize is set to on the VM dataset. The default of 128k is probably too big for VM files.

Another option is to raise the txg timeout from the default 5 to 10 or 15, which would cut the number of write cycles to the SSDs, assuming you also have a ZIL or set sync=disabled.

If your SSD is worn out so soon, it indicates they are probably being overworked.

Important Announcement for the TrueNAS Community.

SSD_LIFE_LEFT FAILING_NOW

Hostigal

Dabbler

Attachments

Robert Trevellyan

Pony Wrangler

Hostigal

Dabbler

rs225

Guru

Hostigal

Dabbler

rs225

Guru

Similar threads

Important Announcement for the TrueNAS Community.

SSD_LIFE_LEFT FAILING_NOW

Hostigal

Dabbler

Attachments

Robert Trevellyan

Pony Wrangler

Hostigal

Dabbler

rs225

Guru

Hostigal

Dabbler

rs225

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "SSD_LIFE_LEFT FAILING_NOW"

Similar threads