My server powered itself off during a scrub - the first time this has happened

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
i was using a seasonic g 550w for 6 years. it has a 5 year warranty. i think this is the problem.
i just bought a seasonic focus gx 1000 with a 10 year warranty. i don't really need a 1000w but it was cheaper than the 650, 750 and 850.

$169
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
What's the rest of your system specs (drive count specifically?) Scrubs would drive a healthy amount of I/O and therefore power consumption, but so does a startup/spinup of drives.

Look at thermals/heat as a secondary concern.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
i have 12 hard drives and 2 ssds. 6x3 tb and 6x8 tb.

i think i was pretty marginal with the wattage of my original psu and i think as it ages it outputs less power.
i have borrowed my sons psu until friday when i get the new psu. it is running fine at the moment.

my cpu is 45 degrees
 

NASbox

Guru
Joined
May 8, 2012
Messages
650
i have 12 hard drives and 2 ssds. 6x3 tb and 6x8 tb.

i think i was pretty marginal with the wattage of my original psu and i think as it ages it outputs less power.
i have borrowed my sons psu until friday when i get the new psu. it is running fine at the moment.

my cpu is 45 degrees
What else is on the system ? If you say 10W/Drive that's less than 140W, add 140 for the cpu/motherboard.... unless you have a high power graphic card you are under 300W. I guess it's possible you could have had a cap or two go. As long as the drives have staggered spin up, that PS is well within spect.

Is it possible you had a short (1/2 second or so), power failure (caused by the power company doing maintenance). We get them 1 or 2 times/month on average-usually in the middle of the night 3-5AM. Lights just flick, and if a device isn't on a UPS, it crashes. Your UPS is a bit on the light side, and if the batteries are old, it wouldn't protect your system.

Was there anything in the logs as to why the system went down?
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
What else is on the system ? If you say 10W/Drive that's less than 140W, add 140 for the cpu/motherboard.... unless you have a high power graphic card you are under 300W. I guess it's possible you could have had a cap or two go. As long as the drives have staggered spin up, that PS is well within spect.

Is it possible you had a short (1/2 second or so), power failure (caused by the power company doing maintenance). We get them 1 or 2 times/month on average-usually in the middle of the night 3-5AM. Lights just flick, and if a device isn't on a UPS, it crashes. Your UPS is a bit on the light side, and if the batteries are old, it wouldn't protect your system.

Was there anything in the logs as to why the system went down?
no other equipment had this problem just the server. i have 7 case fans and an hba. i get a weekly email about the health of my ups. it says it is at 29% load and the batteries are 100%

where is the log?
 

NASbox

Guru
Joined
May 8, 2012
Messages
650
no other equipment had this problem just the server. i have 7 case fans and an hba. i get a weekly email about the health of my ups. it says it is at 29% load and the batteries are 100%

where is the log?
I usually use the shell and cd /var/log and less messages or for the compressed logs zless messages.n.bz2 -- I don't know if there is a way to do it from the UI, but for me I can do it from the shell faster.

Might not be a bad idea to just do a quick self test on the UPS just to be sure. If the batteries are 2 or more years old, they may fold under load. The 100% may or may not mean anything other than they are currenlty at full voltage. My UPS said the same thing, but a switching event by the utility killed everything. If you have another PC on the same UPS, then you are right, the capacitors in the supply may have gone soft.

FWIW, if you are handy with a soldering iron, about $20 or less in parts from DigiKey and the supply will likely be as good as new for another build. Have a look at the BadCaps website (google is your friend) for info on selecting the right brand of cap.

Anyway, your new 1000W Seasonic should outlive the rest of the component in the server.
 
Last edited:

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
my server was running for nearly 3 days with the new power supply but the crashed again.
tomorrow i'll have to check the logs to see if we can find the problem.
i have never troubleshooted a server before so i may need some help.
 

NASbox

Guru
Joined
May 8, 2012
Messages
650
my server was running for nearly 3 days with the new power supply but the crashed again.
tomorrow i'll have to check the logs to see if we can find the problem.
i have never troubleshooted a server before so i may need some help.
Hang on to the old powersupply, it may be fine.... Seasonic Power Supplies are really good. Your load wasn't excessive.

If the logs don't give you an obvious answer I suggest the following:

Have you run a memory test? When you get weird stuff happening, if you don't have anything in the log I'd run Memtest86 for a couple of days and see if it shakes anything out.

If it's not the memory, I'd suspect the boot device... especially if it is a USB stick. With the price of small SSDs, if you can spare the SATA ports I would replace the boot drive with an SSD. I gave up on USB sticks a couple of years ago as I found them too unreliable. About 2 years ago, I went to a mirrored boot pool..... a bit overkill, but for the less than $50 it cost me to do it makes is very unlikely that I will ever have a boot pool failure. They are also a good place to keep the databases as well.

I would take a backup of your config file and make a new boot drive. I'm pretty sure the procedure is well covered in the manual. If your current drive is ZFS, you should be able to replace it without having to do a fresh install.

Failing either of the above how old is your motherboard? Maybe you have some bad caps (or other parts on the MB, if so, you'll need a replacement.

Good luck.... If the memory test doesn't turn something up, and your boot pool isn't corrupted you may need some luck!
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
Hang on to the old powersupply, it may be fine.... Seasonic Power Supplies are really good. Your load wasn't excessive.

If the logs don't give you an obvious answer I suggest the following:

Have you run a memory test? When you get weird stuff happening, if you don't have anything in the log I'd run Memtest86 for a couple of days and see if it shakes anything out.

If it's not the memory, I'd suspect the boot device... especially if it is a USB stick. With the price of small SSDs, if you can spare the SATA ports I would replace the boot drive with an SSD. I gave up on USB sticks a couple of years ago as I found them too unreliable. About 2 years ago, I went to a mirrored boot pool..... a bit overkill, but for the less than $50 it cost me to do it makes is very unlikely that I will ever have a boot pool failure. They are also a good place to keep the databases as well.

I would take a backup of your config file and make a new boot drive. I'm pretty sure the procedure is well covered in the manual. If your current drive is ZFS, you should be able to replace it without having to do a fresh install.

Failing either of the above how old is your motherboard? Maybe you have some bad caps (or other parts on the MB, if so, you'll need a replacement.

Good luck.... If the memory test doesn't turn something up, and your boot pool isn't corrupted you may need some luck!
The server kept shutting down and i couldn't start it up again. i disconnected everything apart from 1 stick of memory and it will not start - i tried with all 4 sticks of memory 1 at a time. No joy it still shuts down instantly. It is either my m/b or the cpu that is the problem. I have ordered a replacement X9SCL-F it won't be here until monday. Hopefully this will fix the problem.
 

NASbox

Guru
Joined
May 8, 2012
Messages
650
The server kept shutting down and i couldn't start it up again. i disconnected everything apart from 1 stick of memory and it will not start - i tried with all 4 sticks of memory 1 at a time. No joy it still shuts down instantly. It is either my m/b or the cpu that is the problem. I have ordered a replacement X9SCL-F it won't be here until monday. Hopefully this will fix the problem.

Any chance that the heatsink has come lose or the thermal paste has dried out?

I think you are right, it is likely a bad motherboard.
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
Any chance that the heatsink has come lose or the thermal paste has dried out?

I think you are right, it is likely a bad motherboard.

the cpu was 45 degrees
 

NASbox

Guru
Joined
May 8, 2012
Messages
650
the cpu was 45 degrees
That is well within spec, so I'd say you are right that it's the motherboard.
I'm assuming from your description that the system is powering off?
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
That is well within spec, so I'd say you are right that it's the motherboard.
I'm assuming from your description that the system is powering off?
yes. when i powered the system on it shut down 3s later.
i bought a new x9scl-f and it just arrived a few minutes ago. so i'll try that this afternoon.
 
Last edited:

NASbox

Guru
Joined
May 8, 2012
Messages
650
yes. when i powered the system on it shut down 3s later.
i bought a new x9scl-f and it just arrived a few minutes ago. so i'll try that this afternoon.
Good luck, It is actually good that you have a hard failure, now you should have a reasonable degree of certainty that you have solved the problem once the system is operational for a day or two.

You might find it interesting to do a quick visual inspection of the circuit board to determine if you can see any overheated components, obviously bad solder connections or bulging capacitors.

Good luck,
 
Top