Monkey_Demon

Explorer
Joined
Nov 11, 2016
Messages
85
Uncle Fester's FreeNAS 9.10 Configuration Guide recommends using Breakin for processor validation and MemTest86+ for memory validation. But the web site for Breakin says it does a "full processor and memory test" (my emphasis), while MemTest86 has four CPU modes under which to run the memory test.

So I wonder. Is it really necessary to run both Breakin and MemTest86+ as Uncle Fester recommends? Won't Breakin give both CPU and RAM a good workout and report any errors? Won't MemTest do the same if you configure it to use all cores on the CPU?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
So I wonder. Is it really necessary to run both Breakin and MemTest86+ as Uncle Fester recommends?
@danb35 , what do you think? Has the function of these programs changed substantially in the time since the guide was written?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Has the function of these programs changed substantially in the time since the guide was written?
Couldn't say, honestly; I'm not really familiar with Breakin at all. It's worth mentioning that @UncleFester is not me, though he sadly hasn't been active here for some time--I just thought his guide would be good as a wiki, so I'm hosting it. Breakin's web site does indeed indicate that it checks CPU and memory (as well as other things)--though it also sounds very old, based on what it states as compatible hardware.
 

Monkey_Demon

Explorer
Joined
Nov 11, 2016
Messages
85
Thanks for the quick responses.

As a newbie, I find UncleFester's guide has just about the right amount of hand-holding. It's too bad that UF seems to be on an extended vacation.

This site describes using MemTest86 and then the Mersenne Prime Test from the Ultimate Boot CD for the CPU. Do either of you have other suggestions (perhaps with more recent software) for testing, validating, and burning in a new system?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I'll generally run Memtest and don't really bother with CPU testing. Probably not the best plan though. I don't skimp on disk validation though.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,079
This site describes using MemTest86 and then the Mersenne Prime Test from the Ultimate Boot CD for the CPU. Do either of you have other suggestions (perhaps with more recent software) for testing, validating, and burning in a new system?
I usually trust the CPU to be good. In all the time I have been working the computer biz, I have only seen 2 or 3 bad CPUs. One of those was a burning chip from over clocking. I only use Xeon processor and ECC memory, so I do a run of memory testing and call it good.


Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
I haven't seen a flaky x86 CPU that boots an OS and then fails since the Pentium III cartridge days. I have seen more modern CPU's fail entirely, but you don't get to waste your time testing them, they generally don't POST. The only reason I might entertain a CPU only test is to evaluate the CPU cooling of a system. But I'm going to guess most of the people here that build there own started out building/modding gaming rigs, etc..., and have developed that habit to go a little overboard on the CPU cooling at the start, so I'm not sure what the benefit would be.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
The only reason I might entertain a CPU only test is to evaluate the CPU cooling of a system.
Well that is a major part of it as you try to heat saturate your system, however one other benefit is you pull maximum power from your system & power supply so you are also testing this aspect as well as system stability, maybe the motherboard is on the edge of being stable. I believe running both a CPU Stress Test and MemTest86 are vital when you first build a system but once you have it built then you shouldn't have to test it again unless you are experiencing stability failures. Also if your system does fail, it's nice to figure this out up front before you have your system built and the warranty expires on the part.

I will agree that I too have had very good luck with parts and I think I've had only a few failures over the past 30+ years, so it's rare. But I will always do my testing as a home user.

Just my point of view.

As for "Breakin", I have not used it either but think I may download a copy to check it out. The website states it does all the tests we seem to care about.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
As for "Breakin", I have not used it either but think I may download a copy to check it out. The website states it does all the tests we seem to care about.
As a quick and dirty test I used this program on my main computer, not a server. It has an i7-950 CPU and 24GB Non-ECC RAM.

So let me state that you basically just start the test and leave it alone. I freaked myself out when I forgot I left a USB flash drive installed (on a hub) with my financial data on it and not knowing how this program worked, I terminated early for fear of data loss.

This program appears to have been made specifically for this company who builds servers and uses this as a testing tool.

What I found out was the program was indicating that my ECC RAM (of which I have none) was passing the tests fine. I was having a hard drive failure via SMART testing (turned out to be my USB Flash Drive) and the CPU Temp was not displayed, even though I can display it under Windoze. The program appears to support IPMI for CPU Temps and Fan Speeds but of course my main system does not have that.

Conclusion: I only have one true server and I'm unwilling at this time to take it offline to perform this testing properly. It appears that "badblocks" is also part of this testing and the website states that this is non-destructive however we would prefer the option to select a destructive test since this is burn-in testing. There are no options for selective testing, what a shame, though the web site claims that custom scripts are a feature, too bad there isn't a user manual online. So will this work as a one stop testing solution? Well I think someone needs to actually test this out and confirm that the CPU heat is really generated (my system did pull 135 Watts more power when the test was running), the RAM tests really do work, and if hard drives are installed then the SMART Tests and Badblocks Tests work as expected. I'm curious how a RAM failure is indicated, if it will tell you where it failed or if it's just a notification a failure occured. If that were the case then I'd be using MemTest86 to troubleshoot. Also, I feel badblocks needs to be run in destructive mode if you are doing burn-in testing of the hard drives.

Sorry that this was not much of a test but if we can figure out if this is a good one stop solution then I think it's worth recommending in the future. I'm going to shoot this company an email asking for additional information on this product but for now I would recommend people stick with the other dedicated CPU Stress and RAM tests until someone can provide good feedback on this product.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
This site describes using MemTest86 and then the Mersenne Prime Test from the Ultimate Boot CD for the CPU. Do either of you have other suggestions (perhaps with more recent software) for testing, validating, and burning in a new system?

Just saw this bit about the Mersenne Prime Test. I played around with this a great deal in the late 90's as the GIMPS project was destroying the old world order in supercomputing.

A Mersenne Prime is expressed as (2^N)-1, and the size of the numbers involved grow rapidly. I believe the current largest known contains 23 million digits. It took something like three centuries for mathematicians to evaluate values of N up to 257 from Mersenne's original disclosure, and even today there are only 50 or so known Mersenne Primes. A major advancement was made by Lucas and Lehmer in the 1930's & 1940's in the form of a primallity test that can be implemented using a Fourier Transform, and it has been a favorite of super computing types since the Manchester Mark 1. However... This brings me to a point I'd like to make with regards to NAS testing and burn-in:

The FFT used by various implementations of the Lucas-Lehmer test is primarily either a floating point exercise, or implemented as integer math, depending on who's implementation you're using. These only test portions of the CPU, and memory. In the case of a floating point implementation, it's likely a section of the CPU that will see very little use in a NAS oriented build. So my suggestion is use it to compliment the memory & CPU testing, but move on to other things that exercise the relevant portions of the chipset & I/O paths. I'll suggest FIO:

https://github.com/axboe/fio

http://www.storagereview.com/fio_flexible_i_o_tester_synthetic_benchmark


Another trick I've used to the chagrin of a couple storage developers, is to implement the simple file copy program described in Stevens "Advanced Programming", chapter 12 (program 12.14), and modify it to use "O_SYNC" with mmap(). Then just copy a files over and over from a shell script, launching multiple processes to the limit of I/O capacity. The file content can be adjusted to exercise compression, dedup, etc... It's deceptively simple, and kind of brutal on the I/O path. :)
 

JRD

Dabbler
Joined
Apr 21, 2018
Messages
42
Hi
I stress tested my system when I first put it together, all good.

Since then I have been trying to get a script running to control the fans with a lot of kind help from Kevin Horton and Stux. I have had some success.

I wanted to stress the system so I transferred a lot of files and the cpu only got up to 42c. After that I couldn't find anything to put it under pressure within FreeNAS that I understood.

I went back to Breakin again and this time got errors: either cpu or memory. I replaced the processor with a brand new one, same spec. Still failed. SM were unwilling to accept the motherboard had developed a fault. I haven't sent it back yet.

I have run Memtest86 and 86+ it passed. Then Marsenne Prime from the Ultimate Boot CD: it passed.

What do you think? Should I ignore the Breakin test? Is there something I have missed or should I just RMA the board? Test Result 7th December 2018.jpg Test Results 22 November 2018.jpg Test Results 6th December 2018.jpg Mersenne Prime test summary 6th December 2018.jpg Supermicro Customer Reporting 28th November 2018.PNG
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
@JRD Do you still have the fan controls tweaked? Maybe a factory reset may help. Also DIMMA1 is significantly hotter than the other 3 sticks of RAM, maybe you have an airflow issue. I've never run that test that failed for you so I can't speak to it's validity. You might want to run the CPU stress test for a few hours (some folks run it a lot of hours, more than I'm willing) to saturate the motherboard with heat. Also run Memtest86 for several days.

Also lets look at one other thing... If Supermicro said they will not RMA the motherboard and you are running FreeNAS which typically does not run the system at high stress levels, do you really want to stress test the system to the point of failure and then not have a system at all? Now if this were a corporate environment I would answer that question with a big fat Yes, but if this is for home use and you would rather not replace your motherboard too soon, well I think you know my point of view. It's just a viewpoint you should at least consider.
 
  • Like
Reactions: JRD

JRD

Dabbler
Joined
Apr 21, 2018
Messages
42
@JRD Do you still have the fan controls tweaked? Maybe a factory reset may help. Also DIMMA1 is significantly hotter than the other 3 sticks of RAM, maybe you have an airflow issue. I've never run that test that failed for you so I can't speak to it's validity. You might want to run the CPU stress test for a few hours (some folks run it a lot of hours, more than I'm willing) to saturate the motherboard with heat. Also run Memtest86 for several days.

Also lets look at one other thing... If Supermicro said they will not RMA the motherboard and you are running FreeNAS which typically does not run the system at high stress levels, do you really want to stress test the system to the point of failure and then not have a system at all? Now if this were a corporate environment I would answer that question with a big fat Yes, but if this is for home use and you would rather not replace your motherboard too soon, well I think you know my point of view. It's just a viewpoint you should at least consider.

I had the script running as an Init process. Before these recent tests I deleted it and re-booted the system.

I have a Supermicro X11SSM-F-O which has 4 slots for memory. I have two sticks of RAM at A2 and B2. So not actually bang next to the cpu. That Breakin result with a temp reading of 34c at DIMMA1 is an empty slot.

All in a Fractal Node 804 case with 6 case fans: 3 Corsair ML120 on the mb side and 3 Fractal 120 fans on the hd side. I actually thought the fans were over the top. The airflow around the cpu and over the memory 'looks' pretty good to me.

After quite a while the screen goes black when running the Marsenne Prime test. I thought this meant there was a problem withe the cpu or power but I stopped the program and got the screen back: no errors. That's 11 minutes.

The Memtest86+ ran for 22 hours.

This system is for home use. Am I worrying too much about stress tests?
 

sokoloff

Dabbler
Joined
Sep 24, 2018
Messages
10
Is the screen blanking from inactivity? Hitting the shift key might bring the screen back without affecting the stress test.

When we used to run production servers on a literal island that was an airline flight away, we were sticklers for every kind of stress test we could run pre-deployment. For a home system, I'd be just as strict with the hard drives, but a lot more lax on the CPU and memory. A cursory CPU and memory test is good enough for me.
 
  • Like
Reactions: JRD

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
This system is for home use. Am I worrying too much about stress tests?
I went back to Breakin again and this time got errors: either cpu or memory. I replaced the processor with a brand new one, same spec. Still failed. SM were unwilling to accept the motherboard had developed a fault. I haven't sent it back yet.
So I think that if you have a valid test failure then you should troubleshoot and fix it.
I've never run the "Breakin Hardware Diagnostics" test so I don't know how reliable it is. My advice is that if MemTest86 and any CPU stress test can pass without indicating a failure, then you are in good shape. If you are conserned still then you should find out more about the failing test and see if it is truly compatible with your hardware. If it were me, I'd track down the cause of the failure and it could be the software at fault. Now I want to run the damn thing but I won't, I'll forget about it once my FedEx package arrives, Bolt, Washers, Nuts, Helicoils, some LC fiber connections, oh joy!
 

JRD

Dabbler
Joined
Apr 21, 2018
Messages
42
So I think that if you have a valid test failure then you should troubleshoot and fix it.
I've never run the "Breakin Hardware Diagnostics" test so I don't know how reliable it is. My advice is that if MemTest86 and any CPU stress test can pass without indicating a failure, then you are in good shape. If you are conserned still then you should find out more about the failing test and see if it is truly compatible with your hardware. If it were me, I'd track down the cause of the failure and it could be the software at fault. Now I want to run the damn thing but I won't, I'll forget about it once my FedEx package arrives, Bolt, Washers, Nuts, Helicoils, some LC fiber connections, oh joy!

Well it appears Breakin can find a fault, even if you get clear results for Memtest and Marsenne Prime.
I sent the board back to the seller. They found bent pins in the cpu socket. Presumably that is what the Breakin test found.

SM would have helped with RMA but needed the seller to make the referral. The seller would not cooperate. SM said they do not cover the warranty. That is down to the seller.

The defect must have been there from the outset or when I first installed the processor.

So, it's taken nearly a year to get to this point and I have just had to buy a new motherboard. I have learned a lot along the way but that is hard to take.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Yup, if you are not careful it can be easy to screw up a CPU socket. Glad you found the problem though.
 

JRD

Dabbler
Joined
Apr 21, 2018
Messages
42
@joeschmuck I bought a new motherboard, same spec X11SSM F O
I installed it in the case, all other hardware was unchanged. Reset the fan thresholds and ran Breakin.
It failed. (Again, please see above)

So I copied the report to Advanced Clusterering Technologies, who created/provide Breakin and discussed it with their rep via email. He was very helpful. Breakin needs at least 32GB of RAM or more to run. My system has 16GB. The report indicated the program could not run properly. It was not stating there was any hardware problem.

I was just about to throw away the first mb. So reinstalled that and returned the second board to the seller for a refund.

Assuming the rep from ACT has that right Breakin is not appropriate for a system with less than 32GB or more RAM. Unless you can work out what else the report might mean. Doesn't seem to verify ECC, for example.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,996
Sorry to hear you still have that problem and it may be related to the testing software you are running.

Instead of using "Breakin", why not use what the rest of us use, MemTest86 and Prime95 or similar by running the UBCD "Ultimate Boot CD". It's free and you just boot it up and select the tests you desire to run. I won't say that all the tests are state of the art but you are trying to test for system stability and it's a great tool. Finding a memory tester that can verify ECC RAM and provide you a positive indication it's working is not easy but last I looked there were products that could do it for a cost, and then only work on certain motherboards.

Since you bought another motherboard, does this mean that you are going to build another server? If not, you might be able to either return it for a mostly full refund or sell it to someone after you have tested it completely to ensure it's good. You never want to deal with someone getting a DOA motherboard. I'd buy another motherboard but I'm hurting for money, still paying to get the daughter through higher education.
 
  • Like
Reactions: JRD

JRD

Dabbler
Joined
Apr 21, 2018
Messages
42
@joeschmuck Yup, it passed Memtest and 86+ and Marsenne Prime before.
I am so relieved the first mb is ok but it did leave me expecting issues.

I have returned the second board, just waiting for confirmation from the seller.
 
Top