Exquisitely sensitive stability testing - the linux kernel!

graysky

Gawd
Joined
May 6, 2007
Messages
620
TL; DR Summary
The linux kernel is a powerful tool to detect instabilities in your overclock settings with both greater accuracy and sensitivity than either Prime95 or IBT/LinX.

More Details
The linux kernel supplies users with a dead simple method for measuring hardware instabilities -- like those caused by an 'unstable' overclock. There is nothing special to install as this functionality seems to be naively included in the kernel itself. To use it, simply run a standard stress test such as Prime95 or Linpack and watch the output from dmesg. If the system is unstable due to insufficient voltage settings, excessive heat, it will report:

Code:
[Hardware Error]: Machine check events logged

I have seen the kernel throw these errors during a prime95 run before prime95 gave an error in the math. Further, I have seen these errors appear when and linpack did not detect the settings are unstable as evident by the residual number not chaining during the run when the error occurred.

How to Stress Test Under Linux
Probably the most newb-friendly flavor of Linux is Ubuntu. Users can run it live off a CD or a USB without installing it to their systems. Further, it is pre-configured to boot into a GUI with network and hardware autodetected. Download an image from Home | Ubuntu - I recommend the 64-bit version as the 32-bit Linux suffers from the same <4 GB of memory limitation that the 32-bit Windows does,

Note: don't feel like Ubuntu is your only option. There are many other Linux distributions out there from which to choose.

Download the iso, burn it to media or to a USB and boot. Ubuntu prompts users to either "try ubuntu" or "install ubuntu." Just hit the "try ubuntu" button and you will be dumped into the live linux environment.

Here are a few suggestions for stress testing:
1) mprime ---> linux version of prime95. Help to download and run mprime.
2) linpack ---> back end to both LinX and IBT. Help to download and run linpack.
3) x264 video encoding.
4) Compiling something large like the linux kernel.

I have seen on my own machine the ability to pass tests #1 and #2 but an inability to get more than 10 min into a x264 encode or to compile something 4-5 times without errors. It is important to test using several orthogonal stresses. While stressing, print the output of the kernel ring buffer. You can do this in one of two ways:

1) Open a terminal and type dmesg to see a snapshot.
2) Perhaps more useful is to be informed when something happens rather than typing dmesg over and over again! You can do this with the following command:
Code:
sudo cat /proc/kmsg

It looks like nothing is happening, but actually, the command more or less opened a connection to the ring buffer; it will update when something happens. To test it, plug in a USB thumb drive.

Example on my box:
Code:
<5>[13393.025582] scsi 10:0:0:0: Direct-Access     Kingston DataTraveler 112 1.00 PQ: 0 ANSI: 2
<5>[13393.026103] sd 10:0:0:0: [sdc] 7831552 512-byte logical blocks: (4.00 GB/3.73 GiB)
<5>[13393.026449] sd 10:0:0:0: [sdc] Write Protect is of<>133065]s 0000 sc oeSne 30 00

Anyway, you will want to watch for that message I posted above:
Code:
[Hardware Error]: Machine check events logged
 
Last edited:
Interesting, thanks for posting this, going to give this a shot next time I am doing some stability testing, especially because I haven't even tried running Linux in years... might even install Ubuntu for shits and giggles.
 
I wonder if doing this in a Virtual Machine would work. But then again, booting of a usb stick isn't too bad. Thanks for the info.
 
Very cool. Thank you for bringing this up. Will try the next time I tweak my overclock.
 
Always thought about trying Linux as a stability test but never got around to it. Seems like a solid venture and might bring me back to using ArchLinux again.
 
Bumped to mentioned that I updated the main post to recommend using two additional 'stresses' including x264 encoding and compilation of something large like the kernel itself. On my current i7-3770K, I recently discovered that my long-time stable settings have become unstable, requiring an extra bump to the vcore. I could do #1 and #2 all day, but as soon as I started some x264 encodes, noticed the errors in my logs.
 
Back
Top