Unstable Supermicro H8QGI+-F, Quad G34 mobo

Nilsch

n00b
Joined
Dec 5, 2012
Messages
50
I have almost given up on this one, but thought I would see if anybody can help me here first.

One of my Supermicro H8QGI+-F folding motherboards does not recognize more than 24 out of my 32 GB RAM in the BIOS. It is also unstable. If I pull out 8 RAMs, it reports 16GB correctly, which is how I have been folding with it lately.

However, it is unstable. Sometimes it freezes several times a day, other times it can run for as much as a week before it freezes. Screen just goes blank and it is unresponsive, but lights are on. Need to cut the power to get it back up.

I have changed every other part of the equation: New RAM, CPUs, disk drive and PSU to no avail. Also tried with 1/2/3 CPUs, still unstable.

Is there anything you gurus can suggest, or should I ask the store for a new board? Hope I don't need to, as I willl miss up to a month of folding with it if I do (if they accept the return at all).

Thanks,

Nils
 
Last edited:
What CPUs are you using? What nodes/DCTs is the RAM not being detected in? Have you cleaned the contact pads on the CPUs well with alcohol?

Start things off slowly. Rotate each CPU through socket 1. Does each one detect memory in all four slots? If a CPU fails detecting a DIMM, swap it out for another one. If it still fails( after cleanign contact pads ), possible bad CPU.

If all CPUs detect memory correctly in socket 1, add a second CPU in socket. Move then to 3 sockets, and then four. From my experience it sometime takes a lot of shuffling around to find a good usable combo.

Let the CPU swap party begin.
 
I have cleaned the CPUs (6128), also changed to different CPUs (6172), as well as cleaned the contact pads of the CPUs with alcohol.

There are three RAM slots that doesn't detect the RAMs. The two closest to CPU4 and the one closest to CPU2. I have checked the slots with a magnifying glass and can't find anything wrong.

The fan sensors on that side of the board also give false readings of critically low fan speed, so I had to plug those CPU fans straight to molex.
 
I think I did, but can't remember for certain anymore. Should have made a log... I will try, and report back. Thanks!
 
Have you tried a sinlge CPU in socket 1?

Now I tried a single cpu, 2 cpus, 3 and four. Here's what I get. It didn't matter if I rotated the cpus to different slots, the results were the same:

1 CPU: 6144MB
2 CPUs: 12288MB
3 CPUs: 20480MB
4 CPUs: 26624MB

I have inspected both cpu slots and ram slots, they look fine.
 
Run all 4 CPUs through socket 1 to make sure they all can detect the 4 DIMM sockets. If a DIMM is not detected, note down the node and DCT that fails. If all the CPUs fail the same node/DCT, maybe a bad board. Have you also blown some air through the DIMM sockets to make sure they are clean of crap?
 
I tested all four CPU in the first slot in my last test. Same result. I now also tried with different types of Ram. Same result. BIOS version is 3.0, so I can't try that either.

Edit: I Also blew air through the sockets.

:(
 
I have almost given up on this one, but thought I would see if anybody can help me here first.

One of my Supermicro H8QGI+-F folding motherboards does not recognize more than 24 out of my 32 GB RAM in the BIOS. It is also unstable. If I pull out 8 RAMs, it reports 16GB correctly, which is how I have been folding with it lately.

However, it is unstable. Sometimes it freezes several times a day, other times it can run for as much as a week before it freezes. Screen just goes blank and it is unresponsive, but lights are on. Need to cut the power to get it back up.

I have changed every other part of the equation: New RAM, CPUs, disk drive and PSU to no avail. Also tried with 1/2/3 CPUs, still unstable.

Is there anything you gurus can suggest, or should I ask the store for a new board? Hope I don't need to, as I willl miss up to a month of folding with it if I do (if they accept the return at all).

Thanks,

Nils

Was this system running the [H]ardForum database this afternoon also?
 
I have almost given up on this one, but thought I would see if anybody can help me here first.

One of my Supermicro H8QGI+-F folding motherboards does not recognize more than 24 out of my 32 GB RAM in the BIOS. It is also unstable. If I pull out 8 RAMs, it reports 16GB correctly, which is how I have been folding with it lately.

However, it is unstable. Sometimes it freezes several times a day, other times it can run for as much as a week before it freezes. Screen just goes blank and it is unresponsive, but lights are on. Need to cut the power to get it back up.

I have changed every other part of the equation: New RAM, CPUs, disk drive and PSU to no avail. Also tried with 1/2/3 CPUs, still unstable.

Is there anything you gurus can suggest, or should I ask the store for a new board? Hope I don't need to, as I willl miss up to a month of folding with it if I do (if they accept the return at all).

Thanks,

Nils

Was this system running the [H]ardForum database this afternoon also?
 
It would make it easier to help if you provide details of your observations.

"Same result" is hardly helpful.

What is "same result" ? Can we see outputs of TPC -dram ? (side note, we may need
to update tpc to show DIMM details on F15h but that -- soon) Or POST screen?

Alternatively, you should be able to tell exactly which Node/DCT has a missing DIMM from
the BIOS (somewhere in the Northbridge section).

As 402 explained, if it's same Node/DCT in socket 1 that's missing (no matter the CPU),
you should assume (at least for the time being) that socket and/or DIMM slot has a problem
and rotate CPUs through socket 2 (keeping in mind to have _a_ CPU populated in socket 1)
while paying attention to CPU2's nodes/DCTs.
 
Last edited:
Sorry for the somewhat unorganized and undetailed description. Here's what I've done since the forum went down (english is not my first language, so I apologize if it is not written well)::

I have found the following three slots to not recognize any memory: P1-DIMM4A, P2-DIMM4A and P4-DIMM4A.

I have tried two different sets of CPUs (4x6128 and 4x6172) and three sets of RAM chips known to work in other rigs with the same type of motherboard.

I have rotated the RAM and CPU, still the three slots doesn't recognize the RAM (in BIOS).

I have inspected the slots with a magnifying glass without seeing anything wrong.

I have tried with just CPU slot 1 filled, RAM in P1-DIMM4A is still unrecognized, also after trying different RAM chips and CPUs.

Additionally, somethings wrong with the fan sensors for CPU 2 and 4. IPMI log shows the speed as "critically low" although clearly it isn't by looking at them. I put them into molex to avoid these errors. I have tried with different fans. The same goes for all the chassis fan sensors on that side of the board: if I put any chassis fan in there, I get the "criticalloy low" error.

The machine has also been unstable, as in it suddenly becomes unresponsive although heartbeat is on and fans goes around. Power butten doesn't work when this happens, have to cut the power.. This can happen many times a day or after as much as a week.
 
Nevermind, I just sent an email to the store with the same information as in my last post and they responded a few minutes later and will send me a new motherboard after they get this one returned to them. :)

Thanks, for helping me with a proper testing procedure!
 
Last edited:
Back
Top