Supermicro H8QGi/6 and H8QGL Next Generation OC BIOS

Status
Not open for further replies.
Thanks... Ugh, we're running out of options.

Do you have any add-in cards?
Is your IPMI enabled?

Something makes your system special... trying to find what it is.
 
No add-in cards. I believe the IPMI is enabled and from what I can tell, causes a double boot sequence sometimes from power off/on cycle.
I see it going through a sequence of "dots" on the boot screen is what I am trying to describe. :)
 
Can you disable IPMI/BMC and see how that affects the issue? (make sure to make a "clean run"
-- pull the plug before turning the IPMI off)
 
Is that a BIOS setting?
Or a command from LInux?
I'll check through my manual while I wait to see if you know already :eek:
 
No effect.
I even reset the CMOS and loaded Optimal Defaults again to be sure the IPMI was not hard set some how.
On other issue that I brought up before but it did not seem like a big deal at the time is that most every boot, I cannot enter the CMOS setting screen.
Pressing F1 will not get me there. It shows the large red [H] screen, goes blank then after a few seconds the GRUB menu.
Just thought I would mention it again.
Thanks for all the effort!
 
Er, forgive me if I'm being captain obvious.
Try 'del' instead of F1.
 
Not obvious enough to be annoying! :D
I can try that this evening.
The reason I used/tried F1 is that's the splash message I see on the rare occasion when something other than just the red [H] comes on screen during boot up.
The "obvious" is often obscured by the "hope for complicated" in my line of work. :cool:
 
Er, forgive me if I'm being captain obvious.
Try 'del' instead of F1.

Ok. :eek:
Del works to get me into the CMOS menu during boot.
But I'm still stuck with loosing the OC setting.
Would not appear to be a lot of other options for me at this point.
Is there any way to capture the output to the terminal during the [H] "tuning" portion of the boot sequence?

 
For how much trouble you have had Core32 with that board, have you thought about reflashing the BIOS back to stock and doing a RMA on it? Between you and me, I think we are the only ones that have had as many problems :D
 
I have two, identical "issue". They were both bought as new barebones servers so I'm not ready to declare them as failures, just different than others.
This is not something that keeps me from folding. They are up and OC'd 99.99% of the time.
Just that I have to manually go through the process of setting the OC and reboot again if I need to shut down or lose power.
It's mostly just an annoyance.

 
Core32, what boards are these, exactly?

There's no evidence of [H] tuning in released version (other than
few hello-like statements on serial console).

Dev versions include more output on the serial console but still,
they do not write to NVRAM... something else does...

Hmm... perhaps upgrading IPMI's firmware could change things
here? (I know the issue manifested w/IPMI disabled as well, but...)
 
Got it. One of my associates :)D) happens to have same board that also came from exactly
same barebones -- I'll try to gather more info.
 
Cool!
Any information is greatly appreciated.
Hopefully this will help someone else in the future as well.
 
Not having much luck.
I have never seen that issue on either of my Gis, FWIW

Let me ask you one more question... have you only seen it w/261 16
or are target OC settings not relevant? (would "plain" 220 make it manifest
as well?)

EDIT: that is with NG2.A11 ROM, right?
 
Last edited:
No. Since day one it has been this way, no matter what the OC.
My other rig, same problem, will not run stable over a 243 OC and I have the same issue.
Yes, as well, early on I did not use the extra qualifier.

Sorry, but how can I verify the ROM version?
 
Last edited:
Code:
sudo dd if=/dev/mem bs=1M skip=$((0xffe)) count=2 | strings -a | grep H.*NG
 
This is the response:
Code:
-H8QG6:~/fah$ sudo dd if=/dev/mem bs=1M skip=$((0xffe)) count=2 | strings -a | grep H.*NG
[H] G60NG2.A11
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.74077 s, 2.8 MB/s
 
(I made a mistake earlier, it *is* supposed to be NG2.A11)

Ok, so this looks good too.

I'm burned out...:(
 
Sorry for the stress. And thanks again for the effort put forth.
Could I have done something odd when flashing the BIOS?
Doesn't seem likely it would run at all if I had made a mistake there.

Just for grins, what do you see during the boot process, just the large [H] screen?
After I set optimal defaults (say after a CMOS clear), I never see the boot screen or memory "listing" display again no matter how I shut down or power off.

EDIT: How about the Watch Dog jumper function? Could that be sensing some delay on boot?
 
Last edited:
Sorry for the stress. And thanks again for the effort put forth.
Could I have done something odd when flashing the BIOS?
Doesn't seem likely it would run at all if I had made a mistake there.

Just for grins, what do you see during the boot process, just the large [H] screen?
After I set optimal defaults (say after a CMOS clear), I never see the boot screen or memory "listing" display again no matter how I shut down or power off.

I have the same SM barebones system. I'm flashing the bios as I type this to see what my experience is.
 
I have the same SM barebones system. I'm flashing the bios as I type this to see what my experience is.

I did not see any issues. Set the OC, power off, power on, OC is still there (checked via refclock.sh)
 
I did not see any issues. Set the OC, power off, power on, OC is still there (checked via refclock.sh)

Dang!
Do you have the cover on or Intrusion Switch disabled?
Are you using the integrated video?
What sequence do you see on boot?
Thanks for testing.

EDIT: Did you cycle again? The sequence you mention works for me.
It's when I power off again without entering the OC command that the clock returns to stock.
 
Last edited:
Core32, to answer your earlier question. [H] splash is expected when optimal defaults are loaded.

You can turn it off run-time by pressing Tab key or boot-time by disabling Quiet Boot.
 
I may have found at least a work-around.
I've added the smocng.sh command to set the OC value, to my rc.local file.
Just like my vcore setting, this will run everytime I start up.
After two trials it appears to "force" the setting to at least be active on power down.
Then when I power back on it's as if I just cycled to set the OC.
Now, to run a days folding and see if there is any negative effect and try a cycle to see if the force works.
I appreciate all the help.
 
Dang!
Do you have the cover on or Intrusion Switch disabled?
Are you using the integrated video?
What sequence do you see on boot?
Thanks for testing.

EDIT: Did you cycle again? The sequence you mention works for me.
It's when I power off again without entering the OC command that the clock returns to stock.

I do have the cover on. This system is currently a 3p (1x6176, 2x6174) using Dynatron A6 HSF's. I only have 3 of the 80mm system fans plugged in.

I flashed the bios. Powered off. Powered on. Set to optimal defaults. Changed the fan speed to minimal (ES setting in bios, I believe). Booted. Applied OC. Power off. Power on. Check OC.

I just did some further testing. Apply OC, Power off. Power On. Check OC. Power Off. Power On. Check OC (all good). Power off. Completely disconnected from power (pulled plugs from wall). Power on. Check OC (Still good). Reboot. Check OC. Reboot. Check OC.

I have yet to see any issues with the OC going away.
 
I think you've run the gammut. Thanks for spending the time.
The only difference is I am running with the top down.
Runs cooler (and louder) with the top off but leaving the plastic air guide in place.
I will try the sequence with the top on tomorrow to see if that has any effect.
 
Bought this barebones server.
Which has this motherboard

Umm I have 2 of those Core32 and 1 of them has a short somewhere. I have not really looked for it since it has not affected anything yet and I am aware of it so I use caution when I am around it. I discovered it accidentally when I was doing the initial OC I had a screwdriver in my hand and rested my hand on the edge of the case. The screwdriver came in contact with the edge of the case and the tip of the screwdriver was about a 32nd of an inch from the outside edge of the motherboard. Needless to say I heard a little snap saw a faint spark and said Ohhh shit as the computer shut down.

I then wiped the sweat from my brow hit the start button and sat there chewing my fingernails as I waited for it to boot back up, lucky it booted and came back to life but it had lost it's OC setting. Not saying yours has a short but it is possible, I really do not recommend or advise the screwdriver test :rolleyes: I am sure you have allot better equipment to test for that sort of thing than I do. Anyway it might be a place to look.
 
Multimeter, ground to ground, any significant voltage and you may have a ground issue.
 
Still believe I have an unusual jumper setting or something. Or a boo-boo when I set them up.
These apparently came factory fresh. They looked unopened and still had the plastic shrink wrap on everything. Heatsinks had unblemished thermal compound.
The work around I mentioned is effective. I just need to remember to change the rc.local file when I want to change the OC value.
 
I would be tempted to pull the motherboards out the cases to make sure they are not getting grounded.
 
I'm always a curious skeptic until I understand a problem, so I "almost" never say never.
But that being said:
My 30+ years experience designing electronics and pcbs, and then working with mechanical engineers on the housings for them, tells me it's very unlikely a frame ground fault
would manifest itself into this type of issue on two identical rigs with non-similar serial #s.
These are server chassis and they usually have to meet several industry standards to get certified and go through a rigorous set of design verification tests (DVT).
But I will agree there is something different about these two from others using them.
The most obvious difference is "ME" :eek:
And that's why I continue to believe it's some type of operator error!
 
Don't get me wrong, Core32.

Evidently (given your experience), NVRAM area that got allocated to OCNG
isn't as "safe" as I thought (even though NVRAM map shows no users).

We could think about allocating some other area but think we should first
get an understanding of what's going on....

Hmmm... I've got an idea.
Can you please do this:
Code:
sudo dd if=/dev/mem bs=1M of=core32-ng2.rom skip=$((0xffe)) count=2
an send me (mynick@braxis.org) the result.
I wonder whether extraction of the map from your "stored ROM" will give different results...
 
Status
Not open for further replies.
Back
Top