Supermicro H8QGi/6 and H8QGL Next Generation OC BIOS

tear · Apr 24, 2012

Thanks... Ugh, we're running out of options.

Do you have any add-in cards?
Is your IPMI enabled?

Something makes your system special... trying to find what it is.

Core32 · Apr 24, 2012

No add-in cards. I believe the IPMI is enabled and from what I can tell, causes a double boot sequence sometimes from power off/on cycle.
I see it going through a sequence of "dots" on the boot screen is what I am trying to describe.

tear · Apr 24, 2012

Can you disable IPMI/BMC and see how that affects the issue? (make sure to make a "clean run"
-- pull the plug before turning the IPMI off)

Core32 · Apr 24, 2012

Is that a BIOS setting?
Or a command from LInux?
I'll check through my manual while I wait to see if you know already

tear · Apr 24, 2012

It's a jumper on the motherboard.

Core32 · Apr 24, 2012

JPB1. I'll check it, disable and test.
Thanks.

Core32 · Apr 24, 2012

No effect.
I even reset the CMOS and loaded Optimal Defaults again to be sure the IPMI was not hard set some how.
On other issue that I brought up before but it did not seem like a big deal at the time is that most every boot, I cannot enter the CMOS setting screen.
Pressing F1 will not get me there. It shows the large red [H] screen, goes blank then after a few seconds the GRUB menu.
Just thought I would mention it again.
Thanks for all the effort!

dwdawg · Apr 25, 2012

Er, forgive me if I'm being captain obvious.
Try 'del' instead of F1.

Core32 · Apr 25, 2012

Not obvious enough to be annoying!

I can try that this evening.
The reason I used/tried F1 is that's the splash message I see on the rare occasion when something other than just the red [H] comes on screen during boot up.
The "obvious" is often obscured by the "hope for complicated" in my line of work.

Core32 · Apr 25, 2012

dwdawg said:
Er, forgive me if I'm being captain obvious.
Try 'del' instead of F1.

Ok.

Del works to get me into the CMOS menu during boot.
But I'm still stuck with loosing the OC setting.
Would not appear to be a lot of other options for me at this point.
Is there any way to capture the output to the terminal during the [H] "tuning" portion of the boot sequence?

402blownstroker · Apr 26, 2012

For how much trouble you have had Core32 with that board, have you thought about reflashing the BIOS back to stock and doing a RMA on it? Between you and me, I think we are the only ones that have had as many problems

Core32 · Apr 26, 2012

I have two, identical "issue". They were both bought as new barebones servers so I'm not ready to declare them as failures, just different than others.
This is not something that keeps me from folding. They are up and OC'd 99.99% of the time.
Just that I have to manually go through the process of setting the OC and reboot again if I need to shut down or lose power.
It's mostly just an annoyance.

Kendrak · Apr 26, 2012

Easy fix is to never turn them off

Core32 · Apr 26, 2012

If I could just get Walton EMC to abide by that rule.......

tear · Apr 26, 2012

Core32, what boards are these, exactly?

There's no evidence of [H] tuning in released version (other than
few hello-like statements on serial console).

Dev versions include more output on the serial console but still,
they do not write to NVRAM... something else does...

Hmm... perhaps upgrading IPMI's firmware could change things
here? (I know the issue manifested w/IPMI disabled as well, but...)

Core32 · Apr 26, 2012

Bought this barebones server.
Which has this motherboard

tear · Apr 26, 2012

Got it. One of my associates

D) happens to have same board that also came from exactly
same barebones -- I'll try to gather more info.

Core32 · Apr 26, 2012

Cool!
Any information is greatly appreciated.
Hopefully this will help someone else in the future as well.

tear · Apr 26, 2012

Not having much luck.

I have never seen that issue on either of my Gis, FWIW

Let me ask you one more question... have you only seen it w/261 16
or are target OC settings not relevant? (would "plain" 220 make it manifest
as well?)

EDIT: that is with NG2.A11 ROM, right?

Core32 · Apr 26, 2012

No. Since day one it has been this way, no matter what the OC.
My other rig, same problem, will not run stable over a 243 OC and I have the same issue.
Yes, as well, early on I did not use the extra qualifier.

Sorry, but how can I verify the ROM version?

tear · Apr 26, 2012

Code:

sudo dd if=/dev/mem bs=1M skip=$((0xffe)) count=2 | strings -a | grep H.*NG

Core32 · Apr 26, 2012

This is the response:

Code:

-H8QG6:~/fah$ sudo dd if=/dev/mem bs=1M skip=$((0xffe)) count=2 | strings -a | grep H.*NG
[H] G60NG2.A11
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.74077 s, 2.8 MB/s

tear · Apr 26, 2012

(I made a mistake earlier, it *is* supposed to be NG2.A11)

Ok, so this looks good too.

I'm burned out...

Core32 · Apr 26, 2012

Sorry for the stress. And thanks again for the effort put forth.
Could I have done something odd when flashing the BIOS?
Doesn't seem likely it would run at all if I had made a mistake there.

Just for grins, what do you see during the boot process, just the large [H] screen?
After I set optimal defaults (say after a CMOS clear), I never see the boot screen or memory "listing" display again no matter how I shut down or power off.

EDIT: How about the Watch Dog jumper function? Could that be sensing some delay on boot?

firedfly · Apr 26, 2012

Core32 said:
Sorry for the stress. And thanks again for the effort put forth.
Could I have done something odd when flashing the BIOS?
Doesn't seem likely it would run at all if I had made a mistake there.

Just for grins, what do you see during the boot process, just the large [H] screen?
After I set optimal defaults (say after a CMOS clear), I never see the boot screen or memory "listing" display again no matter how I shut down or power off.

I have the same SM barebones system. I'm flashing the bios as I type this to see what my experience is.

jebo_4jc · Apr 26, 2012

tear said:
I'm burned out...

I wouldn't be surprised! You've been an absolute machine. We're thankful for your hard work.

firedfly · Apr 26, 2012

firedfly said:
I have the same SM barebones system. I'm flashing the bios as I type this to see what my experience is.

I did not see any issues. Set the OC, power off, power on, OC is still there (checked via refclock.sh)

Core32 · Apr 26, 2012

firedfly said:
I did not see any issues. Set the OC, power off, power on, OC is still there (checked via refclock.sh)

Dang!
Do you have the cover on or Intrusion Switch disabled?
Are you using the integrated video?
What sequence do you see on boot?
Thanks for testing.

EDIT: Did you cycle again? The sequence you mention works for me.
It's when I power off again without entering the OC command that the clock returns to stock.

tear · Apr 26, 2012

Core32, to answer your earlier question. [H] splash is expected when optimal defaults are loaded.

You can turn it off run-time by pressing Tab key or boot-time by disabling Quiet Boot.

Core32 · Apr 26, 2012

I may have found at least a work-around.
I've added the smocng.sh command to set the OC value, to my rc.local file.
Just like my vcore setting, this will run everytime I start up.
After two trials it appears to "force" the setting to at least be active on power down.
Then when I power back on it's as if I just cycled to set the OC.
Now, to run a days folding and see if there is any negative effect and try a cycle to see if the force works.
I appreciate all the help.

firedfly · Apr 26, 2012

Core32 said:
Dang!
Do you have the cover on or Intrusion Switch disabled?
Are you using the integrated video?
What sequence do you see on boot?
Thanks for testing.

EDIT: Did you cycle again? The sequence you mention works for me.
It's when I power off again without entering the OC command that the clock returns to stock.

I do have the cover on. This system is currently a 3p (1x6176, 2x6174) using Dynatron A6 HSF's. I only have 3 of the 80mm system fans plugged in.

I flashed the bios. Powered off. Powered on. Set to optimal defaults. Changed the fan speed to minimal (ES setting in bios, I believe). Booted. Applied OC. Power off. Power on. Check OC.

I just did some further testing. Apply OC, Power off. Power On. Check OC. Power Off. Power On. Check OC (all good). Power off. Completely disconnected from power (pulled plugs from wall). Power on. Check OC (Still good). Reboot. Check OC. Reboot. Check OC.

I have yet to see any issues with the OC going away.

Core32 · Apr 26, 2012

I think you've run the gammut. Thanks for spending the time.
The only difference is I am running with the top down.
Runs cooler (and louder) with the top off but leaving the plastic air guide in place.
I will try the sequence with the top on tomorrow to see if that has any effect.

Grandpa_01 · Apr 27, 2012

Core32 said:
Bought this barebones server.
Which has this motherboard

Umm I have 2 of those Core32 and 1 of them has a short somewhere. I have not really looked for it since it has not affected anything yet and I am aware of it so I use caution when I am around it. I discovered it accidentally when I was doing the initial OC I had a screwdriver in my hand and rested my hand on the edge of the case. The screwdriver came in contact with the edge of the case and the tip of the screwdriver was about a 32nd of an inch from the outside edge of the motherboard. Needless to say I heard a little snap saw a faint spark and said Ohhh shit as the computer shut down.

I then wiped the sweat from my brow hit the start button and sat there chewing my fingernails as I waited for it to boot back up, lucky it booted and came back to life but it had lost it's OC setting. Not saying yours has a short but it is possible, I really do not recommend or advise the screwdriver test

I am sure you have allot better equipment to test for that sort of thing than I do. Anyway it might be a place to look.

Untitledone · Apr 27, 2012

Multimeter, ground to ground, any significant voltage and you may have a ground issue.

Core32 · Apr 27, 2012

Still believe I have an unusual jumper setting or something. Or a boo-boo when I set them up.
These apparently came factory fresh. They looked unopened and still had the plastic shrink wrap on everything. Heatsinks had unblemished thermal compound.
The work around I mentioned is effective. I just need to remember to change the rc.local file when I want to change the OC value.

402blownstroker · Apr 27, 2012

I would be tempted to pull the motherboards out the cases to make sure they are not getting grounded.

Core32 · Apr 27, 2012

I'm always a curious skeptic until I understand a problem, so I "almost" never say never.
But that being said:
My 30+ years experience designing electronics and pcbs, and then working with mechanical engineers on the housings for them, tells me it's very unlikely a frame ground fault
would manifest itself into this type of issue on two identical rigs with non-similar serial #s.
These are server chassis and they usually have to meet several industry standards to get certified and go through a rigorous set of design verification tests (DVT).
But I will agree there is something different about these two from others using them.
The most obvious difference is "ME"

And that's why I continue to believe it's some type of operator error!

402blownstroker · Apr 27, 2012

I hate to ask, but you have gotten the proper certification to use these right? If not, you can go to www.certifymetouseh8qgi-f.com

Core32 · Apr 27, 2012

Of course! Took the test 5 times.... Does a 10 out of 50 count?

tear · Apr 27, 2012

Don't get me wrong, Core32.

Evidently (given your experience), NVRAM area that got allocated to OCNG
isn't as "safe" as I thought (even though NVRAM map shows no users).

We could think about allocating some other area but think we should first
get an understanding of what's going on....

Hmmm... I've got an idea.
Can you please do this:

Code:

sudo dd if=/dev/mem bs=1M of=core32-ng2.rom skip=$((0xffe)) count=2

an send me (mynick@braxis.org) the result.
I wonder whether extraction of the map from your "stored ROM" will give different results...

Supermicro H8QGi/6 and H8QGL Next Generation OC BIOS

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|Gawd

[H]ard|DCer of the Month - January 2013

[H]ard|Gawd

[H]ard|Gawd

[H]ard|DCer of the Month - Nov. 2012

[H]ard|Gawd

[H]ard|DCer of the Year 2009

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Month - February 2012

[H]ard|DCer of the Month - April 2011

[H]ard|DCer of the Month - February 2012

[H]ard|Gawd

[H]ard|DCer of the Year 2011

[H]ard|Gawd

[H]ard|DCer of the Month - February 2012

[H]ard|Gawd

[H]ard|DCer of the Year 2013

[H]ard|DCer of the Month - April 2012

[H]ard|Gawd

[H]ard|DCer of the Month - Nov. 2012

[H]ard|Gawd

[H]ard|DCer of the Month - Nov. 2012

[H]ard|Gawd

[H]ard|DCer of the Year 2011