OCNG5: OC firmware for Supermicro AMD G34 platforms

gigatexal, well done! Tear, you continue to be a champ.

With respect to hot VRMs: If the board is positioned vertically, as inside a case or hanging on a wall, it will dissipate the heat better. If if it's horizontal, a LOT of heat builds up underneath the VRMs. If you can, position the 120mm fans so that they blow across both the top side of the board and the underside.

Looking at your picture, it does appear that you have air passing under the board. OH, sorry, that pic is in Tear's post.

Would you say that the airflow generated over the board by being in a 1U rackmount case is sufficient? Not eager to blow up my board but that's the configuration it's in.
 
If your system is qualified for SE (140W) chips (and you're using standard power chips from
what I can tell?), you should have a decent bit of headroom (140W vs 115W -- that's ~21%).

Though note that the box may become (even more) noisy -- not sure how much of a concern
that is to you.
 
If your system is qualified for SE (140W) chips (and you're using standard power chips from
what I can tell?), you should have a decent bit of headroom (140W vs 115W -- that's ~21%).

Though note that the box may become (even more) noisy -- not sure how much of a concern
that is to you.

SM claims it'll run the 6100/6200/6300 CPUs....I assume it will take the SE CPUs, but I don't see the specific ones (aside from the 6300P) mentioned. And yes I have the regular Opty 6376 CPUs. Noise is at least temporarily not an issue--have it running in the basement where it can whine away all day.
 
Wasn't loud enough already, so I've OCed things:
IMG_20150918_233235.jpg

These are Opty 6376s....if it'll do 2.86GHz that's pretty awesome :)
 
only 6376s? mine do 3.2ghz, overclock that stuff some more!
 
only 6376s? mine do 3.2ghz, overclock that stuff some more!

Well, I dunno what cooling you have, but this is a 1U case w/ the 1U heatsinks and a load of 40mm fans. It's already running in the mid 50s C right now and is deafeningly loud. So until it cools off outside here's where it'll stay.
 
oh yeah, i am rocking an open case, noctua dual 92mm fans, two 140s on top blowing cool air down and a 180mm fan at the front blowing air across the mobo
 
oh yeah, i am rocking an open case, noctua dual 92mm fans, two 140s on top blowing cool air down and a 180mm fan at the front blowing air across the mobo

Yeah that'd do it. Kiev (see sig) is running with a pair of Noctua 92mm and a pair of Noctua 120mm coolers and it runs a nice and chilly 38c under load. But with nothing but 40mm fans here it is LOUD (and mid-50s I think makes for a good upper bound to temps). I looked into getting some nice Noctua coolers for it but at $100+ each on Newegg I can't really justify close to $500 on new heatsinks. It'll do as-is.
 
ahh the good life of quad optys

and cheers to tear for getting me a huge boost out of my chips
 
ahh the good life of quad optys

and cheers to tear for getting me a huge boost out of my chips

Amazing indeed. Looking forward to seeing how it does for WCG now that it's OCed. Pretty awesome that we can OC these quad-Optys but the dual-CPU Xeons can't even be OCed. Really makes these seem like a particularly good deal.
 
Hey all, not totally related to the OP here...but my quad 6276 system even just with the stock BIOS under Win Server 2012 R2 is completely shutting down once it hits heavy load.

I thought it was maybe a bad connection with the RAM or something, so I pared it back to a single DIMM per CPU. Still no joy though. I fired up Cinebench, it ran successfully a couple times, and then the next time I ran it the system shut down.

Anyone here experienced this before? I don't have a ton of time these days to troubleshoot boxes so hopefully someone has a better idea of what I should be looking into.
 
Not sure if it helps, but do you have a wattage meter (ie Kill-a-watt) hooked up? i had this issue when i overclocked my dual Opteron 8439SE setup to 3.26 GHz and ran Intel Burn Test. i would hit a certain point on test (meter showed close to 600w) and the 650w power supply would shut the system down due to overload. i wonder if you could be running borderline close to your power supply's limits perhaps?
 
Hey all, not totally related to the OP here...
I'd say new thread was totally warranted here...

but my quad 6276 system even just with the stock BIOS under Win Server 2012 R2 is completely shutting down once it hits heavy load.
Some CPUs have been identified to be causing this (the board itself, per SM, doesn't
employ any overcurrent protection circuitry). The suspicion is that they're incorrectly*
asserting THERMTRIP# signal (which causes the board to shut the system down).

*) or perhaps correctly, given scenario with severely impaired thermals between IHS
and the dies

Are your CPUs production or ES units?

Monitoring temps (TurionPowerControl -mtemp) and recording last reading before shutdown
could shed add'l light on your problems.

Another possibility, per rvborough's post, is PSU hitting OCP threshold so it makes sense
to ensure this isn't the case.
 
Last edited:
Thanks for the replies guys, made this post as I was about to hit the bed after a night of frustration and realized I was way too sparse with the details...

The CPUs are indeed Eng Samples, and the PSU is a pretty beefy I think 1200W Gold rated Seasonic.

The thermals are totally fine at least according to HWMonitor, thing never gets past 30 degrees...and I can do something like a fresh boot into Win Server, launch CineBench, and get it to crash within 5-10 seconds, which I don't think would be hitting thermal limits no matter what.

Are there any hacks or anything like that which would deactivate this THERMTRIP# signal you're talking about?
 
The CPUs are indeed Eng Samples
Indeed, some ES CPUs seen in the wild behaved exactly the way you describe.

The thermals are totally fine at least according to HWMonitor, thing never gets past 30 degrees...
I strongly encourage you to check temps using TurionPowerControl -mtemp because
what I'm about to suggest may fry your stuff... (and I don't trust HWMonitor).

Are there any hacks or anything like that which would deactivate this THERMTRIP# signal you're talking about?
If the board is wired per AMD recommendations, making the SP5100 ignore
THERMTRIP# coming from any CPU should be possible (from what I'm reading).

DISCLAIMER: you're doing this completely at your own risk; it may cause damage to your hardware in case of REAL thermal event

1. Download and run RWEverything (portable is the way I roll); then click I/O index button.
  [A child window will appear].
2. Click a button to the left of Key button (it will contain index/data pair, such as 70/71 [hex])
3. Set Index port to CD6 [hex] and data port to CD7 [hex] (the latter should happen automatically)
4. Window contents should update and look almost identically to the ones in the pic
5. Navigate to cell at position 68 [hex], it should contain value of 8C [hex], double-click it
6. Change the value to 84 [hex], confirm with Done button
7. Close RWEverything
8. Check if loading the machine causes a shutdown

rw-thermtrip.png
 
Last edited:
can this be used to undervolt/underclock? I want to get my current 95w CPU's down as far as possible.
 
Current OCNG release (5.2) does not support voltage or multiplier changes.

OCNG 5.3 (which will be released this week and will also support H8DG6/H8DGi) has
appropriate provisions but only positive adjustments can be done (e.g. Vcore +25 mV,
multiplier +2.5x).

If you want to dial things down, you can use TPC (https://code.google.com/p/turionpowercontrol/downloads/detail?name=tpc-0.44-rc2.zip) for that purpose.

If you are utilizing all P-states (not at the same time, naturally), you'll need to first determine
P-state configuration (i.e. TurionPowerControl -l) and then make appropriate adjustments;
for instance, frequency change could look like this:
Code:
TurionPowerControl -set ps 0 freq 3000 ps 1 freq 2500 ps 2 freq 2200 ...
and voltage change could look like this:
Code:
TurionPowerControl -set ps 0 vcore 1.2500 ps 1 vcore 1.1875 ps 2 vcore 1.0500 ...

If your CPU or OS is set up for single performance level, the change is tad more complex
as one needs to make the CPU switch out and then back into adjusted P-state.

Also, be wary of changing frequency of first non-boosted P-state (aka SW P-state 0 --
the one that holds nominal frequency) as changing it immediately changes the TSC
(at least on recent AMD CPUs and APUs). TSC change then screws OS clocks/timing up
and may cause the OS to crash or hang unless HPET is functional and enabled on your
platform and the OS is set up to use HPET for timing (rather than TSC).

HTH!
 
Hi Tear, i wonder if you might (for the sake of other folks on this forum attempting underclocking) also do me the favor of explaining here what might have happened to my H8QGi-F board when attempting underclocking with TPC and the PSI-L bit because i certainly can't. All i know is that i had the PSI-L bit enabled... when i transitioned from the lowest power state to the highest, and the board died as well as all cards plugged into the PCI-e slots. Needless to say - rather expensive to get things back up and running again. Thankfully my 4 uber rare ES processors, and memory were just fine. i'd rather this not happen to anyone else if possible.
 
Short version: do not use PSI_L for power savings on server platforms; it's typically
             used in mobile platforms and desktop/server boards are unlikely to
             properly support PSI_L


PSI_L is a control signal that the processor can send to VRM indicating low-load condition.
In reaction to PSI_L, VRM controller switches certain number of converter's phases off
(number of phases depends on board wiring).

PSI_L is sent by the processor when processor's voltage goes below certain value (== VID
goes above certain value -- VID scale is inverted). The idea is that lower CPU voltage
requirements do not necessitate use of all phases.

All looks fine in theory but the processor needs to be programmed with the PSI_L
threshold voltage so it knows which VIDs are low-power VIDs (PSI_L asserted) and which
VIDs are full-power VIDs (PSI_L deasserted) before enabling PSI_L.

Proper configuration of PSI_L threshold VID is responsibility of board's firmware as only
board vendor knows maximum power output of the VRM while in low-power state (PSI_L
asserted); in other words, only board vendor knows how many phases get turned off
when PSI_L is asserted.

None of SM server boards I've seen have configure the PSI_L threshold voltage (naturally,
they do NOT enable PSI_L either). The default threshold VID is zero. What it means is
that all VIDs get treated like low-power VIDs (!!) (which surely is incorrect and will
overload the reduced-phase-number VRM) the moment PSI_L gets enabled.

On these boards, enabling PSI_L without prior setting of PSI_L threshold VID will cause
the VRM to unconditionally enter low-power state and overload the VRM which is what I
_suspect_ happened in rvborgh's case.

If one wanted to do the right thing and set proper PSI_L threshold VID, one would first need
to examine/reverse engineer the board enough to determine if PSI_L circuit is properly
designed and, if so, determine the number of low-power phases. Next step would be
estimating maximal power output of the VRM while in low-power state and using it
to determine which CPU's P-states (and, consequently, VIDs) could be handled by the
VRM in low-power state (for family 15h CPUs details of this estimation are covered by
AMD pub 42301, section 2.5.1.4.1). As you may have noticed, this estimation
is CPU-specific -- using different CPUs will almost certainly change PSI_L threshold VIDs.

Even if one went through all this effort and was successful, back-of-the-envelope
calculations suggest savings of handful to 10W on a 4p system which, given their power
consumption, is like a drop in the ocean...
 
Last edited:
Hey guys! I'm back with more retarded questions! :D

Have finished my newest 4p opteron build this time watercooled and using spicey chips! The idea was to bring home a few cups from hwbot and then set it to DC duties for the rest of its life. But I'm still having lots of trouble ocing it! Got the script working and thanks again for all your hard work tear! I'm still pulling my hair out with temps though. Is there any way to convert temps to something human or to check if there was indeed a thermal trip and if so which CPU did it. I can't even get 3.5ghz out of these at the moment :(. Using windows server 2008R2, 4x 62xx chips, 16x1gb DDR3 chops and a supermicro H8Qi+-f board. It runs a 240 rad and a 360 rad with High SP fans and a D5 pump on max, so I'm hoping that's enough cooling for it!

I saw the post above regarding taking off the thermal trip sensor, which I will definitely look into, but I obviously don't want to actually cook it! haha. I have a thermal probe sitting on the edge of the IHS on one CPU (don't want to interfere with the mounting of the block) but it hasn't been a reliable indicator of real temps.

This is the beast btw!



 
Is there any way to convert temps to something human
Which temps do you have in mind? Temps reported by.. ? IPMI? TPC? Something else?
For CPU-reported temps you can use TurionPowerControl -mtemp
or to check if there was indeed a thermal trip and if so which CPU did it
You can easily tell a thermtrip by the fact of machine shutting off (though PSU's OCP could be responsible for this as well). Unfortunately, I have not devised a way to detect the CPU responsible for THERMTRIP.
What are your symptoms? Do they manifest with CPUs in stock configuration as well?
I can't even get 3.5ghz out of these at the moment :(. Using windows server 2008R2, 4x 62xx chips, 16x1gb DDR3 chops and a supermicro H8Qi+-f board. It runs a 240 rad and a 360 rad with High SP fans and a D5 pump on max, so I'm hoping that's enough cooling for it!
Make sure there's airflow both above *and* below the PCB to ease the pain on the VRM components.
I saw the post above regarding taking off the thermal trip sensor, which I will definitely look into, but I obviously don't want to actually cook it! haha. I have a thermal probe sitting on the edge of the IHS on one CPU (don't want to interfere with the mounting of the block) but it hasn't been a reliable indicator of real temps.
Disabling thermtrip is mostly theoretical as I haven't encountered a CPU that suffered from "premature thermtrip" issue (I think some folks here did, but, at the time, it didn't occur to me that THERMTRIP could possibly be switched off).

Re probe
Yeah, a probe would probably need to be placed between the block and the CPU (I've never done this, though).

It's also possible that thermal junction between the die(s) and the IHS is impaired == external probe wouldn't bulge in such situation (though on-die stuff would most likely be going crazy).

This is the beast btw!



Looks good!
 
Hi tear, thanks again for your reply! The machine is basically just shutting off completely after a few seconds of load. I will try another psu to make sure that's not the case, though 1200w should be ok!

It doesn't do it at stock speeds or even up ~3ghz. but once I push a bit harder or use 1.3v or higher it starts doing it. Idle is fine, just as soon as I start running something like wprime. So it does definitely sound like a thermal trip.

I've got 4 corsair airflow units sitting on each of the ram banks to cool vrm etc. Have also sinked all the vrms and I'm hoping all the fans on the rads will also provide more airflow.
 
Try a killawatt on the wall socket - you may find you are pulling more power than you think
 
That's a good idea! Managed to get a hold of a fairly cheap reader today.

Wprime 1024m load @ stock = ~475w at the wall (not too bad!)
Wprime 1024m load @ 3000/1.25v = ~725w at the wall
Wprime 1024m load @ 3200/1.3v = ~815w at the wall

Tried for 3300/1.32v and it powered off a couple of seconds into the wprime, before the reader could give a number.

While that's a crapton of power and the CPUs seem to draw power almost exponentially when OCing, I'd still be expecting well over 1200w at the wall before the psu should go into OCP. Just to make sure I have another 1200w PSU (A silverstone 1200G) that I'll swap over into it and see if it makes a difference.
 
My dual ES 62xx on air will do 3.2GHz at 1.25V all day but I think 1.3V might be an issue.
I can't recall but I thought there was a voltage limit that you can find using TPC. You may be hitting an OVP trip.
Have you just tried 3.3GHz at 1.25V ?
Does it run longer if so?
 
Last edited:
I can't tell but did you heatsink the VRM's and are you blowing air across the actual motherboard instead of just your radiators?
 
Yep, VRMs are heatsinked and there's 4 sets of corsair airflow units blowing air across the whole mainboard. Will try my other 1200w psu shortly.
 
I always like coming here better than other forums because you guys know more about the obscure hardware :D.
The Antec 1200w PSU is shitting itself! Hitting OCP far too early by the looks of it.

3300 @ 1.3v runs through fine, instead of the machine shutting off almost immediately.
Did a few more runs, got 3.8 @ 1.33v with a sub 40 second 1024m wprime. Lots more in this thing I reckon :D. Maybe some hwbot records to still be had!

Thanks again guys!!

edit: I realised afterwards, it's not really the antec psus fault. I should have realised when I put it in there. It's a split 12v rail psu. So it has 6x 12v lines. I'm using pretty much 1 or 2 of those rails only. Whereas the silverstone has a single 12v 100amp rail. So the antec was probably hitting it's OCP exactly when it should have been.
 
Last edited:
Got a mostly stable 4.2ghz @ 1.4v.

I think I know what thermal shut down looks like now. It goes to a black screen and eventually reboots itself. Trying for 4.4 @ 1.4v it blue screens like I would expect with not enough volts, but pushing any more volts it just does the black screen thing. And the more volts you push the quicker it does it :p. Seems like my Water cooling can't cope with anymore. I noticed in the script I'm running there is a section for the high temp limit, it's at 75 but pushing it higher doesn't seem to make any difference, so I guess it's a hardware thing in the CPUs rather than the board.
 
Got a mostly stable 4.2ghz @ 1.4v.

I think I know what thermal shut down looks like now. It goes to a black screen and eventually reboots itself. Trying for 4.4 @ 1.4v it blue screens like I would expect with not enough volts, but pushing any more volts it just does the black screen thing. And the more volts you push the quicker it does it :p. Seems like my Water cooling can't cope with anymore. I noticed in the script I'm running there is a section for the high temp limit, it's at 75 but pushing it higher doesn't seem to make any difference, so I guess it's a hardware thing in the CPUs rather than the board.

Are you cooling the VRMs of the board? Sorry if you've answered this earlier.
 
Yep, all vrm's are sinked and there are 4x corsair airflow units, one on each bank of ram to keep them cool. As well as a pedestal fan blowing across the whole unit (side panel is off). There's enough air flow to keep the back rad fans spinning even when powered off :p.
 
Do you actually see temps exceeding 70 (or 75 deg) just before the shutdown?
 
Well, another interesting development I should have seen coming! haha, according to the temp readings she's getting up to high 40's max under load. Which seems low but I think I found the real culprit again. At 4200 1.4v the power meter I have is showing a 1430w draw at the wall. And taking into account around an 80% efficiency, that's 1150w. That would be pretty much it for the 1200w silverstone Gold I reckon, I think it's hitting OCP, just in a different way to the antec 1200. That would explain why the more volts the faster it does it I guess! New psu time by the looks of it! Surely a 1550 should do right?! :p


edit: Watching the power meter while doing cinebench runs. 1437w peak, 1447, 1451, 1457 and the on the last run I did I saw a 1461 and then the machine rebooted :p. Looking at the EVGA 1600 G2 power supply, gets a 9.9 on johnyyguru!
 
Last edited:
Spoke to soon by the looks of it haha. Got the 1.6kw evga psu and it does the same thing :(. seems to be a temp thing or a limit on the current the board can push. Cranking the volts up at a lower clockspeed results in the same thing (eg 1.55v @ 3.5). Ah well, think I've reached the limit of this setup.
 
I do want to point out many that get to the clock rate and amperage territory you are in
are not there long whether it is due to the fact that something lets the magic smoke out,
usually VRMs or PSU, or the power bill/heat gets to be too much. Patriot had Harbringer
that was in the territory you are in and the board let the smoke out.

Please be careful. I really hope you are using an IR thermometer I see you mention crazy
air flow, however DO DO DO DO monitor the sensitive bits.

I don't know what chips you have, you can try to drop multi raise bus/raise multi/drop bus to
get every ounce out. I would wonder if other components need voltage or if that's all she wrote.

Do you feel any heat on the PSU connectors on the board? How about the PSU wires?
 
Last edited by a moderator:
hi Scobar,

On my quad 61xx ES setup... which i use for my home PC... i run 2 out of the 6 cores on each of the dies at 3.5 GHz, and the others at 1.7 GHz... that gives me 16 "fast" cores and 32 "slow". Voltage is 1.25, which ups the temperatures somewhat... but overall i am running 220 watts lower at full chat than with all cores running at 3 GHz. i then use process lasso to run my apps only on the "fast" cores... leaving the other cores for the rest.

Basically this gives me what AMD never made... a Phenom X16 :)

Now for my question... i overclock with TPC (core voltage and multipliers), but does OCNG have the ability to set the CPU-NB multiplier? i have heard that CPU-NB overclocking makes a big difference with the K10 cores... anyone have any experience with this? i've not tried it with TPC yet... i'm assuming that i need to do a warm boot between settings?
 
There is a facility for NB freq, this would be a better tear question as I haven't actively touched OCNG or g34 in a good 2 years, it wasn't this refined back then :) I don't know about multi, ocng allows you to raise ref clock which would increase speed of the NB. Might need to review multi to maintain your current mixed mode operation at the same speeds.

Also, check out this if you haven't: GitHub - kszysiu/tpc: Turion Power Control, this should allow you to change NB multi if memory serves.
 
On my quad 61xx ES setup... which i use for my home PC... i run 2 out of the 6 cores on each of the dies at 3.5 GHz, and the others at 1.7 GHz... that gives me 16 "fast" cores and 32 "slow". Voltage is 1.25, which ups the temperatures somewhat... but overall i am running 220 watts lower at full chat than with all cores running at 3 GHz. i then use process lasso to run my apps only on the "fast" cores... leaving the other cores for the rest.
Why not split it by whole CPU ? Then you could do 36/12 or 24/24 split and you'd be able to drop the voltage on the slower CPUs.
Now for my question... i overclock with TPC (core voltage and multipliers), but does OCNG have the ability to set the CPU-NB multiplier? i have heard that CPU-NB overclocking makes a big difference with the K10 cores... anyone have any experience with this? i've not tried it with TPC yet... i'm assuming that i need to do a warm boot between settings?
You probably could do that but I can't really recall if there are any caveats. It was long time ago and it's possible there's a reason this wasn't heavily advertised... Feel free to experiment; indeed a warm reset is needed to trigger frequency change.

There is a facility for NB freq, this would be a better tear question as I haven't actively touched OCNG or g34 in a good 2 years, it wasn't this refined back then :) I don't know about multi, ocng allows you to raise ref clock which would increase speed of the NB. Might need to review multi to maintain your current mixed mode operation at the same speeds.
And THAT is what I'd recommend. Bump the refclock to raise NB speed (it rises proportionally), then adjust multis/voltages.

Also, check out this if you haven't: GitHub - kszysiu/tpc: Turion Power Control, this should allow you to change NB multi if memory serves.
I think even 0.44-rc2 (latest one with binaries) can do that. Ofc, git version can do that too :)
 
I was able to apply your fw to my h8dgi-f and clocked my 6276 to 2.65Ghz.

I locked the turbo to max, and all cores in cpupower show 3.0Ghz. VM still only see 2.66 but my gaming (vfio passthrough) is working like a dream.

I'm a little concerned that once my 6282SE's come in, that if I try to OC that using a similar fsb bump (I think I'm running 230 fsb) that heat is going to become a factor. These 6282's are clocked at 2.6 by default and are rated at 140W, which my fans are rated for.

So an oc'd 6276 is no issue, since the TDP is 110W I believe. But if I take a 140W proc and OC it, I suspect I'll have issues with the fans as now I'm pushing into new heat territory beyond what my fans can handle.

idk though. I'm going to try. Right now my proc's are running at 58C under what I presume will be a high load, nowhere near their max heat profile. So maybe I'll be fine?
 
Back
Top