Supermicro H8QGi/6 and H8QGL Next Generation OC BIOS

Status
Not open for further replies.
Speaking of which how is the v modding of the motherboard coming along core32 any progress.?

Only that the second 4P, 6166HE, is folding and appears stable at 245.
I didn't want to bring down the 6128 machine until I was happy the second would keep me contributing.
About a week delay because the CPUs were shipped insured by USPS and no one could be home to sign.
I "expect" to start looking for a solid location on the MB to capture the VRM I2C signals this week, if SWMBO gives me the time ;)
 
My OC on a H8QGL-IF+ didn't go well.

I wonder if I can get a little help to get back to stock? I have my SM Bios installed with the optimal defaults.
My folding went fine for the first 4 or 5 frames and then ground to a halt, almost 2 hours a frame on a 6904.

Before I blame the stock Bios I thought I should remove the only other change I made "setup utility"
Is there a way or is it even necessary to remove it after going back to a stock Bios?

Otherwise I'll have to re install the OS. Then if that doesn't work I'll check the stock Bios.
Thanks's in advance.:)

and then assuming I get things back to normal, I'll try on the weekend to figure out what I did wrong with the OC bios.

Does the setup utility work without "Thekraken" installed?
Does it matter where I run it from? I ran it from my home folder in a folder named OC.
Does it matter that I am on Ubuntu 11.10?
 
Last edited:
Yes, few folks (ChelseaOilman, dwdawg) experienced HT issues without prior (or w/hardly any) warning. No idea as to the cause yet.


dfonda, if you've flashed stock BIOS, that's all that needs to be done.
Mere presence of setup utility doesn't affect the system.

More than that, even if you were to use smocng.sh w/stock ROM it would
have no effect (as stock ROM doesn't read OC configuration).

What I am confused about is whether these symptoms:
My folding went fine for the first 4 or 5 frames and then ground to a halt, almost 2 hours a frame on a 6904.
were w/OC or stock ROM and whether any OC was applied at that time.
 
Hi Tear .

I tested refclock=243 but it fails because of " Client-core communications error: ERROR 0x8b ", see FAHlog.txt :

[14:28:38] Completed 175000 out of 250000 steps (70%)
[14:46:28] Completed 177500 out of 250000 steps (71%)
[15:04:11] Completed 180000 out of 250000 steps (72%)
[15:21:50] Completed 182500 out of 250000 steps (73%)
[15:39:34] Completed 185000 out of 250000 steps (74%)
[15:57:28] Completed 187500 out of 250000 steps (75%)
[16:15:11] Completed 190000 out of 250000 steps (76%)
[16:19:57] CoreStatus = 8B (139)
[16:19:57] Client-core communications error: ERROR 0x8b
[16:19:57] Deleting current work unit & continuing...
[16:20:21] - Preparing to get new work unit...
[16:20:21] Cleaning up work directory
[16:20:24] + Attempting to get work packet

So i decided to do a test with default timming of ddr3 ( 99924 instead of 77721 ) .

For the moment, the PPD is the same .:eek:

Hope this help to stabilize the system . Test is in progress .

If it fails, i will go back to refclock = 240 .
 
Last edited:
" dfonda" My folding went fine for the first 4 or 5 frames and then ground to a halt, almost 2 hours a frame on a 6904.
"tear" were w/OC or stock ROM and whether any OC was applied at that time.
Tear, this was stock, it ran fine in the OC Bios, I just couldn't get into the Bios...so I switched back to stock to figure out what I did wrong.

It ran fine in the stock Bios as well, for the first 4 or 5 frames I checked before I went to bed. Then when I woke up, I noticed it had a problem. I restarted it went to work and still had slow times when I got home.

I just went through all of my settings in Bios and OS to see if there was anything amiss. Everything looks correct as far as sleep modes and such. I have gone through a few cold reboots unplugging the PSU....I am checking now to see if that corrects it. My boots were clean and everything seems fine.

Thank you for the help. I'll report back, if the cold reboots are all it needed, I may not have done that when I went back to the stock Bios.

EDIT: So far so good the first frame was 36:37....
 
Last edited:
Hi Tear .

I tested refclock=243 but it fails because of " Client-core communications error: ERROR 0x8b ", see FAHlog.txt :
Typical symptom of crossing temperature threshold (for given Vcore).
Lower the temps or bump vcore (if you can).

Given that memory is now tuned/clocked with target frequency at boot
you can also run memtest86/IBT/whatnot to verify memory's health/tuning.
 
Since last night, i cleaned the filters of case's fans and my cpu température are far better now: 47-52 °C.
Concerning the cpu voltage , i can not bump it because it is already at the max.

So i suppose that there is only two possibility to explain the core error :

- not enought cpu voltage ; impossible to solve.
- too tight memory timming : test in progress .

If i find still some error , i will go back to 240 and the overclock will be over.:)
 
Now that you've lowered the temps you'll probably be fine (even
if you go back to XMP timings).

CPU stability (simplified) is a function of three variables:
- supply voltage
- frequency
- temperature

if (unstable) {
    LOWER TEH TEMPZ and/or
    RAISE TEH VOLTZ and/or
    LOWER TEH OC;
};

Also, if you're at the "edge of stability" minor supply voltage
drop or ambient temp rise is very likely to crash your unit.

My recommendation (for now) is testing OC at slightly elevated temps
to ensure that ambient temp rise (due to weather/ventilation) doesn't
give you headache.

"Slightly elevated" is typically realized by fan controller; removing fans
is too drastic of a method (temps will rise too sharply).
 
Sorry a bit off topic - I am testing a quad 6176SE. For those who has these CPUs, what's your temp from tpc at 100% load? Mine is about 47C using balanced setting in BIOS with Noctua U12DO A3. The problem is that the board shuts down unexpectedly after I run fah for 5 mins. Otherwise it's working fine.

Thanks.
 
Sorry a bit off topic - I am testing a quad 6176SE. For those who has these CPUs, what's your temp from tpc at 100% load? Mine is about 47C using balanced setting in BIOS with Noctua U12DO A3. The problem is that the board shuts down unexpectedly after I run fah for 5 mins. Otherwise it's working fine.

Thanks.

The same thing happened to me twice when I was working on OC'ing my 6166HEs. The solution was to increase the vcore.
Both tear and I was surprised that insufficient vcore would result in such a hard crash but my conclusion was that, because it worked twice.
I don't know if you can increase the vcore on 6176SEs though :confused:
 
Sorry a bit off topic - I am testing a quad 6176SE. For those who has these CPUs, what's your temp from tpc at 100% load? Mine is about 47C using balanced setting in BIOS with Noctua U12DO A3. The problem is that the board shuts down unexpectedly after I run fah for 5 mins. Otherwise it's working fine.

Thanks.

Running 6176SE with the classic OC of 12.5% in place and the exact same HSF as you, my CPUs range from 38-51C. I've got the GL board with the awkward CPU positions, so not all the fans are installed. The board is however running naked and in a hosting facility with proper cooling.
 
Hmm... temps are similar. Not sure why the shutdown/crash. Suggestions welcome! Thanks.
 
Sorry a bit off topic - I am testing a quad 6176SE. For those who has these CPUs, what's your temp from tpc at 100% load? Mine is about 47C using balanced setting in BIOS with Noctua U12DO A3. The problem is that the board shuts down unexpectedly after I run fah for 5 mins. Otherwise it's working fine.

Thanks.

Perhaps this is due to VRM overheating because Noctua don't push fresh air directly onto them ?

Indeed 6176SE consume a lot of power : 140 watts per Cpu .
 
Thanks for the suggestions, but I am using a Seasonic X-1250 and got a huge fan blowing directly at the VRM of the 4 CPUs. Northbridge(??) temp was around 50C.

I hope it's not CPU related :p
 
URGH, all that should really have gone into separate thread.... :|

What is the board resting on?
 
Latest news about my OC with default ddr3 timming and refclock = 243 .

Seems to work fine until now . Completed one 6904. The second is in progress .

Average TPF = 17:30 minutes .:cool:
 
If your overclock is stable on 6903/4s might want to try an 8101...

its beta but it will be out eventually... pushes much harder.
 
If your overclock is stable on 6903/4s might want to try an 8101...

its beta but it will be out eventually... pushes much harder.

Personal experience, it will take a perceived stable OC and make it unstable. :(
 
I would like to give my Mobo another try this weekend...is it confirmed that Ubuntu 11.10 works? Otherwise I'll install 10.10.
 
Both my 4P servers are running Ubuntu 11.10 but I used the ext3 file system, not the default ext4 file system.
 
Both my 4P servers are running Ubuntu 11.10 but I used the ext3 file system, not the default ext4 file system.
I am on ext4 but I run an ssd on this rig, which gets around the communication delays with Stanford, I would think that should be OK.

Any thing 11.10 users setup differently?
Thanks for the info folks...;)
 
I am on ext4 but I run an ssd on this rig, which gets around the communication delays with Stanford, I would think that should be OK.

Any thing 11.10 users setup differently?
Thanks for the info folks...;)

The ssd will mask the problem with the ext4 file system. I would still strongly recommend using ext3. You could probably get away with shrinking you current partition, make a new ext3 partition, and mount it at /home/<user>/fah. That would solve the problem without a reinstall.

11.10 should work fine for you. Several use it without issue.
 
Finally the OC BIOS is loaded and my quad 6176SE are humming along nicely at 210. With that I get around 5:54 with 6901. Thanks tears and everyone who make this OC dream a reality :)

Will push the limit higher when the folding rig gets a better home.
 
just curious about the HT Retries and was wondering if the rest of you were seeing the same pattern as I am. When you run the command's chmod +x ht-retries.sh and sudo ./ht-retries.sh when testing your OC I have noticed that on all three of my rigs the retries are always on either node #3 or on node #5. I was just wondering if the rest were seeing the same thing.

Which sockets are #3 and #5 anyway. If it is indeed a pattern would it be a board issue or a bios issue. Anyway I was just curious as to what others might be seeing.
 
........I have noticed that on all three of my rigs the retries are always on either node #3 or on node #5. I was just wondering if the rest were seeing the same thing.

Which sockets are #3 and #5 anyway. If it is indeed a pattern would it be a board issue or a bios issue. Anyway I was just curious as to what others might be seeing.

On my first rig, the retries I observed occasionally were on Node 7 while using the 6128 CPUs, OCd above 250. None now that I am running the 6166HE CPUs.
On the second rig I see a few on Node 0, but the number is about 100 total after 5 days of folding. Also using 6166HEs.

If I read the previous thread information correctly here and you are using the SuperMicro MB, Node3 (assuming numbered, 0 to 7) would be the second core in CPU 2 (assuming numbered, 1 to 4) and Node 5 would be the second core in CPU 3 (assuming same numbering).
No idea about the cause. Still a learning noobie :) But I would suspect CPU, PSU or MB before BIOS, not necessarily in that order.


 
The only time I had ht retries was after I did a reboot instead of a power cycle. It didn't look like they had a pattern. Other than that I never had any ht retries during regular operation.
 
Seriously considering selling my Tyan board but not sure how much I'd pose going from tyan to SM, and if the OC would make up for money spent
 
Seriously considering selling my Tyan board but not sure how much I'd pose going from tyan to SM, and if the OC would make up for money spent

Which chips do you have? Based on my 6174 and 6180 chips, I'd guess you'd get an extra 75-100k ppd. My stock 6174's were pulling about 510k ppd. OC'd 12.5%, they were around 590k, IIRC. You will be limited on your OC based on the worst CPU you have. If one can only OC 8%, thats where you will be limited. However, most g34 dodeca's seem to hit 12% or higher.
 
He has 6168s, which are interesting chips in they they are the low end of the mid-range power chips. We have absolutely no experience with them to date that I know of, so I can't really speculate. As firedfly pointed out, most dodeca G34s seem to be able to do 12%, which would put you in the stock 6172 range of 475K ppd. Is it worth it to take this chance? That is entirely up to you.
 
Seriously considering selling my Tyan board but not sure how much I'd pose going from tyan to SM, and if the OC would make up for money spent

It really depends on the chips you have, but with 6168's I'd expect at least 15% OC (230 refclock). If you're lucky, you can beat the 230 barrier and get close to 250. I was extremely lucky to have my 6166's hitting 260, so don't take it as "expected".

I'd say you can expect 230 (as in "expected value" in stats) and do your calculations accordingly, and see if you think it's worth your money. You can end up at lower or higher than 230 though (ok, duh!)...
 
Sorry for the newb question, but just so I know I am looking for the right board.

It's all board that begin with H8QGL, H8QG6 & H8QGi correct? Do I need to worry about what's after that? Such as Supermicro H8QGi-F-O would be considered a H8QGi board and is good to go despite the -G-O at the end?
 
As an eBay Associate, HardForum may earn from qualifying purchases.
Nope, you don't need to worry.
All 4p G34 SM boards are currently supported.
 
With release 2 of the H8QG6/i BIOS, I am running now at a 240 blck. 6903 is getting about 14m20s TPF. Turns out I had to bump up the vcore voltage on the 6166HEs. For some reason I could not boot with a blck rate above 220 with release 1 of the BIOS. Any ways, if 240 is stable I will push it a bit more. Definitely a happy camper now :D
 
I think one last question.

I can run one CPU at a time, correct? So I can get the motherboard, 1GB module of DDR3 1333 CL7 (or should each CPU have 2x1GB DDR3 1333 CL7?), 850W PSU (true 850W PSU from like Antec, Corsair, Seasonic) and I'll be good to go with one CPU and add RAM/CPU as I go?
 
Status
Not open for further replies.
Back
Top