• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

2P crashes

Spazturtle

[H]ard|Gawd
Joined
Jan 4, 2013
Messages
1,525
Specs:
CPU1: ZS232245TGG44 matches 6276
CPU2: ZS232245TGG44 matches 6276
Mobo: Supermicro h8qgi-f
Ram: 8 sticks of crucial ballistix tactical 2gb @ 1600mhz cas 8

I have done a memtest and It has fully passed 2 times
On boot with the BMC enabled it will boot fine but after a reboot I get 6 bleeps and it requires the power to be cut in order to boot.
With BMC disabled It boots fine.

Without the kraken installed it crashes about 3 times a week.
With the kraken installed it crashes about twice a day.

I have tried multiple installs (debian 6, debian 7, ubuntu) and all get the same.

There is defiantly no overheating going on with the cpus as they sit at around 30c and tpc -htc shows no signs of past or present overheating.
The board should be well cooled with 2*230mm, 1*200mm and 1*140mm fans in the case, standoffs are used.
 
Kraken makes CPUs work harder so it's not surprising that it aggravates your symptoms.

Are you overclocked? Or are you using psmax 1 trick? Or none of the above?

Please post output of:
Code:
sudo tpc -l
sudo tpc -CM
w.r.t. the latter -- run for 30 seconds, then press Ctrl+C to interrupt.

Next, what are the exact symptoms of a crash? Reboot? Halt? FahCore crash?

Also, run this (and paste the output):
Code:
sudo egrep -i 'mcelog|machine.check|hardware.error' $(ls -rt /var/log/{messages,syslog}*)
 
Describe the crashes. With 8 sticks run memtest a lot longer... 6+ hours. Usually having to unplug/turn off the PSU is a good sign of it dying.
 
result of tpc -l:
http://pastebin.com/0mkHShJn

result of tpc -CM:
http://spazturtle.co.uk/tpc-cm.txt

Usually its a reboot but I have on some occasions had it just lock up (said something about a cpu soft lockup on screen on a few of those times, other times just black screen)

The command you gave me gives me this:
Code:
spazturtle@speedstar:~$ sudo egrep -i 'mcelog|machine.check|hardware.error' $(ls -rt /var/log/{messages,syslog}*)
ls: cannot access /var/log/messages*: No such file or directory
spazturtle@speedstar:~$ sudo egrep -i 'mcelog|machine.check|hardware.error' $(ls -rt /var/log/syslog*)
spazturtle@speedstar:~$

It is a few year old 750w psu but I would rater try and diagnose the issue then just randomly replace parts.
 
Don't overclock the 6200's manually. There is more than just clockspeed and volts going on.

Go turbo mode, then sudo tpc -psmax 1

Then dink with the pstate 1 mode to boost further.

I'm running pstate 1 (turbo all core), which for me is 3000 and 1.200v. Then I bumped the clock to 3200. Steady as a rock so far.

But when I changed the just the top non-turbo pstate, I couldn't get 2.9ghz stable even at 1.350v.

Just those simple changes went from 21:13 TPF down to 18:03 TPF.

But you must watch the temps. All Core Turbo lights it up.
 
Last edited:
Ok, so 3.0 and 1.175 V. To take vcore out of the equation I'd recommend going to 1.2000 V.

Next, your P-states look weird, one CPU in 2, the other in 6.
Given you're overclocking I'd recommend turning PowerNow off (in the BIOS) so Linux doesn't make CPUs change P-states.

Once you do that, -CM should consistently show ps 2 [also in the stats printed after 30s].
 
Don't overclock the 6200's manually. There is more than just bclk and volts going on.

Go turbo mode, then sudo tpc -psmax 1

Then dink with the pstate 1 mode to boost further.
That actually will not work.
Once you lock CPUs in P-state 1, no adjustments of P-state 1 will _ever_ take effect as no P-state transitions will occur any more.
 
It's working on the ASUS board. That's all I did. Clockspeed utility and TPF both agreed as I went from 2600, to 3000 to 3100 to 3200.

What is puzzling is I thought you had to change pstates. It instantly overclocked. And I "thought" I had turbo off. Next time I reboot I'll take a look. I must be mistaken.
 
It would depend on the definition of "dink" I suppose - if "dinking" involved switching the machine in and out of pstate 1, it would work.
 
What is puzzling is I thought you had to change pstates. It instantly overclocked. And I "thought" I had turbo off. Next time I reboot I'll take a look. I must be mistaken.
That is worth verifying.

What TPFs are you getting at 3200?
 
Ok, so 3.0 and 1.175 V. To take vcore out of the equation I'd recommend going to 1.2000 V.

Next, your P-states look weird, one CPU in 2, the other in 6.
Given you're overclocking I'd recommend turning PowerNow off (in the BIOS) so Linux doesn't make CPUs change P-states.

Once you do that, -CM should consistently show ps 2 [also in the stats printed after 30s].

Opps PowerNow was turned on today when I reset the BIOS, so only 1 crash has been whilst powernow was enabled, all the others have been whilst powernow was disabled.

I have set it too 1.2v and it crashes/reboots after a few minuets.
 
Can you show me outputs [w/o fah running]:

Code:
sudo tpc -CM
Let it run for 30s, then press Ctr+C. Do _not_ send complete output to a file, just paste it once it dumps the stats.

Code:
sudo tpc -l

Code:
sudo clockspeed
 
Just ramblin' -

When I got back in bigadv, I bought the ASUS mobo and a couple of 6128's.
FAIL.
So I got some 12-core ES chips. FAIL. xxx43's aren't stable while folding.
FAIL.
So I got a 4P 8356 machine.
FAIL.
Then I got 8384? chips for it.
FAIL.

I fold by FAIL method. First Attempt Is Lame.

If I would learn to follow instructions, my life would be much simpler, but not as challenging... :D
 
So just this part:
Code:
Ts:113151 - 
Node 0	c0:ps2 - c1:ps2 - c2:ps2 - c3:ps2 - c4:ps2 - c5:ps2 - c6:ps2 - c7:ps2 - Tctl: 27
Node 1	c0:ps2 - c1:ps2 - c2:ps2 - c3:ps2 - c4:ps2 - c5:ps2 - c6:ps2 - c7:ps2 - Tctl: 27
Node 2	c0:ps2 - c1:ps2 - c2:ps2 - c3:ps2 - c4:ps2 - c5:ps2 - c6:ps2 - c7:ps2 - Tctl: 27
Node 3	c0:ps2 - c1:ps2 - c2:ps2 - c3:ps2 - c4:ps2 - c5:ps2 - c6:ps2 - c7:ps2 - Tctl: 26
Node0
 C0:     0     0   582     0     0     0     0       C1:     0     0   582     0     0     0     0
 C2:     0     0   582     0     0     0     0       C3:     0     0   582     0     0     0     0
 C4:     0     0   582     0     0     0     0       C5:     0     0   582     0     0     0     0
 C6:     0     0   582     0     0     0     0       C7:     0     0   582     0     0     0     0
Node1
 C0:     0     0   582     0     0     0     0       C1:     0     0   582     0     0     0     0
 C2:     0     0   582     0     0     0     0       C3:     0     0   582     0     0     0     0
 C4:     0     0   582     0     0     0     0       C5:     0     0   582     0     0     0     0
 C6:     0     0   582     0     0     0     0       C7:     0     0   582     0     0     0     0
Node2
 C0:     0     0   582     0     0     0     0       C1:     0     0   582     0     0     0     0
 C2:     0     0   582     0     0     0     0       C3:     0     0   582     0     0     0     0
 C4:     0     0   582     0     0     0     0       C5:     0     0   582     0     0     0     0
 C6:     0     0   582     0     0     0     0       C7:     0     0   582     0     0     0     0
Node3
 C0:     0     0   582     0     0     0     0       C1:     0     0   582     0     0     0     0
 C2:     0     0   582     0     0     0     0       C3:     0     0   582     0     0     0     0
 C4:     0     0   582     0     0     0     0       C5:     0     0   582     0     0     0     0
 C6:     0     0   582     0     0     0     0       C7:     0     0   582     0     0     0     0
MinTctl:24	 MaxTctl:28

tpc -l:
http://pastebin.com/UsFcQC4G

sudo: clockspeed: command not found
and apt-get was unable to locate it.
 
Code:
root@speedstar:/home/spazturtle/ocng-utils-4.2# clockspeed
Clockspeed (OCNG4.2)
Family 15h
Turbo is supported. 2 boost state(s).
Running, please wait...
Refclock: 200.000 MHz
Clockspeed: 3000.000 MHz

So this is what you want?
 
Yup. Ok. everything seems fine... ugh.

I'd lean towards the PSU (as it lasts even shorter at 1.2000 V) but, to be thorough, I'd exercise Qinsp's method, too.

You'd need to enable PowerNow/Turbo/CPB in the BIOS, then:

Code:
sudo tpc -psmax 1
sudo tpc -set ps 1 freq 3000 vcore 1.1750

Then check with clockspeed whether it got insta-applied -- should see 3000 MHz and... try folding afterwards (w/Kraken, ofc).
 
Last edited:
Yup. Ok. everything seems fine... ugh.

I'd lean towards the PSU (as it lasts even shorter at 1.2000 V) but, to be thorough, I'd exercise Qinsp's method, too.

You'd need to enable PowerNow/Turbo/CPB in the BIOS, then:

Code:
sudo tpc -psmax 1
sudo tpc -set ps 1 freq 3000 vcore 1.1750

Then check with clockspeed whether it got insta-applied -- should see 3000 MHz and... try folding afterwards (w/Kraken, ofc).

Nope after doing that is shows ~ 2600 MHz
 
Yeah, I suspected that...

What if... what if you reboot the machine (with powernow enabled and all, just like before) and then
you run the two tpc commands in reverse.

That should yield better results.
 
Ok did them the other way around and clockspeed now says 3000, lets see how this folds.
 
I'm a retard. While I'm not at work where the boxen are, I think I had to:

sudo tpc -psmax 2
sudo tpc -psmax 1

the last time I dinked with it. BUT, I don't think I did the first couple of times. I'll know in the AM. I'll recreate what I did and check the BIOS.

AFAIK - Here's the rub: AMD configured and tested All Core Turbo. They made all the adjustments required to have all mobo configs work with those settings. There might be settings we are not even aware of. AND they would put in a safety factor to allow for HW variation.

So a 2.6ghz retail chip has a pre-configured SOLID overclock setup built into it, with more room left to play. ES's aren't necessary really. As the multiplier will go a LOT higher under turbo than the advertised clock rate.

Mine apparently has a multiplier to go 3400. Currently my CPU temps probably won't handle it.
 
Last edited:
Tear:

Should I get the heat under control, to use the top Turbo clock (2 core) do I go:

sudo tpc -psmax 0 ?

Seems that is iffy. A lot of code uses 0 as default, or no changes.

If not, how do I force top turbo?
 
Forcing ps 0 has been researched without success.
 
Hmmm...

Is there a Vector Table for the addresses of the pstate configs?

In olden days, we used to edit BIOS shadow addresses to redirect vectors for BIOS subroutines.

Cliff Notes: If the BIOS data is mirrored, you can change the pointer.
 
OR ... Is there a way to read all the settings of the pstate 0?

Problem there, is you might not be able to decode the info without info from AMD.
 
P-state data are in the CPU, not in the memory.

Start with reading AMD Family 15h BKDG (models 00-0Fh), AMD pub 42301.
 
Gotcha. Thanks!

But if we knew what the pstate 0 settings were, we could emulate it to pstate 1?
 
Determining P-state 0 settings isn't difficult (see MSR C001_0064).

The issue is that P-state 1 does not accept those settings (at least it wasn't accepting them last time I checked).
 
Still crashes, the crashes have also got more common, gonna try taking a multimeater to the 12v rail.
 
DOH!! Sorry.

What happens if you run just 1 stick of RAM per CPU? (try to take mem out of the equation).

Only partially related, the GG43 ES's I tried to run would fold ~10 frames and crash. The machine ran fine except when folding.

You might try to swap CPU1 and CPU2? CPU1 is the "brains" of the outfit.

Just trying to throw out possibilities, regardless of how dumb they are.:D
 
DOH!! Sorry.

What happens if you run just 1 stick of RAM per CPU? (try to take mem out of the equation).
He already memtested the memory. Issues occur at load, why would it be memory?

Only partially related, the GG43 ES's I tried to run would fold ~10 frames and crash. The machine ran fine except when folding.
But.. he's running B0 chips ?

You might try to swap CPU1 and CPU2? CPU1 is the "brains" of the outfit.
Sorry, but this I can't let go.

The brains? Please explain, I'd like to know more how CPU2 is not the "brains".
 
Wait. Is a MultiMeater something I saw on an infomercial that makes Hero sandwiches?
 
He already memtested the memory. Issues occur at load, why would it be memory?


But.. he's running B0 chips ?


Sorry, but this I can't let go.

The brains? Please explain, I'd like to know more how CPU2 is not the "brains".

Weak stick or socket? Does memtest heat the board 100%?

Yeah, that's why I said GG43. Not related.

Somewhere I read that in a multiCPU board, it boots BIOS from CPU1. Hence might affect what is written in BIOS?

Don't remember where I read it, or how long ago.

Dunno. When I had a "dead" 4P 8400, swapping around CPU's fixed it. But they were not ES.
 
Last edited:
Still crashes, the crashes have also got more common, gonna try taking a multimeater to the 12v rail.
Do you have IPMI on that board? Could use it to check CPU supply voltage, too.
We do have a Linux tool to check voltages on SM boards but ASUS locked things out :(

Also, I would consider testing CPUs individually (at your desired 3.0 and 1.175V).

That is, to test CPU1:
1. Set up your OC as usual
2. Make sure cores are wrapped w/Kraken
3. Run with -smp 16
4. Confirm that CPU1 temp has gone up

To test CPU2:
1. Set up your OC as usual
2. Unwrap cores (thekraken -u)
3. Wrap cores + set startcpu to 16 (thekraken -i -c startcpu=16)
4. Run with -smp 16
5. Confirm that CPU2 temp has gone up

To restore initial Kraken config, unwrap and wrap the cores back w/o any config options, that is:
1. thekraken -u
2. thekraken -i
 
The brains? Please explain, I'd like to know more how CPU2 is not the "brains".

I've heard people say that befor, somebody said that the configuration data that the bios reads is in cpu1 and that swaping the chips over can help in some cases.


EDIT:
Yeah I'd have to turn the BMC back on the use IPMI right? That shouldn't be a problem with Gi boards I think.

Also thanks for all the help so far, I was starting to get really down and your help has cheered me up, so thanks.
And some of the info you posted here is really useful so maybe putting it in a guide would help others.
 
Last edited:
Back
Top