how much difference does hyperthreading make?

jebo_4jc · Jan 7, 2013

On sandy bridge/ivy bridge, what kind of ppd difference does HT make? Anybody done any clock for clock comparisons recently?

Microcenter only has mobo combo deals with the 3570k right now, not the 3770k, so the 3570 end up being drastically cheaper.

Toconator · Jan 7, 2013

Hey jebo, I fold with both those CPU's. Although they are 300 MHz apart right now ( I had the 3570K @ 4.5 GHz but it would random crash when my multi-tasking teen was playing League of Legends and Evony on it while Skyping and facebook open ) the slight drop in MHz didn't affect the ppd drastically. The 3770K also has GPU folding on it 24/7 whereas the 3570K is part time GPU folder. some recent results (3570K / 3770K):

P6945 : 20.4K / 32.4K ppd
P7200 : 18.2K / 26.3K ppd
P6985 : 21.5K / 33.4K ppd
P6980 : 20.7K / 35.3K ppd
P6997 : 21.6K / 34.3K ppd

slightly faster RAM on the 3770K rig too but I'm not sure if it has the same TPF effect as on the bigadv rigs. gaming virtually identical between the two, my older hex-core actually took less of a hit while gaming/folding at once (couldn't afford a 3930K rig tho).

HoosierDad · Jan 8, 2013

I have read that increases Folding performance by 20-30%.

extide · Jan 8, 2013

Hyperthreading helps quite a bit on folding@home, enough for it to be noticeable. I would swing for the i7 if possible. The i5's do fold well but yeah there is definitely a pretty good boost with the HT on i7's.

Skripka · Jan 8, 2013

extide said:
Hyperthreading helps quite a bit on folding@home, enough for it to be noticeable. I would swing for the i7 if possible. The i5's do fold well but yeah there is definitely a pretty good boost with the HT on i7's.

And SMP performance on the same hardware only declines over time so if you like those numbers get the HT part now...those numbers will only go south as time goes on. My 1055T x6 is already getting 25minute tpf smp units that output jack shit for PPD. Hell for the money buy GPUs, when QRB GPU units come they'll render smp utterly pointless in the points race...at least far more irrelevant than smp is now.

Toconator · Jan 9, 2013

Skripka said:
And SMP performance on the same hardware only declines over time so if you like those numbers get the HT part now...those numbers will only go south as time goes on. My 1055T x6 is already getting 25minute tpf smp units that output jack shit for PPD. Hell for the money buy GPUs, when QRB GPU units come they'll render smp utterly pointless in the points race...at least far more irrelevant than smp is now.

Yep, SMP does decline over time. My 1090T used to get 20K ppd, now it gets slightly less than the 3570K on the same project. As far as irrelevance, SMP doesn't impact other tasks as much as GPU folding. If it's not a dedicated folding box, SMP is still worth running, GPU QRB or not.

Kendrak · Jan 9, 2013

HoosierDad said:
I have read that increases Folding performance by 20-30%.

That is a fairly good measure.

musky · Jan 9, 2013

I always thought it was closer to 50% - a HT logical core is roughly half a physical core, so enabling HT dropped frame times by roughly 33%. I don't have any numbers to back that up, though.

Kendrak · Jan 9, 2013

musky said:
I always thought it was closer to 50% - a HT logical core is roughly half a physical core, so enabling HT dropped frame times by roughly 33%. I don't have any numbers to back that up, though.

I do remember seeing some data with those marks at lower clocks.

I do think however (I could be horribly wrong) that as the memory bus get saturated HT made less and less impact, thus we have crazy OCed 2500ks matching merely highly clocked 2600ks in TPF like 5ghx i5 vs a 4.5ghz i7

I would love to see some real data if anyone has it......

extide · Jan 9, 2013

Kendrak said:
I do remember seeing some data with those marks at lower clocks.

I do think however (I could be horribly wrong) that as the memory bus get saturated HT made less and less impact, thus we have crazy OCed 2500ks matching merely highly clocked 2600ks in TPF like 5ghx i5 vs a 4.5ghz i7

I would love to see some real data if anyone has it......

It's really more the L1/L2 and partly L3 cache contention that limits you. Remember there is only 256K L2 on modern intel cpu's per core, so split that in half.. I am not sure about the execution side but on the front end, the really fast decoders and micro-op cache in later gen procs helps quite a bit in this instance as well.

jebo_4jc · Jan 9, 2013

Well a very helpful microcenter rep ended up giving me the bundle price on the I7 so I got the 3770k plus a good Asus mobo for $360

Kendrak · Jan 9, 2013

jebo_4jc said:
Well a very helpful microcenter rep ended up giving me the bundle price on the I7 so I got the 3770k plus a good Asus mobo for $360

I think you owed him a hug.

Skripka · Jan 9, 2013

Toconator said:
Yep, SMP does decline over time. My 1090T used to get 20K ppd, now it gets slightly less than the 3570K on the same project. As far as irrelevance, SMP doesn't impact other tasks as much as GPU folding. If it's not a dedicated folding box, SMP is still worth running, GPU QRB or not.

My point was that when GPU QRB comes back, the few PPD the SMP will get is barely worth the electricity compared to the GPUs....Assuming the GPU QRB returns as what the beta units were, which considering this is Stanford we're talking about is a very reasonable assertion.

cyclone3d · Jan 9, 2013

musky said:
I always thought it was closer to 50% - a HT logical core is roughly half a physical core, so enabling HT dropped frame times by roughly 33%. I don't have any numbers to back that up, though.

No way you will get anywhere close to 33% increase with HT. An HT core is nothing like 50% of a real core.

Max possible on a program written specifically with multithreading in mind.. with absolutely no locks in the code, will give you max 20-25% speed increase. This is with the ability to adjust the data sets so they will fit in the cache.

Have too big of data sets, and the increase will drop quite a bit.

If there are locks in the code, which is most likely required for folding, you won't even see that high of an increase.

jebo_4jc · Jan 9, 2013

cyclone3d said:
No way you will get anywhere close to 33% increase with HT. An HT core is nothing like 50% of a real core.

Max possible on a program written specifically with multithreading in mind.. with absolutely no locks in the code, will give you max 20-25% speed increase. This is with the ability to adjust the data sets so they will fit in the cache.

Have too big of data sets, and the increase will drop quite a bit.

If there are locks in the code, which is most likely required for folding, you won't even see that high of an increase.

If ppd increased linearly with tpf decreases, you might be right. However ppd increases much more quickly, hence the claims of 30-50% more ppd

musky · Jan 9, 2013

cyclone3d said:
No way you will get anywhere close to 33% increase with HT. An HT core is nothing like 50% of a real core.

Max possible on a program written specifically with multithreading in mind.. with absolutely no locks in the code, will give you max 20-25% speed increase. This is with the ability to adjust the data sets so they will fit in the cache.

Have too big of data sets, and the increase will drop quite a bit.

If there are locks in the code, which is most likely required for folding, you won't even see that high of an increase.

Prove it, with the F@H client and an HT-capable processor. I would test it if I owned one.

Haitch · Jan 9, 2013

musky said:
Prove it, with the F@H client and an HT-capable processor. I would test it if I owned one.

Testing it on an i7-980 now. Results in a couple of hours.

H.

musky · Jan 9, 2013

Haitch said:
Testing it on an i7-980 now. Results in a couple of hours.

H.

Just need frame time with HT and without HT, same unit. I am curious now....

Haitch · Jan 9, 2013

Ran test on an i7-980 @ 3.46GHz
Memory: 18GB @ 1600MHz

OS: Ubuntu (12.04?)
Kernel: 3.2.0.29-generic X64

Using The-Kraken (0.7-pre15)

WU - 7215 (R=0,C=40,G=220)

Measured Steps 1 - 6, DLB engaged before Step 1

No HT - 6 Cores, 6 Threads
TPF: 3:21
PPD: 26,446
Watts: 216
PPD/Watt: 122

HT - 6 Cores, 12 Threads
TPF: 2:55
PPD: 32,554
Watts: 241
PPD/Watt: 135

TPF Improvement: 13%
PPD Improvement: 23%
Watts Increase: 11.6%
PPD/Watt Improvement: 10.6%

Not as much as I anticipated.

musky · Jan 9, 2013

Well, color me wrong....thanks for the data Haitch!

tjmagneto · Jan 10, 2013

musky said:
Well, color me wrong....thanks for the data Haitch!

I've checked by big box of crayons and still haven't found the color "wrong" yet.

Quisarious · Jan 10, 2013

Maybe not so wrong.

As pointed out above, cache performance is a big factor in how much HT helps (or hurts). One of the most significant changes in SB was increased L1 bandwidth.

So, looking at SB+ generations, the picture may be different.

Here's one data point I just ran on a 3770k@4.3GHz using latest OSX client...

Project: 7515 (Run 0, Clone 166, Gen 81)

Same WU, average of 10 steps each, though there was only 2 seconds variation in each case.

HT on:
3:08 TPF, PPD=33679

HT off:
3:58 TPF, PPD=23644

PPD improvement=42%

I don't have time atm to test on a 2P SB-E system, but I suspect the results would be similar (I'll try it tomorrow).

EDIT: forgot watt measures... Load power-idle power with HT on was 95w, with HT off was 79w.

EDIT: Second edit, forgot to subtract monitor power from load power...doh

W.Feather · Jan 10, 2013

HT made a larger difference when BigAdv was allowed on 8 threaded boxen

As such even though its true performance was not 50% per core extra, it could do projects worth more points in the end, so its perceived performance is better.

lundrog · Jan 10, 2013

So whats better, a 3770k at 4.4 - 4.5 or a 3570k at 4.7- 4.8?

I'm running my 3570k @ 4.8 right now, folding hits about 70C with my Antec 620 water kit.

I am waiting for better push / pull fans and case fans, to see if I can drop the temp a bit to go for 4.9...

I think I could do 5.0 with a full rad, it seems semi stable at 4.5 volt but at 80c... ha! to hot to know if it would be stable with better cooling, and to high a voltage to run, but that level 1 vs level 2 i'm running now.

how much difference does hyperthreading make?

[H]ard|DCer of the Month - April 2011

[H]ard DCOTM January 2026

Limp Gawd

2[H]4U

[H]F Junkie

[H]ard DCOTM January 2026

[H]ard|DCer of the Year 2009

[H]ard|DCer of the Year 2012

[H]ard|DCer of the Year 2009

2[H]4U

[H]ard|DCer of the Month - April 2011

[H]ard|DCer of the Year 2009

[H]F Junkie

Fully [H]

[H]ard|DCer of the Month - April 2011

[H]ard|DCer of the Year 2012

Limp Gawd

[H]ard|DCer of the Year 2012

Limp Gawd

[H]ard|DCer of the Year 2012

[H]ard DCOTM x2

Limp Gawd

[H]ard DCOTM x4 & [H]DCOTY x1

[H]ard|Gawd