Any real advantage to having a cpu that supports hyper-threading?

Metformin

Gawd
Joined
Feb 27, 2013
Messages
690
I have a choice between a x3430 (4 cores,4 threads) and a x3440 (4 cores,8 threads). Just wondering if the extra $10.00 is worth it for the x3440.
 
yes it does help a lot of stuff even single thread apps. That is why I always buy i7 with HT. There was a study done on it showing HT on and off and it helped anywhere from -3% to +15% IIRC. Was years ago that i saw this.

for additonal information a 4.9GHz 6600K vs 4.8GHz 6700K the 4.8GHz 6700K will be faster in nearly everything due to HT even with the 100Mhz less. Now in some select cases the 6600K may actually be a bit faster but those are select applications.
 
While it depends on the use case, I'd say for $10 it's an easy yes. But I'm a recent convert to the usefulness of HT. Arma III is the first game I've noticed that actually made use of logical cores to great effect. When I initially set the advanced parameters I put my cpu core count at 6, assuming they only counted physical cores. I realized my mistake and changed it to 12 the other day — my frame rate went from averaging around 34 to an average of 60 with all the other settings left alone. (currently running it at 4k, everything set to ultra or as high as it will go, overall view distance 4000, object distance 3000.)
 
Yeah, the extra logical cores are what make the Core i3 surprisingly smooth for a dual-core, and the Core i7 so dominant in heavily-threaded tasks.

For ten bucks it's a no-brainer. Intel usually charges a 50% markup for it on new processors.
 
If you do more than 4 threads with heavy CPU load. then yes. depending on code inefficiency it can give you a boostaround of 10-25% at 8 threads

If you do 4 or less heavy CPU threads, then HT will degrade your performance. with around 5-10%

a quick and dirty benhmark

Cinebench R15 4 threads ( I7 920 4C/8T)
With HT enabled: 305-307CB
With HT disabled:: 335-337


7zip benchmark 4 threads 32mb (I7 3770 4C/8T)
with HT enabled 177727KBs / 195486KBs
with HT disabled 18270KBs / 201053KBs


Whenever your CPU can accept more threads than it have in physical hardware you risk threads conflicts, when you don't utilize the CPU fully.
is the same with AMD CMT design since it only has 1 fpu per two cores ( and therby accepting 2 threads to 1 FPU)

if you do go for a CPU with HT and are gaming alot (typical utilize lower numbers of threads" you can use my Project Mercury to disabled HT for you main application and gain a slight CPU performance boost


Also the more inefficient your software is coded. the more gain you get with hyper threading and the less penalty. Which is the reason some highly optimized math libraries disable/circumvent HT ( Linpack among others)
 
Last edited:
It won't matter for some highly optimized codes that already make the full use of a single core (like ones that do scientific calculations). We usually disable it. But for desktops it certainly helps out a lot.
 
If you do more than 4 threads with heavy CPU load. then yes. depending on code inefficiency it can give you a boostaround of 10-25% at 8 threads

If you do 4 or less heavy CPU threads, then HT will degrade your performance. with around 5-10%

a quick and dirty benhmark

Cinebench R15 4 threads ( I7 920 4C/8T)
With HT enabled: 305-307CB
With HT disabled:: 335-337


7zip benchmark 4 threads 32mb (I7 3770 4C/8T)
with HT enabled 177727KBs / 195486KBs
with HT disabled 18270KBs / 201053KBs


Whenever your CPU can accept more threads than it have in physical hardware you risk threads conflicts, when you don't utilize the CPU fully.
is the same with AMD CMT design since it only has 1 fpu per two cores ( and therby accepting 2 threads to 1 FPU)

if you do go for a CPU with HT and are gaming alot (typical utilize lower numbers of threads" you can use my Project Mercury to disabled HT for you main application and gain a slight CPU performance boost


Also the more inefficient your software is coded. the more gain you get with hyper threading and the less penalty. Which is the reason some highly optimized math libraries disable/circumvent HT ( Linpack among others)

Have you any real proof to your wrong statements?.. those numbers are based on what?.. you are spreading lot of misinformation there without any real base.. with fabricated numbers, last time I did the same test last year I got completely different result to what you state.. so you are saying HT degrade performance? what a joke.. it's a funny fact that even with HT ON and running the benchmark with 4 Threads consistently have better result because the other 4 threads can do the additional job while only 4 threads are working to 100%..

4C HT OFF
4c4t.PNG


4C HT ON
4c4tHT.PNG


4C/8T
4c8t.PNG
 
Last edited:
see that cinebench score? you should build a dual cpu x5650 system. it would be the crap out of even the latest skylake on anything heavily threaded. even better, build a dual cpu e5-2670 v1. would more than beat the crap out of any non xeon cpu. x5650 cpu is around 50-60$ and e5-2670 is around 65$ nowaday..

x5650 is 6 core/12 threads. while e5-2670 is 8 core/16 threads each cpu :)
 
Have you any real proof to your wrong statements?.. those numbers are based on what?.. you are spreading lot of misinformation there without any real base.. with fabricated numbers, last time I did the same test last year I got completely different result to what you state.. so you are saying HT degrade performance? what a joke.. it's a funny fact that even with HT ON and running the benchmark with 4 Threads consistently have better result because the other 4 threads can do the additional job while only 4 threads are working to 100%..

*sigh* well what a well approach to a debate you are having here.
you saw the numbers and the CPU it was tested on. thread conflicts is very real that you don't know about it don't make it false. its also very obvious if you know what HT actually does
anyway since you dont want to take my word for it try this instead
http://www.agner.org/optimize/optimizing_assembly.pdf

"Hyperthreading is Intel's term for running multiple threads in the same processor core. Two
threads running in the same core will always compete for the same resources, such as
cache, instruction decoder and execution units. If any of the shared res
ources are limiting
factors for the performance then there is no advantage to using hyperthreading. On the
contrary, each thread may run at less than half speed because of cache evictions and other
resource conflicts."

You also reworded what i said. i common technique people use to try to "win" an augment rather than debating the topic.
Your line : "so you are saying HT degrade performance?" this imply a more ultimate of what i say. you imply that i said HT will always degrade performance which was very much NOT what i said

I said IF you are running 4 or less CPU heavy threads
which is always why your next argument is pretty much what i said
"because the other 4 threads can do the additional job while only 4 threads are working to 100%.."
I assume you mean logical cores or thread execution units. Threads are something in software only, don't mix them up.
But in this case you are stating you are not running 4 or less CPU heavy threads then. You are in fact running more. 4 for cine bench and some undefined extra work. So not the case i was talking about


So in short. You rewording what I said and/or missing to understand the subject and test methods, doesn't imply I'm wrong or spreading false information.
Now to figure out why we get different results in a CO-operative way: Did you actually set cinebench to only have 4 threads or did you run it with default settings ?
if you ran it with default settings that would explain your results.
 
Last edited:
Araxie.
Since it seems like you prefer Pictures over just text scores her is a freshly new run for you

The important thing in this test is this. Setting it for 4 threads to begin with to simulate a 4threade situation
CBsettings4_T.png



I7 930 4C HT enabled
CB4_T_4_CHT.png


I7 920 4C HT disabled
CB4_T_4_C.png


What i see her is a nice 12% boost from avoiding thread conflicts ( 2 thread going to the same physical core instead of getting a physical core each)


This has been tested on multiple machines. its "basic" knowledge for assembler/thread optimizing. And several people can confirm these results.
Beside the facts its bloody obvious getting a physical core for each thread instead of sharing one is better for raw performance.

This is true for any CPU (SMT or CMT) that can accepts more threads than physical cores, but does not get more threads than its physical cores. It simply due to the way Windows shares threads among logical cores.


let me try to explain it to you:
You have thread 1 to 4 each able to utilize a physical core 80%. the missing 20% is due to data fetch and cache/prediction misses

Thread1 (T1) one get assign to Logical core 1 (LC1) which points to physical core 1 (PC1)
T2 get assign to one of the 7 remaining LC's but so it has a 100/7=14.25% risk if being assigned to LC2 which points to PC1 as well

now since PC1 is 80% full of handling T1 there is only 20% left to give to T2 you get a total Core utilization of 100% over 2 threads
that's only 50% in average per thread.

So 14.25% of the chance you risk dropping form 80% of a CPU utilization to only 50% . so lets say 30% drops 14.25% of the times = 4.275% drop in CPU utilization. a direct hit in performance
In case this does no go into LC2 and thereby not into PC1, it goes to a full core and gets its nice full performance

But lo and behold here comes T3 it has 6 reminings logical cores. 2 of which will conflicts with T1 and T2 so 33% chance of dropped performance just like above
if this threads is lucky as well and get into to let say LC5 which goes to PC3

4th thread now has 5 logical cores to get assigned to. but 3 of these conflicts with thread 1,2 and 3
that a 60% chance of getting hit in performance as shown above.



you have to remember HT/Logical cores does not give any extra resources, it just make you able to better utilize what is unused by the other thread in the physical core
Luckily its pretty easy to handle and work around by doing proper affinity handling

sorry for all the typos but i was typing from work.





-- edit ---

i jstu realized my math in the above explenation is wrong. you are not dropping 30% 14.24% of the times at thread 2 you are dropping 2x 30% 14.25% of the time since both T2 and T2 drops in performance
or said other wise. the tottal core utilixation drops from 2x80% to 2x50% a drop in 60% of a core

so it shoudl be 60% * 14.25% = 8.55% loss on T2

of cause the numbers 80% and 20% are made up and depends on the threads. and some other liberties haven been taking in regards to this percentage calculation. But the underlying argumentation is valid
 
Last edited:
The guy above can't be fucking serious comparing actual cores with mix of real and logical cores (hello,OS scheduler!).

I mean, for fuck's sake, he just compared fake i5 with fake i3 and concluded that fake i3 is, surprise, slower, pretty much.
 
The guy above can't be fucking serious comparing actual cores with mix of real and logical cores (hello,OS scheduler!).

I mean, for fuck's sake, he just compared fake i5 with fake i3 and concluded that fake i3 is, surprise, slower, pretty much.

But he's right. HT can sometimes take a performance hit when it's under-utilized. This is why Battlefield 3 and 4 took a small performance hit with HT enabled on a Core i7 (Core i3 saw massive speedup), because the game has four major threads.

But it doesn't happen in every application. And most of the time you won't notice it.

Also, newer CPUs have had better-optimized HT scheduling to counter these negative effects, along with faster cache to mask any performance impact. So of course SvenBent is going to show us the worst-case performance with a first-generation Core with Hyperthreading.

It would be nice if someone could run the same benchmark with a 6700k to show what I'm talking about: there should be a lower performance hit. Anyone got five minutes and a 6700k?

In the meantime, there is this set of gaming benchmarks:

Gaming benchmarks: Core i7 6700K hyperthreading test

OF course, the rest of these benchmarks in this thread are bullshit because we're restricting threads on massive-multithreaded benches. And also, the impact is less than 10 percent in most cases. This is just to demonstrate what can happen.
 
Last edited:
The guy above can't be fucking serious comparing actual cores with mix of real and logical cores (hello,OS scheduler!).

I mean, for fuck's sake, he just compared fake i5 with fake i3 and concluded that fake i3 is, surprise, slower, pretty much.

Wait what? Me or Araxie ? im not sure i follow your statements



anyway for Araxie again here is a bit more "evidence"


The important setting in WPrime
WPsettings4_T.png



WPrime I7 930 HT enabled
WP4_T_4_CHT.png



WPrime I7 93- HT disabled
WP4_T_4_C.png




7zip 930 HT enabled
7z4_T_4_CHT.png



7zip 930 HT disabled
7z4_T_4_C.png


all getting a boost when it only using 4 threads and disabling HT so you only have the 4 physical cores they can go to.
now off cause had i used 8 threads enabling HT would be beneficial.
 

Attachments

  • upload_2016-8-26_10-21-7.png
    upload_2016-8-26_10-21-7.png
    43.5 KB · Views: 44
So of course SvenBent is going to show us the worst-case performance with a first-generation Core with Hyperthreading.
It would be nice if someone could run the same benchmark with a 6700k to show what 'm talking about. Anyone got five minutes and a 6700k?

Its not for worst case reason. I did show a 3770 in my first post as well. ( however only in text)
The reason is that the 930 is my work computer I'm sitting with at of this moment
The 3770 i my home computer.

Please don't directly or indirectly imply a personal bias accidental or not. Its already hard to explain the issues correctly for people that tend to put fingers in their ears and go LALALLAA IM RIGHT YOU ARE NOT...

I Wholeheartedly agree it very depending on situation and software ( which was my starting argument) and that test with a newer architecture would be interesting.
 
Its not for worstcase reason. I did show a 3770 in my first post as well. ( however only in text)
the reason is that the 930 is my work computer im sitting with at of this moment
The 3770 i my home computer.

Please don't directly or indirectly imply a personal bias accidental or not. Its already hard to explain the issues correctly for people that tend to put fingers in their ears and go LALALLAA IM RIGHT YOU ARE NOT...

I Wholehardheartedly agree it very depending on sitatuion and software ( which was my starting argument) and that test with an newer architecture would be interesting.

Yes, and if you are impacted that much THAT YOU NOTICE IT, it's just a software switch away form turning it off.

That's why I never discourage people form buying hyper-threading processors, if it's within their budget: it just buys you greater longevity, even if you turn it off years ago when you bought it. You can turn it on NOW, now that we have many more games with 8 threads.

Also, your 3770k is barely one generation of improvement over your 920. Haswell added more execution units to improve hyper-threading performance, and Skylake added faster cache AND one more execution unit. Get back to me with 6700k results if you want to be taken seriously in your analysis of HT impact in the modern world.
 
Last edited:
#20
I agree it easy to handle and you can even do it just with affinity if you software does not provide a switch for it.
My point was not to discourage to buy one ( note i never said don't buy it), but to inform the issues there can be, and you need to take care of if you want maximum performance. I believe that is valid information when the OP is asking for performance.
 
I wanna see these negative returns on a modern processor with HT that this guy is talking about.
 
Haswell added more execution units to improve hyper-threading performance, and Skylake added faster cache AND one more execution unit. Get back to me with 6700k results if you want to be taken seriously in your analysis of HT impact in the modern world.

Again in this topic its about 45nm Xeons with hyper threading i believe the closes I7 architecture to that is indeed the 900 series.
correct me if I'm wrong. I'm simply going but what looks more identical on the Intel ARK

I do however agree for overall aspect a wider/newer CPU range would better show the benefits/disadvantages, however that is out of my scope.


Hmm i OP cin wait and is interested i think i might have some xeon of that generation in my quad core setup. i just need to find a proper PSU for it.
 
Ive got a 6600k @ 4.5. Ill run it tonight. Someone here has a 6700k @ 4.5 as well. I'm not holding my breath to see the 6700k to come in slower because it's not gonna happen, not never.
 
Ive got a 6600k @ 4.5. Ill run it tonight. Someone here has a 6700k @ 4.5 as well. I'm not holding my breath to see the 6700k to come in slower because it's not gonna happen, not never.

Its not the point to compared those two CPUs, but to compared the I7 with and without Hyper-threading, when you are running with only 4 heavy CPU threads.
So benchmark from the I5 and in a different system is not needed it will just give you wrong results ( cache size. memory timmings etc etc will pollute the results).
 
Wait what? Me or Araxie ? im not sure i follow your statements



anyway for Araxie again here is a bit more "evidence"


The important setting in WPrime
WPsettings4_T.png



WPrime I7 930 HT enabled
WP4_T_4_CHT.png



WPrime I7 93- HT disabled
WP4_T_4_C.png




7zip 930 HT enabled
7z4_T_4_CHT.png



7zip 930 HT disabled
7z4_T_4_C.png


all getting a boost when it only using 4 threads and disabling HT so you only have the 4 physical cores they can go to.
now off cause had i used 8 threads enabling HT would be beneficial.
Yes, you, because you are comparing i5 with i3 and finding out that i3 is slower. That is the only thing your screenshots show.
And yes, i see your point, but you could at least show actual real world low-thread count workloads instead of making it look plain stupid with embarrassingly parallel synthetics.
 
Again in this topic its about 45nm Xeons with hyper threading i believe the closes I7 architecture to that is indeed the 900 series.
correct me if I'm wrong. I'm simply going but what looks more identical on the Intel ARK

True, forgot that :D
 
Yes, you, because you are comparing i5 with i3 and finding out that i3 is slower. That is the only thing your screenshots show.
And yes, i see your point, but you could at least show actual real world low-thread count workloads instead of making it look plain stupid with embarrassingly parallel synthetics.

I'm sorry i don't understand at all what you are trying to say with in i3 and i5. I'm running this on an old I7 only enabling and disabling HT. Where do you get this I3 and I5 from ?

If you have any suggestion to software that are better to use I'm all ears but 7-zip is very real world to me. And in many situations i cant utilize 8 threads with 7-zip due to memory requirements.
it takes around a bit more than 16GB of ram per 2 threads to use. So often 7-zip is running with 2 or 4 threads only. currently using affinity to avoid Threads conflicts from HT give me a boost that can shave of a day or 2 of the work.
I simply choose software I knew i could set the threads on and that was not bottle-necked by other system parts to show the effect.

I'm very open for debate on this but I have done tons of tests and benchmark on this over both Intel and AMD system (SMT and CMT) when i implanted the features to avoid this performance drop in my Project Mercury software.


This is an old benchmark from august 2015 on AMD CMT desing with handbrake 2 threads ( more than 2 threads reduced video quality microminimally)

afiinity all = 30-40fps
afiinity 0&1 = 26-30s fps
affinity 0&3 = 38-39fps

now in this case i used affinity to control and avoid the CMT fallbacks

and the FPS increased from 30-40 from to 38-39

Putting the 2 threads on the same CMT unit and thereby forcing the thread conflicts 100% of the time the performance dropped to 26-30fps
This is off cause very artificial but is to show the big difference between sharing a physical core and having one each.



Put again if you know about some software that use only 2-4 CPU heavy threads that yo find more applicable, just drop a link and i will look into it.
 
I'm sorry i don't understand at all what you are trying to say with in i3 and i5. I'm running this on an old I7 only enabling and disabling HT. Where do you get this I3 and I5 from ?

If you have any suggestion to software that are better to use I'm all ears but 7-zip is very real world to me. And in many situations i cant utilize 8 threads with 7-zip due to memory requirements.
it takes around a bit more than 16GB of ram per 2 threads to use. So often 7-zip is running with 2 or 4 threads only. currently using affinity to avoid Threads conflicts from HT give me a boost that can shave of a day or 2 of the work.
I simply choose software I knew i could set the threads on and that was not bottle-necked by other system parts to show the effect.

I'm very open for debate on this but I have done tons of tests and benchmark on this over both Intel and AMD system (SMT and CMT) when i implanted the features to avoid this performance drop in my Project Mercury software.


This is an old benchmark from august 2015 on AMD CMT desing with handbrake 2 threads ( more than 2 threads reduced video quality microminimally)

afiinity all = 30-40fps
afiinity 0&1 = 26-30s fps
affinity 0&3 = 38-39fps

now in this case i used affinity to control and avoid the CMT fallbacks

and the FPS increased from 30-40 from to 38-39

Putting the 2 threads on the same CMT unit and thereby forcing the thread conflicts 100% of the time the performance dropped to 26-30fps
This is off cause very artificial but is to show the big difference between sharing a physical core and having one each.



Put again if you know about some software that use only 2-4 CPU heavy threads that yo find more applicable, just drop a link and i will look into it.

Your old i7 doesn't count for a relevant discussion though. Most especially when you place a limit in the software. Handbrake, for instance, will use just about every core you can give it. I call BS on your decreases video quality, as well. That makes zero sense.
As far as I know, there isn't a single piece of modern software (where speed matters) that isn't multi-threaded. Your OS, web browsers, games, and productivity software are all multithreaded now.
 
Your old i7 doesn't count for a relevant discussion though. Most especially when you place a limit in the software. Handbrake, for instance, will use just about every core you can give it. I call BS on your decreases video quality, as well. That makes zero sense.
As far as I know, there isn't a single piece of modern software (where speed matters) that isn't multi-threaded. Your OS, web browsers, games, and productivity software are all multithreaded now.

1: relevean.c
My CPU has the same architectures as the OP's i don;t know why you don't think its relevant

2: limit in the software
yes indeed because we are talking about a specific situation that I'm trying to simulate. that should really not be a surprise to reduce it to 4 thread to simulate how 4 threads acts.
i think you are missing the points of the benchmarks.

3: video quality
Well you can call BS all you want. but it doesn't change the fact the the threading model in X.264 reduce the encoder ability to work with coherent data.
Dark Shikari which is one of the x.264 developed have talked a lot about the two multi threading model. sadly i cant find it right now but i will try too.
it generally it has two. Sliced ( no quality drop but poor performance scaling) and frame based ( better performance but less quality. hardly measurable))
the last one is the one that in typical use some version of x.264 used the sliced model.
the different is very small and measured in SIMM which arguable can be told might not reflect humans perception optimally.
Now you can call it BS all you want but its rally not and important aspect of whatever HT can hurt your performance or not.

4:
Multi threading Multi threading is not enough we are talking about heavy CPU use threads. its doesn't matter you have 100threads that does nothing really. but once you have a few that can take all your core can give it. that's when it happens.. you viewpoint on multi threading is simply to well simple. In high performance computing you want to look into resource usage. That's the core of the problem here.
with a low amount of threads that can eat all you can give it. Having HT enabled simply denies the threads access to some of the resources.
 
You jump around so much. Are we talking about HPC or standard desktop computing?

I would like to see an example of your claims about reduced video quality. Encode a 30 second video with the same quality settings. One in the way you say degrades video quality, and one in the way that you believe is the superior method.

Also, as you describe, that sounds like a downfall of the codec used, rather than a problem with using more threads. You describe an issue that points to poor programming.
 
You jump around so much. Are we talking about HPC or standard desktop computing?

I would like to see an example of your claims about reduced video quality. Encode a 30 second video with the same quality settings. One in the way you say degrades video quality, and one in the way that you believe is the superior method.

Also, as you describe, that sounds like a downfall of the codec used, rather than a problem with using more threads. You describe an issue that points to poor programming.

I'm jumping around a'lot because there are tons of people to respond to.

1: it doesn't matter the thread conflicts issue is the same.

2: I'm not sure it visible to your eyes and it is beside the point really.

3: its a downfall to have to handle data. You cant just wave a magical multithread wand over all software. E.G. Dictionary compression is highly serialized and very hard to multi thread without just doubling the work load.
&zip LZMA does it by having one thread for the modeling and one for the encoding. sadly the encoding is much faster than the model so you dont get a perfect 100% speedup from 1 to 2 threads.
Anything above 2 threads. 7-zip simply splits the data up in chunks and treats them in parallel. which double the amount of memory needed for the dictionary as well as reduced compression ratio slightly, since the entropy from one chunk is not useful for the entropy on the other. kinda the same way if you did two files with and without solid mode..
Its again really nothing new about multithraeding and/or compression.
Simply said when you current calculations are based on the results from the previous you cant simply multi thread it. you can then try to split up the data but looses the data coherency in the process.
Other compression method exists that are highly multi thread able, because the run multiple models and select the optimum one. but its a different kind of compression with differ purpose,benefits and tradeoffs.
 
I would like to see an example of your claims about reduced video quality. Encode a 30 second video with the same quality settings. One in the way you say degrades video quality, and one in the way that you believe is the superior method.

i found the original test her

https://birds-are-nice.me/publications/extremex264_5.shtml

graph_FiM_threads.png


As i said from the beginning its micro minimally difference. and hardly worth it for most normal users.
but how i handle it is that i have multiple handbrakes running with 2 threads each. Thereby still getting the multithreaded boost but without taking the penalty.
however they don't always sync up perfectly and assign affinity to avoid CMT issues will speed up the slow encoding slightly
aka on a 4cores/2FPU CMT desindng. handbrake1 will run on LC1 and 3. while handbrake2 runs on LC 2 and4

So when one stops the remaining one never shares physical ressources between its threads.

You can scale it accordingly to number of encoding/cores etc as you please
 
Update with a newer CPU

7-zip 6 threads 5820k (C6T12)
Normal :
552% 4543 / 25072 MIPS

HT "disabled" with affinity
559% 5035 / 28126 MIPS

11-12% boost in performance by avoiding Hyper threading when utilizing low numbers of threads.

Still waiting for a skylake to roll in.
 
Keep in mind that threaded performance of many apps is also affected by your underlying storage (Depending on workload) 3D compositing makes heavy use of threads, but you can quickly saturate a spinning drive or even a single SSD with metadata generation with that workload.

Using multiple SSDs is necessary at that level to support full performance for every thread.
 
Keep in mind that threaded performance of many apps is also affected by your underlying storage (Depending on workload) 3D compositing makes heavy use of threads, but you can quickly saturate a spinning drive or even a single SSD with metadata generation with that workload.

Using multiple SSDs is necessary at that level to support full performance for every thread.

I agree.
It could be interesting to make an I/O limiting case to see how it affects performance under Hyper threading. i would guesstimate that it would reduce the negative effect from hyper threading. or pti in other word you get I/O limited instead of CPU ressource limited. but im really kinda guessing here
 
And then if you really want to get abstract and crazy, 2 CPU systems are faster than 4-CPU systems (depending on application).

Eg: an enterprise database will run faster on a 2 CPU system with flash storage than it will on a 4 CPU system with the same flash storage.

The NUMA overhead with the 4-socket system is quite the bottleneck and requires a ton of process/thread pinning to make it efficient.
 
But he's right. HT can sometimes take a performance hit when it's under-utilized. This is why Battlefield 3 and 4 took a small performance hit with HT enabled on a Core i7 (Core i3 saw massive speedup), because the game has four major threads.

But it doesn't happen in every application. And most of the time you won't notice it.

Also, newer CPUs have had better-optimized HT scheduling to counter these negative effects, along with faster cache to mask any performance impact. So of course SvenBent is going to show us the worst-case performance with a first-generation Core with Hyperthreading.

It would be nice if someone could run the same benchmark with a 6700k to show what I'm talking about: there should be a lower performance hit. Anyone got five minutes and a 6700k?

In the meantime, there is this set of gaming benchmarks:

Gaming benchmarks: Core i7 6700K hyperthreading test

OF course, the rest of these benchmarks in this thread are bullshit because we're restricting threads on massive-multithreaded benches. And also, the impact is less than 10 percent in most cases. This is just to demonstrate what can happen.
thats what i recalled but wasn't sure. I thought HT has some major improvements with removing that penalty the 920 had where HT really doesn't have a pentalty in those work cases anymore.
 
And then if you really want to get abstract and crazy, 2 CPU systems are faster than 4-CPU systems (depending on application).

Eg: an enterprise database will run faster on a 2 CPU system with flash storage than it will on a 4 CPU system with the same flash storage.

The NUMA overhead with the 4-socket system is quite the bottleneck and requires a ton of process/thread pinning to make it efficient.

Yes that an issues we had with some of the multi CPU system we where running. We had really good scalabilty with adding more cores to the system. but going to multi CPU system. we saw "only" a bit above 80% of the scalabiltiy from doubling the CPU vs doubling the cores.



Anwyay people keept hihnking this is not an issues for skylake
So jere are the skylake test with cinebench ( 4 threads only)

6700K HT enabled
4_threads_HT_enabled.png



6700K HT disabled
4_threads_HT_disabled.png



8-9% boost in performance form diabling HT on a low threaded software
Which is better than the previous generations.



So to recap my starting statement:
if you can utilize 4+ more heavy CPU threads HT will help
if you don't. HT can hurt performance and you might want to look into disabling it or using affinity to avoid thread collisions.

if its mainly gaming you are doing. you will not notice it much and can use my Project Mercury to automatically "disable" hyper threading on your main application.will stillleaving HT running for background threads.
 
Back
Top