AMD possibly going to 4 threads per core

GoodBoy · Oct 2, 2019

SvenBent said:
if i use a screwdriver as a hammer its not a bad hammer. im just an idiot

Liked just for this =)

GoodBoy · Oct 2, 2019

SvenBent said:
Its clear that windows 10 1803 does NOT take SMT into account ....

It does, just not optimally for that particular CPU design.

SvenBent said:
In short: windows 10 1803 CPU scheduler does not seem to be SMT aware or SMT optimal

Windows has been SMT aware since Windows XP. Introduction of chiplets needed to be accounted for in the scheduler logic, and it is once you are on 1903 for the Ryzen CPU's.

Rockenrooster · Oct 2, 2019

GoodBoy said:
It does, just not optimally for that particular CPU design.

Windows has been SMT aware since Windows XP. Introduction of chiplets needed to be accounted for in the scheduler logic, and it is once you are on 1903 for the Ryzen CPU's.

I think he was testing an Intel CPU on his previous post... so.....

BrotherMichigan · Oct 2, 2019

SvenBent said:
I was talking about SMT in general it just happens that the only CPU i have running in windows 10 to test the 1903 version was an intel version.
I just wnated to emphasied before the fanboys goes crazy that the SMT issue is both for Intel and AMD

anyway i did a quick test in windws 10 1803 on my laptops and it clearly show the SMT penalty

> Cinebench r15 <

4 threads / normal
357
360
363
371
366
= 363.4

4 thread / Affinity 0 2 4 6
383
383
379
383
381
= 381.8

Its clear that windows 10 1803 does NOT take SMT into account as there is a

363.4 to 381 is a slight above 5 boost from handling SMT correctly manually

side note the 2 non 383 results in cinebench was due to me not having taskmanager ready to adress affinity so it had to load taksmanager and set affinity first. which naturlala reduce benchmarkresults
so avoiding SMT thread conflicts also give way more stable results with cinebench

on your first point
If you are using taskmanager or similar to see anything about thread load distributions you are doing it wrong to begin with. This is not meant as an insult but to clarify on a very often seen misunderstanding on the CPU load scheduling topic.
this was even done by Kyle testing thread scalabilty in some games so don't feel bad about that.
This is simply just not how you see thread distribution/load as what you see is an average over time. so even though you see lets say se 8% load across all cores in taskmanger
it does not mean that more than 1 core was in use at a given time

You need to know the difference between "instant" sand "over time" situations

Some ppl might say 5% is not abig deal. but seeing how peopel went crazy about 100mhz turbo boost in ryzen. its kinda fun to se how they are losing 5% performance fomr not beeing smt aware and up to 20% for not being CCX aware

In short: windows 10 1803 CPU scheduler does not seem to be SMT aware or SMT optimal
(and you need to be using project mercury for getting max performance)

Will test 1903 as soon as I can

Ah, so the penalty you're seeing is academic and simply the result of the scheduler and not anything inherent to SMT, then. I was thinking you were saying that enabling SMT would result in a decrease in performance in 7-zip, which I've found to be untrue at least for more recent builds of 7-zip and modern architectures.

As for using task manager to visualize workload distribution, it actually works just fine for this simple task and even averaging workload over a few seconds doesn't obfuscate anything. If I start a heavy, single-threaded workload on my 1700X, windows seems to be smart enough to pin it to a single core and task manager shows a single core loaded to 100% (or 7-8% overall utilization due to the single thread and background processes.) As soon as I toss another thread into the mix, I observe thread-hopping and no single logical core sees more than about 20% usage at any given time (due to averaging over the refresh time.) No, I'm not seeing combinations of two logical CPUs bouncing to 100% as the workload moves around, but I also don't need to in order to validate (or invalidate, as the case may be) the behavior I was talking about.

SvenBent · Oct 2, 2019

GoodBoy said:
It does, just not optimally for that particular CPU design.

Windows has been SMT aware since Windows XP. Introduction of chiplets needed to be accounted for in the scheduler logic, and it is once you are on 1903 for the Ryzen CPU's.

Can you provide any testing that show there is not penalty from SMT when in low multithreade situation ?

I believe you are confusing working with SMT due to being SMP capable and actual SMT aware
SMT aware = dont do stupid distributions of threads to logical cores on the same physical core when a another physical core is available
SMP aware = I can distribute load to multiple (logical) cores

I believe you are talking about the later. I'm talking about the former

---edit---
and btw windows have been SMP aware way before windows XP
I ran dual cpu on windows 2000

BrotherMichigan · Oct 2, 2019

SvenBent: I'm dumb and you're right; this doesn't actually address whether or not the scheduler is being smart enough to not toss a second thread onto a single core. The problem is that forcing processor affinity isn't really a valid test either since in that case you're also preventing thread hopping between physical cores. A more proper test would be to turn off SMT at the BIOS level and then perform the test again.

SvenBent · Oct 2, 2019

BrotherMichigan said:
The penalty you're seeing is a result of the scheduler and not anything inherent to SMT, then. I was thinking you were saying that enabling SMT would result in a decrease in performance in 7-zip, which I've found to be untrue at least for more recent builds of 7-zip and modern architectures.

As for using task manager to visualize workload distribution, it actually works just fine for this simple task and even averaging workload over a few seconds doesn't obfuscate anything. If I start a heavy, single-threaded workload on my 1700X, windows seems to be smart enough to pin it to a single core and task manager shows a single core loaded to 100% (or 7-8% overall utilization due to the single thread and background processes.) As soon as I toss another thread into the mix, I observe thread-hopping and no single logical core sees more than about 20% usage at any given time (due to averaging over the refresh time.) No, I'm not seeing combinations of two logical CPUs bouncing to 100% as the workload moves around, but I also don't need to in order to validate (or invalidate, as the case may be) the behavior I was talking about.

The issue is the scheduler is not SMT aware and this issue test here is ONLY present on SMT issues
It IS a native issue from SMT due to how SMT works.

and no task manager does not work for this. and if you dont get it it will take me way to long time to explain this over and again.
search the forum on the multiple previous explanation on how thread scheduleing works and why you percpetion on this method on this issue is flawed

i wil ltry to do i short

a single thread software will show load distributions on all cores in taskmanager.
By you method seeing that multiple cores is being loaed with some load can lead to the incorrect conlussion the software is multithread
that is due t the misconception that a thread can only run on one care at at time (aka instant) but you are measing over time.
aka a threade can jump through all the cores in the given measurments time and have load on all cores. without ever beeing on more than one core at time

using taksmanger you can not see the diffrennce between 1 thread running at 100% of a core vs 4 threads running 25%
as the will all average out there 100% core load on all the cores in your measuremt intervall

saa yes by design using taskmanger to evaluation thread load distributions is flawed on midt to low levels of load.

on the SMT issue
The perfomarnce issue comes form 2 thread getting assigned to 2 logical cores that are on the same physical cores and threby have to share ressources, rather than getting assinged to 2 logicla cores that leads to 2 seperae physsical cores
on the later the threads are not sharing the physical core and can run at full speed
This is basic knowledge on how SMT works by desing

The issue is that Windows scheduler treats all logical cores as even even if there is load on its paired logical cores.
so when you manually tell window scheduler to do it right by affinity you get the increase of not having to share physsical cores

So the issues IS rooted in how SMT works but come to effect because windows scheduler is not aware of how SMT actually works

The same issue is present with the Rryzen CCXdesign

When 2 threads go to 2 seperate CCX units, and has to share data, it make inter thread communicaton to be slowed down to infini fabric speed rather than l3 cache speed
If windows was CCX design aware in ti scheduler it would try to utilize cores on the same ccz for threads comming from the same process to avoid this communicatt speed drop down
This is why we see 20% boost in FPS in games when using project mercury to handle the threads correctly according to CCX design.
this is also the fix i believe is in windows 10 1903

The penalty from CCX is way worse than SMT why microsoft probably fixed (trying to fix) this first

This should really not be new information as this is years old in the testing
Tge CCX thing is the reason why AMD has the gamming mode in their software to disable one CCX

SvenBent · Oct 2, 2019

BrotherMichigan said:
SvenBent: I'm dumb and you're right; this doesn't actually address whether or not the scheduler is being smart enough to not toss a second thread onto a single core. The problem is that forcing processor affinity isn't really a valid test either since in that case you're also preventing thread hopping between physical cores. A more proper test would be to turn off SMT at the BIOS level and then perform the test again.

Affinity doe not lock a thread to a given core.
it locks the process to a number of cores. so te 4 thread are still jumping back and forth on the 4 cores the process is assigned to

You can follow this thread load/distribution with process monitor from microsoft i i can highly recommend it as its the only software im aware of that actually show thread level information

This does however not invalidate you argument of a real test would be turning SMT off
You are right that it would be a more proper testing.
However my laptop does not have a fancy enough bios for ths

BUT
i can inform you that the years of testing i have done on this for project mercury
The difference from SMT off vs affinity comes out to be within error margens

Ill see if some of my earlier testing are still available on this

Iet me point out that I'm not saying SMT is not a good thing. its great for heavy threaded situations running cinbench with full nubmer of threads give a clear benefit to SMT CPU's
My point is just that SMT has a draweback in some sitatuation and
a: Windows does not handle it well
b: Project mercury absolut rocks on this ( self ad here)
c: With SMT4 both performance gains AND penalty will arise so SMT4 might be bad for gamers)

This is not a black/white situatuon it just happens that most benchmarks site handles it like it either single threaed or fully threaded when we look at SMT. and there are ton of software that are in between especially modern game.

--- edit ---
I really should upload my testing to my website instead of dumping it in random forum lol

BrotherMichigan · Oct 2, 2019

SvenBent said:
Affinity doe not lock a thread to a given core.
it locks the process to a number of cores. so te 4 thread are still jumping back and forth on the 4 cores the process is assigned to

You can follow this thread load/distribution with process monitor from microsoft i i can highly recommend it as its the only software im aware of that actually show thread level information

This does however not invalidate you argument of a real test would be turning SMT off
You are right that it would be a more proper testing.
However my laptop does not have a fancy enough bios for ths

BUT
i can inform you that the years of testing i have done on this for project mercury
The difference from SMT off vs affinity comes out to be within error margens

Ill see if some of my earlier testing are still available on this

Iet me point out that I'm not saying SMT is not a good thing. its great for heavy threaded situations running cinbench with full nubmer of threads give a clear benefit to SMT CPU's
My point is just that SMT has a draweback in some sitatuation and
a: Windows does not handle it well
b: Project mercury absolut rocks on this ( self ad here)
c: With SMT4 both performance gains AND penalty will arise so SMT4 might be bad for gamers)

This is not a black/white situatuon it just happens that most benchmarks site handles it like it either single threaed or fully threaded when we look at SMT. and there are ton of software that are in between especially modern game.

--- edit ---
I really should upload my testing to my website instead of dumping it in random forum lol

Ah, right, because you're setting the affinity for the whole application to a number of CPU cores. Duh.

If I have time tonight, I can do some SMT on/off testing on my 1700X limited to eight cores. Might be enlightening. (We are VERY off topic at this point!)

SvenBent · Oct 2, 2019

BrotherMichigan said:
Ah, right, because you're setting the affinity for the whole application to a number of CPU cores. Duh.

If I have time tonight, I can do some SMT on/off testing on my 1700X limited to eight cores. Might be enlightening. (We are VERY off topic at this point!)

I do have 1700x my self that what i did the CCX testing on
But I'm running windows 7

if you want to test the penalty from CCX try testing with 4 thread. and then assing the 4 threads to cpu 1 3 5 7
that would
- Remove SMT
- Be on only one CCX

But if you are using cinebecnh like i did. please not it adjust its own affinty to all cores whenever you hit run.
You have to reset the affinity after pressing run. That confused the hades out of me for the longest time

Just to recap my point:
SMT4 can be a even bigger SMT penalty for games than SMT currently it and/
and people need to take in account what benchmark they read and how they load cores before applying any gains to their usage scenario
15% boost in cinebench could be a 5 % loss in games because thread/core load is etremely different

BrotherMichigan · Oct 2, 2019

Luckily results with SMT on/off even with multi-CCD SKUs like the 3900X see differences that are more or less margin of error in gaming benchmarks: https://www.techpowerup.com/review/amd-ryzen-9-3900x-smt-off-vs-intel-9900k/

DukenukemX · Oct 2, 2019

IdiotInCharge said:
It's been a while since it mattered, but 'pipeline stages' are what I think you're getting at. Pipelines might be... number of threads? In any case, Netburst had an absurd number of pipeline stages, as it was designed for high-bandwidth media processing, something for which it actually was very good at. Where it failed was when the branch predictor missed, and the whole pipeline had to be restarted. This negated Netbursts clockspeed advantage for the most important work a CPU does: chewing through branching code. Thus while Intel wasn't uncompetitive, and they certainly had a better platform around their CPUs at first, once AMD and their partners stabilized the platform and then moved the memory controller onboard, it was game over for Netburst. Except for Bluray players. The first ones ran Pentium IVs, because they were good at that!

With respect to 'more pipelines', if you mean more execution units per core, then that can increase IPC when properly balanced. However, that balance is important, because increasing execution units only works if they can be fed by the rest of the CPU and the system. Typically, more cache is needed, better resource allocators are needed, and even more threads given to the OS as in SMT4.

And that brings up a central question: is AMD going to outfit their SMT4-enabled cores for a variety of workloads? If they are, could that be useful to the consumer / outside of the enterprise? If not, are they going to keep the 'leaner' cores they have now and continue to improve them for the consumer?

Yea, I mean pipeline stages. That's why I think Ryzen is very Netburst like, because it can make really good use of memory bandwidth. AMD found a way to make it work, and not worry about clock speed as much. I doubt AMD would make each SMT4 core specialized. AMD might have found a way to lengthen and shorten pipelines dynamically, so it makes it easy to divide it up to 4 threads.

SvenBent · Oct 2, 2019

BrotherMichigan said:
Ah, so the penalty you're seeing is academic

not sure what you mean by academic here.
The resulting penalty is present under most games depeding on have many cpu heavy thread in the software vs how many physicl lcores in the system
im just using a easier program to show the penalty

to explain further

lets says you software have 4 cpu heavy threads
- you have a 2 physsical core CPU. SMT will help as you get the benifits from SMT and non of the draw back ( all physsicl core are utilizes
- you have a 4 p[hyssical core CPU. SMT will hurt as without SMT you will be able to utilize all physsical cores. but with SMT you run in the above thread conflicts where 2 thread goes to the same physical cores instead of 2 different ones.

So its very much a hit and miss issue and need the correct understanding to see it

BrotherMichigan · Oct 2, 2019

SvenBent said:
not sure what you mean by academic here.
The resulting penalty is present under most games depeding on have many cpu heavy thread in the software vs how many physicl lcores in the system
im just using a easier program to show the penalty

to explain further

lets says you software have 4 cpu heavy threads
- you have a 2 physsical core CPU. SMT will help as you get the benifits from SMT and non of the draw back ( all physsicl core are utilizes
- you have a 4 p[hyssical core CPU. SMT will hurt as without SMT you will be able to utilize all physsical cores. but with SMT you run in the above thread conflicts where 2 thread goes to the same physical cores instead of 2 different ones.

So its very much a hit and miss issue and need the correct understanding to see it

I was under the impression that you would not be subject to thread hopping by setting the CPU affinity. You already addressed that above.

SvenBent · Oct 2, 2019

BrotherMichigan said:
Luckily results with SMT on/off even with multi-CCD SKUs like the 3900X see differences that are more or less margin of error in gaming benchmarks: https://www.techpowerup.com/review/amd-ryzen-9-3900x-smt-off-vs-intel-9900k/

The issue with these kind of testing that is not focusing on the issue is that you run into factor that masqueraded it
also i dont see any reffernce to measure error margins which again imho i just poor testing methods ( i might have missed it)

1: A CPU performance drop will not visible in a test that is GPU bottlenecked.
soo looking at 4k games you are not seeing the extra performance because you cant use it.
even in lower resolution there will be times where you might be gpu bottlenecked so the measure diffrent on the cpu will be smaller
this is the very reason when we do prope cpu comparions we do it in low ressolutation to know the diffrente on the CPU's themselves

2: Ryzem also has the CCX issue to handle that is bigger penalty than the SMT issue so the disable of SMT performance boost can be masquaread as the lower number of logical cores make more thread jumps back and forth between CCX giving a bigger CCX penalty
Or said diffrently. The benefits form SMT might be countered by the CCX issues when disabling SMT

3: the 3900ccx has an large amount of physical cores. this incraase the chance for a thread to go to a spare physsical core rather than the same physsical core. So needles to say the issue is naturally smaller on this cpu compared to cpu with a lower number of cores

4: this benchmark has no information about thread distribution load so we dont know if we are even measuring the mentioned situation

This is important to take into account when we read the benchmarks
lets look at the 720p results for 3900x only
All games
SMT on 97.8%
SMT off = 100%

so we see a little above 2% performance increse across the board.
this is very small yes. is it withing error margesn i dont know the article is very weak on data samples

But looking that the 5% difference we saw in cinbench that is solely CPU depending would is be weird to see 2-3% in stiation that is not only CPU dependant ?
Not at all. so while this is not a super conformation it is in my eyes still more of evidence for the difference than not

also this results is the aveage of sitatuion that is not even relevat for the stiatuion aka some of the game might scale perfectly and threby not have a SMT penalty
I dont know the article lacks technical information and i dont know the games by heart

7 of the benchmarks showed gaings from disabling SMT. 3 did not
aging small gains. so not a huge confirmation but still pointin more towards there IS a penalty with SMT sometimes rather than there is NEVER a penalty

Here is where it gets interst
when smt helps it never goes further thn just slight above the non SMT. nno other CPU are "skipped"
When SMT is showing to hurt. the different are in some cases able to skipp several cpu
Look at metroxexedus and rage 2

lets remove anything with a less than 3.3% diffrens due to error margin
What we have left is:
Farcry 5 = 3.6% boost from disabling SMT
Metro exedus = 9.6% performance boost from disabling SMT
rage 2 = 4.5% perfromance boost from disabling SMT
Shadow of the tomb raider = 3.35% boost from disabling SMT
Wolfenstein ii = 5% performance boost form disabling SMT

Whenever there is more than 3.3% different is ALWAYS in the favor of disabling SMT in the test you provided
Taking into account the 4 points from above i see this test is a good proof of the SMT penalty rather than a disapproval

Again just because we test in sitaution where the penalty is no in effect does not mean its not present
This is the reason why we never try to "prove a negative"

I mean almost 10% FPS boost in Shooter game for free does kinda have a nice ring to it

BrotherMichigan · Oct 2, 2019

SvenBent said:
The issue with these kind of testing that is not focusing on the issue is that you run into factor that masqueraded it
also i dont see any reffernce to measure error margins which again imho i just poor testing methods ( i might have missed it)

1: A CPU performance drop will not visible in a test that is GPU bottlenecked.
soo looking at 4k games you are not seeing the extra performance because you cant use it.
even in lower resolution there will be times where you might be gpu bottlenecked so the measure diffrent on the cpu will be smaller
this is the very reason when we do prope cpu comparions we do it in low ressolutation to know the diffrente on the CPU's themselves

2: Ryzem also has the CCX issue to handle that is bigger penalty than the SMT issue so the disable of SMT performance boost can be masquaread as the lower number of logicl cores make more thread jumps back and forth between CCX giving a bigger CCX penalty
Or said diffrently. The benefits form SMT might be counterd by the CCX issues when disabling SMT

3: the 3900ccx has an enoemr amount of physical cores. this incrase the chance for a thread to hot a speare physsicl core rather than the same physsicl cores. so needles to say the issue is naturally smaller on this cpu compared to cpu with a lower number of cores

4: this benchmark has no information about thraed distribution load so we dont know if we are even measuing the mentioned situation

This is important to take into account when we read the benchmarks
lets look at the 720p results for 3900x only
All games
SMT on 97.8%
SMT off = 100%

so we see a little above 2% performance increse across the board.
this is very small yes. is it withing error margesn i dont know the article is very weak on data samples

But looking that the 5% difference we saw in cinbench that is solely CPU depending would is be weird to see 2-3% in stiation that is not only CPU dependant ?
Not at all. so while this is not a super conformation it is in my eyes still more of evidence for the difference than not

also this results is the aveage of sitatuion that is not even relevat for the stiatuion aka some of the game might scale perfectly and threby not have a SMT penalty
I dont know the article lacks technical information and i dont know the games by heart

7 of the benchmarks showed gaings from disabling SMT. 3 did not
aging small gains. so not a huge confirmation but still pointin more towards there IS a penalty with SMT sometimes rather than there is NEVER a penalty

Here is where it gets interst
when smt helps it never goes further thn just slight above the non SMT. nno other CPU are "skipped"
When SMT is showing to hurt. the different are in some cases able to skipp several cpu
Look at metroxexedus and rage 2

let remove anythign with ina 3.3% error margein
what we have left is
Farcry 5 = 3.6% boost from disabling SMT
Metro exedus = 9.6% performance boost from disabling SMT
rage 2 = 4.5% perfromance boost from disabling SMT
Shadow of the tomb raider = 3.35% boost from disabling SMT
Wolfenstein ii = 5% performance boost form disabling SMT

Whenever there is more than 3.3% diffrent is ALWAYS in the favor of disabling SMT in the test you provided

"Margin of error" being a colloquialism meaning "within a few percent" or "likely below notice." To bring out the word academic again, the issues you mention here are largely academic as well, meaning that whatever the cause, the differences are mostly small and the end user is likely unconcerned about their origin. It's well known the SMT is usually no good for games (hence one of the reasons why people prefer the 9700K for gaming systems), but the advantages in other workloads generally outweigh the costs.

SvenBent · Oct 2, 2019

BrotherMichigan said:
"Margin of error" being a colloquialism meaning "within a few percent" or "likely below notice." To bring out the word academic again, the issues you mention here are largely academic as well, meaning that whatever the cause, the differences are mostly small and the end user is likely unconcerned about their origin. It's well known the SMT is usually no good for games (hence one of the reasons why people prefer the 9700K for gaming systems), but the advantages in other workloads generally outweigh the costs.

"but the advantages in other workloads generally outweigh the costs."
I totaly agree on that my point was never to imply that SMT is inherently BAD. Just that there are situations where having SMT hurts performance.
So when we look at getting SMT4 we need to realize that while STM4 might give more gains due to more possible threads to process.
it is also increase the risk of the SMT penalty to roar its ugly head.

I'm a big fan of SMT it its helpfll in most my workload. but that should not blind me from it drawback if I want maximum performance

"Margin of error" being a colloquialism meaning "within a few percent" or "likely below notice"
Error of margin in my world is not typical an vague terminology. its a measured and defined limitation typically ahead of analyzing data to remove potential bias.
Either we calculate the P value. or use the outlies

like lets say cinebench

if the average of A is above B.
But the lowest of A is not above the highest of B. that would be considered within margn of error as the error distance during the measurements are small than the distance the middle of the A and B test results

A:
340
360
340
337
350

B:
500
500
320
320
700

This is offcause an extreme example
but this would would be determined as no different due to margin of error. even thoug the avarage of the sammples are highly different.
The overlap in sampels values are overlaaping and therefore can not dismiss that its just a inaccurancy an data recording/gathering

this is alsy why i tend to use 5 sampels rather than just 3
i tend to du a5:3 delta elimination before avaraging the remagning 3
is this results is way more "stable" against outside factors impacting the measurements

But I opted to just do averages today because. im lazy

--- Edit ---

i do apologize for typo' im on company time

SvenBent · Oct 2, 2019

BrotherMichigan said:
"Margin of error" being a colloquialism meaning "within a few percent" or "likely below notice." To bring out the word academic again, the issues you mention here are largely academic as well, meaning that whatever the cause, the differences are mostly small and the end user is likely unconcerned about their origin. It's well known the SMT is usually no good for games (hence one of the reasons why people prefer the 9700K for gaming systems), but the advantages in other workloads generally outweigh the costs.

Just to clarify my other point with anoter shamelss plug

Having to disable and enanle SMT is a horrible process of having to reboot to gain maximum performance
My project mercury tools does this live wby assigin affinity to avoid this draw back

So with project mercury you get the best of both worlds
a: All you logical cores massive performance when you software scalels highly
b: Avoiding threads getting staved by having ti shared cores unended when number of threads are low

And as a bonus using PM over really disabling SMT you still keep te benefits of the extra logical cores to offload background task while you game is getting the full performance of not having to share physical cores.
and hey its free

GoodBoy · Oct 2, 2019

SvenBent said:
Can you provide any testing that show there is not penalty from SMT when in low multithreade situation ?

I believe you are confusing working with SMT due to being SMP capable and actual SMT aware

SMT aware = dont do stupid distributions of threads to logical cores on the same physical core when a another physical core is available
SMP aware = I can distribute load to multiple (logical) cores

I believe you are talking about the later. I'm talking about the former

The scheduler has to be aware of both SMT/HT and SMP to efficiently schedule threads.
I was talking about SMT. Win2k was not SMT aware, and it treated it as if it was an SMP scenario. That was fixed in Windows XP... And for new chiplet/CCX cpu designs, accounted for in build 1903.

SvenBent said:
and no task manager does not work for this.

a single thread software will show load distributions on all cores in taskmanager.

Setting affinity to a particular core will keep it on that core. Unless this is some new bug, or how the Ryzen scheduler bug presents itself on pre-1903 windows..

SvenBent said:
on the SMT issue
The perfomarnce issue comes form 2 thread getting assigned to 2 logical cores that are on the same physical cores and threby have to share ressources

This is the whole point of SMT, multiple threads sharing a core. Granted it is obviously better to put a thread on another physical core if there is one available.

SvenBent said:
The issue is that Windows scheduler treats all logical cores as even even if there is load on its paired logical cores.

I don't believe this to be the case, at least not something observed on my Intel's logical cores. If you are referring to the pre-1903 bug for Ryzen, not sure why we are hashing it out, as it has been fixed (reportedly) by Windows 10 build 1903...

SvenBent said:
So the issues IS rooted in how SMT works but come to effect because windows scheduler is not aware of how SMT actually works

I really do not believe this is the case.

SvenBent said:
Just to recap my point:
SMT4 can be a even bigger SMT penalty for games than SMT currently it and/
and people need to take in account what benchmark they read and how they load cores before applying any gains to their usage scenario
15% boost in cinebench could be a 5 % loss in games because thread/core load is etremely different

Agree completely. Perhaps something is lost in translation on my interpretation of the points you were trying to make in the other posts.

Rockenrooster · Oct 2, 2019

SvenBent said:
Just to clarify my other point with anoter shamelss plug

Having to disable and enanle SMT is a horrible process of having to reboot to gain maximum performance
My project mercury tools does this live wby assigin affinity to avoid this draw back

So with project mercury you get the best of both worlds
a: All you logical cores massive performance when you software scalels highly
b: Avoiding threads getting staved by having ti shared cores unended when number of threads are low

And as a bonus using PM over really disabling SMT you still keep te benefits of the extra logical cores to offload background task while you game is getting the full performance of not having to share physical cores.
and hey its free

The best of both worlds!
Source: I used to use this tool pre 1903 on my Ryzen

SvenBent · Oct 2, 2019

GoodBoy said:
The scheduler has to be aware of both SMT/HT and SMP to efficiently schedule threads.
I was talking about SMT. Win2k was not SMT aware, and it treated it as if it was an SMP scenario. That was fixed in Windows XP... And for new chiplet/CCX cpu designs, accounted for in build 1903.

Setting affinity to a particular core will keep it on that core. Unless this is some new bug, or how the Ryzen scheduler bug presents itself on pre-1903 windows..

This is the whole point of SMT, multiple threads sharing a core. Granted it is obviously better to put a thread on another physical core if there is one available.

I don't believe this to be the case, at least not something observed on my Intel's logical cores. If you are referring to the pre-1903 bug for Ryzen, not sure why we are hashing it out, as it has been fixed (reportedly) by Windows 10 build 1903...

I really do not believe this is the case.

Agree completely. Perhaps something is lost in translation on my interpretation of the points you were trying to make in the other posts.

I was talking about SMT. Win2k was not SMT aware, and it treated it as if it was an SMP scenario. That was fixed in Windows XP.
Im curios what exactly was fixed? cause its definitely not putting thread correctly to avoid the drawback of SMT ( aka thread conflicts)

Setting affinity to a particular core will keep it on that core. Unless this is some new bug, or how the Ryzen scheduler bug presents itself on pre-1903 windows..
This has nothing to do with what i said. I mentioned taskmanager is bad at showing thread distribution load. Nothing about affinity. i think you mixed my messages together here

This is the whole point of SMT, multiple threads sharing a core. Granted it is obviously better to put a thread on another physical core if there is one available.
Agreed this is where SMT gets it benefit from im simply showing the light on the double edged sword it it
SMT gives and SMT takes. it no just SMT gives as many people tend to think

I don't believe this to be the case, at least not something observed on my Intel's logical cores. If you are referring to the pre-1903 bug for Ryzen, not sure why we are hashing it out, as it has been fixed (reportedly) by Windows 10 build 1903...
We are hashing it out because i believe 1903 fixed the CCX/chiplets issue and not the SMT issue. And with SMT4 the SMT issue will grow if windows does not fix it
Again how did you test for this issue? As how most ppl test this simply does not work to illustrate it due to lack of understanding of the issue

I really do not believe this is the case.
I'm not sure to convince you otherwise than this issue is ONLY present on SMT as that is the only situation you can assign 2 threads to the same physical core
as soon as SMT is out of the picture the performs increases ( either by affinity or real disabling)
Again if the debate is if it due to scheduler vs SMT. its a matter of semantics as both are culprits.
The issue is not there without SMT. The issue is not present of the scheduler handled threads "correctly"

This issue has been documented since the core I 9xx and is really not new

GoodBoy · Oct 2, 2019

Yeah, I believe that 4way SMT in a CPU would not ultimately be a good idea for consumers, gamers, or even workstation workloads.

And it's just a rumor at this point.

There are 4 way and 8 way SMT processors, but those are relegated to Xeon Phi and other server-market processors. I think if this was something beneficial on the desktop, we would have already seen it.

The post with the theoretical 60% perf boost on AMD's SMT vs a 30% boost on Intels HT, i really don't know where the guy got his figures from... and I read that post. I know in real world workloads, SMT is not all that beneficial (the point I think you are making).

Fun to theorize about, but I don't think this is something we will have to worry about in desktop CPU's.

And for someone with a 12 or 16 core Ryzen, they should be disabling SMT to get the best "typical use cases" performance. I know I would.

IdiotInCharge · Oct 2, 2019

Rockenrooster said:
I never realized that early Bluray players used P4's. I wonder how power hungry they were.... I figured they were fixed function hardware instead.

Loud as hell, but they worked -- fixed-function hardware took longer due to the jump in processing capability needed to decode the new codecs.

OutOfPhase · Oct 2, 2019

GoodBoy said:
Fun to theorize about, but I don't think this is something we will have to worry about in desktop CPU's.

And for someone with a 12 or 16 core Ryzen, they should be disabling SMT to get the best "typical use cases" performance. I know I would.

I'll agree with your main point.

None of this is for desktop, nor is a 12-16 core part in the first place. They're working to improve some efficiencies in fully saturated manycore scenarios where the chip is running through tasks as fast as it can (and will have an unending supply).

For any normal home use, this is all massive overkill and in many cases, working counter to metrics which might matter more than the cinebench or encoding benchmarks. If you truly do that all day, sure - select those products. If not, look at what your load really is.

Mega6 · Oct 2, 2019

I made myself a promise that my next upgrade will be a 12 core. When everyone was buying 4 core I bought the 3930K 6c/6t and have not regretted once. I use VMware, but also even an old ass engine like the source engine can create 7 decent threads. Quake 2016 will load down all 12 evenly. Although disabling HT would prob be slight performance gain just gaming - I multitask the hell out of it. So average Joe - overkill for websurfing, sure. There are still a lot of advanced users that can take advantage of 12+ cores I think.

ChadD · Oct 2, 2019

GoodBoy said:
Yeah, I believe that 4way SMT in a CPU would not ultimately be a good idea for consumers, gamers, or even workstation workloads.

And it's just a rumor at this point.

There are 4 way and 8 way SMT processors, but those are relegated to Xeon Phi and other server-market processors. I think if this was something beneficial on the desktop, we would have already seen it.

The post with the theoretical 60% perf boost on AMD's SMT vs a 30% boost on Intels HT, i really don't know where the guy got his figures from... and I read that post. I know in real world workloads, SMT is not all that beneficial (the point I think you are making).

Fun to theorize about, but I don't think this is something we will have to worry about in desktop CPU's.

And for someone with a 12 or 16 core Ryzen, they should be disabling SMT to get the best "typical use cases" performance. I know I would.

Its hard to say.... I have posted before I think this rumor is a 50/50 chance of being real at best. Still... the main reason SMT4 hasn't been more widely adopted is the price of cache more then anything imo.

IBMs power 9 with SMT 4 actually perform very well... but IBM has dedicated a lot of cache on those chips. The talos power 9 22 core / 88 thread parts do beat up on threadripper and Intel i9s in things like compression work. In Phoronix benchmarks of the talos II power 9 there dual 22 core machine with 176 threads was 20-30% faster then AMDs 2990wx in many workloads. Now granted that is power vs x86 under Linux and of course performance is all over the place due to software optimizations (or lack of... power 9 looses a lot of tests) Still if done right with the proper amount of cache to dedicate to threads SMT4 is a very viable technology.

If AMD does offer SMT4... I imagine it will only be possible if 7nm+ allows them to once again double the amount of L3 cache on their chiplets. With a proper amount of cache and another big jump in their prediction engine. Zen 2s prediction engine is much better then zen 1... but its still not the latest high end research level either, they have room to improve there. AMD with zen 2 moved to a TAGE branch predictor which is much like the one IBM uses in power 9 and was first developed by researchers in 2006... its fast (much faster then AMDs old predictor) but its not perfect and requires a form of statistical correction. One new solution researchers have been working on is a BATAGE prediction engine... which provides the speed of TAGE while removing the need for statistical correction, making it much simpler to implement (less silicon). So its possible if AMD has worked a BATAGE engine into Zen3 that they may well have have freed up thedie space to add SMT4 bits along with 7nm+ allowing for even more cache.

If Zen 3 early marketing takes about BATAGE and SMT4... Zen3 could end up being the same sort of leap Zen2 was over Zen. Of course its also possible Zen3 is less ambitious. Will be fun to find out. part of me is really hoping both things are true... and AMD is able to wait for Intel to announce their 7nm parts and then drop Zen3 the day after. Fun times we live in.

If you have some time to burn reading about prediction engines is sort of fun.

https://hal.inria.fr/hal-01799442/document

SvenBent · Oct 2, 2019

GoodBoy said:
Yeah, I believe that 4way SMT in a CPU would not ultimately be a good idea for consumers, gamers, or even workstation workloads.

And it's just a rumor at this point.

There are 4 way and 8 way SMT processors, but those are relegated to Xeon Phi and other server-market processors. I think if this was something beneficial on the desktop, we would have already seen it.

The post with the theoretical 60% perf boost on AMD's SMT vs a 30% boost on Intels HT, i really don't know where the guy got his figures from... and I read that post. I know in real world workloads, SMT is not all that beneficial (the point I think you are making).

Fun to theorize about, but I don't think this is something we will have to worry about in desktop CPU's.

And for someone with a 12 or 16 core Ryzen, they should be disabling SMT to get the best "typical use cases" performance. I know I would.

again no need to disabled SMT when you can solve the issue it and get best of both world with free software.

i both game and run weeklong runs 100% core scalablity heavy cpu load.
having to enable/disable smt for those two scenarioes is just not optimal

clicking of one chechkebox in project mercurtry to give my games max speed is a lot easier/faster solution.
best of all i can do this while i run my game during the heavu cpu load and still get the no SMT benefits for the game and the SMT benefits for the background heavy CPU load

once you reach this scenario the benefits from Projet mercure runs into the >100% fs improvements in gamethis is however due to more issues than just SMT and CCX getting fixed

BrotherMichigan · Oct 3, 2019

GoodBoy said:
The post with the theoretical 60% perf boost on AMD's SMT vs a 30% boost on Intels HT, i really don't know where the guy got his figures from... and I read that post. I know in real world workloads, SMT is not all that beneficial (the point I think you are making).

I'm assuming you're talking about the SMT yield figures I posted from TheStilt? If so, it's 38% vs 30% in the selected workloads (not 60% vs 30%), the workloads are all specified, and many of them ARE real world workloads (e.g. three separate video encoding tasks, 3D rendering in Blender, file compression/decompression with 7-Zip and WinRAR, code compilation with GCC, etc.)

Grimlaking · Oct 3, 2019

SvenBent said:
again no need to disabled SMT when you can solve the issue it and get best of both world with free software.

i both game and run weeklong runs 100% core scalablity heavy cpu load.
having to enable/disable smt for those two scenarioes is just not optimal

clicking of one chechkebox in project mercurtry to give my games max speed is a lot easier/faster solution.
best of all i can do this while i run my game during the heavu cpu load and still get the no SMT benefits for the game and the SMT benefits for the background heavy CPU load

once you reach this scenario the benefits from Projet mercure runs into the >100% fs improvements in gamethis is however due to more issues than just SMT and CCX getting fixed

Ok after seeing all of this conversation about your tool... I want it where is a link?

And does it work on server OS's or do they not have this issue? And do you license for enterprise use if it does work?

GoodBoy · Oct 3, 2019

BrotherMichigan said:
I'm assuming you're talking about the SMT yield figures I posted from TheStilt? If so, it's 38% vs 30% in the selected workloads (not 60% vs 30%), the workloads are all specified, and many of them ARE real world workloads (e.g. three separate video encoding tasks, 3D rendering in Blender, file compression/decompression with 7-Zip and WinRAR, code compilation with GCC, etc.)

Yeah, it was one of theStilt's graphs pretty sure. SMT scaling or something.

Those particular workloads don't interest me though, I never do those tasks. Did he have other examples like a few games? (don't recall seeing any) Were there any examples that showed SMT being a negative benefit? (If not, it's an incomplete picture).

BrotherMichigan · Oct 3, 2019

GoodBoy said:
Yeah, it was one of theStilt's graphs pretty sure. SMT scaling or something.

Those particular workloads don't interest me though, I never do those tasks. Did he have other examples like a few games? (don't recall seeing any) Were there any examples that showed SMT being a negative benefit? (If not, it's an incomplete picture).

I already addressed this when Snowdog said the same thing. The post is an architectural analysis and that particular part of the post was investigating SMT implementation and yield. It would be really stupid to include a bunch of workloads that don't scale with SMT if you're investigating SMT yield differences between architectures. There are plenty of other outlets that have done simple SMT on/off comparisons across a variety of workloads (one I linked in this very thread) that you can find with a very simple google search.

Rockenrooster · Oct 3, 2019

GoodBoy said:
Yeah, it was one of theStilt's graphs pretty sure. SMT scaling or something.

Those particular workloads don't interest me though, I never do those tasks. Did he have other examples like a few games? (don't recall seeing any) Were there any examples that showed SMT being a negative benefit? (If not, it's an incomplete picture).

Then you are the wrong market/target audience for SMT

IdiotInCharge · Oct 3, 2019

Rockenrooster said:
Then you are the wrong market/target audience for SMT

Everyone is the target market for SMT.

Rockenrooster · Oct 3, 2019

IdiotInCharge said:
Everyone is the target market for SMT.

Not if you don't want it because it decreases performance for you..
So far the people that are in this target audience is 1 person: GoodBoy

drescherjm · Oct 3, 2019

AMD is adding this to increase performance.

Grimlaking · Oct 3, 2019

SMT 4 is a enterprise solution NOT INTENDED FOR DESKTOPS. I'm sure being AMD they will enable it for desktops.... maybe. I really kind of hope if they do it's only on threadripper CPU's.

IdiotInCharge · Oct 3, 2019

Rockenrooster said:
Not if you don't want it because it decreases performance for you..

We can see SMT4 implementation routes that both include and preclude usefulness for consumers. We don't know which way, or even which ways, AMD will take Zen.

Rockenrooster said:
So far the people that are in this target audience is 1 person: GoodBoy

You're making it personal. This is a forum with individuals that are authenticated only by handle and password, and thus is nearly anonymous. It's counterproductive.

Grimlaking said:
SMT 4 is a enterprise solution NOT INTENDED FOR DESKTOPS.

I don't see it. Even Intel has a habit of designing cores for each generation and then using that same core from low-power to top-end consumer to HEDT to many-core server CPUs. AMD does this even more; every Zen die short of their APUs and their upcoming console processors are the same part on the wafer.

Thus, it would make sense for SMT4 to be something that every Zen 3 die is capable of, with the scenario where it doesn't reach consumers due to the extra threads being disabled and / or the necessary cache being disabled. Which is admittedly a possibility, but at the same time, if their implementation shows performance gains across consumer workloads as well -- and given that AMD can add cache fairly easily and cheaply -- we might see AMD push SMT4 across their product lineup.

Speculation: AMD could produce a low-power APU with high single-core burst speeds with four cores and sixteen threads. With updated graphics tuned for mobile, they'd have a pretty compelling part from a performance perspective.

Rockenrooster · Oct 3, 2019

IdiotInCharge said:
Speculation: AMD could produce a low-power APU with high single-core burst speeds with four cores and sixteen threads. With updated graphics tuned for mobile, they'd have a pretty compelling part from a performance perspective.

Now that would be pretty cool!

As for singling out GoodBoy, I did that because he posted in a thread about SMT4 while we were discussing the benefits of SMT and their different implementations and said that he/she never runs those workloads that would benefit from SMT and isn't interested in looking at them. So why post in this thread?
Most everyone here knows that most games benefit very little from SMT so why show benchies for them...

GoodBoy · Oct 3, 2019

BrotherMichigan said:
... It would be really stupid to include a bunch of workloads that don't scale with SMT if you're investigating SMT yield differences between architectures. There are plenty of other outlets that have done simple SMT on/off comparisons across a variety of workloads (one I linked in this very thread) that you can find with a very simple google search.

It's not stupid to include those workloads if AMD is trying to sell me on SMT4 in a desktop processor.... and someone writing an article on SMT should be including those cases where SMT is NOT beneficial, or not as beneficial, or a detriment, otherwise it is an incomplete picture as I stated in my previous post...

Rockenrooster · Oct 3, 2019

GoodBoy said:
It's not stupid to include those workloads if AMD is trying to sell me on SMT4 in a desktop processor.... and someone writing an article on SMT should be including those cases where SMT is NOT beneficial, or not as beneficial, or a detriment, otherwise it is an incomplete picture as I stated in my previous post...

I'm pretty sure the SMT benefiting games has been beat to death already. You should be looking at real cores. No sense beating a dead horse.......

AMD possibly going to 4 threads per core

2[H]4U

2[H]4U

Gawd

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

Supreme [H]ardness

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

2[H]4U

2[H]4U

Gawd

2[H]4U

2[H]4U

NVIDIA SHILL

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

2[H]4U

Limp Gawd

2[H]4U

2[H]4U

Limp Gawd

Gawd

NVIDIA SHILL

Gawd

[H]F Junkie

2[H]4U

NVIDIA SHILL

Gawd

2[H]4U

Gawd