GoodBoy
2[H]4U
- Joined
- Nov 29, 2004
- Messages
- 2,773
if i use a screwdriver as a hammer its not a bad hammer. im just an idiot
Liked just for this =)
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
if i use a screwdriver as a hammer its not a bad hammer. im just an idiot
Its clear that windows 10 1803 does NOT take SMT into account ....
In short: windows 10 1803 CPU scheduler does not seem to be SMT aware or SMT optimal
I think he was testing an Intel CPU on his previous post... so.....It does, just not optimally for that particular CPU design.
Windows has been SMT aware since Windows XP. Introduction of chiplets needed to be accounted for in the scheduler logic, and it is once you are on 1903 for the Ryzen CPU's.
I was talking about SMT in general it just happens that the only CPU i have running in windows 10 to test the 1903 version was an intel version.
I just wnated to emphasied before the fanboys goes crazy that the SMT issue is both for Intel and AMD
anyway i did a quick test in windws 10 1803 on my laptops and it clearly show the SMT penalty
> Cinebench r15 <
4 threads / normal
357
360
363
371
366
= 363.4
4 thread / Affinity 0 2 4 6
383
383
379
383
381
= 381.8
Its clear that windows 10 1803 does NOT take SMT into account as there is a
363.4 to 381 is a slight above 5 boost from handling SMT correctly manually
side note the 2 non 383 results in cinebench was due to me not having taskmanager ready to adress affinity so it had to load taksmanager and set affinity first. which naturlala reduce benchmarkresults
so avoiding SMT thread conflicts also give way more stable results with cinebench
on your first point
If you are using taskmanager or similar to see anything about thread load distributions you are doing it wrong to begin with. This is not meant as an insult but to clarify on a very often seen misunderstanding on the CPU load scheduling topic.
this was even done by Kyle testing thread scalabilty in some games so don't feel bad about that.
This is simply just not how you see thread distribution/load as what you see is an average over time. so even though you see lets say se 8% load across all cores in taskmanger
it does not mean that more than 1 core was in use at a given time
You need to know the difference between "instant" sand "over time" situations
Some ppl might say 5% is not abig deal. but seeing how peopel went crazy about 100mhz turbo boost in ryzen. its kinda fun to se how they are losing 5% performance fomr not beeing smt aware and up to 20% for not being CCX aware
In short: windows 10 1803 CPU scheduler does not seem to be SMT aware or SMT optimal
(and you need to be using project mercury for getting max performance)
Will test 1903 as soon as I can
It does, just not optimally for that particular CPU design.
Windows has been SMT aware since Windows XP. Introduction of chiplets needed to be accounted for in the scheduler logic, and it is once you are on 1903 for the Ryzen CPU's.
The penalty you're seeing is a result of the scheduler and not anything inherent to SMT, then. I was thinking you were saying that enabling SMT would result in a decrease in performance in 7-zip, which I've found to be untrue at least for more recent builds of 7-zip and modern architectures.
As for using task manager to visualize workload distribution, it actually works just fine for this simple task and even averaging workload over a few seconds doesn't obfuscate anything. If I start a heavy, single-threaded workload on my 1700X, windows seems to be smart enough to pin it to a single core and task manager shows a single core loaded to 100% (or 7-8% overall utilization due to the single thread and background processes.) As soon as I toss another thread into the mix, I observe thread-hopping and no single logical core sees more than about 20% usage at any given time (due to averaging over the refresh time.) No, I'm not seeing combinations of two logical CPUs bouncing to 100% as the workload moves around, but I also don't need to in order to validate (or invalidate, as the case may be) the behavior I was talking about.
SvenBent: I'm dumb and you're right; this doesn't actually address whether or not the scheduler is being smart enough to not toss a second thread onto a single core. The problem is that forcing processor affinity isn't really a valid test either since in that case you're also preventing thread hopping between physical cores. A more proper test would be to turn off SMT at the BIOS level and then perform the test again.
Affinity doe not lock a thread to a given core.
it locks the process to a number of cores. so te 4 thread are still jumping back and forth on the 4 cores the process is assigned to
You can follow this thread load/distribution with process monitor from microsoft i i can highly recommend it as its the only software im aware of that actually show thread level information
This does however not invalidate you argument of a real test would be turning SMT off
You are right that it would be a more proper testing.
However my laptop does not have a fancy enough bios for ths
BUT
i can inform you that the years of testing i have done on this for project mercury
The difference from SMT off vs affinity comes out to be within error margens
Ill see if some of my earlier testing are still available on this
Iet me point out that I'm not saying SMT is not a good thing. its great for heavy threaded situations running cinbench with full nubmer of threads give a clear benefit to SMT CPU's
My point is just that SMT has a draweback in some sitatuation and
a: Windows does not handle it well
b: Project mercury absolut rocks on this ( self ad here)
c: With SMT4 both performance gains AND penalty will arise so SMT4 might be bad for gamers)
This is not a black/white situatuon it just happens that most benchmarks site handles it like it either single threaed or fully threaded when we look at SMT. and there are ton of software that are in between especially modern game.
--- edit ---
I really should upload my testing to my website instead of dumping it in random forum lol
Ah, right, because you're setting the affinity for the whole application to a number of CPU cores. Duh.
If I have time tonight, I can do some SMT on/off testing on my 1700X limited to eight cores. Might be enlightening. (We are VERY off topic at this point!)
Yea, I mean pipeline stages. That's why I think Ryzen is very Netburst like, because it can make really good use of memory bandwidth. AMD found a way to make it work, and not worry about clock speed as much. I doubt AMD would make each SMT4 core specialized. AMD might have found a way to lengthen and shorten pipelines dynamically, so it makes it easy to divide it up to 4 threads.It's been a while since it mattered, but 'pipeline stages' are what I think you're getting at. Pipelines might be... number of threads? In any case, Netburst had an absurd number of pipeline stages, as it was designed for high-bandwidth media processing, something for which it actually was very good at. Where it failed was when the branch predictor missed, and the whole pipeline had to be restarted. This negated Netbursts clockspeed advantage for the most important work a CPU does: chewing through branching code. Thus while Intel wasn't uncompetitive, and they certainly had a better platform around their CPUs at first, once AMD and their partners stabilized the platform and then moved the memory controller onboard, it was game over for Netburst. Except for Bluray players. The first ones ran Pentium IVs, because they were good at that!
With respect to 'more pipelines', if you mean more execution units per core, then that can increase IPC when properly balanced. However, that balance is important, because increasing execution units only works if they can be fed by the rest of the CPU and the system. Typically, more cache is needed, better resource allocators are needed, and even more threads given to the OS as in SMT4.
And that brings up a central question: is AMD going to outfit their SMT4-enabled cores for a variety of workloads? If they are, could that be useful to the consumer / outside of the enterprise? If not, are they going to keep the 'leaner' cores they have now and continue to improve them for the consumer?
Ah, so the penalty you're seeing is academic
not sure what you mean by academic here.
The resulting penalty is present under most games depeding on have many cpu heavy thread in the software vs how many physicl lcores in the system
im just using a easier program to show the penalty
to explain further
lets says you software have 4 cpu heavy threads
- you have a 2 physsical core CPU. SMT will help as you get the benifits from SMT and non of the draw back ( all physsicl core are utilizes
- you have a 4 p[hyssical core CPU. SMT will hurt as without SMT you will be able to utilize all physsical cores. but with SMT you run in the above thread conflicts where 2 thread goes to the same physical cores instead of 2 different ones.
So its very much a hit and miss issue and need the correct understanding to see it
Luckily results with SMT on/off even with multi-CCD SKUs like the 3900X see differences that are more or less margin of error in gaming benchmarks: https://www.techpowerup.com/review/amd-ryzen-9-3900x-smt-off-vs-intel-9900k/
The issue with these kind of testing that is not focusing on the issue is that you run into factor that masqueraded it
also i dont see any reffernce to measure error margins which again imho i just poor testing methods ( i might have missed it)
1: A CPU performance drop will not visible in a test that is GPU bottlenecked.
soo looking at 4k games you are not seeing the extra performance because you cant use it.
even in lower resolution there will be times where you might be gpu bottlenecked so the measure diffrent on the cpu will be smaller
this is the very reason when we do prope cpu comparions we do it in low ressolutation to know the diffrente on the CPU's themselves
2: Ryzem also has the CCX issue to handle that is bigger penalty than the SMT issue so the disable of SMT performance boost can be masquaread as the lower number of logicl cores make more thread jumps back and forth between CCX giving a bigger CCX penalty
Or said diffrently. The benefits form SMT might be counterd by the CCX issues when disabling SMT
3: the 3900ccx has an enoemr amount of physical cores. this incrase the chance for a thread to hot a speare physsicl core rather than the same physsicl cores. so needles to say the issue is naturally smaller on this cpu compared to cpu with a lower number of cores
4: this benchmark has no information about thraed distribution load so we dont know if we are even measuing the mentioned situation
This is important to take into account when we read the benchmarks
lets look at the 720p results for 3900x only
All games
SMT on 97.8%
SMT off = 100%
so we see a little above 2% performance increse across the board.
this is very small yes. is it withing error margesn i dont know the article is very weak on data samples
But looking that the 5% difference we saw in cinbench that is solely CPU depending would is be weird to see 2-3% in stiation that is not only CPU dependant ?
Not at all. so while this is not a super conformation it is in my eyes still more of evidence for the difference than not
also this results is the aveage of sitatuion that is not even relevat for the stiatuion aka some of the game might scale perfectly and threby not have a SMT penalty
I dont know the article lacks technical information and i dont know the games by heart
7 of the benchmarks showed gaings from disabling SMT. 3 did not
aging small gains. so not a huge confirmation but still pointin more towards there IS a penalty with SMT sometimes rather than there is NEVER a penalty
Here is where it gets interst
when smt helps it never goes further thn just slight above the non SMT. nno other CPU are "skipped"
When SMT is showing to hurt. the different are in some cases able to skipp several cpu
Look at metroxexedus and rage 2
let remove anythign with ina 3.3% error margein
what we have left is
Farcry 5 = 3.6% boost from disabling SMT
Metro exedus = 9.6% performance boost from disabling SMT
rage 2 = 4.5% perfromance boost from disabling SMT
Shadow of the tomb raider = 3.35% boost from disabling SMT
Wolfenstein ii = 5% performance boost form disabling SMT
Whenever there is more than 3.3% diffrent is ALWAYS in the favor of disabling SMT in the test you provided
"Margin of error" being a colloquialism meaning "within a few percent" or "likely below notice." To bring out the word academic again, the issues you mention here are largely academic as well, meaning that whatever the cause, the differences are mostly small and the end user is likely unconcerned about their origin. It's well known the SMT is usually no good for games (hence one of the reasons why people prefer the 9700K for gaming systems), but the advantages in other workloads generally outweigh the costs.
"Margin of error" being a colloquialism meaning "within a few percent" or "likely below notice." To bring out the word academic again, the issues you mention here are largely academic as well, meaning that whatever the cause, the differences are mostly small and the end user is likely unconcerned about their origin. It's well known the SMT is usually no good for games (hence one of the reasons why people prefer the 9700K for gaming systems), but the advantages in other workloads generally outweigh the costs.
Can you provide any testing that show there is not penalty from SMT when in low multithreade situation ?
I believe you are confusing working with SMT due to being SMP capable and actual SMT aware
SMT aware = dont do stupid distributions of threads to logical cores on the same physical core when a another physical core is available
SMP aware = I can distribute load to multiple (logical) cores
I believe you are talking about the later. I'm talking about the former
and no task manager does not work for this.
a single thread software will show load distributions on all cores in taskmanager.
on the SMT issue
The perfomarnce issue comes form 2 thread getting assigned to 2 logical cores that are on the same physical cores and threby have to share ressources
The issue is that Windows scheduler treats all logical cores as even even if there is load on its paired logical cores.
So the issues IS rooted in how SMT works but come to effect because windows scheduler is not aware of how SMT actually works
Just to recap my point:
SMT4 can be a even bigger SMT penalty for games than SMT currently it and/
and people need to take in account what benchmark they read and how they load cores before applying any gains to their usage scenario
15% boost in cinebench could be a 5 % loss in games because thread/core load is etremely different
The best of both worlds!Just to clarify my other point with anoter shamelss plug
Having to disable and enanle SMT is a horrible process of having to reboot to gain maximum performance
My project mercury tools does this live wby assigin affinity to avoid this draw back
So with project mercury you get the best of both worlds
a: All you logical cores massive performance when you software scalels highly
b: Avoiding threads getting staved by having ti shared cores unended when number of threads are low
And as a bonus using PM over really disabling SMT you still keep te benefits of the extra logical cores to offload background task while you game is getting the full performance of not having to share physical cores.
and hey its free
The scheduler has to be aware of both SMT/HT and SMP to efficiently schedule threads.
I was talking about SMT. Win2k was not SMT aware, and it treated it as if it was an SMP scenario. That was fixed in Windows XP... And for new chiplet/CCX cpu designs, accounted for in build 1903.
Setting affinity to a particular core will keep it on that core. Unless this is some new bug, or how the Ryzen scheduler bug presents itself on pre-1903 windows..
This is the whole point of SMT, multiple threads sharing a core. Granted it is obviously better to put a thread on another physical core if there is one available.
I don't believe this to be the case, at least not something observed on my Intel's logical cores. If you are referring to the pre-1903 bug for Ryzen, not sure why we are hashing it out, as it has been fixed (reportedly) by Windows 10 build 1903...
I really do not believe this is the case.
Agree completely. Perhaps something is lost in translation on my interpretation of the points you were trying to make in the other posts.
I never realized that early Bluray players used P4's. I wonder how power hungry they were.... I figured they were fixed function hardware instead.
Fun to theorize about, but I don't think this is something we will have to worry about in desktop CPU's.
And for someone with a 12 or 16 core Ryzen, they should be disabling SMT to get the best "typical use cases" performance. I know I would.
Yeah, I believe that 4way SMT in a CPU would not ultimately be a good idea for consumers, gamers, or even workstation workloads.
And it's just a rumor at this point.
There are 4 way and 8 way SMT processors, but those are relegated to Xeon Phi and other server-market processors. I think if this was something beneficial on the desktop, we would have already seen it.
The post with the theoretical 60% perf boost on AMD's SMT vs a 30% boost on Intels HT, i really don't know where the guy got his figures from... and I read that post. I know in real world workloads, SMT is not all that beneficial (the point I think you are making).
Fun to theorize about, but I don't think this is something we will have to worry about in desktop CPU's.
And for someone with a 12 or 16 core Ryzen, they should be disabling SMT to get the best "typical use cases" performance. I know I would.
Yeah, I believe that 4way SMT in a CPU would not ultimately be a good idea for consumers, gamers, or even workstation workloads.
And it's just a rumor at this point.
There are 4 way and 8 way SMT processors, but those are relegated to Xeon Phi and other server-market processors. I think if this was something beneficial on the desktop, we would have already seen it.
The post with the theoretical 60% perf boost on AMD's SMT vs a 30% boost on Intels HT, i really don't know where the guy got his figures from... and I read that post. I know in real world workloads, SMT is not all that beneficial (the point I think you are making).
Fun to theorize about, but I don't think this is something we will have to worry about in desktop CPU's.
And for someone with a 12 or 16 core Ryzen, they should be disabling SMT to get the best "typical use cases" performance. I know I would.
The post with the theoretical 60% perf boost on AMD's SMT vs a 30% boost on Intels HT, i really don't know where the guy got his figures from... and I read that post. I know in real world workloads, SMT is not all that beneficial (the point I think you are making).
again no need to disabled SMT when you can solve the issue it and get best of both world with free software.
i both game and run weeklong runs 100% core scalablity heavy cpu load.
having to enable/disable smt for those two scenarioes is just not optimal
clicking of one chechkebox in project mercurtry to give my games max speed is a lot easier/faster solution.
best of all i can do this while i run my game during the heavu cpu load and still get the no SMT benefits for the game and the SMT benefits for the background heavy CPU load
once you reach this scenario the benefits from Projet mercure runs into the >100% fs improvements in gamethis is however due to more issues than just SMT and CCX getting fixed
I'm assuming you're talking about the SMT yield figures I posted from TheStilt? If so, it's 38% vs 30% in the selected workloads (not 60% vs 30%), the workloads are all specified, and many of them ARE real world workloads (e.g. three separate video encoding tasks, 3D rendering in Blender, file compression/decompression with 7-Zip and WinRAR, code compilation with GCC, etc.)
Yeah, it was one of theStilt's graphs pretty sure. SMT scaling or something.
Those particular workloads don't interest me though, I never do those tasks. Did he have other examples like a few games? (don't recall seeing any) Were there any examples that showed SMT being a negative benefit? (If not, it's an incomplete picture).
Yeah, it was one of theStilt's graphs pretty sure. SMT scaling or something.
Those particular workloads don't interest me though, I never do those tasks. Did he have other examples like a few games? (don't recall seeing any) Were there any examples that showed SMT being a negative benefit? (If not, it's an incomplete picture).
Then you are the wrong market/target audience for SMT
Not if you don't want it because it decreases performance for you..Everyone is the target market for SMT.
Not if you don't want it because it decreases performance for you..
So far the people that are in this target audience is 1 person: GoodBoy
SMT 4 is a enterprise solution NOT INTENDED FOR DESKTOPS.
Speculation: AMD could produce a low-power APU with high single-core burst speeds with four cores and sixteen threads. With updated graphics tuned for mobile, they'd have a pretty compelling part from a performance perspective.
... It would be really stupid to include a bunch of workloads that don't scale with SMT if you're investigating SMT yield differences between architectures. There are plenty of other outlets that have done simple SMT on/off comparisons across a variety of workloads (one I linked in this very thread) that you can find with a very simple google search.
It's not stupid to include those workloads if AMD is trying to sell me on SMT4 in a desktop processor.... and someone writing an article on SMT should be including those cases where SMT is NOT beneficial, or not as beneficial, or a detriment, otherwise it is an incomplete picture as I stated in my previous post...