Techgage Tests 2990WX Performance Scaling With Coreprio

AlphaAtlas

[H]ard|Gawd
Staff member
Joined
Mar 3, 2018
Messages
1,713
Some 2990WX users have been complaining about performance regressions in Windows, and we noticed some strangeness in our own Threadripper benchmarks as well. Fortunately, Level1Techs seemingly nailed down the issue in a series of articles from a few weeks ago, and pointed to Coreprio as a good solution to Window's scheduling issues. Techgage just put that solution to the test in a wide range of benchmarks, and the results are interesting, to say the least. Adobe Premiere Pro renders, which didn't scale with the 2990WX's cores in our testing, saw a massive improvement with Coreprio, while other programs like Blender didn't seem to benefit at all. They also compared Windows to Linux performance in Geekbench, and for whatever reason, saw a night-and-day improvement when switching to the open source OS.

Does all of this mean that Linux is the best OS for a chip like the 2990WX? It's really hard to believe otherwise. To base that off of GeekBench alone would be nonsense, but we have other testing experience to back up those opinions. Blender almost always performs better in Linux than in Windows, so the fact that a many-core chip works better in the penguin OS isn't a huge surprise... Fortunately, using either DLM or Coreprio won't hurt your performance in other areas too much, but it's important to note that it can in fact negatively impact them. On the flipside, if you bought a 2990WX (or 2970WX) and are running against a regression, you shouldn't hesitate in giving the tool a test. Don't like the result, or don't need it active all of the time? All you need to do is simply stop the service from within the applet, and you'll be back to normal.
 
From what I see there is more performance regression from using coreprio than not, over that suit of benchmarks
and It has the same unstable performance profile liek it did in kyle's test
 
i saw from your earlier thread on testing they were using

Windows 10 Pro v1803

does using windows server make any difference?
 
Since I found Windows' scheduler was treating my R7 1700 like all of the 16 threads were identical (making software threads share cores despite there being plenty of uncontended hardware threads to go around), nothing like this surprises me.
 
The question is if AMD is pressuring Microsoft to fix this issue. It still baffles me how they did not do anything about it prior to launch.
 
Since I found Windows' scheduler was treating my R7 1700 like all of the 16 threads were identical (making software threads share cores despite there being plenty of uncontended hardware threads to go around), nothing like this surprises me.

Thats why you use project mercury :D
 
The question is if AMD is pressuring Microsoft to fix this issue. It still baffles me how they did not do anything about it prior to launch.

The most likely explanation to that is that the problem is so integral to Windows that fixing it would involve death from a thousand cuts. No surprise whatsoever that Linux doesn't suffer the issue considering that Linux is designed to run NUMA based supercomputer clusters.

It'll be fixed, when Microsoft adopt their own Linux distro and dump that horrible kernel, scheduler and file system they're using. ;)
 
Wonder if this will be a problem for the next generation Ryzen CPUs.
 
From what I see there is more performance regression from using coreprio than not, over that suit of benchmarks
and It has the same unstable performance profile liek it did in kyle's test
Have you contacted Kyle to suggest he tries a few benchmarks with PM?
 
Have you contacted Kyle to suggest he tries a few benchmarks with PM?

not in regards to the AMD 2990wx ( Pm wont fix this yet.)
I did earlier but he was uninterested. even thoug pm has hashown huge performance increased on both AMD and intel system with SMT/CMT or CCX

i have small ideas based on vendels discoverings but I do do not have a 2990w to test on.
i did reach out to amd ealier for test samples for ryzen systems for the CCX non switch testing. but again nothing.
 
  • Like
Reactions: Meeho
like this
Thats why you use project mercury :D
That doesn't look like it would do anything for my problem. I usually do have more HW threads than I know what to do with, and therefore priorities usually don't have much effect. Coreprio is closer, but there's still some stuff it can't handle very well, like games that want all 16T active at one point in each frame and want 2-4T to not interfere with each other at other points.
 
That doesn't look like it would do anything for my problem. I usually do have more HW threads than I know what to do with, and therefore priorities usually don't have much effect. Coreprio is closer, but there's still some stuff it can't handle very well, like games that want all 16T active at one point in each frame and want 2-4T to not interfere with each other at other
points.

I honestly don't know what you mean by HW threads
Threads are software,

But if you have an abundance of logical cores with less physical backing them up and the issues is putting you thread (software) onto the right logical core to maximize the usage of the physical cores. then yes project mercury fixes this.
That was what it sounded to me your where mentioning


To quote you
"Since I found Windows' scheduler was treating my R7 1700 like all of the 16 threads were identical"
OK I now see you use the word threads for logical cores.

But yes what you mentioned here by seeing all logical cores as equal, then yes project mercury fixes this by assigned threads (software) to the right logical cores ( what you call threads) so maximize the usage of the physical cores

or to avoid cross CCX comuniction.

/www.reddit.com/r/BattleRite/comments/97vv24/serious_performance_boost_on_ryzen_with_project/
23% fps speed up on a Ryzen 1700 CPU due to Project mercury utilize you CPU better than the Microsoft scheduler

Also in in cs:Go i found 9% improving in fps and 25% reduction in fps variance on a Ryzen 1700 system



Properly assign threads in SMT/CMT system and avoid CCX cross talk is a huge benefit for optimizing performance
 
Last edited:
like games that want all 16T active at one point in each frame and want 2-4T to not interfere with each other at other points.

Project mercuy also handles this if you enable Automatic mode
If you game is using less than 50% of cpu time it will be spread out to avoid going to the same physical core.
If you game can eat above 50% of the cpu time it will get access to all logical cores.
 
If Mercury does that, you ought to update its page 1 description. Apparently you aren't advertising half your features. ;)

Project mercuy also handles this if you enable Automatic mode
If you game is using less than 50% of cpu time it will be spread out to avoid going to the same physical core.
If you game can eat above 50% of the cpu time it will get access to all logical cores.

That doesn't actually solve the problem unless you can measure and tweak it at least every 5 milliseconds. 10 ms would be weak but probably still a bit useful, and at 20 ms it'll probably do more harm than good in the games I'm thinking of. I'll give it a shot later anyway.
 
Quick, someone sue AMD because it doesn't scale linearly up to 32 threads. :rolleyes:
 
If Mercury does that, you ought to update its page 1 description. Apparently you aren't advertising half your features. ;)

That doesn't actually solve the problem unless you can measure and tweak it at least every 5 milliseconds. 10 ms would be weak but probably still a bit useful, and at 20 ms it'll probably do more harm than good in the games I'm thinking of. I'll give it a shot later anyway.

Point taken :D

I'm not sure why you think you need to tweak it every 5 ms ( its possible to do that) most games/programs does not change Their thread behavior that often.
Currently I think the automode does it every 500ms but im honestly not sure on top of my head.

Doing it more often = loss of computer time for managing core affinity aka overhead cost.
One of the thing i price myself off on pm is that it uses way less resources then any of its know competition (process lasso,process tamer, wintopprio) because the priority adjustment does not rely on a timed loop check that eats up CPU time.
And the benchmarks tell the story fine of how much improvement there is from correctly handling the threads even though it happens only once.


again I'm not sue how increasing performance in the case you mentioned does not "solve the issue".
Did you actually try it or is this just made up claims because thats what you think things are ?


--- edit ---
I realize you might be talking solving as in 100% solving it
where im talking solving its as 90% solving it as in getting performance gains back.
 
Last edited:
For median games your way is good for 90% and I am talking about 98+%, yeah.

Games do change their threading behavior that often in practice, because some parts of a frame are wider than others. All the threads are always spawned (spawning and killing threads really is slow), but most aren't running at any given moment. Using Planetside 2 as an example just because I know my way around its tech well, average utilization on an 8C16T CPU doesn't go beyond ~6T, so PM will always have it in non-SMT mode, but the game still thinks that it's on a 16T CPU and will briefly try to have 16 threads active at least once per frame. When it does, some of those workers will needlessly have to wait and it'll be slower than not having the extra workers in the first place.

If Windows' scheduler is as broken for other people as it is for me, games that behave like Planetside will benefit from PM on many-core CPUs, but will benefit even more from disabling SMT entirely.

Worlds Adrift is a game for which PM may be the second-best solution (right behind MS getting their act together and fixing the damn scheduler). It spends most of its time at 2~3T utilization (with a particular thread clearly being the limiting factor), but has regular bouts of stutter which pass by noticeably faster on 8C16T than 8C8T. 999 ms isn't very responsive for that purpose, but it'll help.

My concern around 20 ms sampling was about oscillation. In that range, it could easily end up popping back and forth at an unfortunate rate and adding a lot of frametime variance.
 
PM in Planetside feels like having SMT disabled as far as averages, but there's an occasional frametime variance pattern I haven't felt from it before, and there was one big hitch that's also new.

PM in Worlds Adrift behaves about as I was expecting, but doesn't respond fast enough in practice to help out the long stuttery segments by very much, and again there are a couple of new stutters in there I haven't felt before.

I'd test with more, but those are actually the only two compute-heavy games I've got installed over on Windows right now (Fallout 2 won't illuminate much of anything).
 
Does it? All of the problems here are problems with Windows being clueless about the differences between logical cores, and none of it seems inherently tied to core count.
 
Back
Top