Techgage Tests 2990WX Performance Scaling With Coreprio

AlphaAtlas · Feb 1, 2019

Some 2990WX users have been complaining about performance regressions in Windows, and we noticed some strangeness in our own Threadripper benchmarks as well. Fortunately, Level1Techs seemingly nailed down the issue in a series of articles from a few weeks ago, and pointed to Coreprio as a good solution to Window's scheduling issues. Techgage just put that solution to the test in a wide range of benchmarks, and the results are interesting, to say the least. Adobe Premiere Pro renders, which didn't scale with the 2990WX's cores in our testing, saw a massive improvement with Coreprio, while other programs like Blender didn't seem to benefit at all. They also compared Windows to Linux performance in Geekbench, and for whatever reason, saw a night-and-day improvement when switching to the open source OS.

Does all of this mean that Linux is the best OS for a chip like the 2990WX? It's really hard to believe otherwise. To base that off of GeekBench alone would be nonsense, but we have other testing experience to back up those opinions. Blender almost always performs better in Linux than in Windows, so the fact that a many-core chip works better in the penguin OS isn't a huge surprise... Fortunately, using either DLM or Coreprio won't hurt your performance in other areas too much, but it's important to note that it can in fact negatively impact them. On the flipside, if you bought a 2990WX (or 2970WX) and are running against a regression, you shouldn't hesitate in giving the tool a test. Don't like the result, or don't need it active all of the time? All you need to do is simply stop the service from within the applet, and you'll be back to normal.

GHRTW · Feb 1, 2019

So, Microsoft is the one that holds up the progress in CPU tech and Linux helps the advance. Who would believe it?

lazz · Feb 1, 2019

GHRTW said:
So, Microsoft is the one that holds up the progress in CPU tech and Linux helps the advance. Who would believe it?

I'm shocked. Shocked, I tell you.

SvenBent · Feb 1, 2019

From what I see there is more performance regression from using coreprio than not, over that suit of benchmarks
and It has the same unstable performance profile liek it did in kyle's test

GoodBoy · Feb 1, 2019

So when is Microsoft releasing a patch for the issue?

katanaD · Feb 1, 2019

i saw from your earlier thread on testing they were using

Windows 10 Pro v1803

does using windows server make any difference?

ZyzzyxSilver · Feb 1, 2019

Since I found Windows' scheduler was treating my R7 1700 like all of the 16 threads were identical (making software threads share cores despite there being plenty of uncontended hardware threads to go around), nothing like this surprises me.

Hielo_loco · Feb 1, 2019

The question is if AMD is pressuring Microsoft to fix this issue. It still baffles me how they did not do anything about it prior to launch.

SvenBent · Feb 1, 2019

ZyzzyxSilver said:
Since I found Windows' scheduler was treating my R7 1700 like all of the 16 threads were identical (making software threads share cores despite there being plenty of uncontended hardware threads to go around), nothing like this surprises me.

Thats why you use project mercury

Mazzspeed · Feb 1, 2019

Hielo_loco said:
The question is if AMD is pressuring Microsoft to fix this issue. It still baffles me how they did not do anything about it prior to launch.

The most likely explanation to that is that the problem is so integral to Windows that fixing it would involve death from a thousand cuts. No surprise whatsoever that Linux doesn't suffer the issue considering that Linux is designed to run NUMA based supercomputer clusters.

It'll be fixed, when Microsoft adopt their own Linux distro and dump that horrible kernel, scheduler and file system they're using.

Yock · Feb 2, 2019

Wonder if this will be a problem for the next generation Ryzen CPUs.

Meeho · Feb 2, 2019

SvenBent said:
From what I see there is more performance regression from using coreprio than not, over that suit of benchmarks
and It has the same unstable performance profile liek it did in kyle's test

Have you contacted Kyle to suggest he tries a few benchmarks with PM?

SvenBent · Feb 2, 2019

Meeho said:
Have you contacted Kyle to suggest he tries a few benchmarks with PM?

not in regards to the AMD 2990wx ( Pm wont fix this yet.)
I did earlier but he was uninterested. even thoug pm has hashown huge performance increased on both AMD and intel system with SMT/CMT or CCX

i have small ideas based on vendels discoverings but I do do not have a 2990w to test on.
i did reach out to amd ealier for test samples for ryzen systems for the CCX non switch testing. but again nothing.

ZyzzyxSilver · Feb 2, 2019

SvenBent said:
Thats why you use project mercury

That doesn't look like it would do anything for my problem. I usually do have more HW threads than I know what to do with, and therefore priorities usually don't have much effect. Coreprio is closer, but there's still some stuff it can't handle very well, like games that want all 16T active at one point in each frame and want 2-4T to not interfere with each other at other points.

SvenBent · Feb 2, 2019

ZyzzyxSilver said:
That doesn't look like it would do anything for my problem. I usually do have more HW threads than I know what to do with, and therefore priorities usually don't have much effect. Coreprio is closer, but there's still some stuff it can't handle very well, like games that want all 16T active at one point in each frame and want 2-4T to not interfere with each other at other
points.

I honestly don't know what you mean by HW threads
Threads are software,

But if you have an abundance of logical cores with less physical backing them up and the issues is putting you thread (software) onto the right logical core to maximize the usage of the physical cores. then yes project mercury fixes this.
That was what it sounded to me your where mentioning

To quote you
"Since I found Windows' scheduler was treating my R7 1700 like all of the 16 threads were identical"
OK I now see you use the word threads for logical cores.

But yes what you mentioned here by seeing all logical cores as equal, then yes project mercury fixes this by assigned threads (software) to the right logical cores ( what you call threads) so maximize the usage of the physical cores

or to avoid cross CCX comuniction.

/www.reddit.com/r/BattleRite/comments/97vv24/serious_performance_boost_on_ryzen_with_project/
23% fps speed up on a Ryzen 1700 CPU due to Project mercury utilize you CPU better than the Microsoft scheduler

Also in in cs:Go i found 9% improving in fps and 25% reduction in fps variance on a Ryzen 1700 system

Properly assign threads in SMT/CMT system and avoid CCX cross talk is a huge benefit for optimizing performance

Vader1975 · Feb 2, 2019

Thanks for looking out for the community as this gets worked on.

SvenBent · Feb 3, 2019

ZyzzyxSilver said:
like games that want all 16T active at one point in each frame and want 2-4T to not interfere with each other at other points.

Project mercuy also handles this if you enable Automatic mode
If you game is using less than 50% of cpu time it will be spread out to avoid going to the same physical core.
If you game can eat above 50% of the cpu time it will get access to all logical cores.

ZyzzyxSilver · Feb 3, 2019

If Mercury does that, you ought to update its page 1 description. Apparently you aren't advertising half your features.

SvenBent said:
Project mercuy also handles this if you enable Automatic mode
If you game is using less than 50% of cpu time it will be spread out to avoid going to the same physical core.
If you game can eat above 50% of the cpu time it will get access to all logical cores.

That doesn't actually solve the problem unless you can measure and tweak it at least every 5 milliseconds. 10 ms would be weak but probably still a bit useful, and at 20 ms it'll probably do more harm than good in the games I'm thinking of. I'll give it a shot later anyway.

Tsumi · Feb 3, 2019

Quick, someone sue AMD because it doesn't scale linearly up to 32 threads.

SvenBent · Feb 3, 2019

ZyzzyxSilver said:
If Mercury does that, you ought to update its page 1 description. Apparently you aren't advertising half your features.

That doesn't actually solve the problem unless you can measure and tweak it at least every 5 milliseconds. 10 ms would be weak but probably still a bit useful, and at 20 ms it'll probably do more harm than good in the games I'm thinking of. I'll give it a shot later anyway.

Point taken

I'm not sure why you think you need to tweak it every 5 ms ( its possible to do that) most games/programs does not change Their thread behavior that often.
Currently I think the automode does it every 500ms but im honestly not sure on top of my head.

Doing it more often = loss of computer time for managing core affinity aka overhead cost.
One of the thing i price myself off on pm is that it uses way less resources then any of its know competition (process lasso,process tamer, wintopprio) because the priority adjustment does not rely on a timed loop check that eats up CPU time.
And the benchmarks tell the story fine of how much improvement there is from correctly handling the threads even though it happens only once.

again I'm not sue how increasing performance in the case you mentioned does not "solve the issue".
Did you actually try it or is this just made up claims because thats what you think things are ?

--- edit ---
I realize you might be talking solving as in 100% solving it
where im talking solving its as 90% solving it as in getting performance gains back.

SvenBent · Feb 3, 2019

I was incorrect it does it every 999ms in automode

ZyzzyxSilver · Feb 3, 2019

For median games your way is good for 90% and I am talking about 98+%, yeah.

Games do change their threading behavior that often in practice, because some parts of a frame are wider than others. All the threads are always spawned (spawning and killing threads really is slow), but most aren't running at any given moment. Using Planetside 2 as an example just because I know my way around its tech well, average utilization on an 8C16T CPU doesn't go beyond ~6T, so PM will always have it in non-SMT mode, but the game still thinks that it's on a 16T CPU and will briefly try to have 16 threads active at least once per frame. When it does, some of those workers will needlessly have to wait and it'll be slower than not having the extra workers in the first place.

If Windows' scheduler is as broken for other people as it is for me, games that behave like Planetside will benefit from PM on many-core CPUs, but will benefit even more from disabling SMT entirely.

Worlds Adrift is a game for which PM may be the second-best solution (right behind MS getting their act together and fixing the damn scheduler). It spends most of its time at 2~3T utilization (with a particular thread clearly being the limiting factor), but has regular bouts of stutter which pass by noticeably faster on 8C16T than 8C8T. 999 ms isn't very responsive for that purpose, but it'll help.

My concern around 20 ms sampling was about oscillation. In that range, it could easily end up popping back and forth at an unfortunate rate and adding a lot of frametime variance.

ZyzzyxSilver · Feb 3, 2019

PM in Planetside feels like having SMT disabled as far as averages, but there's an occasional frametime variance pattern I haven't felt from it before, and there was one big hitch that's also new.

PM in Worlds Adrift behaves about as I was expecting, but doesn't respond fast enough in practice to help out the long stuttery segments by very much, and again there are a couple of new stutters in there I haven't felt before.

I'd test with more, but those are actually the only two compute-heavy games I've got installed over on Windows right now (Fallout 2 won't illuminate much of anything).

Sarreq Teryx · Feb 4, 2019

this certainly presents one reason why limiting the number of cores by SKU is an idiotic problem.

ZyzzyxSilver · Feb 4, 2019

Does it? All of the problems here are problems with Windows being clueless about the differences between logical cores, and none of it seems inherently tied to core count.

Techgage Tests 2990WX Performance Scaling With Coreprio

[H]ard|Gawd

n00b

Limp Gawd

2[H]4U

2[H]4U

[H]ard|Gawd

n00b

Weaksauce

2[H]4U

2[H]4U

n00b

Supreme [H]ardness

2[H]4U

n00b

2[H]4U

Gawd

2[H]4U

n00b

[H]F Junkie

2[H]4U

2[H]4U

n00b

n00b

n00b

n00b