Techgage Tests 2990WX Performance Scaling With Coreprio

Discussion in 'HardForum Tech News' started by AlphaAtlas, Feb 1, 2019.

  1. AlphaAtlas

    AlphaAtlas [H]ard|Gawd Staff Member

    Messages:
    1,713
    Joined:
    Mar 3, 2018
    Some 2990WX users have been complaining about performance regressions in Windows, and we noticed some strangeness in our own Threadripper benchmarks as well. Fortunately, Level1Techs seemingly nailed down the issue in a series of articles from a few weeks ago, and pointed to Coreprio as a good solution to Window's scheduling issues. Techgage just put that solution to the test in a wide range of benchmarks, and the results are interesting, to say the least. Adobe Premiere Pro renders, which didn't scale with the 2990WX's cores in our testing, saw a massive improvement with Coreprio, while other programs like Blender didn't seem to benefit at all. They also compared Windows to Linux performance in Geekbench, and for whatever reason, saw a night-and-day improvement when switching to the open source OS.

    Does all of this mean that Linux is the best OS for a chip like the 2990WX? It's really hard to believe otherwise. To base that off of GeekBench alone would be nonsense, but we have other testing experience to back up those opinions. Blender almost always performs better in Linux than in Windows, so the fact that a many-core chip works better in the penguin OS isn't a huge surprise... Fortunately, using either DLM or Coreprio won't hurt your performance in other areas too much, but it's important to note that it can in fact negatively impact them. On the flipside, if you bought a 2990WX (or 2970WX) and are running against a regression, you shouldn't hesitate in giving the tool a test. Don't like the result, or don't need it active all of the time? All you need to do is simply stop the service from within the applet, and you'll be back to normal.
     
  2. GHRTW

    GHRTW n00b

    Messages:
    37
    Joined:
    Aug 29, 2018
    So, Microsoft is the one that holds up the progress in CPU tech and Linux helps the advance. Who would believe it?
     
    Red Falcon, odditory, Frobozz and 6 others like this.
  3. lazz

    lazz Limp Gawd

    Messages:
    323
    Joined:
    Apr 15, 2007
    I'm shocked. Shocked, I tell you. :meh:
     
    dvsman, travanx and dgz like this.
  4. SvenBent

    SvenBent 2[H]4U

    Messages:
    3,045
    Joined:
    Sep 13, 2008
    From what I see there is more performance regression from using coreprio than not, over that suit of benchmarks
    and It has the same unstable performance profile liek it did in kyle's test
     
    Sulphademus likes this.
  5. GoodBoy

    GoodBoy [H]ard|Gawd

    Messages:
    1,454
    Joined:
    Nov 29, 2004
    So when is Microsoft releasing a patch for the issue?
     
  6. katanaD

    katanaD [H]ard|Gawd

    Messages:
    1,987
    Joined:
    Nov 15, 2016
    i saw from your earlier thread on testing they were using

    Windows 10 Pro v1803

    does using windows server make any difference?
     
  7. ZyzzyxSilver

    ZyzzyxSilver n00b

    Messages:
    39
    Joined:
    Jul 26, 2018
    Since I found Windows' scheduler was treating my R7 1700 like all of the 16 threads were identical (making software threads share cores despite there being plenty of uncontended hardware threads to go around), nothing like this surprises me.
     
  8. Hielo_loco

    Hielo_loco [H]Lite

    Messages:
    103
    Joined:
    Jan 27, 2015
    The question is if AMD is pressuring Microsoft to fix this issue. It still baffles me how they did not do anything about it prior to launch.
     
  9. SvenBent

    SvenBent 2[H]4U

    Messages:
    3,045
    Joined:
    Sep 13, 2008
    Thats why you use project mercury :D
     
    Rockenrooster likes this.
  10. Mazzspeed

    Mazzspeed [H]ard|Gawd

    Messages:
    1,726
    Joined:
    Dec 27, 2017
    The most likely explanation to that is that the problem is so integral to Windows that fixing it would involve death from a thousand cuts. No surprise whatsoever that Linux doesn't suffer the issue considering that Linux is designed to run NUMA based supercomputer clusters.

    It'll be fixed, when Microsoft adopt their own Linux distro and dump that horrible kernel, scheduler and file system they're using. ;)
     
  11. Yock

    Yock n00b

    Messages:
    11
    Joined:
    Oct 1, 2008
    Wonder if this will be a problem for the next generation Ryzen CPUs.
     
  12. Meeho

    Meeho [H]ardness Supreme

    Messages:
    4,470
    Joined:
    Aug 16, 2010
    Have you contacted Kyle to suggest he tries a few benchmarks with PM?
     
  13. SvenBent

    SvenBent 2[H]4U

    Messages:
    3,045
    Joined:
    Sep 13, 2008
    not in regards to the AMD 2990wx ( Pm wont fix this yet.)
    I did earlier but he was uninterested. even thoug pm has hashown huge performance increased on both AMD and intel system with SMT/CMT or CCX

    i have small ideas based on vendels discoverings but I do do not have a 2990w to test on.
    i did reach out to amd ealier for test samples for ryzen systems for the CCX non switch testing. but again nothing.
     
    Meeho likes this.
  14. ZyzzyxSilver

    ZyzzyxSilver n00b

    Messages:
    39
    Joined:
    Jul 26, 2018
    That doesn't look like it would do anything for my problem. I usually do have more HW threads than I know what to do with, and therefore priorities usually don't have much effect. Coreprio is closer, but there's still some stuff it can't handle very well, like games that want all 16T active at one point in each frame and want 2-4T to not interfere with each other at other points.
     
  15. SvenBent

    SvenBent 2[H]4U

    Messages:
    3,045
    Joined:
    Sep 13, 2008
    I honestly don't know what you mean by HW threads
    Threads are software,

    But if you have an abundance of logical cores with less physical backing them up and the issues is putting you thread (software) onto the right logical core to maximize the usage of the physical cores. then yes project mercury fixes this.
    That was what it sounded to me your where mentioning


    To quote you
    "Since I found Windows' scheduler was treating my R7 1700 like all of the 16 threads were identical"
    OK I now see you use the word threads for logical cores.

    But yes what you mentioned here by seeing all logical cores as equal, then yes project mercury fixes this by assigned threads (software) to the right logical cores ( what you call threads) so maximize the usage of the physical cores

    or to avoid cross CCX comuniction.

    /www.reddit.com/r/BattleRite/comments/97vv24/serious_performance_boost_on_ryzen_with_project/
    23% fps speed up on a Ryzen 1700 CPU due to Project mercury utilize you CPU better than the Microsoft scheduler

    Also in in cs:Go i found 9% improving in fps and 25% reduction in fps variance on a Ryzen 1700 system



    Properly assign threads in SMT/CMT system and avoid CCX cross talk is a huge benefit for optimizing performance
     
    Last edited: Feb 2, 2019
  16. Vader1975

    Vader1975 Gawd

    Messages:
    822
    Joined:
    May 11, 2016
    Thanks for looking out for the community as this gets worked on.
     
  17. SvenBent

    SvenBent 2[H]4U

    Messages:
    3,045
    Joined:
    Sep 13, 2008
    Project mercuy also handles this if you enable Automatic mode
    If you game is using less than 50% of cpu time it will be spread out to avoid going to the same physical core.
    If you game can eat above 50% of the cpu time it will get access to all logical cores.
     
  18. ZyzzyxSilver

    ZyzzyxSilver n00b

    Messages:
    39
    Joined:
    Jul 26, 2018
    If Mercury does that, you ought to update its page 1 description. Apparently you aren't advertising half your features. ;)

    That doesn't actually solve the problem unless you can measure and tweak it at least every 5 milliseconds. 10 ms would be weak but probably still a bit useful, and at 20 ms it'll probably do more harm than good in the games I'm thinking of. I'll give it a shot later anyway.
     
  19. Tsumi

    Tsumi [H]ardForum Junkie

    Messages:
    13,022
    Joined:
    Mar 18, 2010
    Quick, someone sue AMD because it doesn't scale linearly up to 32 threads. :rolleyes:
     
  20. SvenBent

    SvenBent 2[H]4U

    Messages:
    3,045
    Joined:
    Sep 13, 2008
    Point taken :D

    I'm not sure why you think you need to tweak it every 5 ms ( its possible to do that) most games/programs does not change Their thread behavior that often.
    Currently I think the automode does it every 500ms but im honestly not sure on top of my head.

    Doing it more often = loss of computer time for managing core affinity aka overhead cost.
    One of the thing i price myself off on pm is that it uses way less resources then any of its know competition (process lasso,process tamer, wintopprio) because the priority adjustment does not rely on a timed loop check that eats up CPU time.
    And the benchmarks tell the story fine of how much improvement there is from correctly handling the threads even though it happens only once.


    again I'm not sue how increasing performance in the case you mentioned does not "solve the issue".
    Did you actually try it or is this just made up claims because thats what you think things are ?


    --- edit ---
    I realize you might be talking solving as in 100% solving it
    where im talking solving its as 90% solving it as in getting performance gains back.
     
    Last edited: Feb 3, 2019
  21. SvenBent

    SvenBent 2[H]4U

    Messages:
    3,045
    Joined:
    Sep 13, 2008
    I was incorrect it does it every 999ms in automode
     
  22. ZyzzyxSilver

    ZyzzyxSilver n00b

    Messages:
    39
    Joined:
    Jul 26, 2018
    For median games your way is good for 90% and I am talking about 98+%, yeah.

    Games do change their threading behavior that often in practice, because some parts of a frame are wider than others. All the threads are always spawned (spawning and killing threads really is slow), but most aren't running at any given moment. Using Planetside 2 as an example just because I know my way around its tech well, average utilization on an 8C16T CPU doesn't go beyond ~6T, so PM will always have it in non-SMT mode, but the game still thinks that it's on a 16T CPU and will briefly try to have 16 threads active at least once per frame. When it does, some of those workers will needlessly have to wait and it'll be slower than not having the extra workers in the first place.

    If Windows' scheduler is as broken for other people as it is for me, games that behave like Planetside will benefit from PM on many-core CPUs, but will benefit even more from disabling SMT entirely.

    Worlds Adrift is a game for which PM may be the second-best solution (right behind MS getting their act together and fixing the damn scheduler). It spends most of its time at 2~3T utilization (with a particular thread clearly being the limiting factor), but has regular bouts of stutter which pass by noticeably faster on 8C16T than 8C8T. 999 ms isn't very responsive for that purpose, but it'll help.

    My concern around 20 ms sampling was about oscillation. In that range, it could easily end up popping back and forth at an unfortunate rate and adding a lot of frametime variance.
     
  23. ZyzzyxSilver

    ZyzzyxSilver n00b

    Messages:
    39
    Joined:
    Jul 26, 2018
    PM in Planetside feels like having SMT disabled as far as averages, but there's an occasional frametime variance pattern I haven't felt from it before, and there was one big hitch that's also new.

    PM in Worlds Adrift behaves about as I was expecting, but doesn't respond fast enough in practice to help out the long stuttery segments by very much, and again there are a couple of new stutters in there I haven't felt before.

    I'd test with more, but those are actually the only two compute-heavy games I've got installed over on Windows right now (Fallout 2 won't illuminate much of anything).
     
  24. Sarreq Teryx

    Sarreq Teryx n00b

    Messages:
    4
    Joined:
    Apr 5, 2014
    this certainly presents one reason why limiting the number of cores by SKU is an idiotic problem.
     
  25. ZyzzyxSilver

    ZyzzyxSilver n00b

    Messages:
    39
    Joined:
    Jul 26, 2018
    Does it? All of the problems here are problems with Windows being clueless about the differences between logical cores, and none of it seems inherently tied to core count.