Avoid Threadripper Reboots for Game/Creator mode

tangoseal

[H]F Junkie
Joined
Dec 18, 2010
Messages
9,743
Disclaimer: I am not a professional Youtube Uploader. Flame away or take something positive from this.

I made this video to share something I found interesting. I use a program called Process Lasso and have been for quite some time. It doesn't cost a whole lot.

I briefly show how it helps to prevent crashing in games that freak out over Threadripper's high number of threads by the way of this wonderful software program.

This video was recorded @ 3440x1440

 
Nice to know. I have been using Process Lasso for about 5 years now and it's also really great when you want to keep a fully loaded system running smoothly also. :) Can't build a TR rig till I can buy a made for TR4 AIO. :( Hopefully soon.
 
I dont see the point in how AMD implemented 'game mode'. But this could be interesting...
 
I don’t know how I feel about this. It effectively makes it an 1800x with a lot of pcie lanes.
That said it brings latencies involved with multidie down for gaming usage, and there is not a game made that uses not than 8 cores, so what is the issue?
 
Isn't the 1900x supposed to be a 1800x with 64 pcie lanes? Also, does anyone know if it's built with 3 dummy dies or disabling ccx modules on the two dies still? That latency is up there.
Dunno exactly how that will work.
 
Hey tangoseal,
Does your method solve the near memory / far memory latency difference?
I just finished reading Anandtech's updated article about Creator vs. Game mode, and I was wondering if near vs far memory access latency is addressed at all by the way Process Lasso is lockings down available processors to programs.

I'm also thinking maybe AMD should enter into an agreement with Process Lasso to enable user configurable program profiles (similar to the early days of SLI / Crossfire) within the AMD provided software utilities.
 
Hey tangoseal,
Does your method solve the near memory / far memory latency difference?
I just finished reading Anandtech's updated article about Creator vs. Game mode, and I was wondering if near vs far memory access latency is addressed at all by the way Process Lasso is lockings down available processors to programs.

I'm also thinking maybe AMD should enter into an agreement with Process Lasso to enable user configurable program profiles (similar to the early days of SLI / Crossfire) within the AMD provided software utilities.

As a user of the software I can't answer that without quite a bit of research. It would be better to contact Bitsum Technologies and ask them since they are the ones that would be able to implement such code if it was missing. I am not sure, other than synthetics, if the latency issues really affect real life day to day operations to any discernible levels. However, I am not up to speed as much on it as many are like yourself.
 
My understanding of Game Mode is that it is more complicated than turning the 1950x into an 1800x. Because of the number of PCI-E lanes coupled with quad channel memory, the chip actually turns off half the cores in each die to accomplish the 8C/16T setup. This means that latency is not improved at all because the mesh is still needed between the two dies. That is my understanding of the situation, but I could most certainly be wrong...
 
Disclaimer: I am not a professional Youtube Uploader. Flame away or take something positive from this.

I made this video to share something I found interesting. I use a program called Process Lasso and have been for quite some time. It doesn't cost a whole lot.

I briefly show how it helps to prevent crashing in games that freak out over Threadripper's high number of threads by the way of this wonderful software program.

This video was recorded @ 3440x1440


What are all of these, "lot of game titles?" Could you be more specific? I have yet to see this documented.
 
I am only using a 1700 but I am giving it a try, has a slick interface, thanks for sharing Tangoseal.
 
Great program, thank you! I wonder if this will work on my older X99 system too. That has Intel Turbo Boost Max running on a 6 Core processor.
 
Nice to know. I have been using Process Lasso for about 5 years now and it's also really great when you want to keep a fully loaded system running smoothly also. :) Can't build a TR rig till I can buy a made for TR4 AIO. :( Hopefully soon.

there are 2 things i don't like about process lasso

1: it uses cpu ressources (its very bloated imho)
2: it fails under heavy multi processiong because it can't find a single process that uses to much CPU powers. aka runnig 16 processe on a quad core. every process only gets 25% cpu usage each so proceelasso never find ones that is "bad" due to its bad metrics.

my program does not use CPU ressources and works even under haevy multi processing load because it does not need to find a bad program it just obsevers what the user is focussed on aka the users experience.


again shameless plug but it all true.
 
  • Like
Reactions: Meeho
like this
your comments are so biased its not funny.

To address some of them, bloated software, you can set PL to only run the governor back-ground process, not the GUI when operating.
Its totally adjustable as to refresh rates on gov and GUI and all CPU core adjustments (per process and total cpu% are fully adjustable but these are pretty good defaults for your HW when you install it).

PL has been out for yrs, early 2000's, so there track record on it.
I have used it and beta tested it for 10+yr , never seen any high resources , even when using all features at once.
It is extremely flexible in use and settings, don't have threadripper to comment on that part of discussion.

Anyone who want to know more I suggest reading up on Probalance and how it works, this should answer most questions.
https://bitsum.com/how-probalance-works/
 
Last edited:
What I don't understand is if these programs work so good and bring benefits, does that mean that OS developers are dumb or lazy? Or are these utilities not that beneficial?
 
your comments are so biased its not funny.

To address some of them, bloated software, you can set PL to only run the governor back-ground process, not the GUI when operating.
Its totally adjustable as to refresh rates on gov and GUI and all CPU core adjustments (per process and total cpu% are fully adjustable but these are pretty good defaults for your HW when you install it).

PL has been out for yrs, early 2000's, so there track record on it.
I have used it and beta tested it for 10+yr , never seen any high resources , even when using all features at once.
It is extremely flexible in use and settings, don't have threadripper to comment on that part of discussion.

Anyone who want to know more I suggest reading up on Probalance and how it works, this should answer most questions.
https://bitsum.com/how-probalance-works/

Who's comments? Or did they delete them? That's why I like quoting.
 
your comments are so biased its not funny.

To address some of them, bloated software, you can set PL to only run the governor back-ground process, not the GUI when operating.
Its totally adjustable as to refresh rates on gov and GUI and all CPU core adjustments (per process and total cpu% are fully adjustable but these are pretty good defaults for your HW when you install it).

PL has been out for yrs, early 2000's, so there track record on it.
I have used it and beta tested it for 10+yr , never seen any high resources , even when using all features at once.
It is extremely flexible in use and settings, don't have threadripper to comment on that part of discussion.

Anyone who want to know more I suggest reading up on Probalance and how it works, this should answer most questions.
https://bitsum.com/how-probalance-works/


Funny you bring up bias but has absolute NO empirical data to back any thing up...
Since you are so NOT biassed i assume you have some empirical data to share and tried to compare the software? NO?... Well at least you tested my claims before trying to accuse of things? NO?...OK
Really?.... and you are saying I AM biased.

19789999.jpg




oh and btw you link about probalance is a long wordy explanation of exactly what i said: "it finds a process with high CPU usage and lowers its CPU priority". Which again fails if you have enough process since they are never one getting a high enough CPU usage. Something you would easily know if you tested things before making claims about it and falling for marketing BS
Bonus information: Pocess tamer usage total CPU usage which fails on multicore system. Process lasso adjuster for numbers of cores so it "works" for multicore.

Probalanced works by having a constants timed loop. which are constantly using a bit of CPU power. which is NOT a god thing for a program meant to optimize CPU performance.
Min went away with such a bad design in version 1.2 and instead uses the internal Windows message system (Faster response too).

Basically probalance goes out and checks his mail box for mail every 5 minutes.
My software made an agreement with the mail man to knock on my door when there is mail hence i don't have to waist my time constantly checking for mail.


and in numbers her are the resources usage of some popular CPU priority optimizer

Code:
       Work set    Commit size    CPU usage (Monitoring)
Mercury 1.2.0.1    588-616k    3336-3336k    0%
TopWinPrio 1    8276-11176k    29328-32228k    0.018%
TipWinPrio B3    6880-9792k    47776-50268k    0.024%
ProcessTamer     2400-2600k    3136-3416k    0.05%
ProcessLAsso    2108k        11748k        0.156%
PS process lasso was measured just with Process guvnor not the GUI running
Also CPU meauserements was measured down to the 15/16th decimal so please take goo attention to the CPU usage here...



TLDR:
To sum it up you accuse me of bias... i have empirial data to prove my statements
You link til probalbance to "Better understand it" is bassicaly Marketings BS of what i said
You didn't test any claims stated before you passed judgemetnts. yet you claim other for BIAS ?


maxresdefault.jpg



Anyway at least i had fun :D
 
  • Like
Reactions: Meeho
like this
What I don't understand is if these programs work so good and bring benefits, does that mean that OS developers are dumb or lazy? Or are these utilities not that beneficial?
None or the last one depending on where you are coming from. ( and rememberi make one of those)

No program is going to give you some magical super boost but here is what it boils down to... priorities
MS cant know what you are using their software for (Well think pre telemetry) so they have to balance things out for every possibility here and stay neutral. aka everything get an even share of the CPU
however even that is no quite true because MS CPU scheduler does actually boost foreground applications by giving it a bigger slice of CPU timer per time it gets a CPU time. however priory it still the same.
So think everyone gets a box each but the foreground application just get a bigger box.

if you are just running your game and nothing else. nothing can really change since its the only game. you are getting all the boxes.
but if you multi task a lot with heavy programs (like i do with compression optimizaer that runs for weeks at a time) when you start you game. you dont get alle the boxes.
running process lass it monitor if any other process get more than a set amount of boxes and reduces its priority. that way your game get a bigger share of boxes. so its run faster
my program just looks at what you are currently doing and increased it prioty. the end results is the same your game get more boxed because its get priory above your background task

again if you have no boxes you can take instead of others there is no point in this really. thats the honest truth. that why instead of promising huge FPS boost. i only promise "Recovedered FPS" aka a recovery of the FPS you lost due to heavy background CPU usage ( its around 60% typically).
I try to stay away from marketing BS


There a small exception to this.
which is core parking. Coreparking have some really bad drawback in regards to threads moving around cores fast. and even worse if you mess with affinity.
core parking should be disable for optimal performance ( and yes i also have the empirical data for this)
also there are some things you can tune in the CPU scheduler ( i don't recommend it) to reduce CPU waisted cycles on context swithcing ( aka when a core shift from one thread to another).
you can improve you CPU perfomrance very miniscule at the cost of some miniscule inputlag. i dont recommend it and its a very small gain anyway.


TLDR:
basically it boils down to choices and preferences. sacrificing something to gain something more important and MS cant make that choice ahead for you.
and if you are not a haevy multitasker... meh no real benefits (Except for core parking)
 
  • Like
Reactions: Meeho
like this
Nice video tangoseal . But you don't have to install software you can do it manually as well:

You can create a shortcut to assign affinity.

X:\Windows\System32\cmd.exe /C start /affinity Y game.exe

Y is hexadecimal and is a bit mask:

0x1 - 0001 - Core0
0x2 - 0010 - Core1
0x3 - 0011 - Core1 & Core0
0x4 - 0100 - Core2
0x5 - 0101 - Core2 & Core0
0x6 - 0110 - Core2 & Core1
0x7 - 0111 - Core2 & Core1 & Core0
0x8 - 1000 - Core3
0x9 - 1001 - Core3 & Core0
0xA - 1010 - Core3 & Core1
0xB - 1011 - Core3 & Core1 & Core0
0xC - 1100 - Core3 & Core2
0xD - 1101 - Core3 & Core2 & Core0
0xE - 1110 - Core3 & Core2 & Core1
0xF - 1111 - Core3 & Core2 & Core1 & Core0

got it from https://superuser.com/questions/908848/how-do-i-permanently-set-the-affinity-of-a-process
 
There a small exception to this.
which is core parking. Coreparking have some really bad drawback in regards to threads moving around cores fast. and even worse if you mess with affinity.
core parking should be disable for optimal performance ( and yes i also have the empirical data for this)
Have you tested C-states impact?
 
Have you tested C-states impact?

Yup well ive tested high performance vs balanced. over 11 diffrenct PC. AMD and intel Core2 Due i3 i5 i7'sn
laptops and desktops. and found nothing interesting here


Just to give a little snippet of the testing ( i do a lot of testing cause its fun)
Code:
Windows 7 Professional 64-bit SP1
Intel Core 2 Quad Q6600
8.00GB DDR2
1024MB NVIDIA GeForce GT 240 (NVIDIA)
Hitachi HDS721075CLA332 SATA2/300 32MB 72000RPM 3.5"

Geekbench 3.4.1 Tryout for Windows x86 (32-bit)
Single-Core    Multi-Core

Balanced
1289 pts    4482 pts
1294 pts    4518 pts
1290 pts    4479 pts

Performance
1286 pts     4498 pts
1296 pts     4499 pts
1284 pts     4450 pts


--- NovaBench
Balanced
709
705
704

Performance
709
706
706

--- 7-zip Built-in Benchmark Standard 32mb
Compression    Decompression

Balanced
7074 KB/s    106286 KB/S
7146 KB/s    105484 KB/S
7145 KB/S    105230 KB/S

Performance
7015 KB/s    106286 KB/s
7134 KB/s    105756 KB/s
7158 KB/s    106286 KB/s



--- 7-zip Built-in Benchmark Disk trashing 384MB
Compression    Decompression

Balanced
1949 KB/s    71217 KB/s
1234 KB/s    72914 KB/s
1311 KB/s    89239 KB/s
1586 KB/s    75196 KB/s
1405 KB/s    67906 KB/s

Performance
1430 KB/s    76861 KB/s
1401 KB/s    74974 KB/s
1320 KB/s    78904 KB/s
1907 KB/s    74974 KB/s
1716 KB/s    78718 KB/s


--- WinRar
Balanced
2625
2607
2618

Performance
2625
2645
2639


--- Wprime
Balanced
584.393
582.941
582.957

Performance
583.226
582.178
582.822


--- Hyper Pi
Balanced
25.522
25.319
25.553

Performance
24.992
25.334
25.209


--- FFT-z
CPU    Thread

Balanced
324    81.0
325    81.3
326    81.5

Performance
324    81.0
326    81.5
324    81.0

--- DnetC
OGR-NR    RC5-72

Balanced
50.246    9.522
50.123    9.460
50.225    9.506

Performance
50.603    9.477
50.477    9.487
52.180    9.503


--- 3Dmark2001
Balanced
27673
27618
27592

Performance
27567
27467
27417


--- 3Dmark03
Mark    CPU

Balanced
23594    1645
23666    1674
23653    1701

Performance
23666    1646
23561    1684
23633    1647


--- 3Dmark05
Mark    CPU

Balanced
14284    15096
14287    15140
14262    15081

Performance
14273    14954
14367    15642
14312    15336



--- 3Dmark06
Mark    SM2.0    SM3.0    CPU

Balanced
8879    3627    3492    3505
8858    3621    3496    3461
8878    3640    3495    3462

Performance
8857    3630    3495    3435
8843    3610    3496    3449
8879    3619    3497    3514


--- PCmark04
PCmark    CPU    Mem    GFX    HDD

Balanced
N/A    N/A    8165    13898    6421
N/A    N/A    8208    13969    6418
N/A    N/A    8173    14150    6402

Performance
N/A    N/A    8334    12516    6586
N/A    N/A    8201    13941    7054
N/A    N/A    8177    13641    7050


--- PCmark05
PCmark    CPU    Mem    GFX    HDD

Balanced
8391    7618    3754    9592    7106
8322    7606    3770    9444    7065
8369    7587    3751    9499    6940

Performance
8848    7674    3752    9497    7588
8880    7648    3736    9488    7545
8935    7629    3725    9504    7655


---Passmark Performance Test 8.0
Mark    CPU    2D    3D    Mem    Disk

Balanced  
1375.4    3393    360.1    691    850    772  
1360.8    3366    363.0    693    829    749
1365.3    3366    363.7    693    829    759

Performance
1366.2    3384    360.6    693    852    746
1370.8    3398    363.1    693    840    759
1357.05    3397    362.8    693    839    728


--- Sisoft Sandra
Overall     Finacial    Scientific

Balanced
2.26 kPT    4.74 kOPT/s    5.23 GFLOPS
2.25 kPT    4.74 kOPT/s    5.21 GFLOPS
2.27 kPT    4.74 kOPT/s    5.24 GFLOPS


Performance
2.20 kPT    4.76 kOPT/s    5.21 GFLOPS  
2.17 kPT    4.76 kOPT/s    5.15 GFLOPS
2.16 kPT    4.65 kOPT/s    5.21 GFLOPS

--- POV-Ray
Balanced
1626.20 PPS
1631.03 PPS
1640.01 PPS

Performance
1640.15 PPS
1645,10 PPS
1643.15 PPS

--- CineBecnh R15
OpenGL    CPU    CPU single

Balanced
31.58    240 CB    63 CB
31.91    240 CB    63 CB
31.68    251 CB    63 CB

Performance
31.91    240 CB    63 CB
31.59    242 CB    63 CB
31.66    241 CB    63 CB


--- X2
Balanced
146.758
151.875
151.120

Performance
146.403
148.693
150.552


--- UT2004
HQ    Performance

Balanced
72.038    86.797
70.763    86.863
70.026    86.495

Performance
70.593    86.669
71.036    86.377
71.547    86.121


--- Quake 3
Demo1    Demo2

Balanced
536.7    534.2
533.1    528.7
533.1    521.2

Performance
536.9    535.6
544.9    523.0
539.9    531.5


--- StreetFighter 4
Balanced
59.66
59.70
59.70

Performance
59.66
59.66
59.66


--- CatZilla
Balanced
1587
1607
1602

Performance
1583
1604
1582


--- Unigine Sanctuary
Score    MinFPS

Balanced
1942    25.8
1925    25.8
1925    25.8

Performance
1919    25.8
1925    25.8
1926    25.8

--- unigine Tropics
Score MinFPS

Balanced
724    19.7
725    19.5
724    19.7

Performance
737    19.8
737    19.8
737    20.0


--- Unigine Heaven
Score MinFPS

Balanced
462    12.5
463    13.2
463    13.2

Performance
459    6.6
463    13.2
463    13.3


--- Unigine Valley
Balanced
596    9.2
597    9.4
595    9.4

Performance
595    9.4
595    9.2
592    8.4


--- Devil may Cry 4
Balanced
103.3925
105.9725
104.9425

Performane
103.2125
106.0325
103.9425


--- FFXIV HeavesWard
Balanced
3732
3735
3740

Performance
3736
3738
3735


--- Call of Jaurez
AvgFPS    MinFPS

Balanced
19.8    11.2
20.6    11.3
20.7    11.3

PERformance
20.0    10.9
22.0    11.3
21.0    11.3


--- Crysis
Balanced
45.12
45.145
45.14

Performance
45.12
45.085
45.075



--- Stalker
AvgFPS    MinFPS

Balanced
43.4    26.775
43.4    30.1
43.425    27.625

Performance
43.425    27.85
43.475    28.675
43.45    27.5

Now i havent compiled all the results yet. but during testing nothing stood outs as a reason to not have C-states running performance wise.

I've tested all of this to see if i wanted to make an automatic "go to performance mode when in games" features in project mercury. just like Process lasso has. but found it would be a wasted and just bloat my software. I only integrete things that has shown its merrits
 
Last edited:
  • Like
Reactions: Meeho
like this
PCMark seems to like Performance. I would guess there would be more benefits with varying loads and multiple start-stops, than with constant demanding loads, so a hard thing to benchmark.
 
PCMark seems to like Performance. I would guess there would be more benefits with varying loads and multiple start-stops, than with constant demanding loads, so a hard thing to benchmark.

Yeah that the female dog of testing this. but pcmark did have some very random behaiver ( 4th had a lot of those and 5 only with hdd) but it was not constants across PC's.

also the not shown HDtune. crystal disk mark Atto or ASbench could replicate any of the HDD findings on the same PC :(

However HD tune did on some machines produce higher numbers and it could also replciated by using balacned and running a cpu heavy process in the background ( keeping C-states loaded to highest)

Anyway after I'm done doing all my ram testing i really should compile the Cstates tsting results into nice graphs etc....... maybe kyle could use it for some article :D




Also if you compare the "gains" from c-states vs the clear gains form disabling core parking

Code:
8thread    HTon    parking    (Default)
797
797
796

8thread    HTon    no parking
799
798
799



4thread    HTon    parking    (Default
473
477
474

4thread    HTon    No parking
541
542
536

4thread    HTon    No parking    No HT conflicts
619
617
616



4thread    HTon    Parking     No HT conflicts
316
315
312


Going from ~475 to ~540 from disabling core parking is a nice boost
even further going from ~540 to ~618 from using "no ht conflicts" in my tool is also nice

Almost a 50% boost from disabling core parking and adjusting affinity to avoid SMT conflicts..,
This is an extreme case though becauese its solely CPU depending and very nicely threaded software
 
Last edited:
SvenBent My comments I think were misunderstood, I have not looked or compared your app to PL. My comment were towards huge memory/resource footprint.

I was trying to explain w/o getting OT into PL is very flexible in its settings, you don't have to run all its features (there not even turned on by default).
You don't need logging, or even GUI running to use all features, and if you wanted pure priority,affinity, I/O permanent you could also disable probalance so it wouldn't be doing much.

On your tests I see very little cpu% on gov and thats with defaults on, something like CPU%=0.02-03 range, memory 6mb . That is with window resource monitor and ProcessExplorer
that's on a i5 3570k
 
Last edited:
SvenBent My comments I think were misunderstood, I have not looked or compared your app to PL. My comment were towards huge memory/resource footprint.

I was trying to explain w/o getting OT into PL is very flexible in its settings, you don't have to run all its features (there not even turned on by default).
You don't need logging, or even GUI running to use all features, and if you wanted pure priority,affinity, I/O permanent you could also disable probalance so it wouldn't be doing much.

On your tests I see very little cpu% on gov and thats with defaults on, something like CPU%=0.02-03 range, memory 6mb . That is with window resource monitor and ProcessExplorer
that's on a i5 3570k


QUOOOOOOTES :D

Ok i thougt it was a respone to mine dissing of process lasso if possible please use quotes in the future it will help me not misunderstand what goes on. and also easier to see if you respond.
anyway my statemens still stand on regards ot process lasso. its very bloated since the seam features can be done with a lot less.
less memory usage and especilly les CPU usage. those 0.02-0.03 range you are talking about is still infinitely bigger than 0% and the program still fails to do its task under heavy process load ( unless you enable certain other features)

Pretty much there is a solution that does a better job. take less ressources and its 99.99%% free (donation nag)

Don't get my wrong though process lasso is very comprehensive.and it contains a lot of bells and whistles (you might even find some nice features in there). I just preffer programs being lean.
 
Sven, we all appreciate the work you have done, from the beginning, on thread affinity and cache swap issue. I would agree about your low overhead model. Any way you think you could expand your work to monetize the platform? Shareware options are open, but you could sell it if it's patented.
 
Anandtech made a mistake about Game Mode in their initial TR review, but it lead to an interesting finding that disabling SMT is more beneficial than Threadripper Game Mode:
http://www.anandtech.com/show/11726...me-mode-halving-cores-for-more-performance/16

Triple shameless plug.
can be done eaisly with PRoject MErcury as well (disable SMT conflicts features)

OK I better stop before kyle bans me


Sven, we all appreciate the work you have done, from the beginning, on thread affinity and cache swap issue. I would agree about your low overhead model. Any way you think you could expand your work to monetize the platform? Shareware options are open, but you could sell it if it's patented.
I don't think there is anything to patent really it just about knowing s tiny bit about windows inner workings aka the internal message system. tons of program use a similar method for other stuff. it just happens that a timed loop is easier to make, which is also why i started with that.
so its not because i in anyway is a better programmer than the people at Bitsum ( im not even close) just that they might have focused a bit more on bell and whistles rather then optimizing methods to do stuff.
and their method by finding a bad process and lower it is in some way fundamental reversed of my method of focusing on the user experience in increase the priority of that.
 
tangoseal doest it matter wich cores you asign to processes?I am refering to logical cores as opossed to virtual threads?Lets say want to asign 8 cores for 1 process and another 8 to another process,8 logical cores vs 8 threads.
 
tangoseal doest it matter wich cores you asign to processes?I am refering to logical cores as opossed to virtual threads?Lets say want to asign 8 cores for 1 process and another 8 to another process,8 logical cores vs 8 threads.

Cores are just faster threads and software doesnt care as long as it has something to crunch its numbers.
 
Back
Top