AMD Ryzen 7 1700: 4+0 cores destroy 8 cores 16 Threads in Doom/Deus EX

AMD needs a core toggle switch in Ryzen Master?


This clearly means this CPU has potential but Windows is clearly doing something wrong with the CPU. Microsoft stated they were patching Windows 10 even though AMD says there's no issues. It seems that AMD marketing are being armchair engineers, IMO. Lack of communication between departments.
 
Last edited:
AMD needs a core toggle switch in Ryzen Master?


This clearly means this CPU has potential but Windows is clearly doing something wrong with the CPU. Microsoft stated they were patching Windows 10 even though AMD says there's no issues. It seems that AMD marketing are being armchair engineers, IMO. Lack of communication between departments.

Or maybe some sort of game profile so it's automatically detected?
 
AMD needs a core toggle switch in Ryzen Master?


This clearly means this CPU has potential but Windows is clearly doing something wrong with the CPU. Microsoft stated they were patching Windows 10 even though AMD says there's no issues. It seems that AMD marketing are being armchair engineers, IMO. Lack of communication between departments.

No windows update will fix this issue. Now this is the first video I have seen to give a huge boost in performance by disabling cores, I find it wierd, cause some of these games were tested by Hardware.Fr and they didn't see that kind of performance improvement so will need to see more on it to figure out if they are real or not.

Or maybe some sort of game profile so it's automatically detected?

Parking cores is done through bios so I don't know if they will have that kind of control already in windows that is something they will need to update for. They can possible do it by setting up affinity of course......

But now effectively you have a 4 core CPU, so really will be better for the future? Yeah all the arguments for Ryzen's as a good alternative falls apart.
 
Not all games are affected by this and to patch Windows 10 it will take time. Don't expect happening any time soon, probably not until the end of this year when Redstone 4 is out.
 
No windows update will fix this issue. Now this is the first video I have seen to give a huge boost in performance by disabling cores, I find it wierd, cause some of these games were tested by Hardware.Fr and they didn't see that kind of performance improvement so will need to see more on it to figure out if they are real or not

Check this video from @ pcper . What they describe is certainly related with this problem
https://hardforum.com/threads/amd-ryzen-and-the-windows-10-scheduler-no-silver-bullet.1927134/
 
that BF1 demo is weird.

look at the core usage.

all odd numbered ones except for 1 are getting hammered but the even numbered except 2 are just sitting there.

why isn't 1 being utilized and 2 is?
 
No windows update will fix this issue. Now this is the first video I have seen to give a huge boost in performance by disabling cores, I find it wierd, cause some of these games were tested by Hardware.Fr and they didn't see that kind of performance improvement so will need to see more on it to figure out if they are real or not.


I have a 1700 and 1700x at home. I will personally do this testing myself. 1080ti is coming tomorrow but I have a business trip on Thursday. Its either going to be quick and dirty tomorrow night or over the weekend.
 
Update: According to AMD Windows 10 scheduler is not a problem but simply games are more optimized for Intel CPUs.
 
I have a 1700 and 1700x at home. I will personally do this testing myself. 1080ti is coming tomorrow but I have a business trip on Thursday. Its either going to be quick and dirty tomorrow night or over the weekend.


Cool thx up front!
 
I wonder if maybe because of the disabled cores the temps and boost are allowing higher clocks as well which would make these numbers a bit inflated.

Hoping more tests with controlled clocks come out but I don't believe the numbers in that bench. Let's hope others can confirm these numbers.
 
It's really interesting how some games (Deus Ex, Doom) work better with 4 cores (don't have to move data across to the other CCX) but others (BF1, CS:GO, ROTR) seem to be better if all cores are enabled. The games that run faster with all cores enabled are probably just optimized better to require less sharing of data between threads. This would allow each thread to run more independently no matter which CCX it resides on and not have to share data across the CCX as often.
 
that BF1 demo is weird.

look at the core usage.

all odd numbered ones except for 1 are getting hammered but the even numbered except 2 are just sitting there.

why isn't 1 being utilized and 2 is?
Well you are seeing all logical cores in the video. BF1 is probably optimized to use basically the same number of threads as there are physical cores in the system. This would allow those threads to run at their maximum performance without having to share another SMT thread on the same core.
 
Well you are seeing all logical cores in the video. BF1 is probably optimized to use basically the same number of threads as there are physical cores in the system. This would allow those threads to run at their maximum performance without having to share another SMT thread on the same core.

so is 2 a logical core or are all odd number cores logical?
 
so is 2 a logical core or are all odd number cores logical?
I believe with Ryzen the logical cores are odd/even on the same physical core. For example, on physical core 0 you have logical cores 0 and 1. On physical core 1 you have logical cores 2 and 3, etc.
 
I believe with Ryzen the logical cores are odd/even on the same physical core. For example, on physical core 0 you have logical cores 0 and 1. On physical core 1 you have logical cores 2 and 3, etc.

that still doesn't explain what is going on.

2,3,5,7,9,11,13,15 are working

1,4,6,8,10,12,14,16 are sitting there.
 
The 1st part of the video, ok, AVG and MAX about ~10% faster. The minimum though? Teens vs 40's? That's eye opening.
 
Well if the logical cores are enumerated 1 through 16 that would still be a single thread per physical core. I guess the tool they are using to show CPU loading enumerates them 1 through 16. I think normally in the architecture or OS they would be enumerated 0 through 15.
 
I'm so confused now... (reply to cores, physival vs smt)
Has anyone officially came out and detailed how Ryzen is configured in this department? Because I only speculated in one of the threads that maybe Ryzen is setup in an Every-Other setup, but Intel's is Real First - SMT Last. Meaning 1-4 = Physical, 5-8 = Logical. If either AMD or Windows are able to differentiate this on the fly then I have to commend whomever sorted this out.

However, I personally would really like to know what the official word is on this. Not so I can speculate on crap, just because I really need to know for my own sake and peace of mind...


AMD needs a core toggle switch in Ryzen Master?.
From screenshots I had seen (leaked, a few days before launch), I could've sworn you could enable/disable cores through the program. I had though to myself (see: hoped) that you could create profiles for games and dictate which cores would be available for it.

I have Ryzen Master downloaded, but without a motherboard, no way to utilize it yet :cry:
 
AMD needs a core toggle switch in Ryzen Master?


This clearly means this CPU has potential but Windows is clearly doing something wrong with the CPU. Microsoft stated they were patching Windows 10 even though AMD says there's no issues. It seems that AMD marketing are being armchair engineers, IMO. Lack of communication between departments.

I already uploaded a tool to do this for games.
https://hardforum.com/threads/amd-ryzen-game-performance-fix.1926435/#post-1042858032


and this is supposed to be part of my project mercury program


https://s9.postimg.org/ptjui9wlr/Image1.png

Image1.png
 
Last edited:
I'm so confused now... (reply to cores, physival vs smt)
Has anyone officially came out and detailed how Ryzen is configured in this department? Because I only speculated in one of the threads that maybe Ryzen is setup in an Every-Other setup, but Intel's is Real First - SMT Last. Meaning 1-4 = Physical, 5-8 = Logical. If either AMD or Windows are able to differentiate this on the fly then I have to commend whomever sorted this out.

You are confused about how physical and logical cores are related in SMT/HT. All of the cores displayed in Windows are logical cores. Each physical core is represented by two logical cores in the OS. These logical cores are equivalent and have full access to the resources of the core. There is no "real" core and weak SMT core distinction. If threads are running on both logical cores, they must share physical resources (with the hope that more total resources are utilized). So if you want to pin a thread to a physical core, it doesn't matter which of the corresponding logical cores you use.
 
AMD needs a core toggle switch in Ryzen Master?


This clearly means this CPU has potential but Windows is clearly doing something wrong with the CPU. Microsoft stated they were patching Windows 10 even though AMD says there's no issues. It seems that AMD marketing are being armchair engineers, IMO. Lack of communication between departments.

This raises general questions such as:
How would the windows scheduler know the structure of the CCX as it is not really comparable to NUMA nor SMP?
Compounding this CCX is not fixed with cores or cache, meaning Microsoft cannot even assume 4-core per CCX because of the soon to be released 6-Core Ryzen that not only had different core count per CCX but also cache.
How is the scheduler to know where to start threads that have inter-dependencies if the game engine/draw calls/simulations-particles/collision detection scales to the seen 16 logical cores or 12 logical cores (for the soon to be launched models)?

The solution is not simple from a scheduler perspective; the AMD design is nice in some ways and shows some great latency performance on same CCX but the coherent mesh concept falls down with inter-CCX thread and data dependency, is much harder to design a good multi-threaded solution for games than say enterprise/office apps.
Hence why I would say AMD is approaching devs that are scaling to a lot of cores that currently have performance degradation due to data migration/inter CCX thread dependency.

Shame they could not make it 8-Core per CCX, but that adds complexity to the coherent cache/mesh design of their CPU, and tbh the 4-core CCX approach is much more cost effective that then makes it more accessible to consumers/businesses.

Consider this, the scheduler does not overcome these exact issues for developers on PS4 and XBOX1.
Cheers
 
Last edited:
This raises general questions such as:
How would the windows scheduler know the structure of the CCX as it is not really comparable to NUMA nor SMP?
Compounding this CCX is not fixed with cores or cache, meaning Microsoft cannot even assume 4-core per CCX because of the soon to be released 6-Core Ryzen that not only had different core count per CCX but also cache.
How is the scheduler to know where to start threads that have inter-dependencies if the game engine/draw calls/simulations-particles/collision detection scales to the seen 16 logical cores or 12 logical cores (for the soon to be launched models)?

The solution is not simple from a scheduler perspective; the AMD design is nice in some ways and shows some great latency performance on same CCX but the coherent mesh concept falls down with inter-CCX thread and data dependency, is much harder to design a good multi-threaded solution for games than say enterprise/office apps.
Hence why I would say AMD is approaching devs that are scaling to a lot of cores that currently have performance degradation due to data migration/inter CCX thread dependency.

Shame they could not make it 8-Core per CCX, but that adds complexity to the coherent cache/mesh design of their CPU, and tbh the 4-core CCX approach is much more cost effective that then makes it more accessible to consumers/businesses.

Consider this, the scheduler does not overcome these exact issues for developers on PS4 and XBOX1.
Cheers
Spot on.
 
You are confused about how physical and logical cores are related in SMT/HT. All of the cores displayed in Windows are logical cores. Each physical core is represented by two logical cores in the OS. These logical cores are equivalent and have full access to the resources of the core. There is no "real" core and weak SMT core distinction. If threads are running on both logical cores, they must share physical resources (with the hope that more total resources are utilized). So if you want to pin a thread to a physical core, it doesn't matter which of the corresponding logical cores you use.
I appreciate the time taken to explain, and I'm sure that'll indeed be helpful to others who come. I might've worded what I was confused over less than as clear as it could've been. My confusion is not over the distinction/definition of what are Physical/Logical cores, nor what SMT is, so I apologize if that's what was taken from my message. heh

In an attempt to put it more directly: While running Windows 10, in Task Manager, on the Details tab... Lets say we have Chrome running. Right-clicking and selecting "Affinity", in that window that opens up on a Ryzen 7 with 8C/16T it should say:
<All Processors>
CPU 0
CPU 1
CPU 2
CPU 3
CPU 4
CPU 5
CPU 6
CPU 7
CPU 8
CPU 9
CPU 10
CPU 11
CPU 12
CPU 13
CPU 14
CPU 15
Thus, my question was: of those 16 Ryzen Threads, which ones are Physical?
[/end clarification]
~~~~~~~~~~~~~~

EDIT: Something that just came to mind, due to the whole CCX nature of the CPU and having (in the 8C modules) two 4C/8T modules....
Could it possibly be laid out and recognized in Windows as this?
Code:
 Threads 0-3    = Physical
 Threads 4-7    = SMT
 Threads 8-11   = Physical
 Threads 12-15  = SMT
 
Last edited:
Definitely not enough to make up the difference, but the GPU tended to be clocked higher in the faster run. I suppose it could be because the CPU allowed it to use more of it's potential. Just thought I'd mention it.
 
I appreciate the time taken to explain, and I'm sure that'll indeed be helpful to others who come. I might've worded what I was confused over less than as clear as it could've been. My confusion is not over the distinction/definition of what are Physical/Logical cores, nor what SMT is, so I apologize if that's what was taken from my message. heh

In an attempt to put it more directly: While running Windows 10, in Task Manager, on the Details tab... Lets say we have Chrome running. Right-clicking and selecting "Affinity", in that window that opens up on a Ryzen 7 with 8C/16T it should say:
<All Processors>
CPU 0
CPU 1
CPU 2
CPU 3
CPU 4
CPU 5
CPU 6
CPU 7
CPU 8
CPU 9
CPU 10
CPU 11
CPU 12
CPU 13
CPU 14
CPU 15
Thus, my question was: of those 16 Ryzen Threads, which ones are Physical?
[/end clarification]
~~~~~~~~~~~~~~

EDIT: Something that just came to mind, due to the whole CCX nature of the CPU and having (in the 8C modules) two 4C/8T modules....
Could it possibly be laid out and recognized in Windows as this?
Code:
 Threads 0-3    = Physical
 Threads 4-7    = SMT
 Threads 8-11   = Physical
 Threads 12-15  = SMT


i dont know about ryzen because i don have on to test it but all others SMT cpu ive seen on microsoft it was always in pair

aka (PC = physsical core. LC= logical core)
PC0 = LC0 and LC1
PC1 = LC2 and LC3
PC2 = LC4 and LC5
PC3 = LC6 and LC7

You dont have a specilfied phyiscial "thread" and then a SMT thread" They are both SMT "threads" that goes to the same physical cores.
Also threads are something software has. Not a CPU. The mnore proper term would be logical core. it make less confusion when you are actual talking about software thread vs logical cores vs physical cores.
Consiedr logical cores as simply gateways to thephysical cores. 2 gateway to the same core means the core doesn have to stand and way for trafic delyys and hodlups. at one of the gates. it can then open the other gate and take in traficc there.
its basically just and advancement in OoO exection
 
Svent....... lol

Did you start celebrating
54.gif
St. Patrick's Day
54.gif
a bit early?
drinks.gif


It was coherent enough that I followed ya though lol I know that "Logical" refers to all threads, because in these instances neither core is supposed to be looked at as anything else, since a CPU can run 2 threads per core.

As far as I had understood things with Intel, they were always the last threads available. (Proper or "improper" way to call them, calling them all 'logical' just won't cut it in this context) So in a 4C/8T chip it'd be threads 0-3 which relate to the Physical cores (as in what you'd have if you disabled HT), and threads 4-7 relate to the Virtual cores (SMT).

If I was misinformed then, I have some mental correcting to do! :p
 
This causes marginal fps differences in some games. I think this is why AMD wants game developers to determine where the threads go, instead of spending a lot of money trying to write a AMD scheduler/core parking method for windows.
 
Last edited:
This causes marginal fps differences in some games. I think this is why AMD wants game developers to determine where the threads go, instead of spending a lot of money trying to write a AMD scheduler/core parking method for windows.

Exactly the thing ive been trying to get people to test since ive built in the functionalty in my CPU performance progras: Project Mercury
I knew this weeks ago but still nobody on HardOCP seems to want to spend the 10-15 minuttes to run some benchies to test it so we can get a fix for it *sigh*

Image1.png


SMT issues alone has shown up to 10% difference or even better than a 200Mhz OC.
you can see the benchmarks in my thread here
https://hardforum.com/threads/amd-ryzen-game-performance-fix.1926435/#post-1042887880
 
It's really interesting how some games (Deus Ex, Doom) work better with 4 cores (don't have to move data across to the other CCX) but others (BF1, CS:GO, ROTR) seem to be better if all cores are enabled. The games that run faster with all cores enabled are probably just optimized better to require less sharing of data between threads. This would allow each thread to run more independently no matter which CCX it resides on and not have to share data across the CCX as often.

From my understanding of games like BF1, the main game thread is actually fairly small. The rest of the cores are being used for things like physics calculations, things that are not strictly tied to the main game thread. In this sense, each core is processing threads that are largely independent of each other, hence the lack of a performance penalty and scaling with more cores. The other games probably have a heavier main game thread, less sub-threads, and the sub-threads being more closely tied to the main game thread. That would result in better performance if data is not crossing the CCX.

Svent....... lol

Did you start celebrating
54.gif
St. Patrick's Day
54.gif
a bit early? View attachment 19392

It was coherent enough that I followed ya though lol I know that "Logical" refers to all threads, because in these instances neither core is supposed to be looked at as anything else, since a CPU can run 2 threads per core.

As far as I had understood things with Intel, they were always the last threads available. (Proper or "improper" way to call them, calling them all 'logical' just won't cut it in this context) So in a 4C/8T chip it'd be threads 0-3 which relate to the Physical cores (as in what you'd have if you disabled HT), and threads 4-7 relate to the Virtual cores (SMT).

If I was misinformed then, I have some mental correcting to do! :p

You need to think of SMT in this way:

A processor core has several parts to it: the front end (branch predictor and decoder), the core itself, and the back end. What happens is that occasionally the core is able to process information faster than the front end can send it information, especially if the information is difficult to decode. To combat this and enable better utilization of the core, the SMT method doubles the front end to have two front ends feed the core. This is why SMT scaling is highly variable; information that is easy to decode doesn't scale because one front end can keep the core fully occupied, while information that is more difficult will require two front ends to keep the core fully utilized. This is also where performance penalties can occur, as the two front ends can end up fighting each other over utilization of the core. SMT aware programs avoid creating such scenarios.

The two front ends are identical. Each of the front ends are presented to the OS as a logical processor. Therefore, it doesn't matter if you're using front end A or front end B of the processor, you're still using the exact same thing.
 
Ran this through on Witcher 3 the other night. 4C/4T, 4C/8T, 8C/8T and 8C/16T. 720p, lowest settings. Witcher 3 likes more and more threads. Went from 92 fps over the same 3 min bench with 4C/4T (all 4 pegged at 95%+ almost the entire run), to 113 fps for the 8 thread runs (numbers nearly identical), to 128 fps for the 16 thread run.
 
Ran this through on Witcher 3 the other night. 4C/4T, 4C/8T, 8C/8T and 8C/16T. 720p, lowest settings. Witcher 3 likes more and more threads. Went from 92 fps over the same 3 min bench with 4C/4T (all 4 pegged at 95%+ almost the entire run), to 113 fps for the 8 thread runs (numbers nearly identical), to 128 fps for the 16 thread run.


Engine dependent. All in the coding. Those that are knocking Zen for gaming dont understand this apparently.
 
You need to think of SMT in this way:

A processor core has several parts to it: the front end (branch predictor and decoder), the core itself, and the back end. What happens is that occasionally the core is able to process information faster than the front end can send it information, especially if the information is difficult to decode. To combat this and enable better utilization of the core, the SMT method doubles the front end to have two front ends feed the core. This is why SMT scaling is highly variable; information that is easy to decode doesn't scale because one front end can keep the core fully occupied, while information that is more difficult will require two front ends to keep the core fully utilized. This is also where performance penalties can occur, as the two front ends can end up fighting each other over utilization of the core. SMT aware programs avoid creating such scenarios.

The two front ends are identical. Each of the front ends are presented to the OS as a logical processor. Therefore, it doesn't matter if you're using front end A or front end B of the processor, you're still using the exact same thing.
That laid it out very well for me, thank you. :)
 
Back
Top