Separate names with a comma.
Discussion in 'AMD Processors' started by BrianB, Mar 14, 2017.
So what does that tell us about Ryzen's potential?
AMD needs a core toggle switch in Ryzen Master?
This clearly means this CPU has potential but Windows is clearly doing something wrong with the CPU. Microsoft stated they were patching Windows 10 even though AMD says there's no issues. It seems that AMD marketing are being armchair engineers, IMO. Lack of communication between departments.
Or maybe some sort of game profile so it's automatically detected?
No windows update will fix this issue. Now this is the first video I have seen to give a huge boost in performance by disabling cores, I find it wierd, cause some of these games were tested by Hardware.Fr and they didn't see that kind of performance improvement so will need to see more on it to figure out if they are real or not.
Parking cores is done through bios so I don't know if they will have that kind of control already in windows that is something they will need to update for. They can possible do it by setting up affinity of course......
But now effectively you have a 4 core CPU, so really will be better for the future? Yeah all the arguments for Ryzen's as a good alternative falls apart.
Not all games are affected by this and to patch Windows 10 it will take time. Don't expect happening any time soon, probably not until the end of this year when Redstone 4 is out.
Check this video from @ pcper . What they describe is certainly related with this problem
yep saw the video, I am surprised to see that much improvement, interesting, but still 10% yeah that seems to be the hit for the problem ok, guess they are in limits.
that BF1 demo is weird.
look at the core usage.
all odd numbered ones except for 1 are getting hammered but the even numbered except 2 are just sitting there.
why isn't 1 being utilized and 2 is?
I have a 1700 and 1700x at home. I will personally do this testing myself. 1080ti is coming tomorrow but I have a business trip on Thursday. Its either going to be quick and dirty tomorrow night or over the weekend.
Update: According to AMD Windows 10 scheduler is not a problem but simply games are more optimized for Intel CPUs.
Cool thx up front!
I wonder if maybe because of the disabled cores the temps and boost are allowing higher clocks as well which would make these numbers a bit inflated.
Hoping more tests with controlled clocks come out but I don't believe the numbers in that bench. Let's hope others can confirm these numbers.
It's really interesting how some games (Deus Ex, Doom) work better with 4 cores (don't have to move data across to the other CCX) but others (BF1, CS:GO, ROTR) seem to be better if all cores are enabled. The games that run faster with all cores enabled are probably just optimized better to require less sharing of data between threads. This would allow each thread to run more independently no matter which CCX it resides on and not have to share data across the CCX as often.
Well you are seeing all logical cores in the video. BF1 is probably optimized to use basically the same number of threads as there are physical cores in the system. This would allow those threads to run at their maximum performance without having to share another SMT thread on the same core.
so is 2 a logical core or are all odd number cores logical?
I believe with Ryzen the logical cores are odd/even on the same physical core. For example, on physical core 0 you have logical cores 0 and 1. On physical core 1 you have logical cores 2 and 3, etc.
that still doesn't explain what is going on.
2,3,5,7,9,11,13,15 are working
1,4,6,8,10,12,14,16 are sitting there.
The 1st part of the video, ok, AVG and MAX about ~10% faster. The minimum though? Teens vs 40's? That's eye opening.
Well if the logical cores are enumerated 1 through 16 that would still be a single thread per physical core. I guess the tool they are using to show CPU loading enumerates them 1 through 16. I think normally in the architecture or OS they would be enumerated 0 through 15.
I'm so confused now... (reply to cores, physival vs smt)
Has anyone officially came out and detailed how Ryzen is configured in this department? Because I only speculated in one of the threads that maybe Ryzen is setup in an Every-Other setup, but Intel's is Real First - SMT Last. Meaning 1-4 = Physical, 5-8 = Logical. If either AMD or Windows are able to differentiate this on the fly then I have to commend whomever sorted this out.
However, I personally would really like to know what the official word is on this. Not so I can speculate on crap, just because I really need to know for my own sake and peace of mind...
From screenshots I had seen (leaked, a few days before launch), I could've sworn you could enable/disable cores through the program. I had though to myself (see: hoped) that you could create profiles for games and dictate which cores would be available for it.
I have Ryzen Master downloaded, but without a motherboard, no way to utilize it yet
Robert Hallock addresses it to a degree at the end of this article with testing performed on the F1 game.
I already uploaded a tool to do this for games.
and this is supposed to be part of my project mercury program
You are confused about how physical and logical cores are related in SMT/HT. All of the cores displayed in Windows are logical cores. Each physical core is represented by two logical cores in the OS. These logical cores are equivalent and have full access to the resources of the core. There is no "real" core and weak SMT core distinction. If threads are running on both logical cores, they must share physical resources (with the hope that more total resources are utilized). So if you want to pin a thread to a physical core, it doesn't matter which of the corresponding logical cores you use.
This raises general questions such as:
How would the windows scheduler know the structure of the CCX as it is not really comparable to NUMA nor SMP?
Compounding this CCX is not fixed with cores or cache, meaning Microsoft cannot even assume 4-core per CCX because of the soon to be released 6-Core Ryzen that not only had different core count per CCX but also cache.
How is the scheduler to know where to start threads that have inter-dependencies if the game engine/draw calls/simulations-particles/collision detection scales to the seen 16 logical cores or 12 logical cores (for the soon to be launched models)?
The solution is not simple from a scheduler perspective; the AMD design is nice in some ways and shows some great latency performance on same CCX but the coherent mesh concept falls down with inter-CCX thread and data dependency, is much harder to design a good multi-threaded solution for games than say enterprise/office apps.
Hence why I would say AMD is approaching devs that are scaling to a lot of cores that currently have performance degradation due to data migration/inter CCX thread dependency.
Shame they could not make it 8-Core per CCX, but that adds complexity to the coherent cache/mesh design of their CPU, and tbh the 4-core CCX approach is much more cost effective that then makes it more accessible to consumers/businesses.
Consider this, the scheduler does not overcome these exact issues for developers on PS4 and XBOX1.
I appreciate the time taken to explain, and I'm sure that'll indeed be helpful to others who come. I might've worded what I was confused over less than as clear as it could've been. My confusion is not over the distinction/definition of what are Physical/Logical cores, nor what SMT is, so I apologize if that's what was taken from my message. heh
In an attempt to put it more directly: While running Windows 10, in Task Manager, on the Details tab... Lets say we have Chrome running. Right-clicking and selecting "Affinity", in that window that opens up on a Ryzen 7 with 8C/16T it should say:
Thus, my question was: of those 16 Ryzen Threads, which ones are Physical?
EDIT: Something that just came to mind, due to the whole CCX nature of the CPU and having (in the 8C modules) two 4C/8T modules....
Could it possibly be laid out and recognized in Windows as this?
Threads 0-3 = Physical
Threads 4-7 = SMT
Threads 8-11 = Physical
Threads 12-15 = SMT
Threads 0-3 = Physical
Threads 4-7 = SMT
Threads 8-11 = Physical
Threads 12-15 = SMT
Definitely not enough to make up the difference, but the GPU tended to be clocked higher in the faster run. I suppose it could be because the CPU allowed it to use more of it's potential. Just thought I'd mention it.
i dont know about ryzen because i don have on to test it but all others SMT cpu ive seen on microsoft it was always in pair
aka (PC = physsical core. LC= logical core)
PC0 = LC0 and LC1
PC1 = LC2 and LC3
PC2 = LC4 and LC5
PC3 = LC6 and LC7
You dont have a specilfied phyiscial "thread" and then a SMT thread" They are both SMT "threads" that goes to the same physical cores.
Also threads are something software has. Not a CPU. The mnore proper term would be logical core. it make less confusion when you are actual talking about software thread vs logical cores vs physical cores.
Consiedr logical cores as simply gateways to thephysical cores. 2 gateway to the same core means the core doesn have to stand and way for trafic delyys and hodlups. at one of the gates. it can then open the other gate and take in traficc there.
its basically just and advancement in OoO exection
Did you start celebrating St. Patrick's Day a bit early?
It was coherent enough that I followed ya though lol I know that "Logical" refers to all threads, because in these instances neither core is supposed to be looked at as anything else, since a CPU can run 2 threads per core.
As far as I had understood things with Intel, they were always the last threads available. (Proper or "improper" way to call them, calling them all 'logical' just won't cut it in this context) So in a 4C/8T chip it'd be threads 0-3 which relate to the Physical cores (as in what you'd have if you disabled HT), and threads 4-7 relate to the Virtual cores (SMT).
If I was misinformed then, I have some mental correcting to do!
What happens in Deus Ex when you do this with an i7 with HT?
This causes marginal fps differences in some games. I think this is why AMD wants game developers to determine where the threads go, instead of spending a lot of money trying to write a AMD scheduler/core parking method for windows.
You disabled SMT and one CCX.
So we can't tell if this is an SMT issue, or a CCX issue.
You need to add the 4C + SMT case.
Exactly the thing ive been trying to get people to test since ive built in the functionalty in my CPU performance progras: Project Mercury
I knew this weeks ago but still nobody on HardOCP seems to want to spend the 10-15 minuttes to run some benchies to test it so we can get a fix for it *sigh*
SMT issues alone has shown up to 10% difference or even better than a 200Mhz OC.
you can see the benchmarks in my thread here
From my understanding of games like BF1, the main game thread is actually fairly small. The rest of the cores are being used for things like physics calculations, things that are not strictly tied to the main game thread. In this sense, each core is processing threads that are largely independent of each other, hence the lack of a performance penalty and scaling with more cores. The other games probably have a heavier main game thread, less sub-threads, and the sub-threads being more closely tied to the main game thread. That would result in better performance if data is not crossing the CCX.
You need to think of SMT in this way:
A processor core has several parts to it: the front end (branch predictor and decoder), the core itself, and the back end. What happens is that occasionally the core is able to process information faster than the front end can send it information, especially if the information is difficult to decode. To combat this and enable better utilization of the core, the SMT method doubles the front end to have two front ends feed the core. This is why SMT scaling is highly variable; information that is easy to decode doesn't scale because one front end can keep the core fully occupied, while information that is more difficult will require two front ends to keep the core fully utilized. This is also where performance penalties can occur, as the two front ends can end up fighting each other over utilization of the core. SMT aware programs avoid creating such scenarios.
The two front ends are identical. Each of the front ends are presented to the OS as a logical processor. Therefore, it doesn't matter if you're using front end A or front end B of the processor, you're still using the exact same thing.
Ran this through on Witcher 3 the other night. 4C/4T, 4C/8T, 8C/8T and 8C/16T. 720p, lowest settings. Witcher 3 likes more and more threads. Went from 92 fps over the same 3 min bench with 4C/4T (all 4 pegged at 95%+ almost the entire run), to 113 fps for the 8 thread runs (numbers nearly identical), to 128 fps for the 16 thread run.
Engine dependent. All in the coding. Those that are knocking Zen for gaming dont understand this apparently.
That laid it out very well for me, thank you.