Ryzen and the Windows Scheduler - PCPer

CSI_PC · Mar 23, 2017

JustReason said:
if the infinity fabric is linked to ram speed and it is the speed at which the L3 runs and based on the graphs that peg the CCX issue to the latency then Yes I would like to see if ram speed will make any discernable difference. I am not making any claims here just interested in testing using these variables.

OMFG.
AGAIN games tested by a few review sites going from 2133MHz to 3000MHz have the SAME relative gains on both Intel and Ryzen.
What does that tell you?
Are you suggesting Intel has the Ryzen fabric/CCX and L3 cache?
Because that is the only way you can make your point.

You are the one making assumptions while games have been tested on both Intel and Ryzen at both memory frequencies.

JustReason · Mar 23, 2017

CSI_PC said:
OMFG.
AGAIN games tested by a few review sites going from 2133MHz to 3000MHz have the SAME relative gains on both Intel and Ryzen.
What does that tell you?
Are you suggesting Intel has the Ryzen fabric/CCX and L3 cache?
Because that is the only way you can make your point.

You are the one making assumptions while games have been tested on both Intel and Ryzen at both memory frequencies.

OMFG. I can do that too. You aren't understanding at all and you aren't even trying to.

Do you have a single link to anything showing L3 cache latecy/speeds at different ram speeds? If no then no need to respond.

You and razor keep acting like I am looking to prove game fixes or something when all I asked was for some tests with the L3 and ram speeds. I haven't asserted there would be any change in the + or - just curious as how those test results would compare. Simple.

razor1 · Mar 23, 2017

JustReason said:
OMFG. I can do that too. You aren't understanding at all and you aren't even trying to.

Do you have a single link to anything showing L3 cache latecy/speeds at different ram speeds? If no then no need to respond.

You and razor keep acting like I am looking to prove game fixes or something when all I asked was for some tests with the L3 and ram speeds. I haven't asserted there would be any change in the + or - just curious as how those test results would compare. Simple.

Look why do you think the latency is there to begin with?

Do you think Latency has anything to do with frequency?

Just look at ram why is there something called CAS latency and what does it do? How is it different from the Frequency?

Come on man we are not making these things up, its well know by any one that over clocks or ever tried to overclock Latency has nothing to do with Frequency.

You can do the tests for yourself if you want, it will come out exactly what I stated, it will not change the L3 cache latency. It is the way it is. If you need proof on that, do it yourself cause we already know what the hell its going to be.

CSI_PC · Mar 23, 2017

JustReason said:
OMFG. I can do that too. You aren't understanding at all and you aren't even trying to.

Do you have a single link to anything showing L3 cache latecy/speeds at different ram speeds? If no then no need to respond.

You and razor keep acting like I am looking to prove game fixes or something when all I asked was for some tests with the L3 and ram speeds. I haven't asserted there would be any change in the + or - just curious as how those test results would compare. Simple.

I bolded your section where you say " I would like to see if ram speed will make any discernable difference" in previous post I responded to.
You have real results involving games that are the ones impacted currently by the CCX/L3 structure and latency.
But you want to ignore those results/conclusions from such as Digital Foundry and just mull over theory, based on a lack of information from AMD on how we can do that accurately.

razor1 · Mar 23, 2017

CSI_PC said:
I bolded your section where you say " I would like to see if ram speed will make any discernable difference".
You have real results involving games that are the ones impacted currently by the CCX/L3 structure and latency.
But you want to ignore those results/conclusions from such as Digital Foundry and just mull over theory, based on no information given from AMD on how we can do that accurately.

Worse yet, he doesn't know what latency is and how it differs from frequency. Latency is sometimes even necessary depending on what its doing.

Trying to figure out an easy way of saying this. Latency can be beneficial when doing many small tasks as it gives time to fill up the cache and fully saturate the ALU's. But in games sake most tasks are interdependent so that will not happen.

CSI_PC · Mar 23, 2017

razor1 said:
Worse yet, he doesn't know what latency is and how it differs from frequency. Latency is sometimes even necessary depending on what its doing.

Trying to figure out an easy way of saying this. Latency can be beneficial when doing many small tasks as it gives time to fill up the cache and fully saturate the ALU's. But in games sake most tasks are interdependent so that will not happen.

And additionally there are R7 reviews out there that managed to use around 3000MHz DDR4 memory and yet we still see same gap between the various CPUs with Intel, even putting aside the further testing done by Digital Foundry and a couple of others looking at 2133MHz to 3000MHz.
And it is games that suffer most with the latency due to thread and data dependency designs rather than office/benchmark being well designed multthreaded applications .
Cheers

DuronBurgerMan · Mar 23, 2017

I'm just eating my popcorn here.

I wonder, though, if games were more aware of the CCX distinction, and kept some interdependent threads on the same CCX... would that make a difference?

Has anyone tested a 4+0 situation against a 2+2 situation to see which is better? That could tell us how much the data fabric latency affects gaming performance, and whether or not it would be worthwhile for devs to give a sh*t about this when coding.

CSI_PC · Mar 23, 2017

DuronBurgerMan said:
I'm just eating my popcorn here.

I wonder, though, if games were more aware of the CCX distinction, and kept some interdependent threads on the same CCX... would that make a difference?

Has anyone tested a 4+0 situation against a 2+2 situation to see which is better? That could tell us how much the data fabric latency affects gaming performance, and whether or not it would be worthwhile for devs to give a sh*t about this when coding.

It affects some games more than others (makes sense), I remember seeing a few showing no impact from the inter-CCX that scale threads but cannot remember which now, same can be said about SMT where there are bad examples with it and some that are good examples with it.
Cheers

JustReason · Mar 23, 2017

razor1 said:
Worse yet, he doesn't know what latency is and how it differs from frequency. Latency is sometimes even necessary depending on what its doing.

Trying to figure out an easy way of saying this. Latency can be beneficial when doing many small tasks as it gives time to fill up the cache and fully saturate the ALU's. But in games sake most tasks are interdependent so that will not happen.

stop being condescending. You are talking clock cycles:9cas is in clocks as in 9 clock cycles per. But that end result of SPEED has a value in ns (Nano seconds). Just using two diff ram speeds for reference 1600 and 2400: 1/(1600/2) * 9 /= 1/(2400/2) * 9. The graph I saw gave end result speeds in ns not clocks, clock don't change in cache but the speed will dependent on CPU clocks (before) and in the case of Ryzen ram speeds.

DuronBurgerMan · Mar 23, 2017

CSI_PC said:
It affects some games more than others (makes sense), I remember seeing a few showing no impact from the inter-CCX that scale threads but cannot remember which now, same can be said about SMT where there are bad examples with it and some that are good examples with it.
Cheers

IIRC Mafia III did very well on Ryzen, and it's well-threaded.

Gideon · Mar 23, 2017

The CCX latency is just not a big deal, yeah it's there and yes it hurts performance. AMD has already said they know where to improve ZEN+ so this may be one of those areas.

razor1 · Mar 23, 2017

JustReason said:
stop being condescending. You are talking clock cycles:9cas is in clocks as in 9 clock cycles per. But that end result of SPEED has a value in ns (Nano seconds). Just using two diff ram speeds for reference 1600 and 2400: 1/(1600/2) * 9 /= 1/(2400/2) * 9. The graph I saw gave end result speeds in ns not clocks, clock don't change in cache but the speed will dependent on CPU clocks (before) and in the case of Ryzen ram speeds.

YOU just said it, so how does the latency change when the ram isn't being accessed? That is the big thing here, the program that was looking for latency, has nothing to do with the ram access speed! Now if you try to split the two apart, you won't get anything but the actual results that were there before! Sheesh.

You have TWO different problems, latency of the cache is NOT congruent with frequency of the RAM. Now if you try to measure both, you can't because you can't isolate a program to give definite results, unless you can split the variables and measure each variable separately, and that is not easy to do, you need to write a separate program to do that, which is not even worth it because they are not connected to each other. Data in L3 cache doesn't need to access RAM but data in RAM has to be sent to L3 cache. So when the latency doesn't change between the CCX (which btw should be the fastest possible as there is nothing in the middle outside of the fabric), but if it needs to access RAM then when you have slower RAM that is what you are noticing in the different RAM speeds, not the decrease or increase of L3 cache latency.

What you are thinking is that the RAM speeds will change the infinity fabric speeds, no it doesn't do that. It only changes the ability for sending information from the RAM to the the fabric at higher speeds because the RAM is faster.

That is why Infinity fabric has a certain amount of bandwidth based on configuration of the platform, it has nothing to do with the RAM frequency.

The way the cores are set up with Ryzen is that the L3 is not global cache per se. Its local cache per each CCX, so Global per CCX only, but can be shared globally through the fabric for the other CCX. Intel's chip aren't like this, they have global L3 cache and L4 depending on the gen, for all their cores.

That is why both Intel chips and Ryzen show improvements with increased memory speeds when a program access RAM. But why Ryzen shows the performance degradation along with that because of L3 cache of one CCX sending information to the L3 cache of the other CCX.

CSI_PC · Mar 23, 2017

DuronBurgerMan said:
IIRC Mafia III did very well on Ryzen, and it's well-threaded.

Yeah well remembered, just checked a review and that is working fine scaling beyond 4 cores even on Ryzen with the inter-CCX, and SMT is slightly positive just over 7% as well, so a good example of a game working fully with Ryzen.

For Honor can run faster on 1800X than 6900K at 1080p with both at stock settings, about 3.5% difference but shows working fine for 1800X even if 6900K would creep in front with both OC'd.

There is one game much faster with 1800X compared to either 6900K or 7700K and they retested it several times and seeing if they can get clarification from AMD/Dev, unfortunately I cannot remember the review or game doh.

Cheers

CAD4466HK · Mar 23, 2017

DuronBurgerMan said:
3. See: Original Athlon, Athlon 64/X2. I mean, they basically made x86-64 and cheap dual cores a thing. I'm hoping Zen makes cheap 8 cores a thing in the same way.

My Athlon 64 X2 4200+ Toledo was $530 and I got it cheap. Can't remember if the Manchester's were more or cheaper, was on San Diego at the time.
The 3800+ which was one model down cost me $320. And lets not forget the 4800+ that was over $900+ for a 200MHz and 2MB L2 increase. And FX-60> 62. All $1000+.
In comparison Intel was selling their shitty Pentium D Smithfield's 820's for under $300. Yeah they had higher models too, but who was stupid enough to buy them?

So I wouldn't call them cheap. Mabey later AM2's but not 939.

DuronBurgerMan · Mar 23, 2017

CAD4466HK said:
My Athlon 64 X2 4200+ Toledo was $530 and I got it cheap. Can't remember if the Manchester's were more or cheaper, was on San Diego at the time.
The 3800+ which was one model down cost me $320. And lets not forget the 4800+ that was over $900+ for a 200MHz and 2MB L2 increase. And FX-60> 62. All $1000+.
In comparison Intel was selling their shitty Pentium D Smithfield's 820's for under $300. Yeah they had higher models too, but who was stupid enough to buy them?

So I wouldn't call them cheap. Mabey later AM2's but not 939.

Oh hell no, I didn't pay those prices. IIRC, in those days the top model AMD was pretty pricey, but one or two rungs down, and they were cheap-cheap. It's going way back, so I can't remember exactly what I paid for it, but I grabbed an X2 4200+, and I know I didn't pay more than $250 for it, because I was broke-ass at the time.

But I also didn't buy it on the day it came out, either. Back then sh*t was moving so fast, 6 months of waiting got you the former flagships for like half the $$.

For the Athlon 64 3000+ before that, I picked up a socket 754 board and CPU *after* the 939 sh*t came out. Cheap-cheap, and not a big performance difference at the time.

CAD4466HK · Mar 23, 2017

DuronBurgerMan said:
Oh hell no, I didn't pay those prices. IIRC, in those days the top model AMD was pretty pricey, but one or two rungs down, and they were cheap-cheap. It's going way back, so I can't remember exactly what I paid for it, but I grabbed an X2 4200+, and I know I didn't pay more than $250 for it, because I was broke-ass at the time.

But I also didn't buy it on the day it came out, either. Back then sh*t was moving so fast, 6 months of waiting got you the former flagships for like half the $$.

For the Athlon 64 3000+ before that, I picked up a socket 754 board and CPU *after* the 939 sh*t came out. Cheap-cheap, and not a big performance difference at the time.

Yeah release prices were fucking crazy back then, but if you wanted the best...
Never messed around with the Clawhammers or the Venice or the Newcastle, to many buddies killing their boards.
Did have a Opty 175 Venus after I got rid of my 3700+ San Diego, God damn those were the days!
Shit going OT.

ZeroBarrier · Mar 23, 2017

CAD4466HK said:
My Athlon 64 X2 4200+ Toledo was $530 and I got it cheap. Can't remember if the Manchester's were more or cheaper, was on San Diego at the time.
The 3800+ which was one model down cost me $320. And lets not forget the 4800+ that was over $900+ for a 200MHz and 2MB L2 increase. And FX-60> 62. All $1000+.
In comparison Intel was selling their shitty Pentium D Smithfield's 820's for under $300. Yeah they had higher models too, but who was stupid enough to buy them?

So I wouldn't call them cheap. Mabey later AM2's but not 939.

Got my 4200+ Manchester back then for $358, mind you this was a year after it's release. I can't remember what they retailed for at release to be honest.

razor1 · Mar 23, 2017

probably around 500, I got my 4800+ on release it was around 1k

bobzdar · Mar 23, 2017

razor1 said:
Look why do you think the latency is there to begin with?

Do you think Latency has anything to do with frequency?

Just look at ram why is there something called CAS latency and what does it do? How is it different from the Frequency?

Come on man we are not making these things up, its well know by any one that over clocks or ever tried to overclock Latency has nothing to do with Frequency.

You can do the tests for yourself if you want, it will come out exactly what I stated, it will not change the L3 cache latency. It is the way it is. If you need proof on that, do it yourself cause we already know what the hell its going to be.

Frequency can decrease the latency in ns if it is tied to clock cycles - like the case you quoted of CAS latency, which is latency in clock cycles on synchronous ram (ie what we use, CAS15 is 15 clock cycles). If the latency is a certain number of clock cycles, increasing the clock will decrease the latency in ns. Pretty simple. Increasing from 2133mhz to 3000mhz will decrease latency (if tied to clock cycles; synchronous) in ns by 41%. Yes, it's still the same number of clock cycles but since they're occuring 41% faster, the latency in ns will decrease by that much.

The actual latency in ns for 2133 and 3000 CAS15 ram is 14ns and 10ns (first word), respectively. So yes, frequency can definitely affect latency.

razor1 · Mar 23, 2017

bobzdar said:
Frequency can decrease the latency in ns if it is tied to clock cycles - like the case you quoted of CAS latency, which is latency in clock cycles on synchronous ram (ie what we use, CAS15 is 15 clock cycles). If the latency is a certain number of clock cycles, increasing the clock will decrease the latency in ns. Pretty simple. Increasing from 2133mhz to 3000mhz will decrease latency (if tied to clock cycles; synchronous) in ns by 41%. Yes, it's still the same number of clock cycles but since they're occuring 41% faster, the latency in ns will decrease by that much.

The actual latency in ns for 2133 and 3000 CAS15 ram is 14ns and 10ns (first word), respectively. So yes, frequency can definitely affect latency.

yes but we are talking about two different things and that is why I pointed it out, he is thinking its like RAM latency, this is not that. Frequency of the fabric doesn't change because the frequency of the RAM changes.

bobzdar · Mar 23, 2017

razor1 said:
yes but we are talking about two different things and that is why I pointed it out, he is thinking its like RAM latency, this is not that. Frequency of the fabric doesn't change because the frequency of the RAM changes.

Wrong, infinity fabric speed is tied to ram speed:

www.techpowerup.com/231585/amd-ryzen-infinity-fabric-ticks-at-memory-speed

razor1 · Mar 23, 2017

bobzdar said:
Wrong, infinity fabric speed is tied to ram speed:

www.techpowerup.com/231585/amd-ryzen-infinity-fabric-ticks-at-memory-speed

If that is the case the only way to reach max potential is what I stated earlier, ram frequency has to be at 250%-300% more then what it is now to avoid the latency problem, is that going to happen? NO its not realistic.

We saw a 3 fold increase in latency when not even accessing the ram, just cross talk between the CCX's in the pcper review which used 2400mhz ram, that means they need to get that ram frequency up to 7200mhz, is that even realistic?

lolfail9001 · Mar 23, 2017

razor1 said:
If that is the case the only way to reach max potential is what I stated earlier, ram frequency has to be at 250%-300% more then what it is now to avoid the latency problem, is that going to happen? NO its not realistic.

Actually, it is realistic, because supposedly there is a presently disabled option to run fabric at 2x multiplier compared to memory bus clock. Would kind of resolve the issue, don't you think?

razor1 · Mar 23, 2017

lolfail9001 said:
Actually, it is realistic, because supposedly there is a presently disabled option to run fabric at 2x multiplier compared to memory bus clock. Would kind of resolve the issue, don't you think?

if that is possible, I think that isn't possible because the CPU also is connected to the fabric right? Data just can't be shoved into it. This is why Intel has created L4 cache with their newer chips. You will get an overflow at some point.

If they could do it, I think they would have done it too. That would erase the problem.
Added to this Intel's L4 cache has latency of 60ns max, that is still lower then the latency increases we saw with the CCX cross talk!.

Forgot, Infinity fabric is also what is used for all of the power management stuff for the entire system and the CPU frequency scaling vs the other components, XFR all that stuff, forget power management at that point if you are going double the multiplier if its even possible.

Gideon · Mar 23, 2017

lolfail9001 said:
Actually, it is realistic, because supposedly there is a presently disabled option to run fabric at 2x multiplier compared to memory bus clock. Would kind of resolve the issue, don't you think?

Where did you hear that? Would be interesting to read about that.

razor1 · Mar 23, 2017

Yeah it not going to drop that latency at all not by anything meaningful, there is only a certain amount of bytes per cycle the data fabric can handle the increase we see with almost no information being sent back and forth between the CCX is causing this. Its really not going to make a difference.

Intel does 48 bytes per cycle (32 bytes in flight and 16 bytes stored) and this is why the problem can't be corrected, software was made for Intel's 48 bytes per cycle.

So hypothetically if they were able to even do the multiplier for the Infinity fabric at X2 and then the faster RAM on top of that, there is still a 50% overflow problem because of the application.

bobzdar · Mar 23, 2017

razor1 said:
If that is the case the only way to reach max potential is what I stated earlier, ram frequency has to be at 250%-300% more then what it is now to avoid the latency problem, is that going to happen? NO its not realistic.

We saw a 3 fold increase in latency when not even accessing the ram, just cross talk between the CCX's in the pcper review which used 2400mhz ram, that means they need to get that ram frequency up to 7200mhz, is that even realistic?

I don't think it needs to be that high for much improved performance - 3000 to 3200 should make a pretty big difference. Maybe not ideal, but if you both increase bandwidth and decrease ccx/cache latency by 25-40%, its a double win.

CSI_PC · Mar 23, 2017

It is all well and good talking about the theory and how CCX should improve with memory clock speed but here are some actual results done by Eurogamer and FPS gains:
Ryzen 1800X with SMT 2133MHz to 3200MHz (yep theirs actually works at that speed)
Assassin's Creed Unity: 6.4%
Crysis 3: 6.9%
The Division: under 0.5%
Far Cry Primal: 15%
The Witcher 3: 19.3%

Core i7 7700K with SMT 2133MHz to 3000MHz - so RAM increase is lower at 40% over 2133 while above with Ryzen was 50% over 2133.
Assassin's Creed Unity: 5.5%
Crysis 3: 4%
The Division: under 0.5%
Far Cry Primal: 13.7%
The Witcher 3: 19.5%

Interesting how The Division has no sensitivity at all to RAM frequency, and we can see Intel and Ryzen are comparable with fps avg performance gains as memory frequency increases, so applies to both and not just Ryzen.
Cheers

CSI_PC · Mar 23, 2017

lolfail9001 said:
Actually, it is realistic, because supposedly there is a presently disabled option to run fabric at 2x multiplier compared to memory bus clock. Would kind of resolve the issue, don't you think?

Not entirely convinced myself because every system has a limit to what it can be pushed, as an example RAM has to have its latency-timings increased with higher clock speeds or it will fail, PCIe has its own issues if doing the same, and I can bet such a radical change with the Infinity Fabric could also cause fail state if this setting is a broad one as it is talking to quite a few sections with wait and synch dependencies (the L3 cache will have its own clock and timings tolerances), let alone the inter-CCX communication or mesh timings-cycles.
TBH there is not enough information from AMD to suggest this is going to resolve the problem, and I am not entirely convinced in this context of inter-CCX and thread-data dependency it will help even if that parameter can be enabled.
Cheers

CSI_PC · Mar 23, 2017

I somehow missed this bit of news when it was released by AMD regarding the scheduler working fine; they will be releasing an update to their power policy parameters to work better with Balanced Mode:
Sorry if posted before but fits with OP.

In the near term, we recommend that games and other high-performance applications are complemented by the High Performance plan. By the first week of April, AMD intends to provide an update for AMD Ryzen processors that optimizes the power policy parameters of the Balanced plan to favor performance more consistent with the typical usage models of a desktop PC.

Cheers

SvenBent · Mar 23, 2017

razor1 said:
That won't happen, he will get erratic results. The CCX issue is complicated and can't be done outside of the actual application. Having a 3rd party solution, just makes the problem worse in most cases or all cases for that matter.

This isn't like shader replacement which is kinda what SvenBent is kinda talking about where partially compiled code is fully compiled at run time and can be replaced when certain instructions are noticed. This is compiled code that needs to be read first then translated and replaced. A big difference.

Locking down the CCX's to specific threads will cause havoc in most multi threaded code in games because games relay on main threads that branch out for the needs of the graphics portion. And that does not work well if the threads are locked to specific cores since if one thread needs info from a portion of that same thread, whats going to happen? A stall of the core right? Then what happens to the other cores that might need that information too?

You are stuck up shits creak at that point.

GPU's are high parallel, but they need the CPU to keep feeding them, if the CPU is artificially bound then the GPU will get bound too even though it sent the info it needs and when it needs that info back.

This is also why we see some multithreaded games not having as much problems as others because their main threads are set up differently and the performance of the engine is bound to main thread

PS the rest of what he stated about SMT, is pretty much what I was getting at, SMT performance can be improved, but application dependent too. And that has nothing to do with the CCX issues.

Thank you the attention. Please let me try to settle some of the misunderstandings the might be around the "solution" I am working on.

It doesn't have to be all or nothing. Most people, if not all, reacts this way when i bring this up. By doing it software wise and using affinity to "disable" SMT of CCX switching we can do it live.
Now my current software has been doing it for years with SMT but in a dumb way. By dumb i mean it doesn't detect any thing it just does the restrictions or not according to what the user set it for.
This still only affect the active process though (aka the game). the background task are till able to utilize the extra logical cores.

What i am currently working on is a smarter method, that will detect when the active process is running a low amounth of threads and thereby does not benefit for the massive amount of cores. and "disable" one CCX FOR THAT PROCESS ONLY.
its importan to note that the other CCX is still avaible for background software to utilize it and offload it from the CCX the main process is running on.

In case the process has enough heavy CPU threads to benefits from all the massive cores. the restrictions is removed and the main process is allowed to benefits from the massive parallelism of the CPU

So lets say you are running a game that has 4 heavy CPU threads. and are streaming at the same time. my progam will set the game to just be on one CCX unit to avoid CC swithcing. while the streaming process is fully allowed to be on the other CCX and utilize the extra cores.
Hopefully if i get this right there should be little to no drawbacks and a little to decent speed up. you are NOT losing the benefits of 8 cores. because my programs should only adjust it when you can't utilize it anyway

now the problem for me right now that there are two factors to take int account. both SMT and CCX. and i don't know if its more beneficial for let say a 7 threaded game to be restricted out of SMT, or out of CCX switching.
This is what i need the benchmark for so i can know which is the optimal choice to take.

TLDR: We are not dropping the benefits of SMT or extra cores of a CCX. We are just working around bad situations where they hurt the performance

SvenBent · Mar 23, 2017

Also I'm trying to start a small database (started today) on games and how many CPU havey threds they really have to get a good ides of numbers of cores are optimal to buy for certain games

BF1 :5
Lords Of The Fallen: 3
Path of Exiled : 2
Skyrim SE: 1+

If i get enough people on board to test this ( its really simple)
then im planning on a pro version where you can set a list of games and set what feature you want when this game runs. AKA on skyrim you could have SMT and CCX "disabled" and have BF just "disabled" a CCX.
But this is not going to happen until after the smart/automatic way is done

Deleted member 88301 · Mar 23, 2017

Another take on Ryzen:

I'm not qualified to say much about the issue, but I do find it interesting.

I Really Want AMD to do well. It's just good for all of us for them to do well.

JustReason · Mar 24, 2017

razor1 said:
If that is the case the only way to reach max potential is what I stated earlier, ram frequency has to be at 250%-300% more then what it is now to avoid the latency problem, is that going to happen? NO its not realistic.

We saw a 3 fold increase in latency when not even accessing the ram, just cross talk between the CCX's in the pcper review which used 2400mhz ram, that means they need to get that ram frequency up to 7200mhz, is that even realistic?

So you were arguing with me when you didn't even know this? Seriously! I even mentioned the infinity fabric tied to ram speed and that didn't set off bells?

razor1 · Mar 24, 2017

JustReason said:
So you were arguing with me when you didn't even know this? Seriously! I even mentioned the infinity fabric tied to ram speed and that didn't set off bells?

Well you should have linked something, cause I didn't know, when you don't know something, I link information to it don't I?

And It wasn't know till 6 days ago.

Not to mention, a 30% increase in bandwidth is NOT GOING TO DO SHIT TO THE LATENCY ANYWAYS! That is why I didn't know about it. Going form 2400 mhz to 3200 mhz, you won't see shit!

So trying to make a mountain out of a mole hill is your specificity just like you tried with async, you can take that 30% increase in bandwidth and figure out its only going to give ya about 5% increase in performance! That is what we see in games and that is going to give use nothing as an end result!

CSI_PC · Mar 24, 2017

JustReason said:
So you were arguing with me when you didn't even know this? Seriously! I even mentioned the infinity fabric tied to ram speed and that didn't set off bells?

His point is still relevent as that no matter what you buy from a higher frequency RAM perspective it will not change the underlying issue, and he is right as clearly seen by those gaming results from Eurogamer.
Games are the best ones to look at as they are the scenario that has the greatest issue with thread and data dependencies across both CCX.
Actual latency is more than just the clock cycle with say CAS behaviour, it also has separate absolute ns timings and delay whether that be due to physical properties or retrieval-protocol-communication/transmission properties or a combination of both.
And we still do not know that much about the Data Fabric that is at the heart of this; such as is the on-chip switch 38GB/s (say if using 2400MHz) per connection or aggregate and shared between CCXs/IO Hub/system memory, priorities/reservations,etc.

Some are assuming it is shared and that is the total sum, but then Naples has just over 120GB/s between the two sockets and they would also be connected to the Infinity Fabric, so there is a lot we do not understand about the Infinity-Data Fabric apart from as a 'box' in a slide between the core-cache and system memory-IO hub.
Cheers

juanrga · Mar 24, 2017

DuronBurgerMan said:
I'm just eating my popcorn here.

I wonder, though, if games were more aware of the CCX distinction, and kept some interdependent threads on the same CCX... would that make a difference?

Has anyone tested a 4+0 situation against a 2+2 situation to see which is better? That could tell us how much the data fabric latency affects gaming performance, and whether or not it would be worthwhile for devs to give a sh*t about this when coding.

Computerbase.de, hardware.fr and other tested it. A pair of titles like BF1 got noticeable gains (double digit percent) when running on 4+0 RyZen compared to 2+2 RyZen. Average gains over all games was 3% or less.

noko · Mar 24, 2017

Something I did not see mentioned is that the L3 cache speed follows and is the same for the fastest running core in the ccx.

Higher core frequencies faster the L3 will be. If you are testing for L3 latency but the thread is running on a slower core it will show a higher latency then what the L3 can do due to the mismatch in clock speeds.

Aida64 Beta now supposedly does accurate latency tests for Ryzen. So when I have time I can test different clock speeds ram speeds etc.

DuronBurgerMan · Mar 24, 2017

juanrga said:
Computerbase.de, hardware.fr and other tested it. A pair of titles like BF1 got noticeable gains (double digit percent) when running on 4+0 RyZen compared to 2+2 RyZen. Average gains over all games was 3% or less.

Still encouraging. There are/were a few issues with Ryzen gaming performance. The power profile in Windows 10 (fix coming in April - temp fix being set to High Performance mode). Then there was this issue, with cross-CCX communication and the latency involved in it. Then there were SMT penalties which may or may not be related in every circumstance. And there are probably more. It's fair to say there's extra performance still left on the table.

Like you, I don't reasonably expect solving all the issues and optimization requirements will result in Ryzen beating out a 7700k. Not going to happen. Single threaded performance on Kaby Lake is king, and there is no magic pill for Ryzen there. But in the first set of benchmarks that came out, we were looking at a 20%+ deficiency, which is a bit bitter. Cut that deficiency average in half, and Ryzen looks a lot better -- especially since the 6 core parts are benching at damn near the same speed as the 8 core parts. Budget mixed-use folks will love the 6 core/12 thread Ryzens next to the 7600k 4c/4t. Gaming performance spread will be more narrow with those two than 1800X vs 7700k, and content creation spread will be wider.

Ryzen may thus wind up being a better mid-range buy than a high-end buy (which would be limited to users more like myself).

noko · Mar 24, 2017

CSI_PC said:
His point is still relevent as that no matter what you buy from a higher frequency RAM perspective it will not change the underlying issue, and he is right as clearly seen by those gaming results from Eurogamer.
Games are the best ones to look at as they are the scenario that has the greatest issue with thread and data dependencies across both CCX.
Actual latency is more than just the clock cycle with say CAS behaviour, it also has separate absolute ns timings and delay whether that be due to physical properties or retrieval-protocol-communication/transmission properties or a combination of both.
And we still do not know that much about the Data Fabric that is at the heart of this; such as is the on-chip switch 38GB/s (say if using 2400MHz) per connection or aggregate and shared between CCXs/IO Hub/system memory, priorities/reservations,etc.

Some are assuming it is shared and that is the total sum, but then Naples has just over 120GB/s between the two sockets and they would also be connected to the Infinity Fabric, so there is a lot we do not understand about the Infinity-Data Fabric apart from as a 'box' in a slide between the core-cache and system memory-IO hub.
Cheers

Increase ram speed and the data fabric will be able to transfer faster. Since most benchmarks were done with DDR 4 running at 2933, going to DDR 3200 will probably not see that much gain.

For 100mhz base clock, DDR 4 3200 is the maxed speed. Upping the base clocks ram speeds of over 3600 have been achieved. But then you are compromising pcie devices with out of spec frequencies.

Biggest gains will be for optimized code keeping thread communication as much as possible within a CCX.

Ryzen and the Windows Scheduler - PCPer

2[H]4U

razor1 is my Lover

[H]F Junkie

2[H]4U

[H]F Junkie

2[H]4U

[H]ard|Gawd

2[H]4U

razor1 is my Lover

[H]ard|Gawd

2[H]4U

[H]F Junkie

2[H]4U

2[H]4U

[H]ard|Gawd

2[H]4U

Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

2[H]4U

[H]F Junkie

[H]ard|Gawd

2[H]4U

2[H]4U

2[H]4U

2[H]4U

2[H]4U

Deleted member 88301

Guest

razor1 is my Lover

[H]F Junkie

2[H]4U

2[H]4U

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness