AMD Ryzen 9 3000 is a 16-core Socket AM4 Beast

Under 8 threads means within a CCX. The latency issues I mentioned are for CCX-CCX communication within the same die or for die to die communication. if you check the TR numbers, communicating between two dies has similar latency than communicating a Xeon with other Xeon on a dual socket system!
Yes but you make it out like it's a massive problem when it's not for most users here - the latency is as low or lower than intel where it counts in under 8 threads. This is the same discussion as the 16 core Zen thread but reversed.
It has more threads than most software can make use of. Let alone beyond 8 threads for some software cases and Amdahls law etc. But here, in this thread, somehow latency matters more than anything for beyond 8 threads all of a sudden?

...did you look at the screenshot you posted (absent a link, might I add)? Now AMD is putting the memory controller off-die..?
We can't say that AMD won't be able to address the potential issues here, or how they do if they do, but it's very clear that core-to-core latency is a weakness of their technology (which your post supports) and it's something that will require scrutiney on Ryzen 3 parts with multiple CPU dies.
I did look at the chart. And it clearly showed AMD has the lowest latency for a majority of desktop workloads under 8 threads, even beating the 7700k, dark blue and grey at the bottom of the latency pile. I would share your thoughts on latency regarding Zen 2 I would expect a slightly higher latency, with vastly lower latency for Chiplet>Chiplet than in existing Epyc/TR arrangements (e.g. the other half of the chart, closer to current intra-CCX latency which is top of the chart for latency. Even with a doubling they're still around a 6950X and much faster still than the ring bus 7900X latency, which no one bitches about or notices. I would also expect they have a trick to minimise this issue. IO controller is off die, don't forget that moving all that stuff off the chiplet made more room for cache - so expect more there, 32Mb L3 per chiplet if SiSoft leak isn't fudged. So it will pick up some steam in other areas than just clockspeed.
Screeny as much as I am not a fan of them, is from pcper.

Edit to add, Zen+ latency improved over Zen and memory speeds also can impact this, so improved MC and efficiency bump might negate most of the latency to be a wash...
 
But here, in this thread, somehow latency matters more than anything for beyond 8 threads all of a sudden?

It matters or it doesn't; as games get more thread-aware, which is happening but slowly, it can tank IPC. Benchmarks on Threadripper at >16 cores show this vividly, and it's not just gaming.

The challenge is that when games 'break apart' processes to take advantage of more hardware thread resources, they still need to maintain concurrency for much of that work; thus, if the OS puts a thread on a core with significantly worse latency, the whole thing can slow down. With Threadripper the AMD provided solution was to literally turn cores off in the UEFI.

I did look at the chart. And it clearly showed AMD has the lowest latency for a majority of desktop workloads under 8 threads

It shows the AMD parts either on par or worse; as core counts rise, vs. modern Intel parts, AMD inter-core latency gets significantly worse. Again, we're both hoping (and more or less expecting) AMD to address this issue, in part at least by:


Edit to add, Zen+ latency improved over Zen and memory speeds also can impact this, so improved MC and efficiency bump might negate most of the latency to be a wash...

But it will be cache that helps. And probably a lot of it. Memory latency is going to get worse; not as bad as the orphen dies on Threadripper hopefully, but it's also going to affect every core.


Biggest issue is that while total performance is going to undoubtedly go up, AMD has made architecture decisions that will make catching up with Intel's raw per-core performance even more difficult, and that's measuring against the aging and repeatedly 'refreshed' Skylake architecture. Intel has another that's been sitting around for three or four years that likely saw some updates, and that's what AMD's Zen2 / Ryzen 3 is really going to have to compete with.
 
Yes but you make it out like it's a massive problem when it's not for most users here - the latency is as low or lower than intel where it counts in under 8 threads. This is the same discussion as the 16 core Zen thread but reversed.
It has more threads than most software can make use of. Let alone beyond 8 threads for some software cases and Amdahls law etc. But here, in this thread, somehow latency matters more than anything for beyond 8 threads all of a sudden?

Yes. Latency is not a problem. That is the reason why AMD released latency improved AGESAs, a new chipset with improved memory support to reduce latency, some 2000 series chips with reduced L2 latency, a special BIOS mode for dual-die TR (that disables one die to eliminate die-die cross latency), it is also the reason why reviews test Zen with the higher possible OC memory to reduce latencies, and why users are in forums asking about how to obtain the fastest stable RAM for reducing latency in their builds. :rolleyes:
 
It matters or it doesn't; as games get more thread-aware, which is happening but slowly, it can tank IPC. Benchmarks on Threadripper at >16 cores show this vividly, and it's not just gaming.

Windows has problems (scheduler???) with more than 16 cores 32 threads...
Look at linux benches vs Windows and then come back...(not gaming benches)
 
Yes. Latency is not a problem. That is the reason why AMD released latency improved AGESAs, a new chipset with improved memory support to reduce latency, some 2000 series chips with reduced L2 latency, a special BIOS mode for dual-die TR (that disables one die to eliminate die-die cross latency), it is also the reason why reviews test Zen with the higher possible OC memory to reduce latencies, and why users are in forums asking about how to obtain the fastest stable RAM for reducing latency in their builds. :rolleyes:

It's a enthusiast forum so yeah were always trying to tweak the machine to get the best otherwise you would just buy a dell or something. It's not a issue and it makes a small difference in benchmarks if you tweak the memory speed and latencies. It's not a problem except for a few unique cases in the Threadripper and everyone knows about that if you read reviews covering the largest Threadripper. Also Id rather have a feature to disable a ccx that I didnt need if it hurt a specific task I needed to do. All you get from Intel is free exploits these days and reduced performance. Chiplet design was always going to have pros and cons but at least they can continue to innovate on it and move to a smaller process easier. If Intel keeps with that monolithic design they will fall further and further behind as the node shrinks will destroy their yields.
 
  • Like
Reactions: N4CR
like this
Windows has problems (scheduler???) with more than 16 cores 32 threads...
Look at linux benches vs Windows and then come back...(not gaming benches)

I can look at Linux benchmarks all day and not exceed a hard zero for say Adobe Premiere.

Yes, Microsoft (and software vendors) need to address the issue, and yes, AMD is still responsible for releasing a product without full software support.


Beyond that, even when the software support is there, yes latency is still going to be a factor. That's what we're really highlighting using Threadripper, though it should be reiterated that the issues seen with Threadripper represent an absolute worst case and are significantly worse than what should be expected from multi-chiplet Ryzen 3 releases.
 
From WCCF tech:

graph_7.png


I this is to be believed....... then they did something right to get some more IPC... (+25% commpared to 16 Core Threadripper)
16 core Intel @4.8 = AMD 16 core @ 4.2......Holy crap!
But then its WCCF Tech.............
I guess we'll see......
 
Last edited:
From WCCF tech:

View attachment 162526

I this is to be believed....... then they did something right to get some more IPC... (+25% commpared to 16 Core Threadripper)
16 core Intel @4.8 = AMD 16 core @ 4.2......Holy crap!
But then its WCCF Tech.............
I guess we'll see......

Another source claims score of ~4250. This is "~12,5% IPC gain" over Zen+. And this is Cinebench, which favors Zen muarchs.

I guess we'll see......

This was discussed before. There is no space in the IO die for a proper L4. Moreover, software doesn't detect any L4.
 
get some more IPC

...more IPC in Cinebench, which really means more HPC float throughput.

Not to belittle that achievement, but it seriously doesn't mean much when you throw in stuff that is latency and concurrency dependent. If it pans out, it will be nice for video editing though!
 
Another source claims score of ~4250. This is "~12,5% IPC gain" over Zen+. And this is Cinebench, which favors Zen muarchs.


Is this going to turn into a GPU type argument that the Nvidia fans always use when AMD does well, where anything that favors AMD is invalid because it didn't favor Nvidia like the majority of all games do, except in this case it is a CPU not a GPU? The difference in this case is it is Intel instead of Nvidia and the majority of software still favors Intel architecture. The sad part is, you are ignoring the fact that the i9 7960x is overclocked to 4.8Ghz (14% higher clock speed ), as well as having 2 extra cores.. so your "it favors AMD" argument is invalid no matter how you spin it if these numbers are accurate.

I for one hope these are legit and accurate results. It will be fun times.
 
Last edited:
as well as having 6 extra cores

...they're both 16-core CPUs...

I for one hope these are legit and accurate results.

They may be for this specific spin of Cinebench, but the application of these scaling results to any other application directly would be wholly incorrect.

It's entirely possible for Ryzen 3 to be slower than Ryzen 2 with the same number of cores given the changes to the architecture. One step forward, two steps back, and all that.
 
Even still, Cinebench is a good indicator of some baseline performance. I knew my performance was bad on my Ryzen 2700X (and before that my 1700X) because of Cinebench. I couldn't really tell in games, per say, except benchmarks were always slower than other comparable systems. Turns out it was my memory. Swapped to Ryzen compatible memory and everything worked perfectly.

I'm super excited about these rumors. I really hope they end up being true. We're close enough now and I've seen enough smoke to believe it to be. Cost will be an interesting factor as well. Anything in the $200 range that performs as well or better than current gen CPUs are lower TDP will be winners for sure especially for those who haven't switched to AMD.

I'm personally looking forward to the 12c/24t or 16c/32t CPUs.
 
Last edited:
  • Like
Reactions: blkt
like this
Really getting curious on pricing. If a tr4 refresh is some months after zen 2 release, will zen 2 be in the same price bracket as tr4?
 
Really getting curious on pricing. If a tr4 refresh is some months after zen 2 release, will zen 2 be in the same price bracket as tr4?
Maybe current gen TR price for matching core counts, more or less depending on how much TR is left in inventory and the relative performance. Expect Zen2 TR to be more expensive across the board.
 
Is this going to turn into a GPU type argument that the Nvidia fans always use when AMD does well, where anything that favors AMD is invalid because it didn't favor Nvidia like the majority of all games do, except in this case it is a CPU not a GPU? The difference in this case is it is Intel instead of Nvidia and the majority of software still favors Intel architecture. The sad part is, you are ignoring the fact that the i9 7960x is overclocked to 4.8Ghz (14% higher clock speed ), as well as having 2 extra cores.. so your "it favors AMD" argument is invalid no matter how you spin it if these numbers are accurate.

I for one hope these are legit and accurate results. It will be fun times.

What Nvidia fans have to do with what is being stated here about Cinebench? What is being remarked is that Cinebench doesn't represent the average performance of Zen chips, neither represent those tasks where AMD is much worse. Cinebench is an "outlier" for AMD Zen. Why Cinebench is a best case for Zen doesn't have anything to do with optimizing for "Intel architecture". It has to do with Cinebench being a rendering benchmark (that is a throughput load) and Zen being optimized for throughput rather than for latency, apart from Cinebench having anomalous SMT yields (which further favors an architecture as Zen)

The Stilt said:
Cinebench R15 is some sort of a best case benchmark for AMD, that's why it's an outlier.
The IPC difference is abnormally low (5.6% vs. 14.4% average) and the SMT yield is abnormally high (41.6% vs. 28.7% average).

And the 7960X has 16 cores, not 18.
 
What Nvidia fans have to do with what is being stated here about Cinebench? What is being remarked is that Cinebench doesn't represent the average performance of Zen chips, neither represent those tasks where AMD is much worse. Cinebench is an "outlier" for AMD Zen. Why Cinebench is a best case for Zen doesn't have anything to do with optimizing for "Intel architecture". It has to do with Cinebench being a rendering benchmark (that is a throughput load) and Zen being optimized for throughput rather than for latency, apart from Cinebench having anomalous SMT yields (which further favors an architecture as Zen)



And the 7960X has 16 cores, not 18.
You completely missed my point, as it is hard to demonstrate true performance, when the majority of software is coded and optimized for Intel, not AMD. The NVidia comment was made in hopes you would not take a similar path in this comparison, but it didn't work as your response indicates, as it is just an excuse to discount the results, as expected.

It is true that you can determine true performance from just one piece of software, but a person still has to give credit where credit is due rather than make statements trying to invalidate the achievment and/or the results, which is exactly what your statement is trying to do. Basically all you are saying is AMD is optimized for this particular application, intel is not, so the results don't mean anything because it shows Intel's weakness and not AMD. Its the exact same argument Nvidia fans use when AMD does well in a particular game.

As for my core count mistake, may i suggest you read the rest of the comments as it was already covered.
 
Last edited:
You completely missed my point, as it is hard to demonstrate true performance, when the majority of software is coded and optimized for Intel, not AMD.

Not true.

It is true that you can determine true performance from just one piece of software, but a person still has to give credit where credit is due rather than make statements trying to invalidate the achievment and/or the results, which is exactly what your statement is trying to do. Basically all you are saying is AMD is optimized for this particular application, intel is not, so the results don't mean anything because it shows Intel's weakness and not AMD. Its the exact same argument Nvidia fans use when AMD does well in a particular game.

As for my core count mistake, may i suggest you read the rest of the comments as it was already covered.

No. I am not saying that "AMD is optimized for this particular application". My remark was other. And you ignored my point again: my point that CB15 doesn't in any way represent the general behavior of applications, but it is an outlier.
 
Not true.



No. I am not saying that "AMD is optimized for this particular application". My remark was other. And you ignored my point again: my point that CB15 doesn't in any way represent the general behavior of applications, but it is an outlier.

You are talking in circles, contradictory of your own words. Didn't you just state above that C15 is a rendering benchmark (that is a throughput load) and that Zen is optimized for through put, not latency? Doesn't that mean the zen is optimized for that particular application since that is what C15 is really testing?

I never said C15 could represent it's behavior in other applications, specially since each application is different and the results will change with each application. In other words, an application can only show the behavior for that particular application or applications in that particular category. C15 can only represent the performance in the rendering category. But it seems, you are trying to invalidate those results, as if rendering performance means nothing since it isn't able to show the behavior in other applications/categories outside of rendering.

If C15 is just a "outlier" why did Intel go to great lengths with their 28 core "chiller" fiasco last year to try to look competitive with AMD, using C15?
 
Last edited:
From WCCF tech:

....

Even WCCF tech is saying to take this one with "a grain of salt".

If true, this is a very big performance jump.

But one thing from this latest WCCF "leak" stands out as a red flag. They have the boost clock on the 12 core at 5Ghz, and only 4.3 on the 16 core. That makes no sense at all if you understand how boost clock works.

Since boost clock usually only affect 1 or 2 cores then starts dropping. It should be about the same on both 12 and 16 core parts, and generally AMD sets it higher on higher core count chips not the other way around.

It looks more like someone got sloppy making up their "leak".
 

Based on what? Why wouldn't there be cache on the IO die?

Without cache you are reduced to doing a cross bar from the memory controller to each chiplet(locking out the rest), for each memory access. This would be quite inefficient.

With a nice fat cache, the memory controller can work on keeping the cache filled, while chiplets could have simultaneous access to cache portion set up for them.
Also look at the size of the IO die, and consider that it does relatively little. I would bet most of the die is cache.
This will be extremely important for Epyc with 8 chiplets, but this kind of design should also by on Ryzen 3000 with two chiplets.

I'd be shocked if there is no cache in IO die.
 
Based on what? Why wouldn't there be cache on the IO die?

Without cache you are reduced to doing a cross bar from the memory controller to each chiplet(locking out the rest), for each memory access. This would be quite inefficient.

With a nice fat cache, the memory controller can work on keeping the cache filled, while chiplets could have simultaneous access to cache portion set up for them.
Also look at the size of the IO die, and consider that it does relatively little. I would bet most of the die is cache.
This will be extremely important for Epyc with 8 chiplets, but this kind of design should also by on Ryzen 3000 with two chiplets.

I'd be shocked if there is no cache in IO die.
relevant article on wikichip
 

While that answer is slightly better than "no", it's an Infinity Fabric article that says nothing about the new I/O controller chip in Ryzen 3000.

The only thing I could find on wikichip that mentioned the either of the new I/O dies was this:
https://en.wikichip.org/wiki/amd/microarchitectures/zen_2
Rome is codename for AMD's server chip based on the Zen 2 core. Like prior generation (Naples), Rome utilizes a chiplet multi-chip package design. Each chip comprises of nine dies - one centralized I/O die and eight compute dies. The compute dies are fabricated on TSMC's 7 nm process in order to take advantage of the lower power and higher density. On the other hand, the I/O makes use of GlobalFoundries mature 14 nm process.

The centralized I/O die incorporates eight Infinity Fabric links, 128 PCIe Gen 4 lanes, and eight DDR4 memory channels. The full capabilities of the I/O have not been disclosed yet. Attached to the I/O die are eight compute dies - each with eight Zen 2 core - for a total of 64 cores and 128 threads per chip.

The truth is that no one knows what is in the I/O die, but given the large size vs work to be done, it seems reasonable that it would include a significant cache.
 
  • Like
Reactions: N4CR
like this
While that answer is slightly better than "no", it's an Infinity Fabric article that says nothing about the new I/O controller chip in Ryzen 3000.

The only thing I could find on wikichip that mentioned the either of the new I/O dies was this:
https://en.wikichip.org/wiki/amd/microarchitectures/zen_2


The truth is that no one knows what is in the I/O die, but given the large size vs work to be done, it seems reasonable that it would include a significant cache.
Wasn't really meant as a "no", just for informational purposes. I understand that, being a wiki, it is made up of bits of information pieced together by individuals with varying levels of understanding of the subject, sometimes without good sources.
 
The truth is that no one knows what is in the I/O die, but given the large size vs work to be done, it seems reasonable that it would include a significant cache.
^This
I seriously doubt AMD has spent all this time and money just to push out a turd.
 
considering the price difference between the intel and its leaked rumored price. well it would have to be really terrible not to be a big win. cant wait to find out for sure :)
 
Based on what? Why wouldn't there be cache on the IO die?

Without cache you are reduced to doing a cross bar from the memory controller to each chiplet(locking out the rest), for each memory access. This would be quite inefficient.

With a nice fat cache, the memory controller can work on keeping the cache filled, while chiplets could have simultaneous access to cache portion set up for them.
Also look at the size of the IO die, and consider that it does relatively little. I would bet most of the die is cache.
This will be extremely important for Epyc with 8 chiplets, but this kind of design should also by on Ryzen 3000 with two chiplets.

I'd be shocked if there is no cache in IO die.

Space. There is no enough space in the IO die.
 
You are talking in circles, contradictory of your own words. Didn't you just state above that C15 is a rendering benchmark (that is a throughput load) and that Zen is optimized for through put, not latency? Doesn't that mean the zen is optimized for that particular application since that is what C15 is really testing?

I never said C15 could represent it's behavior in other applications, specially since each application is different and the results will change with each application. In other words, an application can only show the behavior for that particular application or applications in that particular category. C15 can only represent the performance in the rendering category. But it seems, you are trying to invalidate those results, as if rendering performance means nothing since it isn't able to show the behavior in other applications/categories outside of rendering.

If C15 is just a "outlier" why did Intel go to great lengths with their 28 core "chiller" fiasco last year to try to look competitive with AMD, using C15?

Zen muarch is optimized for throughput workloads, which isn't the same than saying that "AMD is optimized for this particular application". I already explained why CB15 is an outlier, because it has special characteristics such as abnormally large SMT yields. The Zen architecture has not been optimized for executing CB15.

You continue ignoring the point, and this is the third time you do. The problem here isn't that CB15 doesn't represent non-rendering applications. Blender doesn't represent 7-zip and 7-zip doesn't represent SPECint, and SPECint doesn't represent GROMACs... The problem is that CB15 is an outlier (it doesn't represent rendering, because Blender, Corona,... behave differently), and being an outlier CB15 must be taken out the sample of representative applications.

It doesn't matter what marketing teams do. CB15 is an outlier as proved in a former post.
 
Zen muarch is optimized for throughput workloads, which isn't the same than saying that "AMD is optimized for this particular application". I already explained why CB15 is an outlier, because it has special characteristics such as abnormally large SMT yields. The Zen architecture has not been optimized for executing CB15.

You continue ignoring the point, and this is the third time you do. The problem here isn't that CB15 doesn't represent non-rendering applications. Blender doesn't represent 7-zip and 7-zip doesn't represent SPECint, and SPECint doesn't represent GROMACs... The problem is that CB15 is an outlier (it doesn't represent rendering, because Blender, Corona,... behave differently), and being an outlier CB15 must be taken out the sample of representative applications.

It doesn't matter what marketing teams do. CB15 is an outlier as proved in a former post.

Why do you keep subjugating zen as throughput optimized?

If you take a fixed set of data and measure how long it takes, it is a response metric.

If you take a fixed time period and measure how much data is processed, it is a throughput workload.

Since CB15 takes a fixed set of data and times how long it takes it is thusly the former.

By the above logic and your subjugation, zen is optimized for response not throughput.

No processor is optimized for a certain frame of reference.

Irony that you pull out your GROMACs whistle when complaining about an outlier.
 
Zen muarch is optimized for throughput workloads, which isn't the same than saying that "AMD is optimized for this particular application". I already explained why CB15 is an outlier, because it has special characteristics such as abnormally large SMT yields. The Zen architecture has not been optimized for executing CB15.

You continue ignoring the point, and this is the third time you do. The problem here isn't that CB15 doesn't represent non-rendering applications. Blender doesn't represent 7-zip and 7-zip doesn't represent SPECint, and SPECint doesn't represent GROMACs... The problem is that CB15 is an outlier (it doesn't represent rendering, because Blender, Corona,... behave differently), and being an outlier CB15 must be taken out the sample of representative applications.

It doesn't matter what marketing teams do. CB15 is an outlier as proved in a former post.

CB15 doesn't represent non-rendering applications? I swear I saw a statement that said basically the same thing. Now where did I see that.. hmm, oh wait, silly me! I said it in the very response you replied too (did you fully read what i said).. I think you just confirmed the point I was trying to make. You even took it so far as give examples, and basically using your logic you invalidated every benchmark/application used to judge performance because no single benchmark/application is capable of demonstrating relative performance in every situation for every workload category be it rendering, compression algorithm, gaming, etc. So basically with your logic, 7-Zip benchmarks are invalid because it doesn't represent rendering performance. Do you see how silly your argument is now?

BTW, how is a rendering benchmark not a representation of rendering? I get that a benchmark is going to behave different than an actual rendering application, that is a given. Just as gaming benchmarks behave different than actual game play. But they are still tools to give us indicators of how a piece of hardware will perform doing a particular work load and a way to judge performance between different manufactures/architectures etc.
 
Last edited:
When AMD dominates: it's because of a freak accident.

If we take average performance over a range of benchmarks that say represent >95% of applicable workloads, and there's this one benchmark that really stands out one way or another, we can call it an outlier. CB15 more or less is that.

If we could take a CB15 score and apply that scaling across the board, that would be awesome- but we really, really can't, particularly with AMD changing the layout with Ryzen 3. Ryzen to Ryzen 2 would be more reliable and yet we'd still be a bit skeptical.

To state my concern, I see the possibility that Ryzen 3 might be faster than expected in some workloads and slower than expected in others- as in, we very likely won't see a linear shift in performance from Ryzen 2. These quite nice CB15 resultes point to pure float throughput being up per clock, however, AMD's architectural changes point to lower IPC for anything that requires memory access or thread coherency across dies, simply because they've split processing between two dies and put the memory controller on a third.

And to be clear, just like the references used in this thread, we're speculating. For me, it isn't anti-AMD- I'd love to see Skylake or better performance across sixteen cores in something accessible to consumers! I just see some very real speedbumps potentially impeding Ryzen 3 from getting there.
 
Back
Top