AMD Claims Zen 2 Has 29% Higher IPC than Zen 1 in Certain Workloads

"Certain Workloads" nuff said. Cherry picking isn't a good sign.

intel does the same shit, welcome to marketing.. going from the 6700k to 7700k they were claiming 15% IPC improvement.. in reality it was all over the place from 2% to 15% depending on the workload and on average ended up some where in the middle.
 
This is easy to check.

The best benchmark for calculation thruput for any processor I've found is pretty obscure: Euler3d.

It's a simulation of an aircraft wing:

"The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, a taper ratio of 0.66, and a 45 degree quarter-chord sweep angle. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes. Figure 1 shows the CFD predicted Mach contours for a freestream Mach number of 0.960."

http://www.caselab.okstate.edu/research/euler3dbenchmark.html

The answer is a frequency, higher is better.

This 6/12 core system as overclocked gives a 9.4, the 24 core setup at work wasn't much faster, at about half the clockspeed.

I'd love to see some numbers, if someone would run them. :)

We were doing reverse projections of tomographic data, and this gave us a good handle for evaluating new computers.

We used HP, Dell, then they let me build some, and never went back to the vendors, lol.
 
I laughed HARD when I saw this on Guru3d...

Worth a repost!

(((Meanwhile at Intel)))

tenor.gif
 
I've been telling people that the time for a major upgrade will be in 2019 when Ryzen 2 is out. People thought that I was just being some kind of fanboy. Basically in 2019 you will be able to buy a high end version of a Sony PS5 that is all AMD.
 
Last edited:
I've been telling people that the time for a major upgrade will be in 2019 when Ryzen 2 is out. People thought that I was just being some kind of fanboy. Basically in 2019 you will be able to buy a high end version of a Sony PS5 that is all AMD. I predict 16 core Ryzen 2's @ 5Ghz and Navi based GPU's using the chiplet approach with up to 4x 7nm Vega chiplets onboard. That would be like having 4x GTX 1080's. That is the kind of power you will need to push a 4k image @ 120hz which will be the new gold standard by 2020.

AMD already came out and said that Navi will be monolithic and that whatever GPU comes after will utilize infinity fabric.
 
AMD already came out and said that Navi will be monolithic and that whatever GPU comes after will utilize infinity fabric.

Do you have a link for the infinity fabric following navi? Just asking cause only thing I could find was Raja hinting at that (and he's gone) while David Wang (one AMD SVP) said it was not viable for the near future.
 
According to The Stilt's extremely detailed analysis on Zen+, on the Anand forums, Zen+ has ~9% lower IPC than Skylake/Kaby/CFL, outside niche 256bit workloads. It sounds like 256bit workloads will likely reach near-parity with this release, and from the rumors (they are rumors, though, don't take them seriously!) we will probably see parity, or maybe even a pubic hair above Skylake IPC with Zen 2. Clockspeed parity or near-parity also possible, given TSMC's process.

The question - if all that is true - is how the AM4 chiplet situation will go down. Or if AM4 will even use chiplets, like Rome does. If I were AMD, I'd probably create two IO dies, but use the same chiplets for all products, just bin them accordingly. One IO die for server and TR use, with a max of 8 IF links (or whatever you want to call them), and a smaller mainstream IO die with 2 IF links. The mainstream version could then be available in two basic configurations:

core chiplet -> IO die <- second core chiplet (max: 16 cores, but 12 core configuration - and maybe 8 core - available by salvaging dies with a couple bad cores)

core chiplet -> IO die <- 7nm Vega-derived GPU (max: 8 cores, but with 4 and 6 core configurations available by salvaging dies with bad cores).

But I have absolutely zero idea if that's what they'll really do.
 
Well, IPC is instructions per cycle, so changing boost speeds wouldn't improve IPC at all. AMD is being specific about IPC improvements, and at least with Ryzen it was shown that they were truthful (40% improvement turned into 52% in fact). And given it was buried in the footnotes of their announcement, it is difficult to argue they are hyping it in any measureable way either.

AMD was also truthful about the ~3% IPC improvement of Zen+ over Zen. They've been truthful about their launch schedules, too. They are still bullshitting a lot, I suspect, with some of their GPU claims. But on the CPU side, they've improved a lot.
 
Considering Zen+ has around 7% higher IPC than Zen 1, then if the claims are true of 25% more IPC for Zen 2 , it should be around 18% better on IPC than Zen+. (If true !!).
I think the main thing for AMD to achieve is higher peak clocks to compete with Intel.
 
Considering Zen+ has around 7% higher IPC than Zen 1, then if the claims are true of 25% more IPC for Zen 2 , it should be around 18% better on IPC than Zen+. (If true !!).
I think the main thing for AMD to achieve is higher peak clocks to compete with Intel.

Zen+ is ~3%, not 7% better IPC.
 
Considering Zen+ has around 7% higher IPC than Zen 1, then if the claims are true of 25% more IPC for Zen 2 , it should be around 18% better on IPC than Zen+. (If true !!).
I think the main thing for AMD to achieve is higher peak clocks to compete with Intel.
Higher peak clocks and reduced IF latency. If they can even achieve 5% in each you're looking at one hell of a chip.
 
"Certain Workloads" nuff said. Cherry picking isn't a good sign.


Cherry picking may not be a good sign, but still 29% is huge, considering Intel's generation over generation IPC improvements have tended to be in the 3-5% range.

So, even if AMD only hits half that, its still a very respectable generation to generation improvement.
 

Interesting, but as the article on the front page yesterday pointed out, while Chiplets will do wonders for yields (it is much easier to get good yields on smaller chips than on large ones) it won't do much to combat mores law. I presume this is largely due to the latency associated with infinity fabric. You'll be able to get better yields, and bin more efficiently, and also more efficiently use manufacturing resources utilizing 7nm where it matters the most (cores, FPU), and keeping larger process chips in order places where it matters less, and sure, you can spread out the heat dissipation somewhat to minimize being limited by thermal envelopes, but this will probably be only a small improvement.

I see chiplets and infinity fabric being great for manufacturing and design flexibility and cost, but not doing much if anything at all to drive overall performance forward.
 
Last edited:
Interesting, but as the article on the front page yesterday pointed out, while Chiplets will do wonders for yields (it is much easier to get good yields on smaller chips than on large ones) it won't do much to combat mores law. I presume this is largely due to the latency associated with infinity fabric. You'll be able to get better yields, and bin more efficiently, and also more efficiently use manufacturing resources utilizing 7nm where it matters the most (cores, FPU), and keeping larger process chips in order places where it matters less, and sure, you can spread out the heat dissipation somewhat to minimize being limited by thermal envelopes, but this will probably be only a small improvement.

I see chiplets and infinity fabric being great for manufacturing and design flexibility and cost, but not doing much if anything at all to drive overall performance forward.

Yup, the engineer who is quoted in that link I posted mentions that it's not so much a can't do it, as much as it is that they know it won't do much good for now on the gaming side at least. As long as developers aren't putting in any sort of multiple GPU support into their titles, the "chiplet" (is that a new term or did I just miss it) approach won't provide any benefit as the software will only utilize one of the chips instead of viewing it as one big whole.

That stuff goes way over my head, but I think I understand it to be saying that it's an issue on the software side and using chiplets would do do more harm than good.
 
Yup, the engineer who is quoted in that link I posted mentions that it's not so much a can't do it, as much as it is that they know it won't do much good for now on the gaming side at least. As long as developers aren't putting in any sort of multiple GPU support into their titles, the "chiplet" (is that a new term or did I just miss it) approach won't provide any benefit as the software will only utilize one of the chips instead of viewing it as one big whole.

That stuff goes way over my head, but I think I understand it to be saying that it's an issue on the software side and using chiplets would do do more harm than good.
It's a new term to me, but I dunno if amd coined it or picked it up from somewhere else. I assume it has a meaning of "a chip which doesn't do much on it's own" or "requires other chiplets to function". Whereas, most motherboard chips can function by themselves, these cannot.
 
I see chiplets and infinity fabric being great for manufacturing and design flexibility and cost, but not doing much if anything at all to drive overall performance forward.

We already have a fairly good analog to this: Threadripper.

Chiplets are going to limit latency-bound applications without fail. Those championing the tech seem to do so as if this has never been thought of before- instead, it's more that the concept is just now becoming viable for a broad enough set of applications to warrant basing mass-produced parts on top of.

Threadripper's 'W' parts are a good example of why. You now have eight or sixteen cores that have been divorced from a memory controller and as shown in reviews, if that massive latency penalty is not accounted for, many workloads may tank severely. Fleet as fast as the slowest ship and all that.

While I expect AMD's chiplet-based solutions to be a boon overall- better yields (or more efficient binning), better power and heat distribution leading to better efficiency, etc., there are still going to be situations where the tech will be a bit of a disadvantage.
 
I got to say this when you see big numbers in general it is marketing but then again when it was with Intel it was to put some more oomph into their campaign for you to spend your money on their product.

What can happen that AMD got their 7nm design right (lucky or not) and that could drive certain features better then others. And that certain workloads benefit better then others is a given since that is where they improved more (FP).

You can be sceptical about a lot of things but AMD is not Intel.

The chiplets are a sign that current processor technology is at the physical end. When this happens you can not force things and keep producing large dies without suffering badly (AMD can simply not afford this). Unless the process changes we are bound to see more chiplets solutions rather then less..
 
Unless the process changes we are bound to see more chiplets solutions rather then less..

I'd like to follow this train of thought- I expect chiplets to be a boon for enterprises. Consider one or more 'hub' chiplets surrounded by multiple 'processing' chiplets. There is a latency penalty to moving the memory controller and I/O away from the processing cores, but since the processing cores now lack that extra signalling circuitry, they can likely compensate by adding more cache- be it the traditional SRAM or perhaps even denser DRAM, and probably a combination of the two. Current enterprise-focused processors already do this of course, as they tend to run slower buffered ECC DRAM that provides tremendous bandwidth and capacity at the cost of already significant additional latency compared to contemporary consumer parts.

I also wanted to posit on what a chiplet-based processor might look like- imagine a Zen2 dual-CCX die, so eight cores, with its memory controller stripped and some extra cache added. Now imagine eight of these, so sixty-four cores, all connected to a central I/O chip that itself has eight memory channels and say sixty-four PCIe 4.0 lanes.

We already know that four CCX's in a TR socket can maintain ~4.0GHz, and that's with them located fairly close; give them some thermal 'breathing room' and cut down on the signalling circuitry, and perhaps you may find 4.5GHz working on the top-end. But what's even better is that this solution can scale down very well. You could write several articles speculating the possibilities here!

And if there's one thing that I'd love to see them add: put a basic Vega GPU block on every CCX!
 
We already have a fairly good analog to this: Threadripper.

Chiplets are going to limit latency-bound applications without fail. Those championing the tech seem to do so as if this has never been thought of before- instead, it's more that the concept is just now becoming viable for a broad enough set of applications to warrant basing mass-produced parts on top of.

Threadripper's 'W' parts are a good example of why. You now have eight or sixteen cores that have been divorced from a memory controller and as shown in reviews, if that massive latency penalty is not accounted for, many workloads may tank severely. Fleet as fast as the slowest ship and all that.

While I expect AMD's chiplet-based solutions to be a boon overall- better yields (or more efficient binning), better power and heat distribution leading to better efficiency, etc., there are still going to be situations where the tech will be a bit of a disadvantage.
I would say yes, but it's a bit different. There is the possibility of an L4 cache being present on the central chip, which could make a big difference in latency in some cases. Threadripper didn't need an L4 cache because the dies were directly interconnected, but Rome may have one. Did AMD release the full specifications already?
 
If only there is a thunderbolt 3 AIC that can work on AMD board without hacks......:(
 
I would say yes, but it's a bit different. There is the possibility of an L4 cache being present on the central chip, which could make a big difference in latency in some cases. Threadripper didn't need an L4 cache because the dies were directly interconnected, but Rome may have one. Did AMD release the full specifications already?

Nope, I'm just excited enough about the idea to speculate ;).

But I would also point out that while a centralized L4 cache may happen- it'd still be significantly closer to the processing cores than main memory, after all- the need for cache on each processing die wouldn't be entirely alleviated by it.
 
Who uses thunderbolt anyway?

In my 27 years in this hobby, I've never used either thunderbolt or firewire.

'Used' might be a stretch, but I did have on on my XPS13, and it was capable of 4k60 output. USB-C on my current Zenbook is not. Our work computers (Latitudes) also come with it and it is used as a docking solution.
 
give them some thermal 'breathing room' and cut down on the signalling circuitry, and perhaps you may find 4.5GHz working on the top-end.

I'm not sure if you're talking about from an architecture standpoint or from the die layout on the substrate, but based on the pictures it looks like currently AMD is keeping with a similar arrangement as current EPYC/TR is laid out. I took that as being done because it would keep the "hot spots" in the same area as the current 4-die chips, meaning the same cooling solutions would work well with them.

It took me a moment to realize that, too. As initially, when I saw the picture of how these Rome chiplets were laid out, I couldn't figure out why they would have 4 of the 8 dies further away from the IO chip. To me, that would mean just that much more latency was being introduced. However, I think that AMD was trying to not piss off anyone by going this route, so that these would be drop-ins for any massive server structures that had lots of R&D $$ put into cooling. Of course, this is purely speculation on my part, but it's the only thing that seems logical for why they'd have arranged them how they did.
 
I'm not sure if you're talking about from an architecture standpoint or from the die layout on the substrate, but based on the pictures it looks like currently AMD is keeping with a similar arrangement as current EPYC/TR is laid out. I took that as being done because it would keep the "hot spots" in the same area as the current 4-die chips, meaning the same cooling solutions would work well with them.

It took me a moment to realize that, too. As initially, when I saw the picture of how these Rome chiplets were laid out, I couldn't figure out why they would have 4 of the 8 dies further away from the IO chip. To me, that would mean just that much more latency was being introduced. However, I think that AMD was trying to not piss off anyone by going this route, so that these would be drop-ins for any massive server structures that had lots of R&D $$ put into cooling. Of course, this is purely speculation on my part, but it's the only thing that seems logical for why they'd have arranged them how they did.
Latency shouldn't be different between the dies–there are multiple ways they could achieve same latency with that layout, and I suspect that rather than work around suboptimal latency on the further dies for slightly improved latency on the center dies, they instead did just that.

Afa heat, from what I understand, the central chip should be the hottest of the 9, so the four central chips may be slightly hotter than the four outside. That would be even more reason to not load the central chips (which may have slightly better latency) more than the outside ones, as the added heat would make hitting the highest clocks more difficult.
 
I would say yes, but it's a bit different. There is the possibility of an L4 cache being present on the central chip, which could make a big difference in latency in some cases. Threadripper didn't need an L4 cache because the dies were directly interconnected, but Rome may have one. Did AMD release the full specifications already?

not yet, they refused to talk about anything architecture related until they finalize the clock speeds/optimizations they want to use for Rome.
 
Now I wonder how this IPC translates to SenseMi if it gets close to the 29% IPC uplift claim that would be really great given how good that tech is at very noticeably speeding up slower cheaper mechanical storage. Even if that improves by like 20% from IPC that's big impact and would help a fair bit.
 
Who uses thunderbolt anyway?

In my 27 years in this hobby, I've never used either thunderbolt or firewire.

I am running with LG UltraFine 5K monitor, which is thunderbolt 3 connector only.

Using it to swap between laptop and desktop through a dock. 1 cable charge both laptop, connect display, and hook up all the keyboard/mouse accessories.
When need to use for desktop, just 1 cable swap and everything is connected again.

It's quite useful overall for those who uses laptop. There are a large community among mac user and thunderbolt 3 enable laptops that uses it.
 
Higher peak clocks and reduced IF latency. If they can even achieve 5% in each you're looking at one hell of a chip.
Don't forget there is a good chance they packed a shitload of cache on those chiplets, plus the IMC tweaks... I think that ~10% on average IPC is possible. Combined with 4.5GHz minimum they will be beating the 9900k stock or OC. While using less power.
 
You guys are confusing physical separation with latency.

It doesn't work like that.

A signal on a PCB trace is 160ps/inch on an outer layer, 180ps/inch on an inner layer.

A 4.5GHz clock is 222ps between clocks.

That means ONE CLOCK is 1.23" long, worst case. Lower clocks are longer.

There's no latency involved with that.

How many cpu clocks per memory access are you seeing in CPU-Z?

My memory to core is 1:16, with 4 memory channels that's overall 1:4.

Four channels at 50GB/s is 200GB/s combined interleaved. That's why I went with a 2011 socket...

That's relatively low; I've had systems that were 1:2, but that was before the multi GHz clocks.

There's your latency.
 
Last edited:
Don't forget there is a good chance they packed a shitload of cache on those chiplets, plus the IMC tweaks... I think that ~10% on average IPC is possible. Combined with 4.5GHz minimum they will be beating the 9900k stock or OC. While using less power.

Especially since the 2700x is already right near 9900k's when they are limited to the advertised 95W TDP ;).

But seriously, I'd definitely buy the 3700x or whatever it's going to be called if I get 10% improvement over my 2700x. And the best part is, I can pass down the other cpu's to other family members on AM4 because the motherboard is compatible.
 
Especially since the 2700x is already right near 9900k's when they are limited to the advertised 95W TDP ;).

But seriously, I'd definitely buy the 3700x or whatever it's going to be called if I get 10% improvement over my 2700x. And the best part is, I can pass down the other cpu's to other family members on AM4 because the motherboard is compatible.
That's my plan as well. My wife's machine is due for an upgrade (she's still using an older 4690k) so I'll give her my 1700x and I'll upgrade to Zen2 when its released (I held off on the 2700x but the itch is still there).
 
Back
Top