AMD: Zen will offer 40% faster performance per clock than Carrizo

You can increase the IPC by adding more transistors as the silicon shrinks, but that isn't what the 40% improvement figure is based around. The 40% number is solely design improvements independent of the process. If Zen was a 28nm CPU like the Carrizo, it would have a 40% IPC improvement.

The Excavator alone is getting a 4-15% IPC improvement over Steamroller without any reduction.


Well that's the whole thing, without the process you won't get the 40% IPC, as chips get more complex to design and as reduction in nodes (process) doesn't give as much benefit from power consumption and performance through frequency because of leakage, design has more influence than process to get where you want to be, but process is still part of that because design restrictions are applied by the process.

When you have per clock improvement of 40%, what will the clock speed be for a 14nm Zen, it might not be much more than current CPU's, and that will still stick to that 40% overall increase, hopefully that is not the case they might be able to get more frequency through better design but that is all up in the air. Intel hasn't really been able to increase frequency much of their CPU's even though they have much more control over the process and libraries they use.
 
its a band aid to the actual problem, its like cancer, you want to treat the symptom or do you want cure it? Its a lack of understanding or blind faith when you don't want to have AMD to compete with Intel on a per processor level, but say hell use this API, it will help your noncompetitive CPU, guess what though, the other side of the coin it help Intel's CPU too! So its good for everyone not just AMD's CPU, OK?

WTF are you talking about, I think you are confused with Intel's wishes for low power devices, their CPU's are getting there, but its a different design philosophy to create ultra low power chips. Intel's mind set wasn't there until recently. ARM technology, that was there mind set from the beginning so yeah ARM has this advantage, but Intel is catching up.

It is not a band aid. You can not get performance any other way look at star swarm demo it proves that cores>ipc. You fail to understand basic rules cpu does not drive game development gpu does drive game development hence teraflop gpu still megaflop cpu .....

Well so much for Intel beating ARM maybe they can buyout every OEM worked for the x86 market, ooh crap can't Intel compete but they have lower nm process for so long or did they have to focus all of their energy just to be able to beat AMD... When ARM goes to lower nm, Intel will still be buying design wins for a long long time forget competition.
 
Mark Papermaster, AMD’s Chief Technology Officer. Revealed that Zen will have a huge improvement in IPC, Instructions Per Clock, vs Excavator. AMD’s latest and last, yet unreleased, Bulldozer family CPU core. A 40% increase in IPC would represent the largest jump in IPC ever for the company. We’re not particularly surprised but still very excited about this huge improvement.

Mark Papermaster also made it a point to highlight that this 40% performance improvement figure is independent of the manufacturing process. So it’s a permanent architectural performance improvement that will always be present regardless of the process node flavor chosen to make a Zen based product.

http://wccftech.com/amd-officially-reveals-2016-cpu-roadmap-zen-k12/
 


Its never independent. They might have that as a goal, but process is tied to that.

No engineer can make that claim, no CPU manufacturing is done with out looking at these things:

Performance goals. X
How to get to X through:
Process
Architectural changes
Emerging technologies
Simulate the CPU virtually
Fab the CPU
And then check if the goal was reached

They can't put Zen on a previous node lets say 32nm and get the same 40% IPC improvement because they wouldn't have the room to do it.
 
Its never independent. They might have that as a goal, but process is tied to that.

No engineer can make that claim, no CPU manufacturing is done with out looking these things:

Performance goals. X
How to get to X through:
Process
Architectural changes
Emerging technologies
Simulate the CPU virtually
Fab the CPU
And then check if the goal was reached

They can't put Zen on a previous node lets say 32nm and get the same 40% IPC improvement because they wouldn't have the room to do it.

Yet clearly AMD's Chief of Technology Officer disagrees with you.

It might have a baseline and in this case it would be Excavator which is 28nm. So when they say it's independent, they mean that Zen manufactured at 28nm would be a 40% IPC improvement.

Rumors already state that AMD has both a 14nm and 16nm version of Zen and I wouldn't be surprised if they had a 28nm and 22nm just in case the 14nm and 16nm processes weren't ready.
 
Yet clearly AMD's Chief of Technology Officer disagrees with you.

It might have a baseline and in this case it would be Excavator which is 28nm. So when they say it's independent, they mean that Zen manufactured at 28nm would be a 40% IPC improvement.

Rumors already state that AMD has both a 14nm and 16nm version of Zen and I wouldn't be surprised if they had a 28nm and 22nm just in case the 14nm and 16nm processes weren't ready.

I'm sure that Razor1 can email Mark and explain to him he is wrong ;)
 
It is not a band aid. You can not get performance any other way look at star swarm demo it proves that cores>ipc. You fail to understand basic rules cpu does not drive game development gpu does drive game development hence teraflop gpu still megaflop cpu .....

Well so much for Intel beating ARM maybe they can buyout every OEM worked for the x86 market, ooh crap can't Intel compete but they have lower nm process for so long or did they have to focus all of their energy just to be able to beat AMD... When ARM goes to lower nm, Intel will still be buying design wins for a long long time forget competition.


Cores are more important that IPC when they are used and used properly, I never stated anything to the contrary to that, so your point is invalid. What I am saying is your statement that AMD recommended to use next gen API's as some sort of balm to help their lower performing parts to compete with Intel's parts, that is a FALLACY, as those API's also help Intel parts too. You can't use it for one and not the other, unless you want to only show the good side for one and not the other.

How is ARM doing on the server front compared to Intel BTW, yeah thats right they are pretty much none existent in that market again different design philosophies for different markets. So what is your babble about? BS?
 
Yet clearly AMD's Chief of Technology Officer disagrees with you.

It might have a baseline and in this case it would be Excavator which is 28nm. So when they say it's independent, they mean that Zen manufactured at 28nm would be a 40% IPC improvement.

Rumors already state that AMD has both a 14nm and 16nm version of Zen and I wouldn't be surprised if they had a 28nm and 22nm just in case the 14nm and 16nm processes weren't ready.


What would the size of the chip be, what is the ritcular limit of 28nm ? Answer those questions and you will see it won't fit.
 
I'm sure that Razor1 can email Mark and explain to him he is wrong ;)

He is wrong, its impossible to state performance increase based on design alone, if you are taking about what is the end result on a chip. Theoretically you can state it, but design is restricted by the process, simple as that.

http://www.intel.com/content/www/us/en/history/museum-making-silicon.html

Design specifications that include chip size, number of transistors, testing, and production factors are used to create schematics—symbolic representations of the transistors and interconnections that control the flow of electricity though a chip.
What does this mean? AMD or Intel already know about how many transistors they will need and the size of the chip, and if the process can handle that many transistors and the size is viable for mass production, process dictates the design specifications. What did I say before? You can't say Zen on different process will attain the same IPC if the process can't handle the size or number for transistors to create that increase in IPC. In theory yes you can say a design can do x y and z but there are restrictions, based on the process.
 
Last edited:
You can't say Zen on different process will attain the same IPC if the process can't handle the size or number for transistors to create that increase in IPC.

If you are differing the transistor count, you are no longer on the same design.

IPC with the same design should be a constant.

A process can either manufacture that design or not.

Yield of the process will determine how high it can be clocked, and how much power it will use when doing so, but IPC will remain constant.
 
Zarathustra[H];1041917651 said:
If you are differing the transistor count, you are no longer on the same design.

IPC with the same design should be a constant.

A process can either manufacture that design or not.

Yield of the process will determine how high it can be clocked, and how much power it will use when doing so, but IPC will remain constant.


Exactly for the first part.

Yield doesn't determine how high things can be clocked, clocking has to do with design. Yield is due to design and any manufacturing issues.
 
What would the size of the chip be, what is the ritcular limit of 28nm ? Answer those questions and you will see it won't fit.

What in the world is reticular limit? I was curious so I did a search on it and the only results that came up were two other posts on two forums, both made by you.

As for the size of the CPU, how am I suppose to know? I don't work for AMD and I am not privy to the design documents of the Zen. It is said to have a smaller core than Skylake which allows it to have twice the L2 cache.
 
Cores are more important that IPC when they are used and used properly, I never stated anything to the contrary to that, so your point is invalid. What I am saying is your statement that AMD recommended to use next gen API's as some sort of balm to help their lower performing parts to compete with Intel's parts, that is a FALLACY, as those API's also help Intel parts too. You can't use it for one and not the other, unless you want to only show the good side for one and not the other.

How is ARM doing on the server front compared to Intel BTW, yeah thats right they are pretty much none existent in that market again different design philosophies for different markets. So what is your babble about? BS?

AMD is not recommending anything I'm stating that the Bulldozer cpu are not so bad as everyone makes them out to be. They are even good performers on new API as it seems now (this is the only thing that matters at this moment in time) Bulldozer design can't catchup with ipc so that ship has sailed. Higher core count would help Intel but not as much as it does with AMD.

I have no idea on ARM server performance all I know is mobile devices tend to have ARM and not Intel. Not my babble Intel babble of being so good they beat AMD all day long and they have superior nm process and still needing to buy design wins in the mobile area where they wish to compete for but can not because they can only beat AMD ...
 
What in the world is reticular limit? I was curious so I did a search on it and the only results that came up were two other posts on two forums, both made by you.

As for the size of the CPU, how am I suppose to know? I don't work for AMD and I am not privy to the design documents of the Zen. It is said to have a smaller core than Skylake which allows it to have twice the L2 cache.


reticular limit is the max size a chip can be made before errors in the silicon make it unmanufacturerable.

For most processes, usually the process and maturity of the process you should be able to get 600mm 2.
 
reticular limit is the max size a chip can be made before errors in the silicon make it unmanufacturerable.

For most processes, usually the process and maturity of the process you should be able to get 600mm 2.

I am sure you have a link explaining this?
 
He is wrong

Yeah at AMD they just make up numbers that sound good, maybe they had an internal poll somewhere on which number to vote for...

You know where you then have to explain your feelings towards the number you picked then add some philosophy behind the strength of the number. Maybe even write an essay about it ...
 
Zarathustra[H];1041917651 said:
If you are differing the transistor count, you are no longer on the same design.

IPC with the same design should be a constant.

A process can either manufacture that design or not.

Yield of the process will determine how high it can be clocked, and how much power it will use when doing so, but IPC will remain constant.

That would depend on what you mean by design. The design of the x86 architecture? The design of the x86-64 extension? Design can mean a lot of things based on context. Mark Papermaster specifically says the 40% IPC improvements is from the architectural improvements that Zen has over Excavator and he specifically says they aren't estimating the benefits they will be getting for the process they will be using.
 
That would depend on what you mean by design. The design of the x86 architecture? The design of the x86-64 extension? Design can mean a lot of things based on context. Mark Papermaster specifically says the 40% IPC improvements is from the architectural improvements that Zen has over Excavator and he specifically says they aren't estimating the benefits they will be getting for the process they will be using.


that estimate is valid for any process or smaller process than what Zen is being designed for. It doesn't mean how ever, the process that they are using Zen for right now, will give more than 40% increased performance, although it can mean lower nodes will give the same 40% IPC right off the bat without any changes to the design.
 
Its mostly stuff I picked up while in college its been a while but here


http://pubs.acs.org/doi/abs/10.1021/cg4017118

I would need to purchase an account to read it. Anyway, regardless. Since I can't answer that question even if I wanted to because I don't have access to any AMD internal designs for the Zen anyway.

As for the size of the Zen, we seem to know from what was said and leaks that the Zen core is smaller than Skylake, more in line with the size of a Jaguar core. It allows them to have twice the L2 cache of Skylake.
 
I would need to purchase an account to read it. Anyway, regardless. Since I can't answer that question even if I wanted to because I don't have access to any AMD internal designs for the Zen anyway.

As for the size of the Zen, we seem to know from what was said and leaks that the Zen core is smaller than Skylake, more in line with the size of a Jaguar core. It allows them to have twice the L2 cache of Skylake.


ah sorry hard to find articles on these things because they are fairly specific. I'll keep my eye out though.

Zen seems to be good form everything we know about it right now. Just hope its not smoke up our arse as AMD has been doing as of late.

Well with the increased IPC, they will need that much more cache to fully utilize it. And this is where I'm thinking it will be closer to 60% over all improvement when it comes it out.

Oh I was spelling it wrong, sorry,

here you go,

Reticle

http://www.pcper.com/reviews/Editor...ss-Migration-20-nm-and-Beyond/20-nm-and-Below
 
ah sorry hard to find articles on these things because they are fairly specific. I'll keep my eye out though.

Zen seems to be good form everything we know about it right now. Just hope its not smoke up our arse as AMD has been doing as of late.

Well with the increased IPC, they will need that much more cache to fully utilize it. And this is where I'm thinking it will be closer to 60% over all improvement when it comes it out.

Oh I was spelling it wrong, sorry,

here you go,

Reticle

http://www.pcper.com/reviews/Editor...ss-Migration-20-nm-and-Beyond/20-nm-and-Below

If that amount of IPC needs more cache to fully utilize it, does that mean that Skylake is bottlenecked at the L2 cache?
 
If that amount of IPC needs more cache to fully utilize it, does that mean that Skylake is bottlenecked at the L2 cache?


not at all, design for skylake might not need the extra cache, see if we look at Bulldozer, and variants and look at where the bottlenecks are IPC is a major one. L2 cache for Bulldozer is shared, so more cores more cache needed. Skylake doesn't need that. I'm expecting Zen to have more cores than Intel counterparts, just because they have been doing it all along. The only reason Intel hasn't been putting more cores into the consumer CPU's is there was no need.

here is a good read on why bulldozer failed.

http://www.extremetech.com/computing/100583-analyzing-bulldozers-scaling-single-thread-performance

Bulldozer’s cache latencies are significantly higher than Thuban or Sandy Bridge’s, and the caches themselves are proportioned differently. Previous AMD processors had 64K instruction and 64K data caches for a total of 128K of L1 per core. Bulldozer, in contrast, has just 16K of L1 data cache per core and shares a 64K instruction cache per module. In theory, 16K of L1 is enough — Sandy Bridge has a 16K L1 data cache — but then, Sandy Bridge’s L2 and L3 caches are much faster than their AMD counterparts.
 
not at all, design for skylake might not need the extra cache, see if we look at Bulldozer, and variants and look at where the bottlenecks are IPC is a major one. L2 cache for Bulldozer is shared, so more cores more cache needed. Skylake doesn't need that. I'm expecting Zen to have more cores than Intel counterparts, just because they have been doing it all along. The only reason Intel hasn't been putting more cores into the consumer CPU's is there was no need.

here is a good read on why bulldozer failed.

http://www.extremetech.com/computing/100583-analyzing-bulldozers-scaling-single-thread-performance

That would mean more cache would benefit the Bulldozer. However, the Zen doesn't even remotely resemble the Bulldozer.

I expect more cores as well, but L2 cache is per core. Skylake is 256 KiB per core and Zen is 512 KiB per core.
 
That would mean more cache would benefit the Bulldozer. However, the Zen doesn't even remotely resemble the Bulldozer.

I expect more cores as well, but L2 cache is per core. Skylake is 256 KiB per core and Zen is 512 KiB per core.


well we don't know exactly how they are getting the increased IPC, I'm thinking more ALU/FPU, to do that, so that's where the increased IPC comes from, if that's the case then the increased cache will be necessary.
 
One thing to keep in mind is that not all caches are equal. Bulldozer has very slow cache which hurts it. More slow cache does not solve this problem.
 
Yield doesn't determine how high things can be clocked, clocking has to do with design. Yield is due to design and any manufacturing issues.

Partially true. It's a combination of factors.

Design impacts how high a part can clock, given perfect yield.

The yield of the process can then result in parts that wont clock as high, or require more voltage to reach the same clocks.

This is why we get the silicone lottery, why some of the same model name CPU's overclock higher than others.

If yield didn't impact clock and voltage use, there wouldn't be any reason for manufacturers to bin CPU's, and we know they bin CPU's.
 
well we don't know exactly how they are getting the increased IPC, I'm thinking more ALU/FPU, to do that, so that's where the increased IPC comes from, if that's the case then the increased cache will be necessary.

You are correct in your thinking if what we found out is accurate. A single Zen core will have as many FPUs and ALUs as the complete Bulldozer module did.

One thing to keep in mind is that not all caches are equal. Bulldozer has very slow cache which hurts it. More slow cache does not solve this problem.

One of the features of the Zen is a "high-bandwidth, low-latency cache system."
 
Zarathustra[H];1041917819 said:
Partially true. It's a combination of factors.

Design impacts how high a part can clock, given perfect yield.

The yield of the process can then result in parts that wont clock as high, or require more voltage to reach the same clocks.

This is why we get the silicone lottery, why some of the same model name CPU's overclock higher than others.

If yield didn't impact clock and voltage use, there wouldn't be any reason for manufacturers to bin CPU's, and we know they bin CPU's.


Binning is different, binning is after the fact, because what happens when silicon is designed, manufactures make certain estimates and redundancy within the design to increase the possibility to have fully functional chips, clock rate and functional components thus increasing yields. So after all is said and done, then they bin the chips hopefully to get what they expected to when designed.

Chip design has changed quite a bit from lets say 15 years ago, where they didn't look into the final clock rate, they just hoped to reach a certain clock. now they actually target a certain clock rate when designing the chip.
 
I'm with razor1 on this whole architecture vs node size thing. They're extremely difficult to untangle from each other. Moving one node up/down isn't a big deal, sure (as evidenced by Tick/Tock), but a 10nm P4 chip would still suck, as would a Skylake on a 65nm process.

Now, it could be that Zen's architecture doesn't actually add huge amounts of transistors per core, in which case a node-for-node comparison against their 28nm stuff is valid.
 
I'm with razor1 on this whole architecture vs node size thing. They're extremely difficult to untangle from each other. Moving one node up/down isn't a big deal, sure (as evidenced by Tick/Tock), but a 10nm P4 chip would still suck, as would a Skylake on a 65nm process.

Now, it could be that Zen's architecture doesn't actually add huge amounts of transistors per core, in which case a node-for-node comparison against their 28nm stuff is valid.

Pentium 4 was lousy back then, and it would be lousy now. That's no surprise. A 65nm Skylake would basically be a Core 2. Skylake, Haswell, Sandy Bridge, all the way down is derivative of the old Pentium III CPU architecture.
 
well look at it this way, even though they are based on similar architectures, Intel still needs to rewrite compiler code and microcode for each generation, I was at a presentation once and they stated if they were lucky they only keep around 20% of previous compiler code going from one generation to another. Each node has different metal layers, different transistor layouts, different masks, shifting from one node to another, its complex even if the over all general design schema doesn't change, there still needs to be changes done to the architecture.
 
well look at it this way, even though they are based on similar architectures, Intel still needs to rewrite compiler code and microcode for each generation, I was at a presentation once and they stated if they were lucky they only keep around 20% of previous compiler code going from one generation to another. Each node has different metal layers, different transistor layouts, different masks, shifting from one node to another, its complex even if the over all general design schema doesn't change, there still needs to be changes done to the architecture.

Yes, and there's major extensions added. The issue though isn't really the die shrinking, it's the advancements or changes in technology. Yet, if you made similar changes and added all the extensions to the Pentium 4 with a 14nm process, the thing would still be very bad and its IPC would be nowhere near Skylake's. Steamroller to Zen is a similar shift from Pentium 4 to Core in architecture design. That alone would increase the IPC.
 
Yes, and there's major extensions added. The issue though isn't really the die shrinking, it's the advancements or changes in technology. Yet, if you made similar changes and added all the extensions to the Pentium 4 with a 14nm process, the thing would still be very bad and its IPC would be nowhere near Skylake's. Steamroller to Zen is a similar shift from Pentium 4 to Core in architecture design. That alone would increase the IPC.


True, well Pentium 4 is just a bad design compared to A64, no way around that, just as Phenom and Bulldozer were to their competition.

Pretty much what AMD did when going from A64 to phenom and bulldozer they did a 180 degree design philosophy change by gambling thinking more cores was going to be the way to go, and the gamble didn't work out, because software wasn't ready for more cores form an efficiency standpoint.
 
Hah, my point was missed--let's say since a 65nm Core processor instead. A lot of small tweaks modifications/etc have been done since that have been enabled by transistor density and other fab improvements. Remember, we haven't seen huge jumps in clock speeds, (well, 1.6x since 65nm Cores) or at least nowhere near the increase in areal density--that space is being allocated differently for a different optimization.

And a 65nm Skylake would be hotter than the surface of the sun. Again, architecture and node size are tied together. Not that someone couldn't make a better architecture on 65nm now than they did then.

*As far as massive parallelization--it was a bad idea for a very large swath of the server/laptop/desktop space. Where is all that code parallelization going to come from?
 

Bulldozer’s improved SSE performance (above) and AVX support (below) may help the chip in some corner cases, but at least some AVX-enabled benchmarks, like the Kribi 3D tests available at inartis.com are actually slower on Bulldozer when AVX is used than they are otherwise. It’s not clear if this is because Bulldozer’s AVX implementation is narrower than Intel’s, or because the chip’s SSE capabilities make that instruction set a better fit. Similarly, Bulldozer includes support for multiple new CPU instructions, but AMD’s ability to convince developers to adopt them and recompile code for optimum performance is limited.

Apparently running benchmarks which did not receive a new compile with settings flagged for Bulldozer got these guys puzzled.
Running old software to benchmark new instructions magically would make them work for the Bulldozer, so much for a good read...
 
My concern is this: While IPC is probably going to be 20-30% higher for the typical use case, there has been very little mention of the other half of performance: Clock speed. I have a feeling we might see a clock speed reduction compared to BD.

Point being, if the IPC improvements get you to Haswell level performance, but you then reduce the clock, you fall back to SB/IB level, which is basically non-competitive.
 
My concern is this: While IPC is probably going to be 20-30% higher for the typical use case, there has been very little mention of the other half of performance: Clock speed. I have a feeling we might see a clock speed reduction compared to BD.

Point being, if the IPC improvements get you to Haswell level performance, but you then reduce the clock, you fall back to SB/IB level, which is basically non-competitive.

You think a drop in die size is going to decrease the clock speed?
 
Apparently running benchmarks which did not receive a new compile with settings flagged for Bulldozer got these guys puzzled.
Running old software to benchmark new instructions magically would make them work for the Bulldozer, so much for a good read...


Do you know when those extensions are used?

Tell me and then we can talk about whats relevant and whats not. And also if the software doesn't take advantage of them, its irrelevant as they won't take advantage of them in Intel's case either.

This is probably the the 20th time go read instead of posting stuff you don't know about.
 
Back
Top