AMD Zen Performance Preview

Correct me if I am wrong, but doesn't the CPU you used in your test have 12 cores, 24 threads? I am asking because in your analysis you used Ryzen's number of threads as 8, while it should have probably been 16.

Also the fact that Ryzen 16 thread CPU matches a similarly clocked 6900k which also has 16 threads would mean that at least in this benchmark they have similar IPC per core.

Yes, each CPU in my computer has 12 cores and 24 threads. It's got 2 CPU's. Also, from what I've been told the Core i7 6900K is pretty much all cores locked at 3.5GHz during the benchmark, indicating that it's IPC is just a smidge behind Ryzen.


Not bad for a company with a tiny fraction of Intel's money.
 
Yes, each CPU in my computer has 12 cores and 24 threads. It's got 2 CPU's. Also, from what I've been told the Core i7 6900K is pretty much all cores locked at 3.5GHz during the benchmark, indicating that it's IPC is just a smidge behind Ryzen.


Not bad for a company with a tiny fraction of Intel's money.
Ahh ok, that makes sense, I thought your computer only had one CPU.. sorry I missed that part. Thanks!
 
Nice, but I can't take anyone seriously who still has the Cortana search box enabled. :dead::dead:

Wow, pretty petty, eh? :yuck: I have the Cortana search active but, I just disable the box so it gives me more space for icons on my taskbar.
 
Both Blender and Handbrake are traditionally really awful for AMD's FX CPUs. They are quite FPU heavy and Faildozer suck at it.

What the test show is that Zen's SMT design is functioning as it should, really strong FPU throughput without sharing an FPU unit for 2 "cores".

If there's no surprises then Zen is it, AMD have managed to come from being a decade behind and catching up to Intel in one generation. Jim Keller, god-tier engineer!

If they go good on prices, it's going to be a smash hit. $1100 or even $1750 USD for a CPU is nuts.
 
  • Like
Reactions: N4CR
like this
Dannotech, your program does not show what instructions are being using in the processor for the MIPS calculations when a program is running like that demo. Is it using SSE1, 2, 3, or 4; X87, MMX, AVX1 or 2? Maybe also add support for integer MIPS calculations too instead of just floating point. It's very hard to compare MIPS value in modern processors because they each have their strong and weak points that can depend on what instruction sets are being used. Maybe Intel has stronger AVX than AMD or maybe AMD has stronger X87 than Intel. We have no way of knowing with the current program you have.
 
hypgauge2.gif

X2 !!


Fail !!! Call out AMD for spellcheck, when [H] itself had numerous spelling and grammar issues. The 1st being the opening "tag line":
"AMD's New Horizon live streaming event this week gave us some insights to its new Zen based desktop CPU that will BE marketed under the name of "Ryzen." --fixed, see: http://www.grammar.cl/Notes/Future_Will.htm
 
Fail !!! Call out AMD for spellcheck, when [H] itself had numerous spelling and grammar issues. The 1st being the opening "tag line":
"AMD's New Horizon live streaming event this week gave us some insights to its new Zen based desktop CPU that will BE marketed under the name of "Ryzen." --fixed, see: http://www.grammar.cl/Notes/Future_Will.htm

Lol, Kyle did write this in the article though:

Yes, I am surely a bit of hypocrite on this last point, but HardOCP is far and away from a multi-million dollar tech company with its resources.
 
There's one rather massive and important question that is unanswered: Performance when application makes use of AVX/AVX2 instructions. It should be noticeably slower than on Intel (according to The Stilt).
Of course it'll be slower than Broadwell, Skylake, or Kabylake on pure AVX2 (that is 256bit only) work loads. AMD has already said it can't do 1 256bit FPU op per clock. Anandtech did a nice write up on the subject as well as talked about the core in detail: http://www.anandtech.com/show/10591...t-2-extracting-instructionlevel-parallelism/7.

Realistically this doesn't matter that much now or even in the near future. AVX2 and AVX512 mostly benefit HPC and scientific stuff at the moment it seems. Especially since Intel has to reduce their clock speeds to run AVX512 anyways since heat starts to become a problem. In the long run it'll matter of course but in the long run you'll be buying a new chip anyways.

Given that they slapped in dedicated AES hardware for cryptography, which would probably be the biggest single use case for AVX2/512 for the avg. person, and the slow pace of change in software Zen will probably age fairly gracefully.
 
Of course it'll be slower than Broadwell, Skylake, or Kabylake on pure AVX2 (that is 256bit only) work loads. AMD has already said it can't do 1 256bit FPU op per clock. Anandtech did a nice write up on the subject as well as talked about the core in detail: http://www.anandtech.com/show/10591...t-2-extracting-instructionlevel-parallelism/7.

Realistically this doesn't matter that much now or even in the near future. AVX2 and AVX512 mostly benefit HPC and scientific stuff at the moment it seems. Especially since Intel has to reduce their clock speeds to run AVX512 anyways since heat starts to become a problem. In the long run it'll matter of course but in the long run you'll be buying a new chip anyways.

Given that they slapped in dedicated AES hardware for cryptography, which would probably be the biggest single use case for AVX2/512 for the avg. person, and the slow pace of change in software Zen will probably age fairly gracefully.
X265 encoder benefits from AVX2 quite clearly. There's still quite a bit of work to do for libx264 so at the moment it doesn't benefit as much. Still, for average user and gamer this doesn't really matter. Assuming that AMD tries to pit 8 core Zen against i7-7700K then the extra core count would compensate.

https://bbs.io-tech.fi/threads/amd-ryzen-prosessori-summit-ridge.8165/page-5#post-238621
The Stilt said:
Handbrake käyttää libx264:ää ja se tukee AVX/AVX2 käskyjä. Ainoa ongelma on, että koska toimintoja on vasta alettu kirjoittamaan AVX/AVX2 assemblynä ei noista saada tällä hetkellä mitään hyötyä :think:
Sen sijaan X265:llä jonka kaikki suorituskykykriittiset toiminnot on kirjoitettu myös AVX/AVX2 assemblynä, AVX/AVX2 käskyjen vaikutus suorituskykyyn on erittäin merkittävä.

Testasin asiaa pari päivää sitten uusimmilla kirjastoversioilla:

3RA

X264

8.733fps / 100.00% - (default, all available instructions used)
8.597fps / 98.44% - (AVX2 & BMI2 disabled)
8.440fps / 96.64% - (AVX, AVX2, FMA3 & BMI2 disabled)

X265

11.953fps / 100.00% - (default, all available instructions used)
9.836fps / 82.28% - (AVX2 & BMI2 disabled)
9.726fps / 81.36% - (AVX, AVX2, FMA3 & BMI2 disabled)

1080P, "veryslow" -preset, crf 17.0 for X264
1080P, "slow" -preset, crf 17.0 for X265

Tämä on Haswell-EP:llä.

Ei liene sattumaa että AMD valitsi demoon työkuormia jotka eivät käytä juurikaan 256-bit käskyjä :think:
Toki on aivan normaalia että valmistaja esittelee uutta tuotettaan parhaassa mahdollisessa valossa.
 
X265 encoder benefits from AVX2 quite clearly.
Sure but that is just 1 software out of how many? SSE2 was a huge step up vs x87/SSE but look how long it took to become common place in most software. Same thing goes for 64 bit software. Or 32 bit software if you're willing to accept examples from the 90's. Look at how long its taking for DX12/Vulkan to reach anything resembling widespread usage. Software tends to be slow to change even when the changes bring obvious and large benefits AND even if they're aren't too hard to do (ie. compiler re-run) so its pretty reasonable to say AVX256 won't matter much for a long time.

Assuming that AMD tries to pit 8 core Zen against i7-7700K then the extra core count would compensate.
Even if core counts were the same it won't matter much for a long time. Programmers in general aren't going to target CPU features that only a tiny portion of the market supports as a mainline feature or even a optional one. They'll target it when it starts to become commonplace which is when the effort becomes worth while to do. Yes you'll see smaller and more dedicated developers or ones that have a community behind them like Gentoo or Handbrake use it bleeding edge stuff but they're pretty niche to say the least. Its when stuff like a common web browser doesn't run well without a given feature that they'd run into problems.
 
Sure but that is just 1 software out of how many? SSE2 was a huge step up vs x87/SSE but look how long it took to become common place in most software. Same thing goes for 64 bit software. Or 32 bit software if you're willing to accept examples from the 90's. Look at how long its taking for DX12/Vulkan to reach anything resembling widespread usage. Software tends to be slow to change even when the changes bring obvious and large benefits AND even if they're aren't too hard to do (ie. compiler re-run) so its pretty reasonable to say AVX256 won't matter much for a long time.


Even if core counts were the same it won't matter much for a long time. Programmers in general aren't going to target CPU features that only a tiny portion of the market supports as a mainline feature or even a optional one. They'll target it when it starts to become commonplace which is when the effort becomes worth while to do. Yes you'll see smaller and more dedicated developers or ones that have a community behind them like Gentoo or Handbrake use it bleeding edge stuff but they're pretty niche to say the least. Its when stuff like a common web browser doesn't run well without a given feature that they'd run into problems.
It may matter depending on what you do. It already matters in cutting edge video encoding. It actually makes a difference in Blender too if you use the latest stuff and compile yourself. It will not matter for games anytime soon if ever because those have totally different kind of workloads. It probably will matter for professionals.
 
It may matter depending on what you do. It already matters in cutting edge video encoding. It actually makes a difference in Blender too if you use the latest stuff and compile yourself. It will not matter for games anytime soon if ever because those have totally different kind of workloads. It probably will matter for professionals.

If it never matters in games then we are screwed. Increased IPC is slowly hitting a wall with existing technologies so something major, like quantum computing or something, would need to occur in order for things to change. Games are slowly but surely taking more and more advantage of multiple cores and do remember, games alone are not the only thing running on today's modern computers.
 
Hopefully Ryzen pans out in the real world, would be nice to go back to AMD, the last AMD CPU I had was a 1055T. I just hope it competes in gaming, for the most part there isn't a point to upgrading for games, but it would be nice to support the underdog and have something new to play around with.
 
It may matter depending on what you do.
You're basically ignoring my points about slowness of change in software and developers targeting the most common platform and giving niche cherry picked examples. A whole 2 of them! Even after I gave you multiple real world examples of software being slow to change. If those weren't enough here is another one: look how long it took for decent multi threaded support in games to become common. SMT P4's popped up in 2002 and dual core A64's in 2005 but you didn't really need more than a decently clocked dual core chip for nearly all games until just a few years ago. Its only recently we've seen games use more than 4 threads effectively too.

Also if niche corner case stuff like you're bringing up mattered all that much almost every enthusiast or pro would be running Gentoo or some hand tuned version of QNX RTOS or BSD but all that stuff is hardly used even among people who are pros and really need every last bit of performance they can scrape together.

If it never matters in games then we are screwed.
It'll matter for games eventually it'll just take a long time. It can already be used for physics if you really want and it should eventually matter quite a bit for that sort of thing. I think MS improved compiler support for AVX256 in DirectMath late last year. Don't think anyone is really doing much of anything with it though. Its had AVX support of some sort since 2013 or 2012 I believe but even that doesn't seem to see much if any use.

Change will come guys but its going to be slooooooooooooooooow. Real slow. I don't really like that its like that I'm just telling you the way it is.
 
What's the IPC if you disable hyperthreading? My thought is Blender is ideally suited to hyper threading/SMT so could see a large boost in that. Ergo, Ryzen could be really good in multithreaded environments and still be lackluster in single thread even though they technically look to be even with 8C/16T vs 8C/16T Intel. If the Intel HT doesn't give a true 2x boost in Blender but AMD can get close to that, it'd look even in something like this bench but still be a fair bit behind in single threaded apps.

I want AMD to compete fully (see sig) as that's what I want to upgrade to, but the nature of this benchmark makes me a little skeptical that they can in all conditions. Granted, single threaded performance isn't as important as it was, but a lot of games are still not very multithreaded (and certain functions can't be) so it could hurt it in certain circumstances if it's not also close in single threaded conditions.


For what it's worth my expectation-- lets call it 'gut feel'-- is that Ryzen will be just as strong at single-threaded tasks as Skylake.

Anyway, here is a screen shot of the same test but with Hyper-threading disabled for a total of 24 cores and 24 threads. If you recall from the video I posted on the prior page, these Ivy Bridge Xeon's averaged 1.59 IPC per core when hyper-threading was enabled. As you can see in this screen shot, they are at ~1.15 IPC without Hyper-threading, a reduction of 28%.

Does that jive with your thesis?
 

Attachments

  • Blender-IPC-without-HT.jpg
    Blender-IPC-without-HT.jpg
    117.8 KB · Views: 74
Hmm gut feelings for AMD dont jive well.
Direct proof is needed.
 
And here is a screenshot of Skylake IPC in the same test, also without Hyper-threading since I don't have access to a Skylake Core i7.. /sadface

The average IPC is about 1.24, an 8% increase over Ivy Bridge.
 

Attachments

  • Skylake-Blender-IPC-without HT.jpg
    Skylake-Blender-IPC-without HT.jpg
    105.8 KB · Views: 52
Its fair to wait for direct proof from a independent reviewer but it'd be pretty unlikely for AMD to have a better SMT implementation than Intel while being weaker on single threaded work loads. Intel has been doing SMT for quite a while now and Zen is AMD's first SMT processor. They've avoided doing it for a long time for very good reason, its supposed to be very hard to design and test properly.

The advantage of course is that it'll give you more threads without using as much die space as another core...but only if those real core has enough hardware resources to put towards the virtual one.

The number I remember Intel saying is SMT cost them about 5-10% extra die space but that was quite a while ago (P4 days I think?) so its likely things have changed a lot.
 
Hmm gut feelings for AMD dont jive well.
Direct proof is needed.

You won't get any proof until the processor is out, but keep in mind the micro-architecture details that AMD released a hot chips a few months back paints a very good picture. I've been programming x86 in C++ and assembly for 20 years and I can tell you there is a lot to be excited about here.
 
I'm in no doubt it will be an exceptional chip, but for exactly what isnt clear yet.
Proof of specifics will be needed.
There must be a reason why you arent able to show us, we are reminded of previous ball games before launch.
 
Last edited:
With my signature rig set at stock 3,3GHz, render time was in the 34.55s-34.75s range after 5 runs. Guess i won't be upgrading anytime soon.
 
For what it's worth my expectation-- lets call it 'gut feel'-- is that Ryzen will be just as strong at single-threaded tasks as Skylake.

Anyway, here is a screen shot of the same test but with Hyper-threading disabled for a total of 24 cores and 24 threads. If you recall from the video I posted on the prior page, these Ivy Bridge Xeon's averaged 1.59 IPC per core when hyper-threading was enabled. As you can see in this screen shot, they are at ~1.15 IPC without Hyper-threading, a reduction of 28%.

Does that jive with your thesis?

It does,at least on the Intel side. I'm hoping as postulated above that amd's smt gains are actually lower than Intel's as that would mean their single thread performance is actually better than Intel, but I have a feeling they learned a lot about sharing resources within a core from bulldozer and it's the opposite. We shall see,but if their single thread performance is better than Intel it would be a great thing for gaming.
 
For what it's worth my expectation-- lets call it 'gut feel'-- is that Ryzen will be just as strong at single-threaded tasks as Skylake.

Anyway, here is a screen shot of the same test but with Hyper-threading disabled for a total of 24 cores and 24 threads. If you recall from the video I posted on the prior page, these Ivy Bridge Xeon's averaged 1.59 IPC per core when hyper-threading was enabled. As you can see in this screen shot, they are at ~1.15 IPC without Hyper-threading, a reduction of 28%.

Does that jive with your thesis?


Interesting I'm expecting it to be right around Broadwell, Haswell, from your tests and the blender tests so far for IPC, any reason why you think it will be at Skylake?
 
I will run my Skylake at 3.4ghz fixed (no turbo) see how it fairs. With half the cores/threads will it be less than double the time if you multiple Zen by 2? Not that any processor doubling the core/thread count will scale perfectly but could be a good test in this case.
 
A Skylake I7 6700K with half the number of cores and threads at 3.4ghz (No Turbo) gave 1min 7sec. So if Skylake had perfect scaling going from 4core/8thread to 8core/16threads at 3.4ghz, performance would be around 33.5seconds. Except normally you don't have perfect scaling, so for this application it looks like Zen is doing very well compared to Skylake.

https://postimage.org/

https://postimage.org/
 
Last edited:
can't really extrapolate like that, cache amounts make a pretty big difference in render times depending on the scene of course.
 
can't really extrapolate like that, cache amounts make a pretty big difference in render times depending on the scene of course.
Have we seen past Intel cores double the performance going from 4 cores to 8 cores for a virtually cpu intensive task like this Blender render? Same generation. Will Haswel I7 4770 at 3.4ghz and the I7 5960x with double the cores/threads double the performance in this benchmark, both at 3.4ghz?

Since very cpu intensive and yes virtually cache rendered it could be double. In any case I don't think this benchmark would really exceed the caches excessively.
 
if we can standardize it with same cores (gens), and test that way we maybe able to get a picture.

Just have one test for Ivy bridge 4 vs 6, and see what the difference is,

haswell 4 vs 8

like that, the more gens the better (with locked frequencies)

even same gen 2 vs 4 core will work too

all that would be needed is to see the factor change and if its a stable change within the same gen and then across the different gens. I would expect it to be a stable change across the same gen, but if varies across different gens, then can't extrapolate it that easily.
 
Last edited:
if we can standardize it with same cores (gens), and test that way we maybe able to get a picture.

Just have one test for Ivy bridge 4 vs 6, and see what the difference is,

haswell 4 vs 8

like that, the more gens the better (with locked frequencies)

even same gen 2 vs 4 core will work too

all that would be needed is to see the factor change and if its a stable change within the same gen and then across the different gens. I would expect it to be a stable change across the same gen, but if varies across different gens, then can't extrapolate it that easily.
That does sound reasonable and could give us somewhat a better picture.

AMD's new advance prediction maybe skewing the results for a very limited scenario (which would be absolutely ideal for me) with a virtually CPU task mostly done in the caches. More other types of task would be needed, now the video encoding is very memory intensive and Zen did well there too - so this is getting more interesting as time goes on.
 
I've been on the board since 2000 but can't login because of lost password and different e-mail address. Oh well.

Kyle (Steve?) I dunno, some 'clever' person was kind enough to delete all the accounts which only posted infrequently some time in late 90's or early 00's. This is my second account, I actually registered several years earlier but it was randomly nuked at some particular time.
 
Kyle (Steve?) I dunno, some 'clever' person was kind enough to delete all the accounts which only posted infrequently some time in late 90's or early 00's. This is my second account, I actually registered several years earlier but it was randomly nuked at some particular time.

I had an account here registered in June 2003 but I can't seem to find it either. I thought I just didn't remember my username right, but maybe that's what happened... not that it matters anymore.
 
IPC load is 1.14, easy branch prediction.

Now here is something more interesting. Blender with 256bit AVX2 on. Stock 6700K 27.17 seconds with 100 samples.

blend256.png
 
I had an account here registered in June 2003 but I can't seem to find it either. I thought I just didn't remember my username right, but maybe that's what happened... not that it matters anymore.

All of the passwords were reset and if you had an old email attached to it,you had to email Kyle to get a new one. I had the same problem but luckily remembered my user name so got a new password. Shoot him an email with your old user name and what you suspect the email address attached to it is.
 
Back
Top