AMD Ryzen 16 Core “Whitehaven” Enthusiast CPUs Leaked – 3.6GHz Clock Speed, Boatloads of Cache & Qua

Aw, poor fella, forgot to mention that Intel parts cost twice as much and use more power than AMD parts. You have to resort to comparing core versus core, which not only is comparing a product to something 2x as expensive, but actually shows a WASH between the two when you look at ALL benchmarks, not just your favourites.

It must be tiring to be an Intel loyalist right now...

$599 is twice $499?

87126.png


AMD compared 8-core RyZen to 8-core Broadwell in demos and slides. But comparing now 8-core vs 8-core get accusations of "tiring to be an Intel loyalist".

Cinebench, Blender, and Handbrake were tests used by AMD on Zen demos and slides. Now that AMD loses on all those benches, they are no valid benches anymore? Same happens with AoTs I guess.

And yes, SKL-X consumes more power, because... it is faster.
 
Last edited:
$599 is twice $499?

87126.png


AMD compared 8-core RyZen to 8-core Broadwell in demos and slides, I don't recall you accusing them of "tiring to be an Intel loyalist".

Cinebench, Blender, and Handbrake were tests used by AMD on Zen demos and slides on RyZen. Now that AMD loses on all those benches, they are no valid benches anymore. Same happens with AoTs I guess.

And yes, SKL-X consumes more power, because... it is faster.
dodge, duck, dodge, ... Hey buddy you missed the point again, and no clue what price you are using. Looked like intel 6core to AMD 8core, for shame. By the way consensus is Skylake-X is a major ClusterF***. Against TR it is severely neutered so much so a good number are considering TR solely based on the platform, CPU benches be damned.
 
If you buy an 1800X then it’s your own fault for getting “bad value”. Buy a 1700 or 1700X at $300 and you have your half price. At this point I don’t even know what I am, I get called an Nvidia fanboy and here I am defending AMD. I think I’m just sick of the fucking cherry picking from fanboys to prove non existent points.
 
dodge, duck, dodge, ... Hey buddy you missed the point again, and no clue what price you are using. Looked like intel 6core to AMD 8core, for shame. By the way consensus is Skylake-X is a major ClusterF***. Against TR it is severely neutered so much so a good number are considering TR solely based on the platform, CPU benches be damned.

I got the points very well and I agree with der8auer that SKL-X is an excellent chip and doesn't deserve the bad criticism from certain biased media.

TR is going to have an even harder fight than RyZen. Mobos are expensive, and the multdie approach is affecting both power consumption and performance. RyZen already has a throughput optimized memory controller, Therefore the quad-channel will provide little benefits for the 8-core TR model. The only tangible benefit will be on the 12-core and 16-core models, because moar cores requires moar bandwidth.

Note that the best scenario that AMD could find for ThreadRipper slides was a cherry picked benchmark where AMD needs 60% more cores to provide 24% more performance than existent SKL-X. This impies

SKL core ~ 1.30x faster than Zen core

Therefore TR will barely win in a pair of cherry picked highly parallel scenarios and will lose on everything else, specially on latency sensitive workloads where TR will suffer from die-die latencies.
 
I got the points very well and I agree with der8auer that SKL-X is an excellent chip and doesn't deserve the bad criticism from certain biased media.

TR is going to have an even harder fight than RyZen. Mobos are expensive, and the multdie approach is affecting both power consumption and performance. RyZen already has a throughput optimized memory controller, Therefore the quad-channel will provide little benefits for the 8-core TR model. The only tangible benefit will be on the 12-core and 16-core models, because moar cores requires moar bandwidth.

Note that the best scenario that AMD could find for ThreadRipper slides was a cherry picked benchmark where AMD needs 60% more cores to provide 24% more performance than existent SKL-X. This impies

SKL core ~ 1.30x faster than Zen core

Therefore TR will barely win in a pair of cherry picked highly parallel scenarios and will lose on everything else, specially on latency sensitive workloads where TR will suffer from die-die latencies.



Yet SkylakeX cant even beat it's predecessor the Kabylake chips.... That must be rough to choke down for ya, we refer to that as regression around here. I expect quad channel memory to be as useful as it is to Intel, looks good on charts. Oh I know the motherboards just need a bios update and everything will be better for SkylakeX... oh wait thats right you dont think that is possible for AMD so I guess Intel cant do it either or are you going to admit you were wrong on that? Well your the king of cherry picked benchmarks I am sure you will come up with something to try to salvage this joke of a Intel launch and make Threadripper look bad. Hey look it's Intel mesh it has latency to, hmm might be why performance is down on that SkylakeX...
 
Nope. Tomshardware measurements in that graph are incorrect, disagree with rest of reviews, and even disagree with another article they published.

87080.png

In your graph, all Ryzens are at or below the official "TDP". And all SL-Xs are above the official "TDP".
 
I got the points very well and I agree with der8auer that SKL-X is an excellent chip and doesn't deserve the bad criticism from certain biased media.

TR is going to have an even harder fight than RyZen. Mobos are expensive, and the multdie approach is affecting both power consumption and performance. RyZen already has a throughput optimized memory controller, Therefore the quad-channel will provide little benefits for the 8-core TR model. The only tangible benefit will be on the 12-core and 16-core models, because moar cores requires moar bandwidth.

Note that the best scenario that AMD could find for ThreadRipper slides was a cherry picked benchmark where AMD needs 60% more cores to provide 24% more performance than existent SKL-X. This impies

SKL core ~ 1.30x faster than Zen core

Therefore TR will barely win in a pair of cherry picked highly parallel scenarios and will lose on everything else, specially on latency sensitive workloads where TR will suffer from die-die latencies.
See this is prime example of LYING/HALF TRUTHS.

I don't see you mention once the LACK of PCI-e lanes on Skylake. Nor do I see mention the LACK of solder for Intel which in turn is the cause for their extreme heat and power issues which are REAL. And nor the price differential which HEAVILY favors AMD. Fact is AMDs 16 core will EASILY outclock Intels 16/18core and possibly the 14core as well from the looks of the tangible info, not that special place you seem to get info from with the fairies and candycane dreams. And that clock advantage will likely negate any IPC advantage and more so when using SMT which already annihilates Intels HT.

And you keep mention latency issues with absolutely no proof of the issue, yes we know it impacts and some of the results show it but the degree with which it does is far smaller than the 20% you and your buds like to parrot along.
 
Yet SkylakeX cant even beat it's predecessor the Kabylake chips.... That must be rough to choke down for ya, we refer to that as regression around here. I expect quad channel memory to be as useful as it is to Intel, looks good on charts. Oh I know the motherboards just need a bios update and everything will be better for SkylakeX... oh wait thats right you dont think that is possible for AMD so I guess Intel cant do it either or are you going to admit you were wrong on that? Well your the king of cherry picked benchmarks I am sure you will come up with something to try to salvage this joke of a Intel launch and make Threadripper look bad. Hey look it's Intel mesh it has latency to, hmm might be why performance is down on that SkylakeX...

(i)
Skylake-X is not a replacement for Kabylake. The replacement for Kabylake is CoffeLake.

(ii)
When RyZen 1800X was 20--25% behind Broadwell-E or Kabylake on games, the excuse was that it was "mostly a workstation CPU that also can play games". Now that SKL-X beats RyZen on workstation applications and is only 8% behind Kabylake on games, the same people claims that SKL-X is a "regression". No one of you claim now that SKL-X is a superb workstation CPU that also play games better than RyZen.

(iii)
Quad-channel is useful on bandwidth-sensitive workloads. Just in former page you can find a 7-ZIP bench, showing the difference between dual and quad-channel. The difference between RyZen, TR, and KBL, and SKL-X is that RyZen already has a bandwidth-optimized memory controller, and the quad-channel will bring smaller improvements to the 1900X TR model.

(iv)
SKL-X already got BIOS updates. Some sites even retested with the new BIOS. The difference with RyZen case is that the latency problems on RyZen are structural due to the CCX-CCX microarchitecture and the rest of it not optimized for reducing latency. No BIOS/AGESA update can change the AMD hardware. That is what I said then, when certain people was selling hype/BS about a future magic BIOS/AGESA that would fix gaming and close the gap with Intel CPUs. Time has given me the reason, even with the latest BIOS/AGESA, RyZen is still a good 25% behind Intel in latency-sensitive workloads such as games.

Things are different for SKL-X. the problem was in the buggy launch BIOS, which didn't make the hardware to work correctly. The problem wasn't in the hardware, but in the software with launch BIOS not enabling turbo or p-states, or affecting power consumption. Once final BIOS was released and basic stuff as turbo started to work, then SKL-X did start to shine. I have given several examples. I repeat one of them where SKL-X with launch BIOS was slower than the former Broadwell chip, but this regression was caused by turbo and other stuff not working with the launch BIOS. The final BIOS improves performance and SKL-X is faster than BDW-E and close to Kabylake. Note as RyZen, even with last BIOS/AGESA continue very behind. The 10-core SKL-X is up to 58% faster than the 8-core RyZen.

8225_38_intel-core-i9-7900x-series-skylake-cpu-review.png


(v)
Regarding point (iv) above, I want make clear now that the new latency issue that will introduce ThreadRipper with the dual-die approach is also structural. No future BIOS/AGESA update will reduce the huge distance between dies, neither will change the die-die interconnect.
 
In your graph, all Ryzens are at or below the official "TDP". And all SL-Xs are above the official "TDP".

I only bring that graph to kill the nonsense that SKL-X pulls 250W or more on floating point workloads, and that only falls within the official 140W for integer workloads. I have explained a hadful of times that Prime95 cannot be used to compare RyZen power to the rest of chips; more about this below.

6--9W above the official TDP is well within the margin of error of measurements (including loses from circuity efficiency not being 100%). Therefore the SKL-X chips agree with the official wattage.

Prime95 works as a power virus for Intel chips. It doesn't work as power virus for RyZen. Check the power consumed by RyZen on Luxmark, Cinebench, Excel or x264. I have given Cinebench and Excel figures before. Now I will give x264 measurements

getgraphimg.php


The SKL-X chip does 150W, measured on the 12V rail. Subtract loses from non-perfect efficiency and you get the measurement agrees with the official 140W. On the other hand, the R7 1800X pulls 129W, which is a 36% above the official 95W rating, and cannot be explained by measurement uncertainties or electric loses. As the review mentions

It should be noted, however, that the 7900X consumes a lot of charge. The use of AVX units on x264 plays a role here. In spite of everything, the 7900X remains in its TDP announced what is appreciable (as a reminder, the 1800X maltreats unnecessarily the notion of TDP not to respect it in practice according to our criteria).
 
I don't see you mention once the LACK of PCI-e lanes on Skylake. Nor do I see mention the LACK of solder for Intel which in turn is the cause for their extreme heat and power issues which are REAL. And nor the price differential which HEAVILY favors AMD. Fact is AMDs 16 core will EASILY outclock Intels 16/18core and possibly the 14core as well from the looks of the tangible info, not that special place you seem to get info from with the fairies and candycane dreams. And that clock advantage will likely negate any IPC advantage and more so when using SMT which already annihilates Intels HT.

The PCIe point was answered before, both seriously and humoristically inventing the "moar lanes!" motto.

The solder myth was answered as well. deha8uer demonstrated that delidding the i9 and replacing TIM by liquid soldier only brings 4% extra overclocking potential. He also gave some words about the people is unfairly negative about SKL-X.

The price difference was answered as well, with an explanation that performance is not a linear function of cost (increasing the IPC by 10% doesn't cost 10% more, but much more due to nonlinearities on the underlying physical laws). It was also mentioned that AMD is further reducing costs with a multi-die approach, but this multi-die approach comes with power and performance penalties. It was also mentioned that those penalties are the reason why 99% of engineers in the industry reject multi-dies: the CPUs from IBM, Intel, APM, Cavium, Broadcomm, Sun/Oracle, Fujitsu... are monolithic dies because it is the best technological solution. AMD is the only with a multi-die approach and AMD engineers are doing that for reducing costs.

Regarding costs, I have made additional remarks. AMD slides are comparing $999 ThreadRipper chip to $999 SKL-X chip on a single cherry picked benchmark only. On that custom benchmark AMD needs 60% moar cores to provide 24% more performance. This means that only on a handful of heavily parallel workloads that scale up to 32 threads, the 16-core TR will win to the 10-core SKL-X, on everything else, including games, the AMD chip will lose by huge amounts.

Now keep repeating that I don't say what I have already said a dozen of times.
 
Last edited:
I only bring that graph to kill the nonsense that SKL-X pulls 250W or more on floating point workloads, and that only falls within the official 140W for integer workloads. I have explained a hadful of times that Prime95 cannot be used to compare RyZen power to the rest of chips; more about this below.

6--9W above the official TDP is well within the margin of error of measurements (including loses from circuity efficiency not being 100%). Therefore the SKL-X chips agree with the official wattage.

Prime95 works as a power virus for Intel chips. It doesn't work as power virus for RyZen. Check the power consumed by RyZen on Luxmark, Cinebench, Excel or x264. I have given Cinebench and Excel figures before. Now I will give x264 measurements

getgraphimg.php


The SKL-X chip does 150W, measured on the 12V rail. Subtract loses from non-perfect efficiency and you get the measurement agrees with the official 140W. On the other hand, the R7 1800X pulls 129W, which is a 36% above the official 95W rating, and cannot be explained by measurement uncertainties or electric loses. As the review mentions

It's your graph. Just pointing out the inconsistency.
And you introduced this graph to counter a point about SKL-X exceeding the TDP on certain workloads. Doing that you granted the point that ryzen (even 1800X, which seems to be the worst case) is perfectly within the TDP.

Also, regarding this new graph, you're now arguing that i7-7700K or i9-7900X is as bad (or as good) as R7 1700(X), TDP wise. :)

I have absolutly no idea, but based only on your graphs the only ryzen a little above your expected values is the R7 1800X. If you give the 1800X the same discounts, it's like 10% out?
 
The PCIe point was answered before, bot seriously and humoristically inventing the "moar lanes!" motto.

The solder myth was answered as well. deha8uer demonstrated that delidding the i9 and replacing TIM by liquid soldier did only bring 4% extra overclocking potential. He also gave some works about the people is was being unfairly negative about SKL-X.

The price difference was answered as well with an explanation that performance is not a linear function of cost (increasing the IPC by 10% doesn't cost 10% more, but much more due to nonlinearities on the underlying physical laws). It was also mentioned that AMD is further reducing costs with a multi-die approach, but this multi-die approach comes with power and performance penalties. It was also mentioned that those penalties are the reason why 99% of engineers in the industry reject multi-dies: the CPUs from IBM, Intel, APM, Cavium, Broadcomm, Sun/Oracle, Fujitsu... are monolithic dies because it is the best technological solution. AMD is the only with a multi-die approach and AMD engineers do for reducing costs.

Regarding costs, I have made additional remarks. AMD slides are comparing $999 ThreadRipper chip to $999 SKL-X chip on a single cherry picked benchmark only. On that custom benchmark AMD needs 60% moar cores to provide 24% more performance. This means that on only on a handful of heavily parallel workloads that scale up to 32 threads, the 16-core TR will win to the 10-core SKL-X, on everything else, including games, the AMD chip will lose by huge amounts.

Now keep repeating that I didn't said what I have already said a dozen of times.
Seriously there has to be some incentive to your inane posts because no rational individual with no horse in the race lies as much as you.

No you haven't spoken of the PCI-e lanes, unless dodging it all together was your point. AMD has a huge IO advantage and it DOES matter for those with a large number of Storage drives. And AMD does not require the purchase of dongles to do so. You can try joking the problem away as much as you want (moar cores) but it does not magically give Intel more lanes or make the dongles free.

Solder myth? What myth is this exactly? Intel does not solder, AMD does: FACT. And if what he found made little difference then Skylake is SERIOUSLY SCREWED when it comes to OC potential. What do you think will happen with the 14 core and up. You don't mention those you just mention the 10 core because it shows serious issues with heat and power.

And my god how you danced around the Price issue. Again you have to look at the whole of the platform with cost. Your minimizing the picture is a half truth and absolute lie. Comparing 2 CPUs in a void is disingenuous. Someone looking at the HEDT platform will be considering all of it not just CPU differences. Most of them thus far have mentioned the huge IO/PCI lanes in AMD being a HUGE FACTOR in their purchasing decision. Why pay $500 more for less. That is the price debate, not some skewed metrics or halfass statements on twitter.

And maybe if you actually answered the questions as asked and without the attempts at subterfuge and misdirection we wouldn't have to keep asking for proof or factual statements.
 
It's your graph. Just pointing out the inconsistency.
And you introduced this graph to counter a point about SKL-X exceeding the TDP on certain workloads. Doing that you granted the point that ryzen (even 1800X, which seems to be the worst case) is perfectly within the TDP.

Also, regarding this new graph, you're now arguing that i7-7700K or i9-7900X is as bad (or as good) as R7 1700(X), TDP wise. :)

I have absolutly no idea, but based only on your graphs the only ryzen a little above your expected values is the R7 1800X. If you give the 1800X the same discounts, it's like 10% out?

As I have just explained you, there is no inconsistency when measurement uncertainties are considered.

I introduced the graphs by different reasons. I introduced the Prime95 graph to show that tomshardware measurements on Prime95 are plain wrong. I introduce the x264 graph to show how Prime95 doesn't stress RyZen.

The R7 1700 and 1700X are also violating the official TDPs. One is 16% above the official TDP. The other is 18% above. Those values are far from uncertainties associated to measurements and reflect AMD cheating TDPs, as reviews have noticed. On the other hand the i9 gap is less than one half that.

The KBL i7-7700k is an interesting case. The measured value is a 14% above the official wattage, and this alone would point to the official TDP being broken, but check the SKL i7-6700k and the KBL-X i7-7740k, both are well-below the official TDP, which suggest that the i7-7700k is a statistical outlier. On the other hand all the RyZen chips systematically violate their official TDPs, and the same will happen with ThreadRipper when was launched and reviewed. In fact the current 16-core TR sample is already violating the official 180W value by more than 11%.

The RyZen 1800X is a huge 36% above the official value and giving it an invented "discount" to pretend it is only a "little above" doesn't work. Even assuming that the AM4 platform suffers the same inefficiencies/uncertainties than the SKL-X platform, this would give (129 +- 9) W and the lower extreme of the uncertainty interval is still a huge 26% above the official value. Therefore the conclusion is the same.
 
Last edited:
TDP <> actual power usage. But I have a feeling that it won't matter in this particular discussion.
 
Lol Justreason, do yourself a favor and block him, you'll feel much better. Either way we all know the tdp ratings from both companies fall under the same bogus crap as 80+ certification by using a cherry picked CPU under the absolute perfect conditions.. let's just move on and discuss something that actually matters.
 
Last edited:
Considering Juanrga is banned from other places on the internet including semiaccurate forums for blatant trolling I wouldn't bother with him. He's made so many false and inaccurate statement and predictions that it's become a joke at this point. I've added him to the ignore list too and I recommend others do the same.

What you say about me is not true. Moreover, I find deliciously ironic that you mention SA forums here, because the smart guys at SA are again posting fake information and BS about ThreadRipper. Those smart guys claim that ThreadRipper has four full functional dies and pretend that you could purchase a 16-core today and 'hack' it to get other 16-cores for free:

A lot of people will now try to hack their way into a fully functional 32-core Threadripper. Some will work, some won't, a lot will try, anyway. And I wouldn't be surprised if AMD has orchestrated it all along.

Of course, all those messages are all pure BS and even Charlie has beautiful words about those smart guys: "As I keep saying, you can't cure stupid."

I will say it again: you cannot purchase a 16-core ThreadRipper and hack it to obtain 32-cores, because there is no four dies in the package. There are only two dies.
 
Last edited:
What you say about me is not true. Moreover, I find deliciously ironic that you mention SA forums here, because the smart guys at SA are again posting fake information and BS about ThreadRipper. Those smart guys claim that ThreadRipper has four full functional dies and pretend that you could purchase a 16-core today and 'hack' it to get other 16-cores for free:

Oh good grief.

tumblr_o3u365KhTS1rlafseo1_400.gif


Only you can take a question posed on a forum/thread dedicated to both fact and rumor and then run around the Internet like you are preventing the next apocalypse. Especially considering you kind of bought into it initially yourself and probably gave the initial rumor breathe in the first place. Like this post....
By: juanrga ([email protected]), July 28, 2017 3:41 am
Not really a surprise since the instant we knew the size of the CPU (same size than EPYC), and that it was going to use a SP3r2 socket (a derivation of the SP3 socket for EPYC).

What makes one wonder is where are the SP4 socket products and the dual-die CPUs?

But now only you can save us all? Really? Um, OK.

go-away.gif
 
Lol... Intel causes Stockholm Syndrome pandemic

Latest news headlines in reference to anyone who wasted cash on Skysnake-X

However, juargna is right. You absolutely cant activate extra cores on Threadripper. AMD confirmed with Linus Tech that the the working dies are diagonal to each other and the 2 other dies are blanks and spacers also diagonal from each other. There are literally and I mean literally zero transistors in the blanks. There is also ZERO interposer connections to the blank dies.

Func........non func
Non func.....func
 
Only you can take a question posed on a forum/thread dedicated to both fact and rumor and then run around the Internet like you are preventing the next apocalypse. Especially considering you kind of bought into it initially yourself and probably gave the initial rumor breathe in the first place. Like this post....


But now only you can save us all? Really? Um, OK.

go-away.gif

LOL. There are serious misunderstandings of my words here.

AMD original plan involved three sockets:

AM4 --> single die --> RyZen

SP4 --> MCM2 --> dual die --> ?

SP3 --> MCM4 --> quad-die --> EPYC.


When you quote me asking about "the SP4 socket products and the dual-die CPUs" I am asking about the MCM2 products.

That original plan was canceled and AMD developed a new socket SP3r2, which is a derivation of the SP3 socket, evidently. This new socket is a MCM4 package and it and the accompanying CPUs have the same size than SP3 and EPYC CPUs, but there is no four functional dies in the SP3r2 products.

AM4 --> single die --> RyZen

SP4 --> MCM2 --> dual die --> ?

SP3 --> MCM4 --> quad-die --> EPYC.

SP3r2 --> MCM4 --> dual-die + dual spacer --> Threadripper.


Originally we believed that SP3r2 products would have two functional dies plus two dummy dies (two non-functional dies), because it is a MCM4 package (it was evident from the size) and that is why you can quote me saying "Not really a surprise since the instant we knew the size of the CPU (same size than EPYC)". I even post a funny comment elsewhere asking how AMD could have fully non-functional dies if yields are so good as some claims.

Latter AMD confirmed that there are only two dies in SP3r2, and the other two pieces in the MCM4 package are spacers to help to maintain structural integrity of the whole package.

Do you get it now?

And regarding your last remark, it isn't a question about saving the world or anything like that, but there is no reason why one cannot call BS to the BS. The idea that one can purchase a 16-core Threadripper today and hack it somewhat to get a 32-core tomorrow is BS.
 
Ok....... Cliffs notes on this thread is this real? =====> AMD Ryzen 16 Core “Whitehaven” Enthusiast CPUs Leaked – 3.6GHz Clock Speed, Boatloads of Cache & Quaand and when is it coming out .
 
Ok....... Cliffs notes on this thread is this real? =====> AMD Ryzen 16 Core “Whitehaven” Enthusiast CPUs Leaked – 3.6GHz Clock Speed, Boatloads of Cache & Quaand and when is it coming out .

1501475928u7no9dstxp_1_1.gif
 
The TDP/core latency shilling is desperate and laughable.
Run Ryzen on 3200MHz ram and you get 110ns vs 94ns for the 7900x - that core gap is such a massive difference, literally it's game over for AMD now, guys. Pack up, let's shut it down.

One thing I find weird about the 'spacer' comment from AMD is they still put TIM on them, but I guess it's for tolerances. Do wonder if it'd be cheaper to just make Epyc and bin them accordingly.
Either way I don't care if they are or not, because it's not going to happen - AMD would never let that compete with EPYC sales. I'm more interested in the industrial/decision making side.
 
Run Ryzen on 3200MHz ram and you get 110ns vs 94ns for the 7900x - that core gap is such a massive difference, literally it's game over for AMD now, guys. Pack up, let's shut it down.

110ns is a good 17% higher than 94ns. And are you talking about intradie latency or interdie latency?
 
110ns is a good 17% higher than 94ns. And are you talking about intradie latency or interdie latency?
We can do basic math just fine, the point is that it's such a small difference and worst case. Intra-CCX is even lower, which is a far more common usage case.
If inter-die is high enough to make a large impact, the workload likely wouldn't scale so well as a heavily threaded application in the first place, so the point is practically moot and most data we have about TR/Epyc seems to fit this.
 
We can do basic math just fine, the point is that it's such a small difference and worst case. Intra-CCX is even lower, which is a far more common usage case.

I didn't mention anything about intra-CCX latencies, because the problem is on intra-die and inter-die latencies. Why do you believe any forum is full of recommendations to run RyZen with OC RAM? Because it also overclocks IF and reduces latencies alot of, which improves performance in common situations.

If inter-die is high enough to make a large impact, the workload likely wouldn't scale so well as a heavily threaded application in the first place, so the point is practically moot and most data we have about TR/Epyc seems to fit this.

Most data? Leaks and official info and demos from AMD are avoiding to give us latency-sensitive benches. And the AT review crippled the performance on Intel Xeons why 40% or more in order to do a comparison. The only site that tested latencies directly was STH, and they found terrible inter-die latencies even comparing EPYC to a 4P BDW system.

I am anxious to see TR 1900X compared to R7 1800X, because one of them will be affected by inter-die latencies.
 
I didn't mention anything about intra-CCX latencies, because the problem is on intra-die and inter-die latencies. Why do you believe any forum is full of recommendations to run RyZen with OC RAM? Because it also overclocks IF and reduces latencies alot of, which improves performance in common situations.



Most data? Leaks and official info and demos from AMD are avoiding to give us latency-sensitive benches. And the AT review crippled the performance on Intel Xeons why 40% or more in order to do a comparison. The only site that tested latencies directly was STH, and they found terrible inter-die latencies even comparing EPYC to a 4P BDW system.

I am anxious to see TR 1900X compared to R7 1800X, because one of them will be affected by inter-die latencies.
BS again. I have been waiting for this. I knew when you mentioned there was bias what you were alluding to but you refused to state what it was. But finally you slipped and gave it away.

That 40% crippling was BS in the fact it was in a 6mth build and not likely EVER used by any server setup worth money anyway. It was SKL specific enhancements. The build used for testing was the build supported till 2020 and likely the broadest representation there is. Only a few Intel fanbois cried foul with the ones with any real knowledge setting them straight about being fair and REALISTIC and using the build they used was the CORRECT one and BEST choice. They didn't use an EPYC specific build either so it is likely the fairest it could be.

Was never real sure why you guys cried so much. I read the review and felt Intel came out on top for the most part. The FP tests definitely went to AMD and by a large margin. The power usage was mostly indeterminate because some of the results seemed off and they STATED it would require further testing and the amount of time they had was not enough. Hell they spent 2 weeks with SKL and one with EPYC.
 
LOL. There are serious misunderstandings of my words here.

AMD original plan involved three sockets:

AM4 --> single die --> RyZen

SP4 --> MCM2 --> dual die --> ?

SP3 --> MCM4 --> quad-die --> EPYC.


When you quote me asking about "the SP4 socket products and the dual-die CPUs" I am asking about the MCM2 products.

That original plan was canceled and AMD developed a new socket SP3r2, which is a derivation of the SP3 socket, evidently. This new socket is a MCM4 package and it and the accompanying CPUs have the same size than SP3 and EPYC CPUs, but there is no four functional dies in the SP3r2 products.

AM4 --> single die --> RyZen

SP4 --> MCM2 --> dual die --> ?

SP3 --> MCM4 --> quad-die --> EPYC.

SP3r2 --> MCM4 --> dual-die + dual spacer --> Threadripper.


Originally we believed that SP3r2 products would have two functional dies plus two dummy dies (two non-functional dies), because it is a MCM4 package (it was evident from the size) and that is why you can quote me saying "Not really a surprise since the instant we knew the size of the CPU (same size than EPYC)". I even post a funny comment elsewhere asking how AMD could have fully non-functional dies if yields are so good as some claims.

Latter AMD confirmed that there are only two dies in SP3r2, and the other two pieces in the MCM4 package are spacers to help to maintain structural integrity of the whole package.

Do you get it now?

And regarding your last remark, it isn't a question about saving the world or anything like that, but there is no reason why one cannot call BS to the BS. The idea that one can purchase a 16-core Threadripper today and hack it somewhat to get a 32-core tomorrow is BS.

You wrote all of that and still didn't refute anything I said. Sad.
 
I didn't mention anything about intra-CCX latencies, because the problem is on intra-die and inter-die latencies. Why do you believe any forum is full of recommendations to run RyZen with OC RAM? Because it also overclocks IF and reduces latencies alot of, which improves performance in common situations.



Most data? Leaks and official info and demos from AMD are avoiding to give us latency-sensitive benches. And the AT review crippled the performance on Intel Xeons why 40% or more in order to do a comparison. The only site that tested latencies directly was STH, and they found terrible inter-die latencies even comparing EPYC to a 4P BDW system.

I am anxious to see TR 1900X compared to R7 1800X, because one of them will be affected by inter-die latencies.

You are looking for problems here. If the latency of the CCX would be a problem then why would have they implemented it. In reality latency matters little in a design that allows more cores from single die to multiple dies. Where AMD allows cost for users to be minimal due to this and can deliver a 16/32 cpu that does not get above $1000 meanwhile for lower latencies you pay double or more for the cpu alone. Lets see how well the other 16/32 core solution is and test those latencies again.

In the end it is about AMD design and that has _real_ benefit for consumers price performance and feature wise, you have pay for latencies feature you hold so dear to your heart ...
 
You wrote all of that and still didn't refute anything I said. Sad.

I refuted your claims explaining they are based in a misunderstanding. You confound my remarks about SP4 with my remarks about TR4.
 
You are looking for problems here. If the latency of the CCX would be a problem then why would have they implemented it.

It is not implemented because it is the best technological solution from a performance point of view. They implemented it because it introduces a modular approach that helps AMD to reduce design and verification costs. And it is a central part of their semicustom strategy.


In reality latency matters little in a design that allows more cores from single die to multiple dies.

Even the HSA specification makes clear that CPUs are a special kind of LCUs (Latency Compute Units) in contrast to GPUs, which are TCU (Throughput Compute Units).

Latency matters a lot of and this is the reason why AMD released an improved 1.0.0.4 AGESA with 6% reduced latency, why AMD released improved 1.0.0.6 AGESA with support for faster memory and why people in forums and reviews recommend to overclock RAM to reduce the CCX-CCX latency.
 
If that is true, then the multiscore at stock settings would be ~51270, which is a 23% higher than the score of the i9-7900X and agrees with the 24% gap that AMD reports in slides for a Blender bench. Thus Intel maintains a ~30% performance gap per core.
 
Repeating the same operation the CB score gives a ~20% performance gap per core

get
 
If that is true, then the multiscore at stock settings would be ~51270, which is a 23% higher than the score of the i9-7900X and agrees with the 24% gap that AMD reports in slides for a Blender bench. Thus Intel maintains a ~30% performance gap per core.


Is this your way of saying Intel is still winning while losing?

I9-7900X 10-cores at 3.3ghz base = ~$1000 MSRP
Threadripper 1950X 16-core at 3.4ghz base = $1000 MSRP

In multi-threaded workloads, stock 1950X wins by 24% over stock 7900X but pay no attention to mult-threaded performance on a processor made specifically for multi-threaded workloads. Here instead focus on Intel's single threaded performance. After all that would be generally the lesser use case / purpose for buying a 10 or 16 core processor, but lets focus here how the 7900x in single threaded performance is speculatively 30% better than the AMD offerrings.

Of course in the real world, Threadripper is going to be better suited to running multiple concurrent multi-threaded workloads (like transcoding a video library and streaming it to the tv room, while playing a multi-threaded game, while streaming said game to twitch as well). Just by having more core's available to be loaded with workload tasks while operating at a reasonably high clock speed and IPC, Threadripper will end up offering a smoother experience to the user in a multi-varied, multi-threaded, multi-application environment.

Even with the 10-core Intel, which I expect will do well in it's own right, it will run out of concurrent threads that it can load work in earlier than threadripper, so if we are really comparing at an MSRP level, I would judge that with the current topology of Intel vs. AMD offerrings at $1000 price point, AMD has put out the better product.

What is more, Intel's higher core counts end up with substantially lower clock-speeds (I9-7960X 16-core at 2.8ghz base clock = ~$1700 MSRP) when the workload environment is loading all (or most) cores (which is the most likely use case for a user who wants to buy 10/12/14/16/18 core processors). So ultimately Intel's monolithic CPU design is leading to a certain ratio of clock speed penalty for base clocks as the Intel processors add additional cores. AMD has proven with both Threadripper and Epyc, that the modular 8-core two-CCX design they have built does not have a significant clock speed penalty when adding additional two-CCX modules to the 1P socket implementation.

I know you will likely come back again saying 'Yea but what about the latency introduced by the 2-CCX + 2-module design.....'. Truth is though, you don't really know anything about real world performance effects of the latencies on a Threadripper or Epyc. You speculate alot and try to discuss the topic like you have all the facts, but you really don't know how it performs in real world environments.

I'm sure I won't have to wait long for the next change of tactic where we discuss TDP yet again or that sub-group of users who have some issue that may be errata related and dang if AMD hasn't fixed it within 2 months of the problem surfacing (ignoring Intel taking 1 year to fix similar identified errata in their processor line that was only resolved within the last year).
 
It is not implemented because it is the best technological solution from a performance point of view. They implemented it because it introduces a modular approach that helps AMD to reduce design and verification costs. And it is a central part of their semicustom strategy.

Even the HSA specification makes clear that CPUs are a special kind of LCUs (Latency Compute Units) in contrast to GPUs, which are TCU (Throughput Compute Units).

Latency matters a lot of and this is the reason why AMD released an improved 1.0.0.4 AGESA with 6% reduced latency, why AMD released improved 1.0.0.6 AGESA with support for faster memory and why people in forums and reviews recommend to overclock RAM to reduce the CCX-CCX latency.

Like I said before you will get higher latencies in the non AMD 16C32T .

The way you describe it as a problem while I reminded you it is a design choice. There always drawbacks for whichever design you pick, but performance price and feature wise AMD came out on top.
The way to limit CCX cross talk is to better optimize for internal core to core communication in the source code rather then hardware solution. CCX is a hardware design it can have some improvements but if you are claiming that it is not working right you are wrong ...
 
Like I said before you will get higher latencies in the non AMD 16C32T .

The way you describe it as a problem while I reminded you it is a design choice. There always drawbacks for whichever design you pick, but performance price and feature wise AMD came out on top.
The way to limit CCX cross talk is to better optimize for internal core to core communication in the source code rather then hardware solution. CCX is a hardware design it can have some improvements but if you are claiming that it is not working right you are wrong ...

I didn't say it is not working right. At contrary!!!

The approach is working as expected for a high-latency clustered approach with non-unified L3. That is why before launch I said that Zen was going to have performance problems in latency-sensitive workloads as games... and reviews proved it.

I also advised the MCM approach is going to have both performance and power penalties. And first direct measurements on EPYC show huge inter-die latencies.

I am now claiming that ThreadRipper will have huge inter-die latencies and this will hurt performance on latency-sensitive workloads, on each occasion one core in one die has to communicate info with a core in the other die. The ThreadRipper 1900X will lose to the 1800X in those cases. Now awaiting for reviews to show this.

I have mentioned a couple times that AMD approach is motivated for reducing costs, not because it provides the best performance or efficiency.
 
Is this your way of saying Intel is still winning while losing?

I9-7900X 10-cores at 3.3ghz base = ~$1000 MSRP
Threadripper 1950X 16-core at 3.4ghz base = $1000 MSRP

In multi-threaded workloads, stock 1950X wins by 24% over stock 7900X but pay no attention to mult-threaded performance on a processor made specifically for multi-threaded workloads. Here instead focus on Intel's single threaded performance. After all that would be generally the lesser use case / purpose for buying a 10 or 16 core processor, but lets focus here how the 7900x in single threaded performance is speculatively 30% better than the AMD offerrings.

No one here is discussing single-thread performance. Multitthread performance is given by the product of the performance of a single core and the number of cores. Intel having a faster core is the reason why AMD needs 60% moar cores to get 30% higher performance on workloads that scale to all the 16 cores.

On workloads that don't scale up to 32 threads, the TR chip will be slower because each core is slower. For instance for 24 threads loaded on both chips. Intel would be faster. And all this is about comparing both chips on stock settings. the i9 has higher overclocking headroom. Thus with both chips are overclocked the gap will increase from ~30% to ~45% and TR will be faster (~15%) only when all cores are fully loaded.


I know you will likely come back again saying 'Yea but what about the latency introduced by the 2-CCX + 2-module design.....'. Truth is though, you don't really know anything about real world performance effects of the latencies on a Threadripper or Epyc. You speculate alot and try to discuss the topic like you have all the facts, but you really don't know how it performs in real world environments.

I'm sure I won't have to wait long for the next change of tactic where we discuss TDP yet again or that sub-group of users who have some issue that may be errata related and dang if AMD hasn't fixed it within 2 months of the problem surfacing (ignoring Intel taking 1 year to fix similar identified errata in their processor line that was only resolved within the last year).

What I know is that all what I have been saying about TDP and latencies has been latter confirmed by reviews. I am not speculating. I know this will happen for TR as well, because the laws of physics say me that this will happen, because the inter-die latency is about one order of magnitude higher than the intra-die latency (CCX-CCX) measured in RyZen.

Also people wasn't complaining because AMD didn't fix the segmentation fault in two months, people was complaining because AMD ignored them during five months, and only accepted the issue when affected users organized themselves to do it viral and get attention outside AMD forums.
 
Last edited:
Back
Top