source?
Take the performance gap per core measured by AMD and by leaks reported in the former page. Then use the equation that gives mutithreaded performance, solve it and you get the crossing point is about 26 threads more or less.
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
source?
Take the performance gap per core measured by AMD and by leaks reported in the former page. Then use the equation that gives mutithreaded performance, solve it and you get the crossing point is about 26 threads more or less.
So, speculation stated as fact.
Got it.
Repeating the same operation the CB score gives a ~20% performance gap per core
Stock scores: all core turbo
i9 7900X Score: 2200p | 4.0ghz (10c / 20t)
i9 7960X Score: 3168p = 2200p * 1.6 * (3.6/4.0) | 3.6ghz (16c / 32t)
TR 1950X Score: 2900p = 3400p * (3.5/4.1) | 3.5ghz (16c / 32t)
That is quite bold claim. 7900X doesn't have have the capacity to fully utilize 24 threads. it chokes after 20 threads if they are executed at full load at the same time. You really should know that adding more threads only hampers performance if the CPU doesn't have enough threads to execute them concurrently and they just start to stall more because of thread swapping.For instance for 24 threads loaded on both chips. Intel would be faster. And all this is about comparing both chips on stock settings.
Yep, around 20% difference when comparing against stock 7900X @ 4Ghz all core turbo vs stock TR 1950X @ 3.5Ghz all core turbo. But when we compare against 16c 7960X, performance gap drops to ~9%. And as we know, there is no such thing as 16c / 32t Core i9 with 4.0Ghz stock all core turbo.
Here's some really rough math where core scaling penalty for TR 1950X is already accounted for (doh...) and scaling is perfect for i9 7960X. I guess I should use 1.55-58 as multiplier instead to get more accurate guess but nah, lets give Intel linear scaling. CB scores come from last page.
Code:Stock scores: all core turbo i9 7900X Score: 2200p | 4.0ghz (10c / 20t) i9 7960X Score: 3168p = 2200p * 1.6 * (3.6/4.0) | 3.6ghz (16c / 32t) TR 1950X Score: 2900p = 3400p * (3.5/4.1) | 3.5ghz (16c / 32t)
3168p / 2900p = 7960X 9% faster vs TR1950X
That is quite bold claim. 7900X doesn't have have the capacity to fully utilize 24 threads. it chokes after 20 threads if they are executed at full load at the same time. You really should know that adding more threads only hampers performance if the CPU doesn't have enough threads to execute them concurrently and they just start to stall more because of thread swapping.
Using more rough math and CB as example:
Threadripper
16t cores at 3.5Ghz: 2900 * 0.8 =2320p
1t SMT: (2900 - 2320) / 16 =~ 36p
16c + 8 SMT = 2320 + 36 * 8 = 2608
24t load: 2200p < 2600p
In reality that 2200 score would be a bit lower because 7900X can't magically shit out moar cores to increase performance.
Here's my bold claim: 16c / 16t TR is going to be faster than 10c / 20t 7900X on many workloads.
Yep, around 20% difference when comparing against stock 7900X @ 4Ghz all core turbo vs stock TR 1950X @ 3.5Ghz all core turbo. But when we compare against 16c 7960X, performance gap drops to ~9%. And as we know, there is no such thing as 16c / 32t Core i9 with 4.0Ghz stock all core turbo.
Here's some really rough math where core scaling penalty for TR 1950X is already accounted for (doh...) and scaling is perfect for i9 7960X. I guess I should use 1.55-58 as multiplier instead to get more accurate guess but nah, lets give Intel linear scaling. CB scores come from last page.
Code:Stock scores: all core turbo i9 7900X Score: 2200p | 4.0ghz (10c / 20t) i9 7960X Score: 3168p = 2200p * 1.6 * (3.6/4.0) | 3.6ghz (16c / 32t) TR 1950X Score: 2900p = 3400p * (3.5/4.1) | 3.5ghz (16c / 32t)
3168p / 2900p = 7960X 9% faster vs TR1950X
That is quite bold claim. 7900X doesn't have have the capacity to fully utilize 24 threads. it chokes after 20 threads if they are executed at full load at the same time. You really should know that adding more threads only hampers performance if the CPU doesn't have enough threads to execute them concurrently and they just start to stall more because of thread swapping.
Using more rough math and CB as example:
Threadripper
16t cores at 3.5Ghz: 2900 * 0.8 =2320p
1t SMT: (2900 - 2320) / 16 =~ 36p
16c + 8 SMT = 2320 + 36 * 8 = 2608
24t load: 2200p < 2600p
In reality that 2200 score would be a bit lower because 7900X can't magically shit out moar cores to increase performance.
Here's my bold claim: 16c / 16t TR is going to be faster than 10c / 20t 7900X on many workloads.
Am I reading it right that you're saying that the 16C 32T i9 will only theoretically get a 2200 CB15 multithread score? I personally find that funny, as my overclocked 6950X gets more than that.
As with all things - time will tell.
10% gap has been pretty commonly accepted in general compute standards for some time, even by AMD users.
Given how noncompetitive AMD was with Bulldozer uARCH this is quite impressive, but of course some will tell you otherwise.
His early projections were like 30-40% gaps so 10% to Chipzilla's performance is a good showing.
Here's my bold claim: 16c / 16t TR is going to be faster than 10c / 20t 7900X on many workloads.
You should look at the review of EPYC. It doesn't lose at everything, actually it does well enough to top the charts in nearly all benches even using that 9mth build that was optimized for SKL you have been crying about for the other test.The 16C TR will be about 30% faster (using your own math) than 10C SKL only on throughput-optimized workloads with large SMT yields and that scale up to 32 threads, such as CB [*]. The 16C TR will lose on everything else: throughput-optimized workloads that cannot use the extra 60% cores and have low SMT yields; latency-optimized workloads, and AVX 256--512 workloads.
[*] It doesn't look fortuitous to me that CB scores are being leaked, whereas other metrics aren't leaked. Just as it was not fortuitous that CB and CPU-Z scores were leaked for RyZen, but other metrics weren't, because reviews of RyZen demonstrated that CB and CPU-Z were optimal points.
You should look at the review of EPYC. It doesn't lose at everything, actually it does well enough to top the charts in nearly all benches even using that 9mth build that was optimized for SKL you have been crying about for the other test.
You should look at the review of EPYC. It doesn't lose at everything, actually it does well enough to top the charts in nearly all benches even using that 9mth build that was optimized for SKL you have been crying about for the other test.
At what point "everything else" was transcribed as everything?
The STH EPYC review pretty much summarizes my point about TR. The 64-core EPYC server matches a 36-core SKL Xeon on a compile workload and then beats the Xeon on a rendering workload thanks to having 78% moar cores.
it has a very low base and turbo so yes it is possible, by comparing an overclocked 6950X likely around 4.5Ghz it kind of makes this funny
So in the workloads it is catered towards it does well...gotcha, never thought that was a bad thing.
I mean they don't need special compilers and AVX to handicap use to certain features and monopolies a segment. In domains where Intel vs AMD is treated evenly then AMD pulls out ahead far to many times for the boys in blue to like.
It does well on throughput-like tasks such as rendering and encoding, but we don't need reviews for that. Some of us predicted, before launch, it was going to shine on such tasks. Too bad those thoughput-like tasks often run better on alternative hardware including GPUs.
Hum, customers seem very happy about using AVX. Many of them shared with us their experiences on how AVX improves performance and efficiency on their real-life workloads (in some cases by huge amounts as 2X). And of course no one is prohibiting AMD to make a CPU that shines on AVX workloads, which makes all your talk about monopolies even less relevant.
No, you are reading it wrong.Am I reading it right that you're saying that the 16C 32T i9 will only theoretically get a 2200 CB15 multithread score? I personally find that funny, as my overclocked 6950X gets more than that.
As with all things - time will tell.
No, you are reading it wrong.
Stock clock scores with my math:
i9 7900X Score: 2200p
i9 7960X Score: 3168p (math to get this score: 2200p * 1.6 * (3.6/4.0)
TR 1950X Score: 2900p (math to get this score: 3400p * (3.5/4.1)
Second math part was for 7900X running against TR 1950X while both are utilizing 24 threads.
Oh well, my math was bit wrong, TR is actually getting little over 3000p in CB and its because I thought the all core turbo would be 3.5Ghz and I calculated that it would only get 2900p with that.
The real all core turbo for TR is 3.7Ghz.
EDIT: pic of the day (from one techsite):
Look how big the difference is
Kyle needs to make his graphs look like this.
Did Juanrga make that?No, you are reading it wrong.
Stock clock scores with my math:
i9 7900X Score: 2200p
i9 7960X Score: 3168p (math to get this score: 2200p * 1.6 * (3.6/4.0)
TR 1950X Score: 2900p (math to get this score: 3400p * (3.5/4.1)
Second math part was for 7900X running against TR 1950X while both are utilizing 24 threads.
Oh well, my math was bit wrong, TR is actually getting little over 3000p in CB and its because I thought the all core turbo would be 3.5Ghz and I calculated that it would only get 2900p with that.
The real all core turbo for TR is 3.7Ghz.
EDIT: pic of the day (from one techsite):
Look how big the difference is
Kyle needs to make his graphs look like this.
In general computing AVX is a bag of _____, so it is immaterial at this point.
No one here is discussing single-thread performance. Multitthread performance is given by the product of the performance of a single core and the number of cores. Intel having a faster core is the reason why AMD needs 60% moar cores to get 30% higher performance on workloads that scale to all the 16 cores.
On workloads that don't scale up to 32 threads, the TR chip will be slower because each core is slower. For instance for 24 threads loaded on both chips. Intel would be faster. And all this is about comparing both chips on stock settings. the i9 has higher overclocking headroom. Thus with both chips are overclocked the gap will increase from ~30% to ~45% and TR will be faster (~15%) only when all cores are fully loaded.
It seems my prediction Intel 10C chip would be faster than TR even under 24 threads workloads was rather good.
Also we see how both TR models are slower than the 1800X on games, as expected. I wait anxious to 1900X vs 1800X comparison.
It seems my prediction Intel 10C chip would be faster than TR even under 24 threads workloads was rather good.
No, It seems you didn't read the hardware.fr review. It actually proves my prediction and makes your look silly (and completely wrong). Yes, 7900X is tiny bit faster than 1920X when comparing across multiple programs, where some of the programs can't utilize all those threads properly.
Everytime the programs can fully utilize 24 threads, 1920X is either neck to neck with 7900X or ahead.
TR is very impressive for a MCM/NUMA part. The fact that it is in the same league as Intel's monolithic mesh networked Xeon parts is quite remarkable.
They have used lots of programs that scale up above 24 threads, and the average reflects that.
That is not true. Examples of workloads that scale above 24 threads and the 12C 1920X is slower than the 10C 7900X because the SKL CPU has much faster performance per core
Wohoo, you bring couple of cases where you show that 7900X can win against 12C 1920X. I never even argued that it couldn't. My line clearly says that 7900X is a bit faster (3%) than 1920X when using avarages (according to the review you chose) but I guess you didn't bother to read that. But why are you even talking about 1920X? Your original claim was that 1950X will lose against 7900X when using <24 threads. My claim is that it will not on most workloads, until the thread count drops to 14-15 threads and I've done the math.
I do like Hardware.fr testing methodology but the ram (2400mhz) they decided to pair with TR gives it a big perf. penalty.
You are correct but my point is they minimized those disadvantages most impressively and in such a way as to allow Chimpzilla to viably compete the 800lb Gorilla's 800lb HEDT platformThe disadvantages of the MCM approach versus monolithic die are two: higher latency and higher power consumption.
Highly-parallel workloads such as rendering and encoding are throughput-like, and it matters little if you run those in a monolithic die or in a four-socket system, because those workloads aren't sensitive to latencies. Blender, V-RAY, Handbrake,... run fine on MCM.
The performance disadvantage of the MCM approach comes in latency sensitive workloads, when one core in one die has to interchange information with a core in another die.
It seems my prediction Intel 10C chip would be faster than TR even under 24 threads workloads was rather good.
Also we see how both TR models are slower than the 1800X on games, as expected. I wait anxious to 1900X vs 1800X comparison.
No, you are reading it wrong.
Stock clock scores with my math:
i9 7900X Score: 2200p
i9 7960X Score: 3168p (math to get this score: 2200p * 1.6 * (3.6/4.0)
TR 1950X Score: 2900p (math to get this score: 3400p * (3.5/4.1)
Second math part was for 7900X running against TR 1950X while both are utilizing 24 threads.
Oh well, my math was bit wrong, TR is actually getting little over 3000p in CB and its because I thought the all core turbo would be 3.5Ghz and I calculated that it would only get 2900p with that.
The real all core turbo for TR is 3.7Ghz.
EDIT: pic of the day (from one techsite):
Look how big the difference is
Kyle needs to make his graphs look like this.
Wohoo, you bring couple of cases where you show that 7900X can win against 12C 1920X. I never even argued that it couldn't. My line clearly says that 7900X is a bit faster (3%) than 1920X when using avarages (according to the review you chose) but I guess you didn't bother to read that. But why are you even talking about 1920X? Your original claim was that 1950X will lose against 7900X when using <24 threads. My claim is that it will not on most workloads, until the thread count drops to 14-15 threads and I've done the math.
I do like Hardware.fr testing methodology but the ram (2400mhz) they decided to pair with TR gives it a big perf. penalty.
Low speed memory like that will harm the AMD system more then Intel due the the Fabric speed, he knows that and why hes picking that review site.
Its ok. No need to point out the 16Cpower usage against the 7900X or state the fact that CPU usage was not fully utilized on TR. Yeah no need to point out that with just a little optimization TR will only get better and based on previous Ryzen optimizations it can be quite huge. And no need to point out the platform advantage TR has with higher IO and thus far better operation as far as power delivery and temperature.
Its ok. No need to point out the 16Cpower usage against the 7900X or state the fact that CPU usage was not fully utilized on TR.
With workloads that fully load all cores, the 16C TR is drawing more power than the 7900X as expected, but the interesting part here is that AMD is using a trick on TR to maintain power under control: cores are downclocked under base frequency when the CPU is too loaded: "Moving under the base frequency is however something annoying, even if it is not the first time that we see this behavior at AMD." Reviewers also noted that some watts are missing in the way from the wall to the socket. Their current hypothesis for this discrepancy is that the CPUs are drawing the missed watts outside the ATX12 channel: "Which makes us wonder if these processors would not draw a portion of their power from the 24-pin ATX connector."
The TR does some strange stuff when it starts getting thermally loaded, and I would suggest that NO ONE had really good cooling for it for reviews, expect us.With workloads that fully load all cores, the 16C TR is drawing more power than the 7900X as expected, but the interesting part here is that AMD is using a trick on TR to maintain power under control: cores are downclocked under base frequency when the CPU is too loaded: "Moving under the base frequency is however something annoying, even if it is not the first time that we see this behavior at AMD." Reviewers also noted that some watts are missing in the way from the wall to the socket. Their current hypothesis for this discrepancy is that the CPUs are drawing the missed watts outside the ATX12 channel: "Which makes us wonder if these processors would not draw a portion of their power from the 24-pin ATX connector."