32 core Threadripper is coming out.

I could use 32 cores....I don't NEED them....but I could use them :)
So dam true. I think AMD is just kicking ass. I mean seriously I paid $1000 for my TR 16/32 1950x and for 50% more I can double the core/thread count. I really shouldn't, but dam do I really want to, and it's price is in the spot where I can say it's definitely not out of the question. :D
 
Exactly this, the price of high core count workstations is insane. Heck, look at the iMac Pro.

Getting an equivalent spec machine stood up in AWS EC2 for a man year (7.5 hours * 230 days) is $5888 per year (no point in reserved instances when you can get the developers to actually shut the f'ing things down).

My concern is how the memory controller is accessed by the second pair of dies (thru Infinity Fabric) - if the threads running are sensitive to memory bandwidth then the gains with a 32-core Threadripper may not be as high as we hope.

For threads that don't care about memory bandwidth then of course this thing's price/performance point may kick Intel's nuts into their chest :)

I wonder if the 32-core Threadripper will be able to assign processes to cores based on memory bandwidth? The cores with memory controllers get high-bandwidth processes, the cores accessing memory thru Infinity Fabric get lower-bandwidth processes? Or could that be put in the compiler, to swap processes around? This is not an "every core is equal" situation - kinda reminds me of the Sony Cell processor used in the PS3 (which worries me as we all know what happened to it).
 
My concern is how the memory controller is accessed by the second pair of dies (thru Infinity Fabric) - if the threads running are sensitive to memory bandwidth then the gains with a 32-core Threadripper may not be as high as we hope.

For threads that don't care about memory bandwidth then of course this thing's price/performance point may kick Intel's nuts into their chest :)

I wonder if the 32-core Threadripper will be able to assign processes to cores based on memory bandwidth? The cores with memory controllers get high-bandwidth processes, the cores accessing memory thru Infinity Fabric get lower-bandwidth processes? Or could that be put in the compiler, to swap processes around? This is not an "every core is equal" situation - kinda reminds me of the Sony Cell processor used in the PS3 (which worries me as we all know what happened to it).

Two dies will access memory through the controllers on the other two dies, which will generate both latency and bandwidth bottlenecks in memory-bound workloads.

I wonder if AMD will include three operating modes:

Full. All cores actives for compute-bound workloads and toy benchmarks as Cinebench.

Half. Only dies with memory controllers active for memory-bound workoloads.

Minimal. Only one die active for latency-bound workloads.

Even if AMD doesn't implement those modes in BIOS, such configurations must be the most favorable for specific applications.
 
Two dies will access memory through the controllers on the other two dies, which will generate both latency and bandwidth bottlenecks in memory-bound workloads.

I wonder if AMD will include three operating modes:

Full. All cores actives for compute-bound workloads and toy benchmarks as Cinebench.

Half. Only dies with memory controllers active for memory-bound workoloads.

Minimal. Only one die active for latency-bound workloads.

Even if AMD doesn't implement those modes in BIOS, such configurations must be the most favorable for specific applications.

It's a highly unusual configuration. I eagerly await the benchmarks to tell us what the impacts will be.
 
Yup, 32 cores is batshit insane for home users and probably like 95% of companies worldwide.

I got a fake 8 core cpu and even then I will be lucky if X = Y and Y = my real 4 cores and X can be anything from A to XXX.

Soooo, judging by that, 32 core = 98 inch epeen, but epeen changes hands every other week.

Only because his arm gets tired.

For what its worth, I work enterprise software support and I could easily use 48 cores for a home lab for an instance of our software. It seems like micro services are pushing the need for more cores, and I think more businesses are moving towards that model these days.
 
For what its worth, I work enterprise software support and I could easily use 48 cores for a home lab for an instance of our software. It seems like micro services are pushing the need for more cores, and I think more businesses are moving towards that model these days.

Cool
 
Two dies will access memory through the controllers on the other two dies, which will generate both latency and bandwidth bottlenecks in memory-bound workloads.

I wonder if AMD will include three operating modes:

Full. All cores actives for compute-bound workloads and toy benchmarks as Cinebench.

Half. Only dies with memory controllers active for memory-bound workoloads.

Minimal. Only one die active for latency-bound workloads.

Even if AMD doesn't implement those modes in BIOS, such configurations must be the most favorable for specific applications.

So just to be clear.

Latency matters when AMD SYSTEM memory is involved and we're dealing with a delta of 30-70ns.

But when Optane memory operates in orders of magnitude greater latency than ram. aka 15µs, aka 15000ns, aka 500x delta to the above delta. Fine add them together.

Shenanigans

Edit: my ms
 
Last edited:
I have a feeling it will be minimal. Will be interesting to know for sure tho.

Is AMD releasing a Zen+ 16-core Threadripper? That way you could have a CPU without the potential memory bottlenecks of the 32-core yet take advantage of the Zen+ improvements to be faster than the original Threadripper.

As a bonus it would not only be cheaper but also probably work in any current Threadripper motherboard without worrying about the VRMs catching fire.

Come to think of it, it would be easy for AMD to do this as they could disable dies on failed 32-core TR+ CPUs to make these 16-core ones.
 
Is AMD releasing a Zen+ 16-core Threadripper? That way you could have a CPU without the potential memory bottlenecks of the 32-core yet take advantage of the Zen+ improvements to be faster than the original Threadripper.

As a bonus it would not only be cheaper but also probably work in any current Threadripper motherboard without worrying about the VRMs catching fire.

Come to think of it, it would be easy for AMD to do this as they could disable dies on failed 32-core TR+ CPUs to make these 16-core ones.
Why not just get a 1950X as they are getting pretty cheap? Can always upgrade later to any other processor for the platform.
 
Why not just get a 1950X as they are getting pretty cheap? Can always upgrade later to any other processor for the platform.

Mostly to take advantage of the improvements - if the TR+ plus scales the same way from TR as 2700X did from 1800X then it would have some appeal, especially if someone is running a first-gen TR motherboard that doesn't have the power output to run a 32-core TR+ properly.

Of course the keyword is *IF*...
 
Is AMD releasing a Zen+ 16-core Threadripper? That way you could have a CPU without the potential memory bottlenecks of the 32-core yet take advantage of the Zen+ improvements to be faster than the original Threadripper.

As a bonus it would not only be cheaper but also probably work in any current Threadripper motherboard without worrying about the VRMs catching fire.

Come to think of it, it would be easy for AMD to do this as they could disable dies on failed 32-core TR+ CPUs to make these 16-core ones.


given the 32 core is being called 2990x i'm going to bet there will be a 16 core.. the 2950 or 2970 will probably be the 24 core and the 2920 or 2950 will be the 16 core and maybe 2900 or 2920 will be 12 or 8. due to the layout of the dies and how ryzen works there's tons of possibilities as far as core count goes between 8 and 32 cores.

i also agree if i was going to go threadripper i'd rather go with the zen+ architecture over the first gen even if it would save me 50-60% up front. if PB2 works the same way on TR zen+ as it does with am4 ryzen it's worth the cost if you need the cores/threads.
 
Code:
CPU      air average    LN2 record
----------------------------------
1800X    4033MHz        5803MHz
1900X    3909MHz        4797MHz

Who runs the 1900x on air and LN@ records mean nothing for the average consume. ie the 8700k has a higher LN2 record than the 8086k, last time I checked.

Here are the HWbot averages for water:
2700x - 4285Mhz
2700_ - 4155Mhz
1900x - 4163Mhz
1800x - 4097Mhz
1700x - 4027Mhz
1700_ - 3994Mhz

The sample size for the 1900x is pretty small, but it is clearly the best of the Ryzen 8-core 1st gens and even edges out the 2700 vanilla.

HWbot definitely has it's flaws in determining which chip will perform the best. It is entirely possible that the 1900x has nearly a 200 MHz advantage over a 1700 because the 1900x owners are in a group that are more willing to push their chips to the limit. It is hard to say. But again, from the few reviews I have seen, the 1900x is a boss for overclocking by Ryzen 1 standards.
 
Last edited:
My concern is how the memory controller is accessed by the second pair of dies (thru Infinity Fabric) - if the threads running are sensitive to memory bandwidth then the gains with a 32-core Threadripper may not be as high as we hope.

For threads that don't care about memory bandwidth then of course this thing's price/performance point may kick Intel's nuts into their chest :)

It'll be interesting to see what the difference is over epyc, certainly you could end up starved and it's yet more likely with the higher clock speed.

Gut feel would be that it will make it good for some workloads and bad for others. If you're running VMware and had large VMs I suspect it would be shit, because they're not equal cores and the scheduler will get fucked up waiting.

If it's something that can be parallellized and asynchronous with it, then we're all golden. For me, it'll mostly be that sort of workload, processing pointclouds and running photogrammatry, the latter can be a bit of a bitch for coherency but we'll see how we go.

It'd be sweet if they released a 4ghz epyc at the same time. :nailbiting:

Dual CPU 4ghz epyc with a motherboard like the ASUS Sage, that's what we really want:D
 
If it will retail for $1500, I will def wait a year before purchasing one. It'll come down in price later on...
 
Why not just get a 1950X as they are getting pretty cheap? Can always upgrade later to any other processor for the platform.

I had a 1950x. It had a lot of cores. Impressive a little. In rendering and video compression it was decent. Gaming was so so.

Skylake X is much better gaming and does quite well at compression and rendering etc...

However Zen+ should leverage quite an improvement due to the cache latency and infinity fab updates. I may toy with a 32 core again. What on really waiting on is Zen 2 and the next iteration of hedt.
 
Who runs the 1900x on air and LN@ records mean nothing for the average consume. ie the 8700k has a higher LN2 record than the 8086k, last time I checked.

Here are the HWbot averages for water:
2700x - 4285Mhz
2700_ - 4155Mhz
1900x - 4163Mhz
1800x - 4097Mhz
1700x - 4027Mhz
1700_ - 3994Mhz

The sample size for the 1900x is pretty small, but it is clearly the best of the Ryzen 8-core 1st gens and even edges out the 2700 vanilla.

HWbot definitely has it's flaws in determining which chip will perform the best. It is entirely possible that the 1900x has nearly a 200 MHz advantage over a 1700 because the 1900x owners are in a group that are more willing to push their chips to the limit. It is hard to say. But again, from the few reviews I have seen, the 1900x is a boss for overclocking by Ryzen 1 standards.

These values track with my own experiences.
 
So just to be clear.

Latency matters when AMD SYSTEM memory is involved and we're dealing with a delta of 30-70ns.

But when Optane memory operates in orders of magnitude greater latency than ram. aka 15µs, aka 15000ns, aka 500x delta to the above delta. Fine add them together.

Shenanigans

Edit: my ms

You seem to be confusing the use case of Optane DIMM's. They are for large working datasets. For instance, if you have a 4TB working data set (per node), and most 2S nodes only do ~1TB at a reasonable cost, you are paging to your storage (SSD) a lot. With Optane DIMM's, you can have 512GB/1TB DIMM's (6-12TB memory on 12DIMM's), so that your entire working set is in memory. Depending on the workload, the idea is that having a constant 10x latency compared to regular DRAM would be less penalty then constantly paging your storage (so 1x DRAM latency + paging at ~50-100x storage latency).
 
You seem to be confusing the use case of Optane DIMM's. They are for large working datasets. For instance, if you have a 4TB working data set (per node), and most 2S nodes only do ~1TB at a reasonable cost, you are paging to your storage (SSD) a lot. With Optane DIMM's, you can have 512GB/1TB DIMM's (6-12TB memory on 12DIMM's), so that your entire working set is in memory. Depending on the workload, the idea is that having a constant 10x latency compared to regular DRAM would be less penalty then constantly paging your storage (so 1x DRAM latency + paging at ~50-100x storage latency).

I'm not confusing anything.

I'm pointing out he hypocrisy that some apply to certain metrics based on some unusual brand loyalty. Here's a post from the related discussion on it.

No! Optane is not memory, it is storage. The more I think about it, the absolute distinction is timing. If you have variable timing you are storage, if you have a fixed timing you are memory.

Now that I reflect on it, I cannot imagine that the DIMMs intel is loading into its servers meet the requirements of the interface. A CPU cannot change timings on the fly, so whatever they expose to the system must be contiguous and of a fixed size. A requirement optane will never be able to meet.
https://hardforum.com/threads/pc-oe...cache-drives-and-claiming-its-memory.1962607/
I have yet to see a DIMM in the field other than a picture of it.

The point still stands. You can't pretend to care about simple orders when the other has orders of magnitude.
 
I'm not confusing anything.

I'm pointing out he hypocrisy that some apply to certain metrics based on some unusual brand loyalty. Here's a post from the related discussion on it.

I have yet to see a DIMM in the field other than a picture of it.

The point still stands. You can't pretend to care about simple orders when the other has orders of magnitude.

Apache pass already launched. I have seen many in the field.

https://www.anandtech.com/show/12828/intel-launches-optane-dimms-up-to-512gb-apache-pass-is-here

And I'm totally not understanding you about orders of magnitude. If I'm reading the block diagram of TR2 correctly, they're essentially having to disable 2 memory controllers on 2 dies (although presumably, I don't see why a motherboard manufacturer can't put the traces in to enable it). Epyc gets its high core count from the scaling of Infinity fabric. However, in bandwidth bound workloads, I can definitely see a relatively high performance drop if you're always fetching cross NUMA (Epyc basically acts as 4 NUMA nodes in 1S). Skylake-X is a large monolithic die and doesn't have this problem. Hence why it can gluelessly scale to 8S (8 NUMA), whereas Epyc caps out at 2S (8 NUMA as well). It's just that large monolithic dies can't scale well within a socket (hence losing the core count per socket war).

I mean, a hypothetical test to try and emulate TR2 would be to get an old 2S Opteron/Xeon (Nehalem and later) system, and only populate memory on one socket. Then run various benchmarks where you force affinity cross NUMA. Personally, I've seen even modestly bandwidth demanding apps to take relatively large performance penalties (compared to if memory was populated on both sockets).
 
https://www.anandtech.com/show/1295...ylake-servers-already-have-8-dimms-per-socket

There is newer info out there. Notice the special slots are provided on those motherboards. I look forward to getting further details on where it exists on the system.

I really have no opposition it or either technology. As a product optane is at the top of it's field. Though Intel may be overreaching on some of their marketing claims. They may also have been surprised by how quick other drive competitors closed the gap with existing technologies.

AMD is certainly blending certain technology tiers with this move. As spazztic as AMD marketing can be, they don't seem to be over hyping this for whatever reason. Maybe it will speak for itself, maybe it will only get marginal gains from their imbalanced chip and they don't want to over-hype it bulldozer style.

As for orders of magnitude, ram is in a different order of magnitude regardless of how many channels are provided, storage is another. No one should blend the two.
 
Last edited:
Apache pass already launched. I have seen many in the field.

https://www.anandtech.com/show/12828/intel-launches-optane-dimms-up-to-512gb-apache-pass-is-here

And I'm totally not understanding you about orders of magnitude. If I'm reading the block diagram of TR2 correctly, they're essentially having to disable 2 memory controllers on 2 dies (although presumably, I don't see why a motherboard manufacturer can't put the traces in to enable it). Epyc gets its high core count from the scaling of Infinity fabric. However, in bandwidth bound workloads, I can definitely see a relatively high performance drop if you're always fetching cross NUMA (Epyc basically acts as 4 NUMA nodes in 1S). Skylake-X is a large monolithic die and doesn't have this problem. Hence why it can gluelessly scale to 8S (8 NUMA), whereas Epyc caps out at 2S (8 NUMA as well). It's just that large monolithic dies can't scale well within a socket (hence losing the core count per socket war).

I mean, a hypothetical test to try and emulate TR2 would be to get an old 2S Opteron/Xeon (Nehalem and later) system, and only populate memory on one socket. Then run various benchmarks where you force affinity cross NUMA. Personally, I've seen even modestly bandwidth demanding apps to take relatively large performance penalties (compared to if memory was populated on both sockets).

Wiring the DIMMs to the socket isn't enough if the CPU itself cannot make use of them. Enabling the four memory controllers in the die would generate problems on mobos lacking half the wiring. This is a situation similar to what happened with Carrizo vs Carrizo-lite. In the end motherboard makers chose the less problematic option: single channel for both Carrizo and Carrizo-Lite. In this case it is AMD which is making the decision to maintain backward compatibility with current mobos.
 
We will not see 8 memory channels on TR4. AMD would have released a new chipset if that were the case.

New chipset, yes. But I wonder if the socket could do it.

What I mean is, is it possible that a future chipset could be released with a new crop of motherboards, enabling full 8 channel memory on the TR4 socket? It would probably break backwards compatibility (i.e. you couldn't put an 8 channel enabled chip in a 4 channel enabled board - but possibly the reverse could be done). But at some point, if the performance difference is large, it may be worth it to do that.

I dunno. Just spitballing here.
 
ddr5 which should be here in 2020 to 2021 (in whatever comes after Zen3) may eliminate this need. I assume Zen3 will not have DDR5 since I believe it would be incompatible.
 
Available interconnects will not allow it. I'm thinking there will be less interdie latency with TR2 than EPYC.

Theoretical latency will be smaller, because TR 2000 will have higher clocks for IF. But all EPYC dies can access memory directly, whereas 50% of the dies on TR 2000 have to access memory using the memory controllers of the other two dies, which will generate bottlenecks from moving data back and forth
 
New chipset, yes. But I wonder if the socket could do it.

What I mean is, is it possible that a future chipset could be released with a new crop of motherboards, enabling full 8 channel memory on the TR4 socket? It would probably break backwards compatibility (i.e. you couldn't put an 8 channel enabled chip in a 4 channel enabled board - but possibly the reverse could be done). But at some point, if the performance difference is large, it may be worth it to do that.

I dunno. Just spitballing here.

Intel's X299 could support both dual channel and quad channel memory configurations, so it's technically possible to have a similar thing done on quad versus octo, but it would require a whole revamp of the platform.

Also when X58 was first released, there were dual channel only boards, and populating a quad channel board with only two dimms makes the system run in dual channel mode, so there's nothing to say you couldn't put an octo channel chip in a quad channel board.


It's just not going to happen any time soon.
 
I mean its great that AMD made TR2 backwards compatible with TR1, but I feel they should've just gone with the full (8) memory channel route. Having 2 NUMA nodes without memory will make for inconsistent scaling, IMO. I mean, its entirely possible to fit 8 DIMM Epync in standard ATX (see below).

https://www.servethehome.com/asrock-rack-epycd8-single-socket-atx-server-motherboard/

it would potentially eat into their enterprise market profits which is a no go for them.. companies would just risk the warranty guarantee that enterprise chips come with to save money by getting TR2 instead..
 
it would potentially eat into their enterprise market profits which is a no go for them.. companies would just risk the warranty guarantee that enterprise chips come with to save money by getting TR2 instead..

No really. AMD could find ways to make both sockets pin incompatible and force market segmentation via plattform specs.

In fact the TR4 socket is a SP3r2 socket, derived from, and incompatible with, SPR3.
 
That's basically what it is Juan. They do it their way without 8 channel support which is the market segmentation. You want 8 channel, buy EPYC. Customers of AMD get a lot of something for low costs as it is. It leaves room for future platform upgrade options.
 
TR2 is looking like it might land on my desk but well have to see how it fares in gaming since it will be my all in one rig. I would use it for gaming 75% and productivity 25% and when I do productivity I push the living shit out of my box.
 
TR2 is looking like it might land on my desk but well have to see how it fares in gaming since it will be my all in one rig. I would use it for gaming 75% and productivity 25% and when I do productivity I push the living shit out of my box.

Existing TR is fine for gaming, so I can't see the new TR's being any worse.
 
Back
Top