AMD Ryzen Threadripper 2990WX & 2950X CPU Review @ [H]

Windows 10 Pro 64bit supports up to 256 cores per CPU. I hope that we can put this to rest now.

That being said, the scheduler might be absolute garbage in Windows 10 Pro and only improves as you move up to Enterprise (I'm speculating here).
I don't expect that using Windows Enterprise or Server will help. Windows may scale up to 256 cores on server workloads, but possibly not on HEDT/workstation workloads.

Linus is at least surprised how bad Windows performs in some of the benchmarks. Also not all benchmarks where the 2990WX came out ahead are perfectly tuned for parallelism.
http://openbenchmarking.org/result/1808130-RA-CPUUSAGED10

It seems to me that 2990WX is performing better on phoronix review because the suite is using many microbenches and toy-like workloads that fit into cache and avoid the latency/bandwidth penalties on the compute dies.
Certainly there are some benchmarks that fit your description, but by far not all.

In particular, 7-zip compression and Linux kernel compile don't fit into cache. CFD depends heavily on memory bandwidth, and from looking at some of the Windows results one might think that TR 2990WX has hit a wall here (source: https://techreport.com/review/33977/amd-ryzen-threadripper-2990wx-cpu-reviewed/7):
WPCcfd.png

When actually that could rather be a peculiarity of Windows and/or the benchmark (source: https://www.phoronix.com/scan.php?page=article&item=amd-linux-2990wx&num=4).
rodinia-2990wx.png
 
Only thing they had going for them was compiler fucking and sse2 video applications, bit like how they're trying with avx512 today, just that's hardly used at all, so they can't pull the same trick. Whoops..

But we use the crap out of SSE2 today, for everything.

These instruction additions are meant to address x86's (and mostly x87's) inherent deficiencies and are doing a damn fine job of it. AVX512 continues down that road, and for workstation tasks, would be something highly desired. Hard to imagine building a compute-heavy application today without considering its use, especially given the latency and bandwidth benefits of having it done locally versus pushing it across the PCIe bus to GPU(s).

[Edit] And I'll add: I'd love to see the 28-core Intel compared to the 32-core AMD with an AVX512-optimized workload. Hopefully compiled separately for each, with optimizations for each, such that the difference between the two can be reasonably sussed.
 
Last edited:
  • Like
Reactions: N4CR
like this
Also not all benchmarks where the 2990WX came out ahead are perfectly tuned for parallelism.

Parallelism isn't the only thing to consider here, though- it's how resources are distributed, specifically with respect to memory bandwidth, on the 29x0WX. Having cores that don't have local memory access at all is definitely going to throw a wrench into things. Just being well-threaded might actually be a detriment in some cases if the process isn't also aware of and considers how memory is also distributed.
 
But they didn't say NOT to try more ram though ;)

If you've got some spare dimms it might be worth the time to test it out.

A thought on this- the issue could be alleviated by more RAM, but more likely, performance issues with highly-threaded workloads are going to be constrained by having half of those threads on cores that are not directly connected to a memory controller.

But hell try it if it can be tried.
 
Seems that one part of the reason why the 2990WX does poorly in games is the NVidia driver. NVidia GeForce driver seems to have scaling problems to 64 threads.

https://www.golem.de/news/32-kern-cpu-threadripper-2990wx-laeuft-mit-radeons-besser-1808-136016.html (in German)

Having cores that don't have local memory access at all is definitely going to throw a wrench into things.
Do you have an example of a workload where this is reasonably the case? From what I have seen so far, either the operating system or the benchmark application itself appeared to not scale properly.
 
Beast Mode: ON

That 2990WX is fully unchained! I'm really, really looking forward to the coming couple/few years as professional applications and game engines are optimized to take advantage of that number of cores/threads.

My only minor nitpick: I wish AMD had enabled dual socket SMP for TR2. 64C/128T running at 3.0+ GHz for a fraction of the price of the same core/thread EPYC setup? YES PLEASE!
 
Do you have an example of a workload where this is reasonably the case? From what I have seen so far, either the operating system or the benchmark application itself appeared to not scale properly.

Uh, I'm getting at the 'not scale properly' problem; if an application attempts to use cores with remote memory access the same way it uses cores with local memory access, there are going to be performance issues. Whether that's the Windows scheduler not holding the applications hand or the application just not accounting for the difference is certainly a point of concern, but the overriding issue is that a processor that has cores that are not directly connected to main memory is being used in a 'consumer' environment.
 
Beast Mode: ON

That 2990WX is fully unchained! I'm really, really looking forward to the coming couple/few years as professional applications and game engines are optimized to take advantage of that number of cores/threads.

My only minor nitpick: I wish AMD had enabled dual socket SMP for TR2. 64C/128T running at 3.0+ GHz for a fraction of the price of the same core/thread EPYC setup? YES PLEASE!

AMD does not want to give away the market that buys EPYCs to one that can by TRs for a lot less. Thus I very much doubt you will ever see multi-socket TRs. AMD is making some massive headway in the server market with EPYC which is pretty much trouncing Intel. They won't jeopardize the huge server market for a fringe group that wants to run multi-socket TRs.

That said there is really nothing stopping you from running a less enterprise-class server with a 64-core TR. It would handle server duties just fine as long as you are not expecting to need ECC memory or heavy memory access server apps.
 
Would it be fair to just swap 'Infinity Fabric (IF)' for 'uncore'? I get that the IF is just one part of uncore, albeit probably the most prominent relative to power draw, but what we're really talking about is how non-compute power scales considerably when the number of cores scales up, right?

While related, context matters. When routing a few connections on a less complex chip, you could blend the IF and uncore. Topographically it's hard to lump all this into one comparable unit. The longer the lines the more power they use. Why? Combinations. If you're using a point to point model where each connection is distinct and routeable:

nCr (specifically the number of pathways between n objects is nC2)

2 = 1
3 = 3
4 = 6
5 = 10
6 = 15
7 = 21
8 = 28

Pathways can be directional and shared, this cuts down on the complexity of things but now they must be arbitrated. Intel's ring and mesh are a good representation of certain tradeoffs for complexity and distance.

Threadripper1 is 2 chips linked together. You have one pathway. Threadripper2 is 4 chips with 6 pathways. Thus 6x.

This can apply to almost any topology.

Now to the crux of all this. Is "uncore" getting bigger? Relative to others things when they get smaller. Yes.

Is it fair to say anands numbers are wrong or off when they explained in great detail and context what they were representing? No.

AMD is not going to win an uncore war with a ring or mesh bus. They are just different beasts. It's a tradeoff.
 
Last edited:
But we use the crap out of SSE2 today, for everything.

These instruction additions are meant to address x86's (and mostly x87's) inherent deficiencies and are doing a damn fine job of it. AVX512 continues down that road, and for workstation tasks, would be something highly desired. Hard to imagine building a compute-heavy application today without considering its use, especially given the latency and bandwidth benefits of having it done locally versus pushing it across the PCIe bus to GPU(s).

[Edit] And I'll add: I'd love to see the 28-core Intel compared to the 32-core AMD with an AVX512-optimized workload. Hopefully compiled separately for each, with optimizations for each, such that the difference between the two can be reasonably sussed.

Please name three useful applications that use AVX 512 instructions.

AMD could have implemented AVX 512, however, it takes up a lot of silicon real estate. Most tasks that AVX 512 instructions can be used for can be offloaded and handled way better and faster by a GPU. Intel does not make GPUs (their integrated graphics crap doesn't count), so of course, Intel has a vested interest in offering similar functionality in their CPUs, albeit way slower and way less efficient. AMD, on the other hand, makes the Radeon Instinct line for example. So take a wild guess why for AMD it makes no sense to implement AVX 512. If you're still in doubt, remove any AVX multiplier limit in your BIOS (in case you have a Skylake-X CPU), fire up Prime 95, and watch your CPU temperature and power consumption go through the roof.

If management at Intel has any common sense, they will let that 28 core Xeon be. That presentation at Computex was reactionary, and it was meant to prevent AMD from showing their own 32 core Cinebench scores, which are lower than Intel's overcooked Xeon chip. Nothing more. Intel can either hold out and hang around the mid range until they can manufacture at 10nm, or they can follow AMD's lead and implement MCM in their designs. What do you think would be more cost effective and profitable for Intel?
 
  • Like
Reactions: N4CR
like this
Please name three useful applications that use AVX 512 instructions.

Ouch!

AMD could have implemented AVX 512

And not only should they have, they will.

Intel does not make GPUs

They don't make discrete GPUs

Intel does not make GPUs (their integrated graphics crap doesn't count)

It absolutely does. Not only can you game on them- I do, and I have a 1080Ti in my main rig!- but they're fully-featured and the drivers are working very well.

so of course, Intel has a vested interest in offering similar functionality in their CPUs, albeit way slower and way less efficient.

You're entirely skipping over what happened to their second try at discrete GPUs- they become the very effective Xeon Phi accelerators, that also make heavy use of AVX. And unlike GPUs, both Intel Xeon CPUs and Xeon Phi accelerators support branching code running natively on each compute core.

So take a wild guess why for AMD it makes no sense to implement AVX 512.

Honestly and with all intent of fair evaluation, they likely just had not gotten around to it. I get why they likely focused on refining other parts of Zen 1, especially given just how rough the memory controller was on release.

If you're still in doubt, remove any AVX multiplier limit in your BIOS (in case you have a Skylake-X CPU), fire up Prime 95, and watch your CPU temperature and power consumption go through the roof.

I have, many times before!

And it tells me that AVX512 is rather easy to put to use, and that it's capable of doing some real work.


Now, all of just to say that it's something that would have been nice for AMD to have included, and it's something that they lack in the workstation space. Again, if I were building apps that needed to do compute on the CPU, I'd want the latest AVX support in my code. If I were buying a workstation, I'd have to balance AVX512 support with a few more cores (at lower clockspeeds and lower IPC).

For myself, I'd probably still choose TR2, but that's because I have no personal need for these parts. Work is another matter entirely.
 
I went almost 10 years. Sorry to take this off topic again.
just looked, pretty easy with only 65 ;) that looked like a worthwhile post though, saved it to call out bs.

back ot, wish I had need/money for one of these, maybe just to play with.
 
Sorry if I am missing it, but was there any followup on the MSI MEG X399 Creation motherboard issue (not sure if you're allowed to share what the issue was or not)? I want to pull the trigger on this thing, but that somewhat cryptic note about it breaking has me concerned.
I had corrupted the UEFI on the board. Not sure how I did, but I did. I used the UEFI Flashback feature with a USB stick and I was back up and running. Doing 2950X OCing now.
 
I haven't used Hyper-V in years. I was planning on giving that a shot. I know VMWare fairly well so that's where my comfort zone lies.

Really like hyper-v now, just due to the simplicity and features, but with better networking and less downtime. Nothing against vmware as it served us well, except their pricing strategy. We switched from vmware to hyper-v a couple of years ago at our business and no regrets. I wouldn't have done that 4 years ago.
On another note, I would love to see some type of multi application benchmark. I've never seen someone produce what I would call a real life scenario where you could benefit from all these cores.
For home use, I like running multiple vm's while still using Windows 10 for core. I don't think I'm out of the norm for this type of configuration either. Some encoding going on in background while working with Photoshop, etc.
 
My I7 920 4.0ghz with a 5970 overclocked system needed every bit of 1000w almost 10 years ago... not sure what all the fuss is about systems using that much power.
 
  • Like
Reactions: N4CR
like this
I've pretty much only used Blender on Linux until very recently. Since I'm most familiar with Linux as a platform, I'm much more comfortable with it as a platform. All of the benchmarks I've seen with Blender using the 2990wx show Linux to be the best performing OS with that processor. Given the fact that I'm most comfortable with various Linux distros as compared to Windows and the fact that performance benchmarks better on the 2990wx with Linux, I'll just stick with Linux when I build a 2990wx system. As always, and as specifically indicated in my previous post, YMMV.

https://www.phoronix.com/scan.php?page=article&item=2990wx-linux-windows&num=1
There you go. Give us some scope. Phoronix, which does GREAT Linux content, showed a 14% decrease in render time in blender. (I will type it out here to add value to the thread with data.) I would suggest that Windows is having some big scheduler issues with this CPU and we will likely see that gap greatly close. I hope they show us the same tests with the 2950X.
 
I realize that time is money, but all that drama over 14% on a newly-released part benchmarked on a consumer OS?

Wild.
 
Sure, but are you replacing your production machine with just-released hardware?
Businesses constantly replay machines every year. Not like every machine is on the same timeline. And with Intel not keeping sockets around long, these companies plan on buying fully new systems because of the ecosystem that Intel has created. You can buy 4 2990X boxes for the upgrade/replacement cost for a new Xeon workstation.
 
After reading and watching people trying to overclock this, 2990WX chip, as well as the theoretical bits on power draw and heat output, it feels like the 2990WX is exceeding what the current "extreme" parts have to offer.

I watched buildzoid's MSI mobo breakdown
Theoretically, the 2990WX can exceed 2x EPS 8pin power connectors... which do 480W each...

So, over 960W of power draw on the CPU alone... that's insane
The MSI board has 1160W? of power delivery, and that's potentially not enough either?... that's insane
And when you have 960W+ of heat from the CPU to dissipate, even LN2 will run into thermal issues... that's insane :eek:

So... like... On water, I'm guessing you would need a peltier and water chiller or two to keep the 2990WX from thermal throttling when overclocked

TLDR: Mind = Blown
 
After reading and watching people trying to overclock this, 2990WX chip, as well as the theoretical bits on power draw and heat output, it feels like the 2990WX is exceeding what the current "extreme" parts have to offer.

I watched buildzoid's MSI mobo breakdown
Theoretically, the 2990WX can exceed 2x EPS 8pin power connectors... which do 480W each...

So, over 960W of power draw on the CPU alone... that's insane
The MSI board has 1160W? of power delivery, and that's potentially not enough either?... that's insane
And when you have 960W+ of heat from the CPU to dissipate, even LN2 will run into thermal issues... that's insane :eek:

So... like... On water, I'm guessing you would need a peltier and water chiller or two to keep the 2990WX from thermal throttling when overclocked

TLDR: Mind = Blown
Well that's 960 watts via the spec. They can go past that, they just might melt if you aren't careful.

Though yes crazy amounts of power if pushed, in all the best ways of course.
 
  • Like
Reactions: N4CR
like this
After reading and watching people trying to overclock this, 2990WX chip, as well as the theoretical bits on power draw and heat output, it feels like the 2990WX is exceeding what the current "extreme" parts have to offer.

I watched buildzoid's MSI mobo breakdown
Theoretically, the 2990WX can exceed 2x EPS 8pin power connectors... which do 480W each...

So, over 960W of power draw on the CPU alone... that's insane
The MSI board has 1160W? of power delivery, and that's potentially not enough either?... that's insane
And when you have 960W+ of heat from the CPU to dissipate, even LN2 will run into thermal issues... that's insane :eek:

So... like... On water, I'm guessing you would need a peltier and water chiller or two to keep the 2990WX from thermal throttling when overclocked

TLDR: Mind = Blown

I am no expert, but I think Buildzoid is wrong on that 480W number for an 8 pin connector.

I have seen multiple places that say 336W:
https://forums.servethehome.com/ind...r-supply-with-dual-8-pin-eps-connectors.8371/
http://www.jonnyguru.com/forums/showthread.php?p=97924

4 of the pins are ground, (black), and the other 4 carry power, (yellow)
Multiple people are saying 7 amps is the safe level for each pin.
4 pins x 7 amps x 12V = 336W so the math makes sense
 
Multiple people are saying 7 amps is the safe level for each pin.

This is going to be relative; I would probably agree (if I knew the spec) that 7A is probably a good 'safe' level of current, but every step along the way can be overbuilt.

To that end, with a motherboard and a PSU that are both at least marketed for overclocking, 'safe' might be a good bit higher.

Of course, the only way to know is the [H] way- hook it up and see how many Amps it takes to pop or ignite something :D
 
I think in an art or production department rendering scenes all day I'm not gonna cheap out on my rendering equipment and go either (multi) Epycs or Xeons. Time is money ;)

So brand loyalty renders faster than actual price / performance?
 
I think in an art or production department rendering scenes all day I'm not gonna cheap out on my rendering equipment and go either (multi) Epycs or Xeons. Time is money ;)
Well, I have actually talked to some guys that do this for a living lately, and you are talking about a HUGE delta in price. Using your logic sounds fine until you actually considering having to pay for it and how much those platforms are costing your per minute.
 
Back
Top