Threadripper 1950X or 3950X for linux workstation + nested virtualization.

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
So, switched up how my home lab works. Now have my 3960X (main workstation) and a 1950X (HTPC/Plex Server) running virtualized ESX hosts on NVMe drives, which means even as nested, performance is more than adequate on the guests - and I was able to consolidate down the number of running machines quite a bit too. Plus, both can game/encode/etc while running the nested hosts without even blinking...

Debating on picking up one more 1950X combo for the final part, or I could grab a modern x570 board and a 3950X. Would feed it 64G of ram for now and an NVMe drive or two, plus a 10G card if needed, and drop linux on it - giving 8c/32G to the nested host, leaving half (ish) for other workloads. This would give me 24C/96G of nested capacity to work with, which is adequate for my current needs :)

The advantage of the x570 is that it's Zen2/Zen3, which is more powerful, and tends to be more ram tolerant (better IMC) - and it doesn't have the crappy x399 socket mess that is a PITA to deal with. But, it's more expensive - there are a couple of x399 combos for sale for ~500/600, which only covers the 3950 CPU... And it tops out at 16C and dual-channel ram, plus it won't really like 128G if I ever jump there.

The x399, however, has quad-channel ram (matters a ~bit~ for this workload), is cheaper, and can go up to 32C later on if needed (2990wx), plus potentially take more NVMe drives on the PCIe bus... but is more finicky, is older Zen 1/1+, costs more, and pulls a lot more power...

It's a bit of a weird workload, for sure - and I'm trying to keep cost down as much as possible, so the TR is tempting... but while the board/cpu for TR are cheaper, everything ~else~ is more expensive...
 

zandor

2[H]4U
Joined
Dec 14, 2002
Messages
3,326
My general philosophy on this is you have two Threadrippers on different sockets so it would be more fun to pick up something different. I'd either get the 3950X assuming you're not hiding one from us or maybe go Intel if you can find something appropriate at the right price. If you prefer more of the same get the Threadripper.
 

Ready4Dis

2[H]4U
Joined
Nov 4, 2015
Messages
2,426
Check out the latest comparison of the 5950x, not sure availability, but apparently it's able to out pace a 10980xe in both single and multithread (in at least one leaked benchmark, so add salt). As you noted though, it's not always cheaper just because it's "consumer" and finding anything that officially supports proper ECC (or even unofficially but properly supports) is um... yeah, so if that's important not much choice. Honestly if you need the bandwidth and think you may need 128GB of ram at some point, it may be worth just using the TR. You CAN run 128GB of an x570, just don't expect to be able to run super high clocks with super tight timings.
Seems you are in a slightly odd spot, where the "consumer" hardware" may actually run better than the TR for some time, but there wouldn't be any room to throw more cores at it if you ever needed and stuck with dual channel if it ever becomes a bottleneck. I guess it could be much worse, could have nothing (reasonable) available instead of having to many choices ;).
Have you priced them out fully yet? Like a cost breakdown and then compared what the performance difference may be (try to find some sort of benchmarks that line up with your use case)? And keep in mind if a 3950x is faster, the 5950x would be like 10-20% faster than that. Also, not sure if you care about cooling and power, but those are some things to keep in mind as well.
 

Iratus

[H]ard|Gawd
Joined
Jan 16, 2003
Messages
1,517
If you need pci lanes get Threadripper, otherwise the 3950 is fine. Use an m2 add in card if you need some lane consolidation.

The first gen TR are a bit shit though, we junked them all for the 2000’s on day 1. It was a pretty massive jump but an expensive day.

The approach you describe for the x570 works fine though. I use it at work and shortly at home too. It’s nice with gpu pass through as you can switch between games, ml, docker guests super easy and without gungy multi purpose vms. Bit of a pain to set up, but it’s paid for itself 50 times over for me.

As for other workloads I was doing 120:1 consolidation on VMware using boxes with 16 cores and 128gb of ram 12 years ago. You sound like you know it of course but just manage your queues and don’t deploy big vms unnecessarily as it fucks the efficiency.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
My general philosophy on this is you have two Threadrippers on different sockets so it would be more fun to pick up something different. I'd either get the 3950X assuming you're not hiding one from us or maybe go Intel if you can find something appropriate at the right price. If you prefer more of the same get the Threadripper.
Can’t be intel, has th be same “family” at least for compatibility purposes (thought hard about getting x299, but you just can’t migrate between them).
 

/dev/null

[H]F Junkie
Joined
Mar 31, 2001
Messages
15,069
So, switched up how my home lab works. Now have my 3960X (main workstation) and a 1950X (HTPC/Plex Server) running virtualized ESX hosts on NVMe drives, which means even as nested, performance is more than adequate on the guests - and I was able to consolidate down the number of running machines quite a bit too. Plus, both can game/encode/etc while running the nested hosts without even blinking...

Debating on picking up one more 1950X combo for the final part, or I could grab a modern x570 board and a 3950X. Would feed it 64G of ram for now and an NVMe drive or two, plus a 10G card if needed, and drop linux on it - giving 8c/32G to the nested host, leaving half (ish) for other workloads. This would give me 24C/96G of nested capacity to work with, which is adequate for my current needs :)

The advantage of the x570 is that it's Zen2/Zen3, which is more powerful, and tends to be more ram tolerant (better IMC) - and it doesn't have the crappy x399 socket mess that is a PITA to deal with. But, it's more expensive - there are a couple of x399 combos for sale for ~500/600, which only covers the 3950 CPU... And it tops out at 16C and dual-channel ram, plus it won't really like 128G if I ever jump there.

The x399, however, has quad-channel ram (matters a ~bit~ for this workload), is cheaper, and can go up to 32C later on if needed (2990wx), plus potentially take more NVMe drives on the PCIe bus... but is more finicky, is older Zen 1/1+, costs more, and pulls a lot more power...

It's a bit of a weird workload, for sure - and I'm trying to keep cost down as much as possible, so the TR is tempting... but while the board/cpu for TR are cheaper, everything ~else~ is more expensive...
I have a similar workload (although I don't do nested virtualization) and I've decided to actually move my plex stuff to mostly be done by Quadro K2200 video cards. (no H265 though). I have an old(er) cisco ucs server with 64G (and a full identical sever off for testing/backup/spare parts) which runs a plex (soft only) vm, an i5-3570s with a quadro that runs as a plex server and I am using an e3-1240v3 as my other plex server. Works ok. The e3-1230 and i5-3570s together pull maybe 130w from the wall total. Both have h264 aceleration. I use my VM server for experimenting, mail server, software based routers, etc. Plex servers are on 1/10G ethernet (either DAC or mmf).
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
I have a similar workload (although I don't do nested virtualization) and I've decided to actually move my plex stuff to mostly be done by Quadro K2200 video cards. (no H265 though). I have an old(er) cisco ucs server with 64G (and a full identical sever off for testing/backup/spare parts) which runs a plex (soft only) vm, an i5-3570s with a quadro that runs as a plex server and I am using an e3-1240v3 as my other plex server. Works ok. The e3-1230 and i5-3570s together pull maybe 130w from the wall total. Both have h264 aceleration. I use my VM server for experimenting, mail server, software based routers, etc. Plex servers are on 1/10G ethernet (either DAC or mmf).

This is similar. I ran mine in a VM on the 3960X for a bit - I realized the possibilities when my wife was logged into a training lab I built for her, the plex server was transcoding a video for someone, I had a movie playing and an encode going on handbrake, AND fired up subnautica - and didn't even notice the other workloads.

So I moved Plex to the 1950X, fed it a 970 for the moment for NVENC, and built another VM host on there so I had some failover. You can use either and never even notice. :p

Now I want a third, just because.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
Check out the latest comparison of the 5950x, not sure availability, but apparently it's able to out pace a 10980xe in both single and multithread (in at least one leaked benchmark, so add salt). As you noted though, it's not always cheaper just because it's "consumer" and finding anything that officially supports proper ECC (or even unofficially but properly supports) is um... yeah, so if that's important not much choice. Honestly if you need the bandwidth and think you may need 128GB of ram at some point, it may be worth just using the TR. You CAN run 128GB of an x570, just don't expect to be able to run super high clocks with super tight timings.
Seems you are in a slightly odd spot, where the "consumer" hardware" may actually run better than the TR for some time, but there wouldn't be any room to throw more cores at it if you ever needed and stuck with dual channel if it ever becomes a bottleneck. I guess it could be much worse, could have nothing (reasonable) available instead of having to many choices ;).
Have you priced them out fully yet? Like a cost breakdown and then compared what the performance difference may be (try to find some sort of benchmarks that line up with your use case)? And keep in mind if a 3950x is faster, the 5950x would be like 10-20% faster than that. Also, not sure if you care about cooling and power, but those are some things to keep in mind as well.
You nailed it. I KNOW that 3950/5950 would do just fine ~right now~, but this is a 3 year refresh most likely... will it need more in the future? It'll be faster than a 1950X for certain... but I could take a used 2970 or 2990, feed it 128G of ram, and have even more... and cores matter more than clocks in many cases for this... But that would also be very expensive, and if I don't need it, the 3950 would be cheaper overall... especially buying used gear. Haven't priced them out fully yet - guess I'll do that this evening. And cooling/power does matter somewhat... so the 3950 wins there.

The other part - I know both TR platforms don't blink at running 5-6 VMs plus nested AND a user workload at the same time. That's what they're built for. I've not tried that on consumer gear in 5 years, pre- 8c+ CPUs... so I don't know if the experience is the same, or if the extra cache/memory bandwidth/PCIE bandwidth on HEDT makes that much a difference or not. I guess we might have to find out...
 

Ready4Dis

2[H]4U
Joined
Nov 4, 2015
Messages
2,426
You nailed it. I KNOW that 3950/5950 would do just fine ~right now~, but this is a 3 year refresh most likely... will it need more in the future? It'll be faster than a 1950X for certain... but I could take a used 2970 or 2990, feed it 128G of ram, and have even more... and cores matter more than clocks in many cases for this... But that would also be very expensive, and if I don't need it, the 3950 would be cheaper overall... especially buying used gear. Haven't priced them out fully yet - guess I'll do that this evening. And cooling/power does matter somewhat... so the 3950 wins there.

The other part - I know both TR platforms don't blink at running 5-6 VMs plus nested AND a user workload at the same time. That's what they're built for. I've not tried that on consumer gear in 5 years, pre- 8c+ CPUs... so I don't know if the experience is the same, or if the extra cache/memory bandwidth/PCIE bandwidth on HEDT makes that much a difference or not. I guess we might have to find out...
Honestly, I haven't done a ton with VM's, but I do use docker containers without issue on even a 4 core. Also, if you are looking at the 3950x and you have a little while before you are planning to pull the trigger I would really check out the 5950x. 16/32, 105w TDP (in AMD terms, this is about 144 watts actual) and it should be 15-20% in both single and multi threaded loads. I don't think running the VM's make any difference so much on the platform if you have the CPU's and RAM to support (obviously things like memory bandwidth can make a difference). PCIe lanes only matter when you run out, which is less likely on a TR, but if you only need a 10gbe (you can get that on board with some MB's) and a GPU or something, it's not as if you're pushing limits. Now if you have a bunch of cards you need to install (capture cards, gpu, multiple ethernet, raid or SAS cards, etc) then that leans towards TR (also, how many drivers, SATA, NVME, w/e you plan to run). I would for sure go x570 if you decide to stay "consumer" which I think you already know anyways, but hey :). Crazy thing is, the 3950x keeps up very well (And sometimes wins) against the 2990wx, and when it loses it's not by much. While you said some of your stuff likes cores, so it may be slightly ahead, but not that much for the price difference, and if you can snag a 5950x it would probably be ahead in most if not all tasks even with it's core deficit. Obviously, the TR hardware is geared more towards what you're doing, but I honestly think either one would work just fine for you.

A few linux benchmarks, don't think it's really your work load type stuff, but you can get an idea how close the 2990wx and 3950x are in performance in a few work loads.
https://techgage.com/article/a-linux-performance-look-at-amds-16-core-ryzen-9-3950x/
 

Dan_D

Extremely [H]
Joined
Feb 9, 2002
Messages
57,205
The 3950X will vastly out perform a first generation Threadripper. It's not even close. I did the testing back when the Ryzen 3000 series launched. The Ryzen 9 3900X was pitted directly against the Threadripper 2920X and outside of the one After Effects test, the 3900X absolutely destroyed the 2920X. The only reason the After Effects test favored the 2920X was due to the test platform having 32GB of RAM instead of 16GB.

Here is an example of what I'm talking about:
 

Attachments

  • 1603402806763.png
    1603402806763.png
    60.2 KB · Views: 0

jrobdog

Limp Gawd
Joined
Dec 4, 2006
Messages
300
It comes down to PCIE lanes. The 3950x is awesome, but you may find yourself limited by the expansion slots.
 
  • Like
Reactions: tived
like this

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
The 3950X will vastly out perform a first generation Threadripper. It's not even close. I did the testing back when the Ryzen 3000 series launched. The Ryzen 9 3900X was pitted directly against the Threadripper 2920X and outside of the one After Effects test, the 3900X absolutely destroyed the 2920X. The only reason the After Effects test favored the 2920X was due to the test platform having 32GB of RAM instead of 16GB.

Here is an example of what I'm talking about:
Oh I know - on a core for core basis, the 3XXX will run rings around the TR 2XXX/1XXX all day long - but is per core performance as important as memory performance (which we don't have a good benchmark for between the two, quad vs dual, for this workload), and core counts (which are the same to start, but even the 19XX goes higher than the 16c that consumer Zen tops out at? That's the part I'm debating - and I don't think we entirely know, so I'm going to have to guess and find out.

That's the hangup I have - wondering if I need the future expansion, or if the limited memory and PCIE bandwidth might be a bottleneck for this use case, given that we're talking about running a normal OS, a nested ESX host, 2-3 nested VMs, and 2-3 front-line VMs. I know that TR3 chews it up and spits it out - and TR1 does pretty much the same... but is that because anything modernish would (having not run stuff like this in years), or is it that the extra horsepower that HEDT brings makes the difference? The only consumer platform I have to compare against is a 10700k, which isn't even in the same ballpark - especially cause it's built 100% as a gaming box.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
Honestly, I haven't done a ton with VM's, but I do use docker containers without issue on even a 4 core. Also, if you are looking at the 3950x and you have a little while before you are planning to pull the trigger I would really check out the 5950x. 16/32, 105w TDP (in AMD terms, this is about 144 watts actual) and it should be 15-20% in both single and multi threaded loads. I don't think running the VM's make any difference so much on the platform if you have the CPU's and RAM to support (obviously things like memory bandwidth can make a difference). PCIe lanes only matter when you run out, which is less likely on a TR, but if you only need a 10gbe (you can get that on board with some MB's) and a GPU or something, it's not as if you're pushing limits. Now if you have a bunch of cards you need to install (capture cards, gpu, multiple ethernet, raid or SAS cards, etc) then that leans towards TR (also, how many drivers, SATA, NVME, w/e you plan to run). I would for sure go x570 if you decide to stay "consumer" which I think you already know anyways, but hey :). Crazy thing is, the 3950x keeps up very well (And sometimes wins) against the 2990wx, and when it loses it's not by much. While you said some of your stuff likes cores, so it may be slightly ahead, but not that much for the price difference, and if you can snag a 5950x it would probably be ahead in most if not all tasks even with it's core deficit. Obviously, the TR hardware is geared more towards what you're doing, but I honestly think either one would work just fine for you.

A few linux benchmarks, don't think it's really your work load type stuff, but you can get an idea how close the 2990wx and 3950x are in performance in a few work loads.
https://techgage.com/article/a-linux-performance-look-at-amds-16-core-ryzen-9-3950x/
Containers end up being way lighter than a full VM (no OS/etc), and much lighter than nested does - especially since I have containers IN the nested VMs :p So I'm going Hardware->Host OS->Nested ESX->Guest OS->Docker :D It's a weird workload - normally folks would do this (if ever) on massive servers - where it runs like shit, in fact - but workstation gives you some unique options as a RTOS that make it more usable ... and still keep the box sane on its own :)

5950 - I'm planning on buying used, so I figure we'll have 3950x hitting the streets used soon for a good price. More money I save, the more drives/NVMe/RAM I can pick up for all the other parts right now.

Real requirements are 16C to start, 64G of ram, at least 2 NVMe drives, 10G networking, wireless (for the host OS) is a plus, and then I'd like some sata ports for later on. The TR systems are getting all the expansion cards for high-end storage right now - which is fine, that's by design. But will x570 be limited or slow the host OS when you're running all that extra stuff - that's the part that there doesn't seem to be a clear answer, because most people doing this don't care about the host OS anymore - or care only about the Host OS and don't care about the guests. I want both, now that I know I can HAVE both.

Honestly, I'm leaning towards an X570 with an add-in 10G card, since the boards with 10G are... stupid pricey for what I need. Won't use the extra features. So Meg Unify or Meg Ace or Strix Gaming E, I think.
 

Dan_D

Extremely [H]
Joined
Feb 9, 2002
Messages
57,205
Oh I know - on a core for core basis, the 3XXX will run rings around the TR 2XXX/1XXX all day long - but is per core performance as important as memory performance (which we don't have a good benchmark for between the two, quad vs dual, for this workload), and core counts (which are the same to start, but even the 19XX goes higher than the 16c that consumer Zen tops out at? That's the part I'm debating - and I don't think we entirely know, so I'm going to have to guess and find out.

There are plenty of memory benchmarks showcasing the difference between Threadripper and something like the Ryzen 3000 series. However, the additional memory bandwidth really only benefits a small subset of workloads and applications. MSSQL or database type transactions are one of them. The 1st generation Threadripper's top model is the 1950X. It's a 16c/32t CPU just like the Ryzen 9 3950X. However, there is no comparison as the latter is significantly faster.

That's the hangup I have - wondering if I need the future expansion, or if the limited memory and PCIE bandwidth might be a bottleneck for this use case, given that we're talking about running a normal OS, a nested ESX host, 2-3 nested VMs, and 2-3 front-line VMs.

If you are talking about the Threadripper 1000 or 2000 series, it's limited to PCI-Express 3.0. X570 and the Ryzen 3000 series have PCIe 4.0 support. Therefore, the X570 has more PCIe bandwidth than TR, not less. If you compare Threadripper to the regular Ryzens, the former has more PCIe lanes, not more bandwidth. Their bandwidth is the same when comparison identical generations. The memory bandwidth of Threadripper is the only real advantage you gain if we are limiting this to Threadripper 1000 and 2000. Even so, as I said there are plenty of benchmarks out there showing little difference if any in most workloads. Despite what you are planning to do, I'm not sure you need that.

I know that TR3 chews it up and spits it out - and TR1 does pretty much the same... but is that because anything modernish would (having not run stuff like this in years), or is it that the extra horsepower that HEDT brings makes the difference? The only consumer platform I have to compare against is a 10700k, which isn't even in the same ballpark - especially cause it's built 100% as a gaming box.

HEDT does not bring extra "horsepower" to the table. HEDT has two things over the mainstream segment. Additional memory bandwidth and increased core density. The platform has additional flexibility due to having more PCIe lanes, but those don't improve performance unless you need the expansion capability. In this case, the Threadripper 1950X is the top model for the 1000 series. It has no core density advantage over the Ryzen 9 3950X. The Threadripper 2990WX is the top model for the 2000 series and while it's 32c/64t, it has much lower clocks than the The Ryzen 9 3950X. Core density is there, but memory bandwidth isn't what you'd think it is. While it does support quad-channel RAM, that CPU's NUMA architecture essentially means half the CPU has no direct memory access slowing it down even further. Many people who used that CPU in a professional capacity find out that having half the memory bandwidth of Epyc is a big problem for it.

Threadripper 1000 and 2000 have a different layout architecturally. Without the Infinity Fabric, the cores have massive penalties when crossing CCX boundaries internally. 1st and 2nd generation Ryzens and Threadrippers have high internal latencies and that's what the architectural changes in Zen2 specifically improve upon. The Infinity Fabric and equal access to the I/O die is one way this is done. The second is the massive increase in L3 cache. This helps mask and overcome those issues. Having said all of that, the Threadripper 2970WX or 2990WX may be faster for your needs but it really depends on what you actually need out of these VM's.
 

somebrains

[H]ard|Gawd
Joined
Nov 10, 2013
Messages
1,401
With respects to running ESXi you map your desired features to white box component matrix.

Your main concern sounds like commodity resources.

I'd want HEDT for nvme drives if I had specific workloads like "I wanna run K8s to learn how to synthetically populate a data lake.". That would be an IO driven endeavor, compute is really an after thought. Networking would be a challenge as you'd want dedicated IOps tiered, as well as clearly defined internal and external networking at a given thru put.

Say you want to add on filesystem, message queue, monitoring at a given depth. You'd want more resources outside of your commodity VMs. #1 is to keep your transactions in flight cached btw an evac event will be a learning event. This used to be nitty gritty HA/FT architecture we didn't touch as often, but is highly relevant as a walk in skillset now. I have recently run different versions of Redis as a key value store to see what performance gains in 6.0 are reality vs PR claims.

Maybe you want to explore synthetic user testing, and want to try diff frameworks vs db performance. That's a worthy uptime experiment that wouldn't require HEDT. Your focus is sql db replication. Spinning rust or basic safe ssds could mirror those read replicas for you. But pass thru of usb device would be a major concern.

You could be using gpu acceleration for Jupyter notebooks, how many would make you hit the hardware matrix.

64gb of ram isn't much, my last lab box was x299 with 128gb of ram.
I didn't have to do much workload orchestration as I was focused on container deployments.
More traditional pre 2015 virtualization with static vms running 24/7 os images should be ok assuming you aren't intentionally running scripts that peg your compute.

As always, it depends on what you are doing, and the scale you want to take it.
 

Iratus

[H]ard|Gawd
Joined
Jan 16, 2003
Messages
1,517
I’m enjoying this thread, very aligned with my interests.

feel bad that I was on autopilot when I responded and didn’t realise it was lopoetve asking as I didn’t look at name and now feel bad as my response is patronising given his level of ml knowledge etc as he’s one of the few on here that actually shows he knows what he’s talking about 😂
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
There are plenty of memory benchmarks showcasing the difference between Threadripper and something like the Ryzen 3000 series. However, the additional memory bandwidth really only benefits a small subset of workloads and applications. MSSQL or database type transactions are one of them. The 1st generation Threadripper's top model is the 1950X. It's a 16c/32t CPU just like the Ryzen 9 3950X. However, there is no comparison as the latter is significantly faster.
True, but few of them seem to touch on virtualization, and no one benchmarks nested virt - it's too dependent on a lot of whack stuff that you can't really eyeball. VMs in question right now are a mail server, MSSQLx2, handful of other random things like web servers, two docker container hosts... and soon a whole virtualized network setup, if I can get NSX in there.

Valid point on the 1XXX - forgot that I had the top of the line for the TR1, I was really thinking of the 2000 series for the future.

If you are talking about the Threadripper 1000 or 2000 series, it's limited to PCI-Express 3.0. X570 and the Ryzen 3000 series have PCIe 4.0 support. Therefore, the X570 has more PCIe bandwidth than TR, not less. If you compare Threadripper to the regular Ryzens, the former has more PCIe lanes, not more bandwidth. Their bandwidth is the same when comparison identical generations. The memory bandwidth of Threadripper is the only real advantage you gain if we are limiting this to Threadripper 1000 and 2000. Even so, as I said there are plenty of benchmarks out there showing little difference if any in most workloads. Despite what you are planning to do, I'm not sure you need that.
Thinking mostly of the number of lanes here, correct- EG, if things are forced to share PCIE bandwidth with the chipset... almost every one of these I build has both multiple NVMe drives and multiple SATA drives as well. Some of the boards have the sata ports and/or NVMe slots sharing with the chipset, and looking at the load that the VSA generates (it's constantly shuffling things between the NVMe cache, SSD performance, and SATA capacity tiers), I'm curious if there will be an impact. Not saying there will be, just curious - especially since the last time I DID try this on consumer kit, there most definitely was - but that was also older gear. I did find my notes on doing it on an I-7700K, which ... had the word SUCK written across the top of the page.

HEDT does not bring extra "horsepower" to the table. HEDT has two things over the mainstream segment. Additional memory bandwidth and increased core density. The platform has additional flexibility due to having more PCIe lanes, but those don't improve performance unless you need the expansion capability. In this case, the Threadripper 1950X is the top model for the 1000 series. It has no core density advantage over the Ryzen 9 3950X. The Threadripper 2990WX is the top model for the 2000 series and while it's 32c/64t, it has much lower clocks than the The Ryzen 9 3950X. Core density is there, but memory bandwidth isn't what you'd think it is. While it does support quad-channel RAM, that CPU's NUMA architecture essentially means half the CPU has no direct memory access slowing it down even further. Many people who used that CPU in a professional capacity find out that having half the memory bandwidth of Epyc is a big problem for it.

Threadripper 1000 and 2000 have a different layout architecturally. Without the Infinity Fabric, the cores have massive penalties when crossing CCX boundaries internally. 1st and 2nd generation Ryzens and Threadrippers have high internal latencies and that's what the architectural changes in Zen2 specifically improve upon. The Infinity Fabric and equal access to the I/O die is one way this is done. The second is the massive increase in L3 cache. This helps mask and overcome those issues. Having said all of that, the Threadripper 2970WX or 2990WX may be faster for your needs but it really depends on what you actually need out of these VM's.
And significantly higher cache - 3950 has 64MB of L3, while the 3960 has 128MB, for 1.5x the cores, and for the context switching simulating Ring0 like nested virtualization does, that matters a LOT.

But... as you point out, the 2990 only has 64MB as well, for double the actual cores. Which means it's probably in the same boat as consumer Ryzen would be. So, theory of the cache being important - if I actually started pushing the 3950x or the 1950x like I do my 3960X, both would hit a wall much sooner (even ignoring cores) just from the lack of L3 cache... but that's only a theory at this point, and I'm not likely to ever actually hit them that hard, since the 3960 is my main workstation and those are secondary systems for other tasks.

Valid point on NUMA - both virtualization products I'm using are NUMA aware, and I can either shuffle things to the back side of the CPU (as it were) or not as needed (or let the scheduler handle it), but much like the prior line - that would imply a wall I'd hit sooner for certain on the older generation processors, and an advantage that either of the 3x series would have. The improved IF on Zen2/Zen3 is a significant boost in speed, and memory access does matter - no direct access for any of the cores on the "back side" of the CPU would be problematic for any workload I shuffled back there.

I think the best thing to do here is try- pick up the board and chip and see how it does, knowing if it's not enough, I won't lose much and can switch back over to x399 (or just pick up another TRX4 box) afterwards. And if it is enough, then hey - money saved.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
With respects to running ESXi you map your desired features to white box component matrix.

Your main concern sounds like commodity resources.

I'd want HEDT for nvme drives if I had specific workloads like "I wanna run K8s to learn how to synthetically populate a data lake.". That would be an IO driven endeavor, compute is really an after thought. Networking would be a challenge as you'd want dedicated IOps tiered, as well as clearly defined internal and external networking at a given thru put.

Say you want to add on filesystem, message queue, monitoring at a given depth. You'd want more resources outside of your commodity VMs. #1 is to keep your transactions in flight cached btw an evac event will be a learning event. This used to be nitty gritty HA/FT architecture we didn't touch as often, but is highly relevant as a walk in skillset now. I have recently run different versions of Redis as a key value store to see what performance gains in 6.0 are reality vs PR claims.

Maybe you want to explore synthetic user testing, and want to try diff frameworks vs db performance. That's a worthy uptime experiment that wouldn't require HEDT. Your focus is sql db replication. Spinning rust or basic safe ssds could mirror those read replicas for you. But pass thru of usb device would be a major concern.

You could be using gpu acceleration for Jupyter notebooks, how many would make you hit the hardware matrix.

64gb of ram isn't much, my last lab box was x299 with 128gb of ram.
I didn't have to do much workload orchestration as I was focused on container deployments.
More traditional pre 2015 virtualization with static vms running 24/7 os images should be ok assuming you aren't intentionally running scripts that peg your compute.

As always, it depends on what you are doing, and the scale you want to take it.
Except I'm running nested. Totally agree on whitebox - this is collapsing down 9 different physical ESXi boxes (mix of Dell, Supermicro and ASRock Rack) because ... well, they were all getting older, and dropping 10k on updating all of it wasn't something I was looking forward to, and they're stupidly power hungry too (and even in a storage room, the squirrel fans on the 10G switches I had was getting annoying as @#$%).

It's simulating normal datacenter workloads for training, teaching, giving me something to fiddle with, WTF not, and "hey, that would be handy to have" as well. Mix of storage centric, IO centric, CPU centric, etc workloads - bit of everything. And whatever my friends want to build too :) RAM right now isn't as much a constraint, as I'm running the core stuff (Vcenter/DCs/etc) as tier-1 VMs on the hosts themselves, and the tier 2 workloads are the fun bits. The big fun part will be figuring out how to shoehorn in NSX for some of the things I want to fiddle with there. Just finished a whole write up on Load balancing a set of apps that we were missing docs on.

I’m enjoying this thread, very aligned with my interests.

feel bad that I was on autopilot when I responded and didn’t realise it was lopoetve asking as I didn’t look at name and now feel bad as my response is patronising given his level of ml knowledge etc as he’s one of the few on here that actually shows he knows what he’s talking about 😂
:D No worries. I'm the one that runs WEIRD shit inside of things... I expect that half my threads result in consternation from people reading them on "you want to do what now?" - but I also have access to weird software that no one else does too that I get to play with, so I'm always trying to find creative ways to do things. I was startled at how well this runs right now (seriously, nested VMs generating 4k+ IOPS and 800MB/s of bandwidth? That's average hardware based home lab power - and I'm encoding a video at the same time!) and that got me thinking.
 
Last edited:

Iratus

[H]ard|Gawd
Joined
Jan 16, 2003
Messages
1,517
It’s a game changer. I think generally speaking everything will just become containerised applications of course but being able to do it now, with less hassle and have the clean separation you get in a server context in a home one, it’s amazing. Yay for hashicorp packer.

Used the setup at work for a while as it enabled much cleaner separation and cranked up utilisation without having to keep stuff empty so a 36 hour job could process. Made me happy as I remember building a centralised build farm for 1000 developers 15 years ago and having it be across 3 racks.

Now bringing it to my home setup. My 5.1ghz 7700k and 1080ti that were an awesome PC when I built it, now gets replaced with something with 6 times more processing power and double the speed on cuda. That’s just crazy to me, and makes me sad that it’s taken 6 months as I’ve been so busy.

Of course it’s great when we’re all working at home too, not having to use cloud or centralised gpu compute, and enabling someone to turn off work and play a game at the end of the day. Such an easy justification

Totally understand the weird shit. I’m thinking I’m just gonna push it to the ridiculous just for giggles. Run a billion node graph database, compile some code, run pytorch against imagenet and play crysis at the same time

The only clue that it’s going on being the increasing fan speed and thinking about how I really do need a hvac upgrade as my PC starts trying to broil me.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
30,408
Heh. I got a 3090 for my friend so he could do ML work in his spare time (he does graph/symantec web for work) and be a distraction. Took a heck of a lot of work to get there, but ... fun times.
 
Top