AMD possibly going to 4 threads per core

Will be interesting as their SMT scales much better than Intel right now

Do you have a few comprehensive resources for this point? I've seen it repeated and I'm a bit curious.

wonder if it's diminishing returns or useful. Time will tell, but it makes some sense as cores get more and more features that aren't being utilized.

Specific SMT setups where the OS and hardware are developed closely in concert show linear scaling, but that's not what we're talking about here- it's just the 'best case' scenario. IBM with their own operating systems (UNIX-based) with their Power CPU line is a strong example here, and a weaker example would be Apple developing IOS, their flavor of ARM CPUs, and enforcing strict application guidelines. Not really SMT related to my knowledge, but that's the level of hardware - operating system - software coupling that's needed.

With respect to x86 as the base instruction set and Windows and Linux as the predominant operating systems, the decision point between 'wider' cores with more resources and SMT versus just adding more cores is more difficult.

If AMD wants to pile more threads on to each core, cool, but the big problem with SMT happens when a core has multiple threads fighting over exhausted resources causing overall execution speed to decline. That's where we don't want to end up.
 
the intel phi cpus I run have quad way hyperthreading and it appears to work fairly well (more so under linux). however I pretty much exclusively use them for compute work.
 
Do you have a few comprehensive resources for this point? I've seen it repeated and I'm a bit curious.



Specific SMT setups where the OS and hardware are developed closely in concert show linear scaling, but that's not what we're talking about here- it's just the 'best case' scenario. IBM with their own operating systems (UNIX-based) with their Power CPU line is a strong example here, and a weaker example would be Apple developing IOS, their flavor of ARM CPUs, and enforcing strict application guidelines. Not really SMT related to my knowledge, but that's the level of hardware - operating system - software coupling that's needed.

With respect to x86 as the base instruction set and Windows and Linux as the predominant operating systems, the decision point between 'wider' cores with more resources and SMT versus just adding more cores is more difficult.

If AMD wants to pile more threads on to each core, cool, but the big problem with SMT happens when a core has multiple threads fighting over exhausted resources causing overall execution speed to decline. That's where we don't want to end up.
Well if.you turn on Intel hyper threading you're pwned so....
 
I've messed around with some Power9's at work, for what I was using them for SMT4 seemed to work well, though ditto above as was mostly for compute work for simulations.
 
i'm less enthusiastic about threads exposed to the user, since i'm not rocking a data center, than I would be about some sort of advancement in the area of cpu's leveraging it's huge L3 cache to opportunistically optimize code blocks for autoparallization and vectorization at the assembly level. Then compilers and interpreters could set flags on functions to enable/disable. It would be cool to have a cpu that you didn't need to hope that whoever compiled a given binary compiled it to specifically target your cpu in order to get the most out of it. - or have to lug around copies of the same functions compiled for every single cpu target so that the particular one you need can be used at runtime.
 
It would be cool to have a cpu that you didn't need to hope that whoever compiled a given binary compiled it to specifically target your cpu in order to get the most out of it. - or have to lug around copies of the same functions compiled for every single cpu target so that the particular one you need can be used at runtime.

What we're seeing today is code that is written in whatever language being compiled into a form of byte code a la Java, and then being either precompiled at some distribution stage, including on the client machine (Android does this), or being run with just-in-time compilation, in all cases with client hardware being targeted more specifically over time.

Hardware-specific optimizations are going to occur in those compiler stages, and what we're really getting away from is pre-compiled generic binary executables.

Well if.you turn on Intel hyper threading you're pwned so....

It's an attack vector that does need to be considered, obviously, among many. It's not an 'instant own' however.
 
don't make me use "let me google that for you" lol!

You don't need to- I'm quite aware as most are that Cinebench is a decent representation of a latency-insensitive float- but not SIMD-focused benchmark.

But as such, it does not at all represent 'SMT scaling much better', especially as SMT for desktop applications generally targets the utilization of different types of execution units within a CPU core.

So, if you're going to add more SMT capacity per core while also adding more of the same types of execution units to each core, why not just add more cores?
 
You don't need to- I'm quite aware as most are that Cinebench is a decent representation of a latency-insensitive float- but not SIMD-focused benchmark.

But as such, it does not at all represent 'SMT scaling much better', especially as SMT for desktop applications generally targets the utilization of different types of execution units within a CPU core.

So, if you're going to add more SMT capacity per core while also adding more of the same types of execution units to each core, why not just add more cores?

Isn't the reason because IPC?
Also one of the reddit links has some really good info about SMT:

Also in basically any benchmark from games to server apps that compare Intel and AMD with 4/4 and 4/8 the AMD one scales better 90% of the time thus the saying that "AMD's SMT scales better than Intel's HT"
Take your blindfolds off...
 
Isn't the reason because IPC?

If that's the case, since Zen is slower than Skylake, AMDs scaling in SMT should also be worse. Since the assertion is being made that AMDs SMT scaling is better, IPC doesn't seem to be a contributer on those grounds, nor does it logically seem direclty linked.

Also one of the reddit links has some really good info about SMT:

There's nothing in that thread that shows why AMDs SMT scaling is 'better'. There are points made toward it being different, but that's going to be application dependent, and on average, Zen is slower than Skylake per core.
 
If AMD wants to pile more threads on to each core, cool, but the big problem with SMT happens when a core has multiple threads fighting over exhausted resources causing overall execution speed to decline. That's where we don't want to end up.

Pretty they know what they are doing.
 
If that's the case, since Zen is slower than Skylake, AMDs scaling in SMT should also be worse. Since the assertion is being made that AMDs SMT scaling is better, IPC doesn't seem to be a contributer on those grounds, nor does it logically seem direclty linked.



There's nothing in that thread that shows why AMDs SMT scaling is 'better'. There are points made toward it being different, but that's going to be application dependent, and on average, Zen is slower than Skylake per core.

Well I thought we were talking about the difference between the benefits that SMT gives vs what HT gives.
If CPU A is 1/10 the speed of CPU B, and SMT adds 50% perf to CPU A, but only adds 20% perf to CPU B, then SMT for CPU A scales better hands down.

Also there are some cases where Zen+ has EQUAL IPC to Skylake, it just won't ever be games...
 
Well I thought we were talking about the difference between the benefits that SMT gives vs what HT gives.

HT is Intel's brand of SMT.

If CPU A is 1/10 the speed of CPU B, and SMT adds 50% perf to CPU A, but only adds 20% perf to CPU B, then SMT for CPU A scales better hands down.

Agreed on the basic point of comparison- what I'm asking for is for a comprehensive proof that shows that AMDs implementation scales 'better' than Intels. Or vice versa.

Also there are some cases where Zen+ has EQUAL IPC to Skylake, it just won't ever be games...

It really, really depends on the specific workload - not just the specific application, but specifically how it's used. Games could absolutely be faster on Zen if they were so tuned, but most engines were designed before Zen was known to exist, and when IPC and clockspeed were assumed to continue to advance.
 
That'd be a nope.

See for yourself:
https://www.thefpsreview.com/2019/07/07/amd-ryzen-9-3900x-cpu-review/9/
Written by Dan D himself which i would trust more than you.

What is seen here is the 3900x has higher IPC than the 9900k in most of the tests that he runs(9 out of 12 I think)
The only ones it loses are games which seems to be Ryzen 3000's weakness.
This would lead one to believe that Zen 2 has higher IPC than the 9900k.
There is more to determining IPC than just games....
 
Last edited:
HT is Intel's brand of SMT.
I thought that was a given...

It really, really depends on the specific workload - not just the specific application, but specifically how it's used. Games could absolutely be faster on Zen if they were so tuned, but most engines were designed before Zen was known to exist, and when IPC and clockspeed were assumed to continue to advance.
Hence the "some cases" in my post.
 
See for yourself:
https://www.thefpsreview.com/2019/07/07/amd-ryzen-9-3900x-cpu-review/9/
Written by Dan D himself

What is seen here is the 3900x has higher IPC than the 9900k in most of the tests (9 out of 12 I think)

Second paragraph:

"This has the unfortunate result of leaving our simulated eight core Ryzen with more L3 cache than it would ordinarily have. Therefore, the topology of the simulated eight core Ryzen 3000 series CPU, isn’t quite right."

The only ones it loses are games which seems to be Ryzen 3000's weakness.

Games are representative of complex, highly-serial workloads. They also represent the most 'intense' work that consumers typically do, and typically the most intense work that's actually time sensitive.

I thought that was a given...

Calling one brand's SMT 'SMT' and other's SMT 'HT' would lead one to believe otherwise.
 
For virtualizing I just need core counts and ram, speed is completely irrelivant 99% of the time. so a 32/128 would be a wet dream for me, or god forbid a 64/256...... Sploosh
 
Second paragraph:

"This has the unfortunate result of leaving our simulated eight core Ryzen with more L3 cache than it would ordinarily have. Therefore, the topology of the simulated eight core Ryzen 3000 series CPU, isn’t quite right."

How does this even relate??? you want to compare a $300 CPU to a $500 CPU????
Calling one brand's SMT 'SMT' and other's SMT 'HT' would lead one to believe otherwise.

Dude. I call it what Intel and AMD each call it.
Intel calls it Hyperthreading (HT)
AMD calls it SMT
We all know its SMT in the end.
not sure what the point of this part was?????????

seems you very much like nitpicking and pretend you/I don't understand some things
 
How does this even relate??? you want to compare a $300 CPU to a $500 CPU????

No, I want to compare IPC.

Dude. I call it what Intel and AMD each call it.
Intel calls it Hyperthreading (HT)
AMD calls it SMT
We all know its SMT in the end.
not sure what the point of this part was?????????

It wasn't clear that you knew, as your wording called that into question.
 
A 300 cpu and 500 cpu?

A 70,000 dollar corvette can lay waste to a 230,000 Ferrari so what's your point?

I can bag cat shit at 1 dollar and dog shit at 5000 dollars. In the end they are both bags of shit.

Price doesn't reflect performance as AMD has clearly obliterated Intels Farce of pricing lack luster vulnerable shit cpus at God awful high prices.
 
The benefits of this outside of pure compute code would be hard to fathom. I've also read that it's harder to get efficient decode scaling out of x86, which means more cores is probably the way forward to more SMP performance.

Easier to sell when you're re running Power, or you're Sun and you've lost all other options, and are willing to sell your soul getting into a niche for the rest of your existence (UltraSparc T1)
 
Last edited:
Shh don't tell that to the Intel fanboys. They will tell you security doesn't matter.

I wonder if you don't give a crap about AV, ya probably are too "pro" to care about mitigations either. I wonder if those two are mutually exclusive?
 
No, I want to compare IPC.
.

Ok then...
I would call it advantage AMD then...
Having more cache than a 3700x doesn't make it any more unfair. Compare AMD's best to Intel's best (In mainstream that is)
You don't need to artificially gimp AMD's CPU to have less cache than normal to have a "fair" comparison to Intel's 9900k when you're only measuring IPC.
 
Last edited:
A 300 cpu and 500 cpu?

A 70,000 dollar corvette can lay waste to a 230,000 Ferrari so what's your point?

The difference here is you can say "I have a Ferrari!"

That sounds waaay cooler than saying "I have an Intel CPU!"
People be like "wOw ThAts sO cOoL!" lol
 
The benefits of this outside of pure compute code would be hard to fathom. I've also read that it's harder to get efficient decode scaling out of x86, which means more cores is probably the way forward to more SMP performance.

And the case for compute acceleration on CPUs going forward is dwindling. CPUs excel at chewing through branching code, but putting extra 'compute' on them isn't going to work out any better than it did for AMDs APUs. A certain amount of compute capacity is needed for stuff that cannot be easily run on other compute-focused hardware or for stuff that needs a small amount of compute done at lower latencies, but heavy compute needs to be focused on GPUs.

The idea of expanding every core to handle more compute is frankly a bit silly. Compute on CPUs has seen a massive performance boost through the use of SIMD units like SSE and AVX (and 3DNow!), and these represent a tiny fraction of what a decent GPU can accomplish in terms of compute throughput.
 
SMT4 will have some real advantage in the server space.... Not so much for desktops. Servers being able to dedicate a couple physical cores to a VM while maintaining high thread count will be a big advantage. Where as desktop users are already looking at pretty minor uplifts from SMT2.

Still when Intel does drop a 7nm 3D Chiplet part next year. AMD having SMT 4 parts so they can market a Ryzen 4700x 8 core 32 thread part for $329 vs a $500+ Intel 8/16 part will be quite hilarious. I hope the rumor is true and AMD has it ready to drop the day Intel announces 7nm desktop parts.
 
AMD having SMT 4 parts so they can market a Ryzen 4700x 8 core 32 thread part for $329 vs a $500+ Intel 8/16 part will be quite hilarious.

About as hilarious as the lawsuit they just lost for marketing Bulldozer quad-cores as eight-core CPUs, when they were slower than Intel's quad-core CPUs?

I thought that was hilarious the entire time, after I got over AMD throwing away their good architecture and consigning themselves to irrelevance in the CPU space for a decade ;)
 
Back
Top