AMD possibly going to 4 threads per core

Ready4Dis · Sep 26, 2019

https://wccftech.com/rumor-amd-zen-...t-up-to-4-threads-per-core-with-smt4-feature/

Will be interesting as their SMT scales much better than Intel right now, wonder if it's diminishing returns or useful. Time will tell, but it makes some sense as cores get more and more features that aren't being utilized.

IdiotInCharge · Sep 26, 2019

Ready4Dis said:
Will be interesting as their SMT scales much better than Intel right now

Do you have a few comprehensive resources for this point? I've seen it repeated and I'm a bit curious.

Ready4Dis said:
wonder if it's diminishing returns or useful. Time will tell, but it makes some sense as cores get more and more features that aren't being utilized.

Specific SMT setups where the OS and hardware are developed closely in concert show linear scaling, but that's not what we're talking about here- it's just the 'best case' scenario. IBM with their own operating systems (UNIX-based) with their Power CPU line is a strong example here, and a weaker example would be Apple developing IOS, their flavor of ARM CPUs, and enforcing strict application guidelines. Not really SMT related to my knowledge, but that's the level of hardware - operating system - software coupling that's needed.

With respect to x86 as the base instruction set and Windows and Linux as the predominant operating systems, the decision point between 'wider' cores with more resources and SMT versus just adding more cores is more difficult.

If AMD wants to pile more threads on to each core, cool, but the big problem with SMT happens when a core has multiple threads fighting over exhausted resources causing overall execution speed to decline. That's where we don't want to end up.

cdabc123 · Sep 26, 2019

the intel phi cpus I run have quad way hyperthreading and it appears to work fairly well (more so under linux). however I pretty much exclusively use them for compute work.

Master_shake_ · Sep 26, 2019

IdiotInCharge said:
Do you have a few comprehensive resources for this point? I've seen it repeated and I'm a bit curious.

Specific SMT setups where the OS and hardware are developed closely in concert show linear scaling, but that's not what we're talking about here- it's just the 'best case' scenario. IBM with their own operating systems (UNIX-based) with their Power CPU line is a strong example here, and a weaker example would be Apple developing IOS, their flavor of ARM CPUs, and enforcing strict application guidelines. Not really SMT related to my knowledge, but that's the level of hardware - operating system - software coupling that's needed.

With respect to x86 as the base instruction set and Windows and Linux as the predominant operating systems, the decision point between 'wider' cores with more resources and SMT versus just adding more cores is more difficult.

If AMD wants to pile more threads on to each core, cool, but the big problem with SMT happens when a core has multiple threads fighting over exhausted resources causing overall execution speed to decline. That's where we don't want to end up.

Well if.you turn on Intel hyper threading you're pwned so....

Balkroth · Sep 26, 2019

I've messed around with some Power9's at work, for what I was using them for SMT4 seemed to work well, though ditto above as was mostly for compute work for simulations.

Darth Ender · Sep 26, 2019

i'm less enthusiastic about threads exposed to the user, since i'm not rocking a data center, than I would be about some sort of advancement in the area of cpu's leveraging it's huge L3 cache to opportunistically optimize code blocks for autoparallization and vectorization at the assembly level. Then compilers and interpreters could set flags on functions to enable/disable. It would be cool to have a cpu that you didn't need to hope that whoever compiled a given binary compiled it to specifically target your cpu in order to get the most out of it. - or have to lug around copies of the same functions compiled for every single cpu target so that the particular one you need can be used at runtime.

IdiotInCharge · Sep 26, 2019

Darth Ender said:
It would be cool to have a cpu that you didn't need to hope that whoever compiled a given binary compiled it to specifically target your cpu in order to get the most out of it. - or have to lug around copies of the same functions compiled for every single cpu target so that the particular one you need can be used at runtime.

What we're seeing today is code that is written in whatever language being compiled into a form of byte code a la Java, and then being either precompiled at some distribution stage, including on the client machine (Android does this), or being run with just-in-time compilation, in all cases with client hardware being targeted more specifically over time.

Hardware-specific optimizations are going to occur in those compiler stages, and what we're really getting away from is pre-compiled generic binary executables.

Master_shake_ said:
Well if.you turn on Intel hyper threading you're pwned so....

It's an attack vector that does need to be considered, obviously, among many. It's not an 'instant own' however.

Rockenrooster · Sep 26, 2019

IdiotInCharge said:
Do you have a few comprehensive resources for this point? I've seen it repeated and I'm a bit curious.

Cinebench highlights this pretty good. Also very easy to find...

IdiotInCharge · Sep 26, 2019

Rockenrooster said:
Cinebench highlights this pretty good. Also very easy to find...

That would be the opposite of comprehensive, thanks

.

Rockenrooster · Sep 26, 2019

IdiotInCharge said:
That would be the opposite of comprehensive, thanks .

don't make me use "let me google that for you" lol!

Rockenrooster · Sep 26, 2019

Sorry couldn't resist:
http://letmegooglethat.com/?q=AMD+SMT+scaling+vs+Intel+HT

IdiotInCharge · Sep 26, 2019

Rockenrooster said:
don't make me use "let me google that for you" lol!

You don't need to- I'm quite aware as most are that Cinebench is a decent representation of a latency-insensitive float- but not SIMD-focused benchmark.

But as such, it does not at all represent 'SMT scaling much better', especially as SMT for desktop applications generally targets the utilization of different types of execution units within a CPU core.

So, if you're going to add more SMT capacity per core while also adding more of the same types of execution units to each core, why not just add more cores?

Rockenrooster · Sep 26, 2019

IdiotInCharge said:
You don't need to- I'm quite aware as most are that Cinebench is a decent representation of a latency-insensitive float- but not SIMD-focused benchmark.

But as such, it does not at all represent 'SMT scaling much better', especially as SMT for desktop applications generally targets the utilization of different types of execution units within a CPU core.

So, if you're going to add more SMT capacity per core while also adding more of the same types of execution units to each core, why not just add more cores?

Isn't the reason because IPC?
Also one of the reddit links has some really good info about SMT:

Also in basically any benchmark from games to server apps that compare Intel and AMD with 4/4 and 4/8 the AMD one scales better 90% of the time thus the saying that "AMD's SMT scales better than Intel's HT"
Take your blindfolds off...

drescherjm · Sep 26, 2019

Isn't the reason because IPC?

I don't think so. Ryzen does not have a large edge in IPC.

And it certainly was behind in IPC 2 years ago when that thread was posted.

IdiotInCharge · Sep 26, 2019

Rockenrooster said:
Isn't the reason because IPC?

If that's the case, since Zen is slower than Skylake, AMDs scaling in SMT should also be worse. Since the assertion is being made that AMDs SMT scaling is better, IPC doesn't seem to be a contributer on those grounds, nor does it logically seem direclty linked.

Rockenrooster said:
Also one of the reddit links has some really good info about SMT:

There's nothing in that thread that shows why AMDs SMT scaling is 'better'. There are points made toward it being different, but that's going to be application dependent, and on average, Zen is slower than Skylake per core.

Rockenrooster · Sep 26, 2019

found this after 2 more seconds of googling:
https://linustechtips.com/main/topic/985591-skylake-vs-zen-vs-zen-htsmt/
old article but highlights the benefits/better scaling of SMT in a few different benchmarks.

thesmokingman · Sep 26, 2019

IdiotInCharge said:
If AMD wants to pile more threads on to each core, cool, but the big problem with SMT happens when a core has multiple threads fighting over exhausted resources causing overall execution speed to decline. That's where we don't want to end up.

Pretty they know what they are doing.

Rockenrooster · Sep 26, 2019

drescherjm said:
I don't think so. Ryzen does not have a large edge in IPC.

And it certainly was behind in IPC 2 years ago when that thread was posted.

Currently Ryzen 3000 has better IPC than anything Intel has. The percentage difference depends on the type of application. Games have a smaller difference, others have a much larger difference.

IdiotInCharge · Sep 26, 2019

thesmokingman said:
Pretty they know what they are doing.

For the server market, they quite likely do.

For desktops with far less predictable workloads? I see more segmentation coming

.

IdiotInCharge · Sep 26, 2019

Rockenrooster said:
Currently Ryzen 3000 has better IPC than anything Intel has. The percentage difference depends on the type of application. Games have a smaller difference, others have a much larger difference.

That'd be a nope.

Rockenrooster · Sep 26, 2019

IdiotInCharge said:
If that's the case, since Zen is slower than Skylake, AMDs scaling in SMT should also be worse. Since the assertion is being made that AMDs SMT scaling is better, IPC doesn't seem to be a contributer on those grounds, nor does it logically seem direclty linked.

There's nothing in that thread that shows why AMDs SMT scaling is 'better'. There are points made toward it being different, but that's going to be application dependent, and on average, Zen is slower than Skylake per core.

Well I thought we were talking about the difference between the benefits that SMT gives vs what HT gives.
If CPU A is 1/10 the speed of CPU B, and SMT adds 50% perf to CPU A, but only adds 20% perf to CPU B, then SMT for CPU A scales better hands down.

Also there are some cases where Zen+ has EQUAL IPC to Skylake, it just won't ever be games...

IdiotInCharge · Sep 26, 2019

Rockenrooster said:
Well I thought we were talking about the difference between the benefits that SMT gives vs what HT gives.

HT is Intel's brand of SMT.

Rockenrooster said:
If CPU A is 1/10 the speed of CPU B, and SMT adds 50% perf to CPU A, but only adds 20% perf to CPU B, then SMT for CPU A scales better hands down.

Agreed on the basic point of comparison- what I'm asking for is for a comprehensive proof that shows that AMDs implementation scales 'better' than Intels. Or vice versa.

Rockenrooster said:
Also there are some cases where Zen+ has EQUAL IPC to Skylake, it just won't ever be games...

It really, really depends on the specific workload - not just the specific application, but specifically how it's used. Games could absolutely be faster on Zen if they were so tuned, but most engines were designed before Zen was known to exist, and when IPC and clockspeed were assumed to continue to advance.

Rockenrooster · Sep 26, 2019

IdiotInCharge said:
That'd be a nope.

See for yourself:
https://www.thefpsreview.com/2019/07/07/amd-ryzen-9-3900x-cpu-review/9/
Written by Dan D himself which i would trust more than you.

What is seen here is the 3900x has higher IPC than the 9900k in most of the tests that he runs(9 out of 12 I think)
The only ones it loses are games which seems to be Ryzen 3000's weakness.
This would lead one to believe that Zen 2 has higher IPC than the 9900k.
There is more to determining IPC than just games....

Rockenrooster · Sep 26, 2019

IdiotInCharge said:
HT is Intel's brand of SMT.

I thought that was a given...

IdiotInCharge said:
It really, really depends on the specific workload - not just the specific application, but specifically how it's used. Games could absolutely be faster on Zen if they were so tuned, but most engines were designed before Zen was known to exist, and when IPC and clockspeed were assumed to continue to advance.

Hence the "some cases" in my post.

IdiotInCharge · Sep 26, 2019

Rockenrooster said:
See for yourself:
https://www.thefpsreview.com/2019/07/07/amd-ryzen-9-3900x-cpu-review/9/
Written by Dan D himself

What is seen here is the 3900x has higher IPC than the 9900k in most of the tests (9 out of 12 I think)

Second paragraph:

"This has the unfortunate result of leaving our simulated eight core Ryzen with more L3 cache than it would ordinarily have. Therefore, the topology of the simulated eight core Ryzen 3000 series CPU, isn’t quite right."

Rockenrooster said:
The only ones it loses are games which seems to be Ryzen 3000's weakness.

Games are representative of complex, highly-serial workloads. They also represent the most 'intense' work that consumers typically do, and typically the most intense work that's actually time sensitive.

Rockenrooster said:
I thought that was a given...

Calling one brand's SMT 'SMT' and other's SMT 'HT' would lead one to believe otherwise.

Lakados · Sep 26, 2019

For virtualizing I just need core counts and ram, speed is completely irrelivant 99% of the time. so a 32/128 would be a wet dream for me, or god forbid a 64/256...... Sploosh

Rockenrooster · Sep 26, 2019

IdiotInCharge said:
Second paragraph:

"This has the unfortunate result of leaving our simulated eight core Ryzen with more L3 cache than it would ordinarily have. Therefore, the topology of the simulated eight core Ryzen 3000 series CPU, isn’t quite right."

How does this even relate??? you want to compare a $300 CPU to a $500 CPU????

IdiotInCharge said:
Calling one brand's SMT 'SMT' and other's SMT 'HT' would lead one to believe otherwise.

Dude. I call it what Intel and AMD each call it.
Intel calls it Hyperthreading (HT)
AMD calls it SMT
We all know its SMT in the end.
not sure what the point of this part was?????????

seems you very much like nitpicking and pretend you/I don't understand some things

IdiotInCharge · Sep 26, 2019

Rockenrooster said:
How does this even relate??? you want to compare a $300 CPU to a $500 CPU????

No, I want to compare IPC.

Rockenrooster said:
Dude. I call it what Intel and AMD each call it.
Intel calls it Hyperthreading (HT)
AMD calls it SMT
We all know its SMT in the end.
not sure what the point of this part was?????????

It wasn't clear that you knew, as your wording called that into question.

tangoseal · Sep 26, 2019

Master_shake_ said:
Well if.you turn on Intel hyper threading you're pwned so....

Yeah by like 47 different pwnage security holes lol

Maybe 5 more will be found next week

tangoseal · Sep 26, 2019

A 300 cpu and 500 cpu?

A 70,000 dollar corvette can lay waste to a 230,000 Ferrari so what's your point?

I can bag cat shit at 1 dollar and dog shit at 5000 dollars. In the end they are both bags of shit.

Price doesn't reflect performance as AMD has clearly obliterated Intels Farce of pricing lack luster vulnerable shit cpus at God awful high prices.

defaultluser · Sep 26, 2019

The benefits of this outside of pure compute code would be hard to fathom. I've also read that it's harder to get efficient decode scaling out of x86, which means more cores is probably the way forward to more SMP performance.

Easier to sell when you're re running Power, or you're Sun and you've lost all other options, and are willing to sell your soul getting into a niche for the rest of your existence (UltraSparc T1)

Rvenger · Sep 26, 2019

tangoseal said:
Yeah by like 47 different pwnage security holes lol

Maybe 5 more will be found next week

Shh don't tell that to the Intel fanboys. They will tell you security doesn't matter.

thesmokingman · Sep 26, 2019

Rvenger said:
Shh don't tell that to the Intel fanboys. They will tell you security doesn't matter.

I wonder if you don't give a crap about AV, ya probably are too "pro" to care about mitigations either. I wonder if those two are mutually exclusive?

Rvenger · Sep 26, 2019

thesmokingman said:
I wonder if you don't give a crap about AV, ya probably are too "pro" to care about mitigations either. I wonder if those two are mutually exclusive?

Only takes one time.

Rockenrooster · Sep 26, 2019

IdiotInCharge said:
No, I want to compare IPC.
.

Ok then...
I would call it advantage AMD then...
Having more cache than a 3700x doesn't make it any more unfair. Compare AMD's best to Intel's best (In mainstream that is)
You don't need to artificially gimp AMD's CPU to have less cache than normal to have a "fair" comparison to Intel's 9900k when you're only measuring IPC.

Rockenrooster · Sep 26, 2019

tangoseal said:
A 300 cpu and 500 cpu?

A 70,000 dollar corvette can lay waste to a 230,000 Ferrari so what's your point?

The difference here is you can say "I have a Ferrari!"

That sounds waaay cooler than saying "I have an Intel CPU!"
People be like "wOw ThAts sO cOoL!" lol

IdiotInCharge · Sep 26, 2019

Rockenrooster said:
You don't need to artificially gimp AMD's CPU to have less cache than normal to have a "fair" comparison to Intel's 9900k when you're only measuring IPC.

Cache affects IPC measurements; IPC cannot be measured irrespective of cache.

IdiotInCharge · Sep 26, 2019

defaultluser said:
The benefits of this outside of pure compute code would be hard to fathom. I've also read that it's harder to get efficient decode scaling out of x86, which means more cores is probably the way forward to more SMP performance.

And the case for compute acceleration on CPUs going forward is dwindling. CPUs excel at chewing through branching code, but putting extra 'compute' on them isn't going to work out any better than it did for AMDs APUs. A certain amount of compute capacity is needed for stuff that cannot be easily run on other compute-focused hardware or for stuff that needs a small amount of compute done at lower latencies, but heavy compute needs to be focused on GPUs.

The idea of expanding every core to handle more compute is frankly a bit silly. Compute on CPUs has seen a massive performance boost through the use of SIMD units like SSE and AVX (and 3DNow!), and these represent a tiny fraction of what a decent GPU can accomplish in terms of compute throughput.

ChadD · Sep 26, 2019

SMT4 will have some real advantage in the server space.... Not so much for desktops. Servers being able to dedicate a couple physical cores to a VM while maintaining high thread count will be a big advantage. Where as desktop users are already looking at pretty minor uplifts from SMT2.

Still when Intel does drop a 7nm 3D Chiplet part next year. AMD having SMT 4 parts so they can market a Ryzen 4700x 8 core 32 thread part for $329 vs a $500+ Intel 8/16 part will be quite hilarious. I hope the rumor is true and AMD has it ready to drop the day Intel announces 7nm desktop parts.

IdiotInCharge · Sep 26, 2019

ChadD said:
AMD having SMT 4 parts so they can market a Ryzen 4700x 8 core 32 thread part for $329 vs a $500+ Intel 8/16 part will be quite hilarious.

About as hilarious as the lawsuit they just lost for marketing Bulldozer quad-cores as eight-core CPUs, when they were slower than Intel's quad-core CPUs?

I thought that was hilarious the entire time, after I got over AMD throwing away their good architecture and consigning themselves to irrelevance in the CPU space for a decade

AMD possibly going to 4 threads per core

2[H]4U

NVIDIA SHILL

Supreme [H]ardness

Fully [H]

Gawd

Gawd

NVIDIA SHILL

Gawd

NVIDIA SHILL

Gawd

Gawd

NVIDIA SHILL

Gawd

[H]F Junkie

NVIDIA SHILL

Gawd

Supreme [H]ardness

Gawd

NVIDIA SHILL

NVIDIA SHILL

Gawd

NVIDIA SHILL

Gawd

Gawd

NVIDIA SHILL

[H]F Junkie

Gawd

NVIDIA SHILL

[H]F Junkie

[H]F Junkie

[H]F Junkie

2[H]4U

Supreme [H]ardness

2[H]4U

Gawd

Gawd

NVIDIA SHILL

NVIDIA SHILL

[H]F Junkie

NVIDIA SHILL