A couple of questions on CPU threads and clock rate

Coolio

Weaksauce
Joined
Jan 8, 2021
Messages
118
Hi guys,

I did a research, but several points are not clear, could you please kindly advice on them?

  1. Threads are good for background tasks (file download, running antivirus, etc.) - this is clear. However, if I'm not mistaken, software which is not specifically optimized for threads works more efficiently when it can utilize a standalone CPU core, which arises the dilemma: cores or threads are my choice? Which is better for the following scenario: office/gaming = 50/50, no CPU upgrade in the next 5 years?
  2. How widespread (if at all) is the trend of adapting apps to CPU multi-thread technology and which software (office/games) is more likely to follow this trend in the next 3-5 years?
  3. The higher the CPU clock rate is, the more operations per second are done - clear. I've heard this logic is much more valid for apps, which are not optimized for multi-thread work (i.e. "single-thread apps"). If this is true, am I right assuming that office apps won't work much faster on 4Ghz CPU vs. a 3.5Ghz one, while games (majority of which is not optimized for multiple-threads yet) will show noticeably better results (with an appropriate GPU/RAM on board, of course)? Is that correct?

Thank you!
 
You're putting more thought into this than is necessary.

Firstly, I assume what you're mostly confused about is SMT or hyperthreading (Intel's name for SMT) threads. In the absence of HT or SMT, then each CPU core on a system can process one thread. SMT is enabled by a little bit of extra silicon on the CPU, and allows each core to process two threads concurrently. With that said, depending on what instructions are contained within the threads, the ability to actually execute the two threads *concurrently* is not guaranteed - sometimes the CPU can, sometimes it cannot. As a very rough example, CPUs contain separate logic for handling integer operations (2+2) versus floating point operations (2.756*6.521), and so if it was presented with two threads one doing integer ops and the other doing something floating point, it might actually be able to execute those two threads simultaneously because they aren't competing for the same on-chip resources. If both threads contained integer operations, then they would be vying for the same limited resources (the integer logic) and would likely be executed in series rather than in parallel. For this reason, while enabling SMT adds +100% to the potential thread count on a CPU, the actual positive performance impact is far more limited at between 0% and +15%.

Threads are good for background tasks (file download, running antivirus, etc.) - this is clear. However, if I'm not mistaken, software which is not specifically optimized for threads works more efficiently when it can utilize a standalone CPU core, which arises the dilemma: cores or threads are my choice? Which is better for the following scenario: office/gaming = 50/50, no CPU upgrade in the next 5 years?
Real physical cores are better than SMT threads. If, as an example, you had a choice between a 6-core CPU with SMT (so 12 Threads) and an 8-core CPU without SMT (so 8 threads), then most of the time you would be better off with the 8-core CPU. The overall thread count would be less, but with all the threads being backed by a real physical core there are no contention issues to worry about.

With that said, it's *almost* not a question anymore. Almost all of AMD's CPUs are fully SMT enabled, and all of the higher-end Intel ones are as well. If you buy a new CPU today, you'll probably get one with SMT, so you likely won't be comparing a 6C/12T CPU vs an 8C/8T one; both vendors will offer the larger thread counts.

How widespread (if at all) is the trend of adapting apps to CPU multi-thread technology and which software (office/games) is more likely to follow this trend in the next 3-5 years?
Both gaming and non-gaming workloads are rapidly adapting to multi-threading. Ever since the PS4 and XB1 generation of consoles, games have been developed with the idea of multiple x86 CPU cores, and the Jaguar cores on the PS4 and XB1 were so weak that games were forced to use multiple cores on those devices to get anywhere near acceptable performance. The new consoles are based on 8-core Zen2 (Ryzen 3000) generation CPUs, so expect future games to start stressing PC CPUs much more heavily.


The higher the CPU clock rate is, the more operations per second are done - clear. I've heard this logic is much more valid for apps, which are not optimized for multi-thread work (i.e. "single-thread apps"). If this is true, am I right assuming that office apps won't work much faster on 4Ghz CPU vs. a 3.5Ghz one, while games (majority of which is not optimized for multiple-threads yet) will show noticeably better results (with an appropriate GPU/RAM on board, of course)? Is that correct?
For *most* users, single-threaded performance is really no longer an issue. In addition, it's not just clockspeed - the Ryzen 3000 series ran at very similar clockspeeds to the Ryzen 2000 series, and yet the 3000 series is markedly faster. This is because the 3000 series gets 'more done' per clock than the 2000 series - this is called IPC or instructions per clock. The new 5000 series chips manage to clock slightly faster than the 3000 series before it, but the vast bulk of their performance improvement over the 3000 comes from improving their IPC. If you ran a Ryzen 2000, 3000, and 5000 side-by-side with equal core counts and locked to the same 3 GHz speed, then there would be clear performance divisors between the three CPUs. Intel CPUs have had much smaller IPC improvements for the last several generations- to the point where chips like the i7 6700k and i7 7700k were essentially indistinguishable from each other from a performance perspective. Their new Rocket Lake CPUs they just announced are supposed to have a big IPC improvement, but they aren't actually out yet so time will tell.

Lastly, games *are* starting to use lots of cores. Older games are still largely gated by single threaded performance, but that's not a particularly relevant factor in making a purchasing decision since the single threaded performance of the current top-end CPUs are fairly similar, and *all* of them are plenty to play older games.
 
Last edited:
What is it you want to achieve? Apart from not upgrading for 5 years?

i built the below to last me 10 years with the odd upgrade here and there ;) faster cpu, more cores, faster GPU, more ram, bigger and faster hard drive(s) but other then that keep it as is
 
Hmm I don’t agree with the earlier post re 8 real cores vs 6 cores/12 threads. In most productivity tasks the latter is faster (see 8700k vs 9700k benchmarks at the same clock speed, https://www.gamersnexus.net/hwreviews/3421-intel-i7-9700k-review-benchmark-vs-8700k-and-more ) note most benchmarks there are games, but if you look up other reviews, the same thing plays out.

more tests here: https://www.aeco.space/en/blog/no-hyper-threading-vs-hyper-threading

8c/16t is usually in the order of +20-25% faster than 8c/8t in most software. AMD cores are more sensitive to SMT than intel too.

As for software being able to use multiple threads, most software is multithreaded to some extent. Games are generally reasonably well threaded (4-6 threads minimum usually, sometimes up to 12/16). Office software/content creation? It depends but most is.
 
Last edited:
You are asking a question that has far more layers and complexity than defining what a thread is. I am simplifying as well as I can.

Hardware
Technically speaking, in the beginning, you are having a bit of code that is "In Order" and trying to make the most of it with a "Scalar" processing. This was processing one bit of data at a time-Single Instruction Single Data. The hardware of the processor started to increase in complexity with buffers, queues, and caches. Other instructions and memory operations allowed the CPU to increase its "scaling" to then become "Superscalar." Another technique to gain more "Parallelism" is to process in "Pipelines." Not only did this allow the processor to increase in clock speed, it allowed more execution parallelism. A very win-win situation because this goes hand-in-hand. Around this time that "In Order" processing was turned into a hybrid of executing instructions. The front end of processors are "In Order" and the back end of the processor are "Out of Order." Later with SMP or multi-core processing was just a cherry on top. Multi-socket systems, then multi-core, multi-socketed/multi-core system, and finally multi-socketed/multicore array of systems.

To note also, we had multiple processing systems such as having another processing unit on the PCB, soldered or socket. For example, an FPU was something that can be added, or not, on some systems. Along with many other systems such as consoles that had execution ability located elsewhere on the PCB or socketed like a cartridge into a peripheral slot. These were extensions started to become integrated. Still handling the same function(s) as before, but all within one package. This includes Memory Management Unit, Bus Interfaces, or even storage of instruction and data that was the cache to get closer and then integrated. Note, I am not being x86 centric and I am not going to get into architecture such as von Neumann and Harvard. Any of you can do that.....

Software
You have to "manage" the execution and make it all work. A glue to the hardware and function units of the CPU within the system as a whole. The "Operating System" has bits of code to do all this. In the beginning as execution was so "In Order" they tried to make it seem as if multiple execution can happen with that. There were multiple techniques such as "Time Slice" and other techniques that the OS managed to make code execute proficiently on a single core and single data bit of execution. This was "Multi-Tasking." You can, sort of, think as this as a SMT technique due to the operating system not hardware (yet). What is really funny to me is how not many get that the processing is not at all in real time. It is made to seem as real time! As clocks increase and execution of data goes from single to multiple you might start to see the complexity of it all, and just how amazing it is that all of this was thought about!

Both hardware and software
In the hardware queues, buffers, and other techniques allowed "Dependencies" of data to be processed and managed within the execution of the processor. So for instance you can have the in order of execution front end, but once the back end instructions start to fill an out of order Re-OrderBuffer and are sent to units/ports to execute, something may depend on some data that is NOT in the pipeline to execute already. This creates stalls within the pipeline to move data and present that dependency. Now think about if that dependency is on another socket, and within another core! Not only does data need to be handled outside (software) of the processor it needs to be managed inside (hardware). This is also why prefetching and caches are important. This is also where code compiling and kernel execution start to really-really come into play. The philosophies of code execution where to place the emphasis, this all starts to seem very Tron'ish. Were does it go and how many cycles and.........we have the "Kernel" to manage. Or perhaps the Master Control Program-whatever confusing universe you are in now due to reading my shit.

This hardware threading or SMT is just filling the execution of instructions in the caches/buffers/queues of the CPU to prevent the stalls or better utilize the low execution of the units. Which can happen due to prediction, cache-miss, dependencies, and movements of executing instructions within the buffer. If there is a stall it can go with this other_instruction that is ready. The software and the hardware work together to create even a more real'ish in time to execution and parallelism that still DOESN'T actually exist. All of this is very similar to what in Budo is called a randori. This is a practice that a defender takes on multiple opponents. It is translated as "taking chaos." When you have multiple opponents how does one defend oneself? The scene with The Outlaw Josie Wales (sidewalk scene josey wales - YouTube) in which he took on multiple opponents and was later asked about how he managed that is the same. How about an engine block, horse cart, a sports team? We are very SIMD-Single Execution Multiple Data-creatures trying to extract parallelism and do Multiple Instructions Multiple Data, MIMD. I can tell you, everything in the universe is the same as processing. If that does not get you high and stimulate the shit out of then......



Oh, and BTW when you ask about clocks did you know that there is actually a "heartbeat" to a processor? Look it up.....

Blow up your mind....kid. You started the rabbit hole adventure.




TL;DR: shit gets faster because they make the shit that way. Its just faster....where my beer?
 
Last edited by a moderator:
Blow up your mind....kid. You started the rabbit hole adventure.
First of all - thank you all guys for your detailed explanations, which - I'm sure - will guide my decision in a right direction. Yep, the hole seems to be deeper than I expected it to be, but I'm behind Alice and the view is not bad. :)

What is it you want to achieve? Apart from not upgrading for 5 years?
Actually I haven't decided yet - whether I want to leave the system as it is for the next 5 years or probably upgrade it here & there occasionally, being able to sell my parts at a more or less good price. Any of those 2 variants supposes buying smth. above the medium level, so I'm here to learn how to find good balance of price/quality.

AMD cores are more sensitive to SMT than intel too.
With a high probability Ryzen 5000 [Zen 3] will be my choice, so will be great to know what you mean by "sensitivity" here - that AMD tends to "assign" more threads to their cores than Intel does?

If you buy a new CPU today, you'll probably get one with SMT, so you likely won't be comparing a 6C/12T CPU vs an 8C/8T one; both vendors will offer the larger thread counts.
Well, yes - the 6/6 or 8/8 etc. is not even an option these days, so going for higher core number at a first glance seems to be a good rule of a thumb (the more cores - the more threads anyways), but the difference in price of 6/12 vs. 8/16 may be a couple of hundreds (cost of a cool PC case). The excellent explanation of SW/HW interaction given by Shikami is probably too technical for my decision-making, but I'm not looking for easy ways either.

For this reason, while enabling SMT adds +100% to the potential thread count on a CPU, the actual positive performance impact is far more limited at between 0% and +15%.
That's a good point (and smth. new for me!), but I take it as a less "bang for a buck", which sucks by itself, but the overall core/thread logic [that I'm trying to understand] stays the same.

If, as an example, you had a choice between a 6-core CPU with SMT (so 12 Threads) and an 8-core CPU without SMT (so 8 threads), then most of the time you would be better off with the 8-core CPU.

I don’t agree with the earlier post

8c/16t is usually in the order of +20-25% faster than 8c/8t in most software.
Okay, this is exactly my headache these days. We have two opposite opinions. Being a novice I'm not gonna argue here, but could you guys kindly share links to posts or Youtube videos which help to better understand this 8/8 vs. 8/16 case? There are tons of channels and reviews, but being more experienced than me you probably know those trustworthy?

the 3000 series gets 'more done' per clock than the 2000 series - this is called IPC or instructions per clock.
Is IPC equal across all CPUs of the same architecture (e.g. Zen 3)?

expect future games to start stressing PC CPUs much more heavily.

Lastly, games *are* starting to use lots of cores.

Games are generally reasonably well threaded (4-6 threads minimum usually, sometimes up to 12/16).
I'm not pushing my opinion here (I have none so far :) ), but what I've read is kinda "games usually utilize not more than 4 cores, so higher clock rate and 6 cores will increase productivity more than the same clock rare and 8 cores. The role of threads in modern games' increases with every year, but clock rate is still more important". Following this logic and the reality of CPU architecture (more cores = more threads) makes sense to define the range of cores 80% of the games (hey, Pareto!) use. As threads are "hardwired" to cores (no much of a choice here) go for higher possible clock rate within those max. cores you need. Does this logic make sense?
 
My point is that you don't need to worry about thread/core count vs frequency, because with both AMD and Intel the single threaded performance goes up in tandem with core count. AMDs fastest single threaded CPU is the 5950X, though only by a small margin.

All you need to decide is how much money you want to buy and how many cores you want. If you plan to game, I recommend a 6 core or better for any current games, or if you want longevity I recommend a 8 core to keep parity with the new gen of consoles.
 
I did share links.

re AMD vs intel.
AMD’s architecture allows for 5 (conditionally 6) instructions per clock cycle to execute per processor core.

Intel’s, in contrast, allows for 3 and conditionally 4 instructions per clock cycle per core.

What this means is that with standard compilers typically you end up with anywhere between 2-8 instructions per clock cycle being issued and therefore when this number is low, as is typical, you have a large proportion of the core being unused. (white blocks below). Symmetric Multi-Processor means multiple cores, not SMT.

1610924662030.png


SMT provides the ability to use these unused circuits and get more performance provided the application(and/or compiler) is thread aware. It is somewhat trivial to tell the compiler to use multiple threads, but not all software or software paths need to be multithreaded and sometimes they cannot be (eg, sometimes you need to work out what the result of something is before moving on to the next thing)
1610924795841.png


(images blatantly ripped from here)

Visually a direct comparison is below.
1610925015259.png


While this is an old article, it is very relevant to the discussion and explains these concepts in further detail although the images don't work:
https://arstechnica.com/features/2002/10/hyperthreading/
 

Attachments

  • 1610924727439.png
    1610924727439.png
    203.2 KB · Views: 0
Last edited:
  • Like
Reactions: tived
like this
I'm not pushing my opinion here (I have none so far :) ), but what I've read is kinda "games usually utilize not more than 4 cores, so higher clock rate and 6 cores will increase productivity more than the same clock rare and 8 cores. The role of threads in modern games' increases with every year, but clock rate is still more important".
Saying a game doesn't use more than 4 cores is pretty irrelevant. It's all about the threads it uses. CP2077 will use all 16 threads of an 8 core processor happily, but won't run them all at 100% because of thread/process dependencies and frame rates. AC:O(rigins/odyssey) both use 12 threads quite heavily, but won't go much beyond that. I haven't tried valhalla yet. DOOM eternal will use 12 threads at about 60% and sometimes shoot up to 16 when it needs to. Based on my own testing.

In contrast, Starcraft II will typically only use 4-6 threads, and not work them particularly hard. Ori and the will of the wisps is similar (2-4 threads).

Note I am talking processor threads, not actual program/process threads.

If you're doing anything productivity wise (including excel) it will use as many threads as possible.

Naturally if you have fewer threads, then a processor with fewer faster cores will be faster. But in a lot of these applications the reality is that you don't need the extra speed. Ori and the will of the wisps isn't going to play better on faster cores (beyond a certain point)

So in summary: MOST games will use up to 12 threads, and a few top end games will use more if they're there, but you're not going to see a huge jump in performance going beyond 6 cores with SMT for 80% or more games - so more isn't necessary (generally) unless you're using productivity software at this point in time.

What CP2077 has done however is that it has given us a glimpse of the future of computer games in the medium term, that is, AMD 6 core chips are fine, provided SMT is available, without it, they suffer with a fair bit of performance deficit.

None of this takes into account AVX(2) which allows one instruction to shoot out multiple data points. This can be compared to multiple threads but on a much bigger scale (1 instruction can shoot out 8 data points, similar to 8 threads), and each core can do 2 of these at once. Intel's AVX units are slightly better than AMDs at this point in time.
 
Last edited:
It is interesting to see threads, x86 decoding, and whatever bits (no pun) of information being tossed around. However, not many know the kernel and the work that it does with the processor and the system. What about the devices in which manage much of the data themselves and just report with an interrupt. Ah, the interrupt....this always seems to never be mentioned in systems. There is much more to what happens within and then how the system interacts with it.

So for example, you will have an argument that an online game will only use "X" cores. I find these interesting because they do not ever cover over-all system usage. So, that keyboard, mouse, network interface, sound, GPU, and whatever else has to report to the processor (core or other socketed core). The devices can be very independent these days managing much but they still have to say "Hey!." These interrupts (primitive) used to take much of the time of a processor with many cycles wasted. It was reduced and improved with changes made, and even interrupts have received a much necessary upgrade with MSI and MSI-X (Message Signal Interrupts). These are extremely faster and very capable interrupts, that can increase over-all system efficiency.

You move the mouse, interrupts. That key input, interrupts. That packet and there are many coming constantly-sometimes 20K of them, interrupts. When multi-core finally came around there was a noted difference to system interaction. My favorite was when Intel's HEDT came around. They were far too expensive and never seem to be worth the cost due to not replacing the gaming edge of the lower platform. Fucking Intel bullshit scheme, IMO, to wrench out money. For instance, the X58 and i7-900 was the last good Intel architecture, platform, and cost- I digress.

With the HEDT systems people noted smoothness, especially multi-player games. Oh, I wonder why, even though the games were not "multi-core" aware?? Only using 2 cores or such. This was the time that RSS (Receive Side Scaling) and multi-queue NIC's started to come into being (User Space Network Drivers (tum.de)). This is when hardware met with hardware and sync'ed. The packets from the game server and host are "bound" to cores. Even if interrupt process is managed or not (interrupt moderation), a buffers will fill (ring and descriptors/RAM), and then allocate to the cores that are bound for interaction and processing which can be static or dynamic. This alone was a tremendous increase an playability because old networking was bound to one core only-and still kind of is. Soon, network I/O will be vectorized with FD.io (FDio - The Universal Dataplane). Technically speaking most of the higher bandwidth networking cards start to offload so much of the networking stack due to how much it can really take a way from the processor. And! It also shows how much bus utilization it needs due to slot installation, and bus speed. Many/all of the off-the-shelf routers contain logics that offload networking and NAT from the RISC processors because they really cannot handle the IO, even the wireless is offloaded now. Funny thing is many noticed a speed drop when they enable certain features due to the offloading engine not being capable to offload due to compatibility. For example, QoS or monitoring features of the SOHO router.

Are you starting to get my point? Cores, and such sure...there is more to a system than that. The processor, the system it is connected to, and the base speed of the RAM all have interaction with everything. Learning this so you so make a decision is very recommended, but just so you do not have to listen to MoAr! CahOrs! arguments. There is a difference of system interaction with any architecture and the processing. We are in such an interesting state of processing, and I regret to say that we seem to be regressing in its utilization of it. I didn't need much processing power back in the day to enjoy a good game. I just needed a good game.

Anyways, "utilization" is being spread out and trying to spread it is very important so as to not bottleneck execution. Everything was bound to core 0 and not much beyond, but this is changing and cant be felt in system interaction

The Ryzen 3rd gen is an excellent piece of engineering. The Asus Dark Hero, although expensive is an excellent motherboard. AMD assisted Asus with the design of it-it is also passive cooling. Just get some dual rank RAM and not to go above 3733 because that is when you lose your 1:1 ratio.

I'm done, stick a fork in me....need more coffee
 
Last edited by a moderator:
with both AMD and Intel the single threaded performance goes up in tandem with core count. AMDs fastest single threaded CPU is the 5950X
Uhmm... 5950X has 16 cores and 32 threads - why do you call it single-threaded? I'm sure this is just my misunderstanding, but I've always thought "single-threaded" is 6c/6t or 8c/8t, while 5950X is 16c/32t...

Symmetric Multi-Processor means multiple cores, not SMT.
Those pictures you've shared are vizualizing the benefits of SMT persuasively, right. Just for my understanding - SMP (multiple cores) means CPU with (obviously) many cores, but only 1 thread per core, right? Like 6/6 or 8/8 I've mentioned above.

CP2077 will use all 16 threads of an 8 core processor
AC:O(rigins/odyssey) both use 12 threads quite heavily
DOOM eternal will use 12 threads at about 60%
In contrast, Starcraft II will typically only use 4-6 threads
MOST games will use up to 12 threads, and a few top end games will use more if they're there, but you're not going to see a huge jump in performance going beyond 6 cores with SMT for 80% or more game
So to summarize - threads are the higher priority for gaming compared to cores and 8 cores will be enough even for a few years ahead - got it.

Naturally if you have fewer threads, then a processor with fewer faster cores will be faster.
Which finally means (and please correct me if I'm mistaken): if there's a choice between 6c/12t or 6c/16t - the latter is better. However, if 6c/16t is too expensive, buy 6c/12t 4GHz rather than 6c/12t 3.5GHz. "Compensate with clock rate when the next jump to more threads is out of your budget" - is this the idea?


Shikami dude, I really respect your thoughtful approach and do want to read that info attentively tomorrow - its almost night here now. :-( Hope your coffee was good! :)
 
Those pictures you've shared are vizualizing the benefits of SMT persuasively, right. Just for my understanding - SMP (multiple cores) means CPU with (obviously) many cores, but only 1 thread per core, right? Like 6/6 or 8/8 I've mentioned above.
Correct. SMP means many cores, but only one thread per core in the above diagram. Technically speaking a 6c/12t processor is still SMP, just it's got SMT implemented also.
So to summarize - threads are the higher priority for gaming compared to cores and 8 cores will be enough even for a few years ahead - got it.
Correct.
Which finally means (and please correct me if I'm mistaken): if there's a choice between 6c/12t or 6c/16t - the latter is better. However, if 6c/16t is too expensive, buy 6c/12t 4GHz rather than 6c/12t 3.5GHz. "Compensate with clock rate when the next jump to more threads is out of your budget" - is this the idea?
I think you mean 6c/12t and 8c/16t not 6c/16t.

Generally speaking with that modification, what you said is correct, that said, once you get to 8c/16t, going above is not going to benefit you unless you have software that is going to take advantage of it in your workflow and that is a bottleneck to your work.

I'd add:
  • 6c/12t at 4ghz will generally be faster than 8c/16t 3.5ghz in day to day software, as most of the time most users don't use more than 12 threads. This makes it a more compelling option for a lot of users.
  • 8c/16t at 4.8-5ghz is not uncommon at the moment anyhow.
  • 8c/16t is the sweet spot at the moment for both software and price, for the majority of people (3700x, 9900k, etc are economical relatively speaking and very competitive with the higher end options)
  • The 5600x is an interesting and very good value chip (6c/12t) it can be as fast as the previous generation 8 core chips with its improvements.
 
Last edited:
Uhmm... 5950X has 16 cores and 32 threads - why do you call it single-threaded?
Because in order to fit 16 cores onto a single CPU package and fit within the heat/power budget, AMD is forced to put the *best* silicon onto their 16 core CPUs. As a result, if you happen to run a single-threaded workload on that 5950X, it will tend to run a little better than running that same workload on a 5900X, 5800X, or 5600X, because the actual chiplets on the 5950X are the best-of-the-best. Running a single-threaded workload will just mean that a single core gets the entire power budget and can boost to its heart content, and since the chiplets are the best AMD produces you'll get very good boost clocks.
 
I remember when SMP was talked about, and then the boards came out. I was like Yaguchi in High Score Girl talking about the cabinets, consoles, and games. Man, we all wanted Pentium Pro SMP workstations so bad. But it sucked with Windows 95 hybrid code, and you can't really play games with NT, LOL! What is interesting again, I see is how the language is not understood. Symmetrical, means, made of similar parts and has equal facing. Back in the day, this was/is a Unified Memory Architecture. Even though there were multiple sockets with single core processor, the memory, I/O and other resources are shared. The opposite is NUMA, Non Unified Memory Access. The access of the other processor's memory controller(s), and caches are very far from the local of the other. Think of it like managing a 8 children Irish family. Technically speaking UMA and NUMA are like that.

Each have their own particular issues of sharing resources, and not sharing. One good example is how the very first Threadripper suffered from it's very own multi chip module NUMA. The Ryzen architecture has particular latencies still, but much better now due to I/O die which creates a very cool multi chip module for us lowly consumers. Still an excellent piece of engineering and I love what they keep on engineering with the Ryzen architecture. I am such a sponge for any chip knowledge. I do miss David Kanter and his chip break downs back in the day (Intel's Haswell CPU Microarchitecture (realworldtech.com)). Wiki-Chip is a good place too, but I worry since I have not seen them update in a while (WikiChip).

What is fascinating is how the resources are starting to be sharable without stalls and delays with processing for dependencies and such. IIRC, they are trying to implement Remote DMA within the system which is networking only, ATM. That is some interesting quantum "spooky" Einstein shit there.

Looks like I digressed a bit again....I'm old

My point is that SMP you should think of it as equal parts, access, and sharing. Intel monolithic cores are very SMP, and why they are/were faster most of the times. The Ryzen 1st gen was kind of like a NUMA. The two CCX's that had equal access but different core module domains. Note, how much better 3rd gen becomes when the CCX is 8 cores, it is not just the improvements made. I guess not many recall the kernel issues and the flow of how the kernel needed to manage the execution wasn't really that great? How some even changed the affinity of the cores to process better? Ryzen 1st gen suffered some due to the Infinity engine and the buffers, queue, and caches within as it needed to transfer dependent data of one core to another core that was in another CCX. Interesting how the architectures are both SMP, but one kind of isn't, no? How one suffers more with data dependencies and kernel management than the other (Linux was much better at it in the beginning).

So, one thing I'll leave you with is something that kind of irks me when I see it. When many are showing task manager, Performance tab/CPU, they never seem to have "show kernel times" toggled; and at high update speed. There is processing, and then there is the kernel moving things around and managing system resources. Note how they can be similar, but they can be both bottlenecked and maxed out. If you have a file being copied you will see the kernel be flatter with the processing since most devices are capable of managing themselves, but still interrupts, I/O, moving memory, and shit. However, say if you were to erase that file. You will then see processing a bit higher than kernel due to entropy calculation, RNG, SIMD, and other processing necessary to complete the task.

This example I am leaving. It is me playing Cyberpunk 2077. That peak is a loading of a save from an NVMe. Note how symmetrical the processing is with physical and logical. Note how the CPU processing of the game engine is. Then compare the kernel and its management of the I/O, data, input, interrupts and such. Different architectures, and different systems will have different execution and flow........very Tron'ish. Especially when you think of 4.7 billion cycles of moving bits in that 1 second of time.......fuck...my.. mind..*pop*
 

Attachments

  • Kernel times.png
    Kernel times.png
    206 KB · Views: 0
6c/12t at 4ghz will generally be faster than 8c/16t 3.5ghz in day to day software, as most of the time most users don't use more than 12 threads.
An important point! So 12 threads today are more than enough for an average user like me, and thus 12t @ 4GHz are a better choice vs. 16t @ 3.5GHz just for the reason than with the high probability 4 threads won't be utilized at all regardless of what their clock rate is. Sounds more than persuasive. But here come 2 questions:
  • the current (yr 2021) "rule of a thumb of 12t" is surely a temporary thing and in 2-3 years apps will become more thread-consuming so e.g. 20t will be "the new 12t" in say 2024. Is there a universal approach which may work also in future, like "buy the CPU with the # of threads @ ~70% of the max. # available on the market and you'll be just fine"? Or am I simplifying this way too much? 😊
  • which MIN difference in MHz between 2 CPUs is worth to be considered - 0.5?

Because in order to fit 16 cores onto a single CPU package and fit within the heat/power budget, AMD is forced to put the *best* silicon onto their 16 core CPUs. As a result, if you happen to run a single-threaded workload on that 5950X, it will tend to run a little better than running that same workload on a 5900X, 5800X, or 5600X, because the actual chiplets on the 5950X are the best-of-the-best.
Thank you for this explanation! I've heard of "X" CPUs as being more powerful in terms of clock rate, but looks like there may be more nuances across the lineup. Are there any details (similar to those you've mentioned about 5950X) I'd better know about when the AMD Ryzen 5000 [Zen 3] family is considered?
 
I've heard of "X" CPUs as being more powerful in terms of clock rate
That's not actually part of the 'X' nomenclature. As far as I know, the 'X' doesn't signify any specific thing except that it's a higher performance part. Of course, as it currently stands on the desktop-class 5000 series, there aren't yet any non-X parts.

Going back in time to the 3000 series, and comparing say the 3600 to the 3600X, the 3600X came with a better cooler and ever-so-slightly faster boost and base clocks, but under normal circumstances their performance tended to be close enough to each other to be within the margin of error of most benchmarks.

Right now, for desktop Zen3 (5000 series) chips, there's only four models you can buy - 5600X, 5800X, 5900X, and 5950X. And you make a decision based on budget and core count, nothing else really factors in. Oh and the 5600X comes with a serviceable fan. I'm hoping they eventually release a 5700X that includes the Wraith Prism cooler, mimicking the 3700X/3800X relationship from the Zen2 days, at a price midpoint between the 5600X and 5800X. Despite owning one, I think the 5800X is the least compelling of the 5000 series from a price-per-core standpoint, and I would have purchased a hypothetical 5700X, or even a 5900X had one been available.
 
Are you starting to get my point? Cores, and such sure...there is more to a system than that.
Oh, dude, though there were really complicated technical details involved 😇 I surely grasped your idea that it makes sense to evaluate the overall system as a set of processes,rather than only components. You sound like the one with a solid engineering background, hence - your approach, but while it can save ton of bucks in commercial projects, it contradicts the idea itself of consumer lineups (be those CPUs or Gillette razors): buy the best you can afford today and the sooner you replace it with the "better" next gen - the more profit we get.

I didn't need much processing power back in the day to enjoy a good game. I just needed a good game.
True's that. I'm possible not much younger than you - I remember myself playing Prince of Persia 1. Yeah, prince.exe -megahit. Those sweet God modes of the past... 🤣

Just get some dual rank RAM and not to go above 3733 because that is when you lose your 1:1 ratio.
What do you mean by 1:1 ratio here? Just that ASUS Dark Hero doesn't support RAM with higher clock rates?
 
What do you mean by 1:1 ratio here? Just that ASUS Dark Hero doesn't support RAM with higher clock rates?
On all AMD systems, something called Infinity Fabric is used as a link between the CCXs (core complexes), chiplets, and the I/O die. The IF runs at a clockspeed that is directly tied to memory speed; AMD calls this 'coupled mode' when the memory clock and IF clock are sync'd. But AMD chips will become unstable if the IF is run too high. Typically they top out somewhere between 1800MHz and 1900MHz. DDR4 memory actually runs at half the quoted speed - so DDR4 3600 is actually running at 1800MHz, so with DDR4 3600 on a Ryzen system your IF should be running at 1800MHz, which is pretty good! 3733 MHz memory would put the IF at 1866, which many can handle.

Technically you can decouple the IF and memory clocks and run them independently, but there is an unavoidable latency penalty incurred by the system to do so, and that latency penalty tends to negate any performance benefit from using faster memory. So if you run the memory and IF decoupled, you might be able to set the IF at 1800 and the memory at 4400 or somethin, but the latency penalty will mean your system will run no better (or even worse) than had you just run at 3733 in coupled mode.

Speaking of which, this is the reason that AMD Ryzen systems are known to be 'sensitive' to memory speeds. When Zen 1 came out, it was super important to get your memory running at 3000 MHz, and for Zen2 the target was 3600 MHz. It's never actually been about the speed of the memory, though, in reality almost all of the performance improvements came from running the IF at a higher clock.
 
The rule of thumb of 12t for now is fine.

AVX is the next big thing, or specifically AVX512. We are only just starting to see it come to desktop processors (specifically rocket lake mobile skus at the moment).

AVX allows one instruction to be performed on multiple data elements. With AVX2, which is implemented on most desktop and laptop chips now, this means up to 256 bits worth of elements, so (for instance) 32 8 bit numbers can be compared or added or multiplied or whatever in one instruction, and in some cases, one clock cycle, rather than 32.

Theoretically in the above example you could have a 32x speed up. In practice this doesn’t play out quite like that cause you’re usually not only doing the one thing, however, the shift to AVX512 will double the register size, allowing even more data to be processed in parallel.

What all this could mean is a 10c/20t cpu with AVX512 could eclipse a 20c/40t cpu without it.

Of course this depends on software implementations, but so does thread parallelism.

What I am getting at is that any general rule you come up with for 2021 may be swiftly negated in 2024 if software keeps up.

That said, peak technology games work ok on fast 4c/8t chips today, so there won’t be a huge rush to get the latest and greatest.
 
AVX is the next big thing, or specifically AVX512

What all this could mean is a 10c/20t cpu with AVX512 could eclipse a 20c/40t cpu without it.
This is very optimistic.

First off, AVX512 isn't a single thing. Unlike previous iterations of the AVX extensions, AVX512 is a set of a bunch of processor extensions that can (and are) implemented independently. I'm not even remotely versed enough in how CPUs work to speak with authority on how processor instructions 'work', or how to determine which extensions are important and which are not, but other than the base AVX-512F and CD instructions the rest of the AVX512 family of instructions has been very unevenly implemented across the Intel CPUs. You can look at it here if you like. In addition, in current Intel CPUs use of AVX-512 causes frequency throttling on the CPUs, and a large power and thermal load. Love him or hate him, Linus Torvalds (creator and chief maintainer of Linux) has a pretty serious bone to pick with AVX-512 as a result.

My point isn't that AVX512 is bad, my point is the idea that it is "the next big thing" - especially within the context of common office and gaming applications - is questionable at best, and possibly just flat wrong.
 
This is very optimistic.

First off, AVX512 isn't a single thing. Unlike previous iterations of the AVX extensions, AVX512 is a set of a bunch of processor extensions that can (and are) implemented independently. I'm not even remotely versed enough in how CPUs work to speak with authority on how processor instructions 'work', or how to determine which extensions are important and which are not, but other than the base AVX-512F and CD instructions the rest of the AVX512 family of instructions has been very unevenly implemented across the Intel CPUs. You can look at it here if you like. In addition, in current Intel CPUs use of AVX-512 causes frequency throttling on the CPUs, and a large power and thermal load. Love him or hate him, Linus Torvalds (creator and chief maintainer of Linux) has a pretty serious bone to pick with AVX-512 as a result.

My point isn't that AVX512 is bad, my point is the idea that it is "the next big thing" - especially within the context of common office and gaming applications - is questionable at best, and possibly just flat wrong.

While I appreciate your point of view.. that is the perspective of someone who hasn't played with AVX. 512bit registers are a big deal, they're not implemented independently, they exist in parallel to integer instructions.

Regarding implementation, per your link we are talking about rocket lake and ice lake which support the most instructions to date.

In addition to that, while intel chips do throttle, the current net % difference in speed is 3% or less, it seems to mainly be the case when you're talking a lot of cores, which makes sense considering the AVX units are huge units compared to the rest of the chip. Usually you're working at 32/64 bit, when you bump that up to 4x or more, you've got more hardware doing processing.

This outlines what throttling will (most likely) look like on consumer processors in future - as in, not much.

This outlines the key benefits from an intensely software development focused perspective in the short term, and from a consumer/hpc perspective

Here’s what we’ve learned.

  • The Ice Lake i5-1035 CPU exhibits only 100 MHz of licence-based downclock with 1 active core when running 512-bit instructions.
  • There is no downclock in any other scenario.
  • The all-core 512-bit turbo frequency of 3.3 GHz is 89% of the maximum single-core scalar frequency of 3.7 GHz, so within power and thermal limits this chip has a very “flat” frequency profile.
  • Unlike SKX, this Ice Lake chip does not distinguish between “light” and “heavy” instructions for frequency scaling purposes: FMA operations behave the same as lighter operations.
So on ICL client, you don’t have to fear the downclock. Only time will tell if this applies also to ICL server.

The vast majority of software (including games and office software like excel, and powerpoint, and word even) uses AVX to some extent, extending this to 512 can be as simple as a recompile if it has been coded right.

Torvalds can go suck eggs - his issue is that his integer code won't work as fast when AVX is used, thing is, that was a problem with v1 of avx512 on skylake, and has been addressed for the coming intel chips as outlined above.
 
Last edited:
The vast majority of software (including games and office software like excel, and powerpoint, and word even) uses AVX to some extent
Yes, but not AVX512. They use the older iterations that have widespread support among client CPUs. Implementation of instruction set extensions like the various iterations of SSE and AVX tend to lag behind significantly from when they are introduced, until they are widely available in the market. Since, as we both pointed out, only the most recent (and unreleased) Intel CPUs have even begun to support the bulk of the AVX-512 instruction set.

And, I can reiterate - my point is not that AVX-512 is bad; it isn't. My point is that it's not going to be the next big thing. Perhaps it *should* be the next big thing, but it's *not* going to be. AMD doesn't yet support it, and like him or not Torvalds is pretty damned influential in terms of linux software development. If he's dragging his feet, it's not exactly going to speed up adoption.
 
Yes, but not AVX512. They use the older iterations that have widespread support among client CPUs. Implementation of instruction set extensions like the various iterations of SSE and AVX tend to lag behind significantly from when they are introduced, until they are widely available in the market. Since, as we both pointed out, only the most recent (and unreleased) Intel CPUs have even begun to support the bulk of the AVX-512 instruction set.

And, I can reiterate - my point is not that AVX-512 is bad; it isn't. My point is that it's not going to be the next big thing. Perhaps it *should* be the next big thing, but it's *not* going to be. AMD doesn't yet support it, and like him or not Torvalds is pretty damned influential in terms of linux software development. If he's dragging his feet, it's not exactly going to speed up adoption.

You do know, that in most cases AVX (and 512) are literally a case of using a compiler switch to enable.. right?

And you know that Zen 2 is set up architecturally for it...

And you know that most people on consumer hardware don't use linux
 
Last edited:
You do know, that in most cases AVX (and 512) are literally a case of using a compiler switch to enable.. right?
Yep.

And you know that Zen 2 is set up architecturally for it...
And yet, AVX-512 remains unsupported even on Zen3, let alone Zen2. To my knowledge they don't even have instruction level compatibility with AVX-512, let alone dedicated silicon on the chip for the task.

You seem to be arguing from the premise that I think AVX-512 is a bad idea. I do not. It sounds great. It's just not top of the list for 'next big thing'. Thanks to Zen 3, the 'next big thing' has become IPC improvements on general purpose x86/x86-64 code, which is a big win for all involved now that Intel is following suit as well.
 
That's not actually part of the 'X' nomenclature. As far as I know, the 'X' doesn't signify any specific thing
Well, I referred to the 3000 series (Zen2), where "X" meant higher stock clocks. Looks like there's no such a grading in 5000 (Zen3), at least so far.

Going back in time to the 3000 series, and comparing say the 3600 to the 3600X, the 3600X came with a better cooler and ever-so-slightly faster boost and base clocks

under normal circumstances their performance tended to be close enough to each other to be within the margin of error of most benchmarks.
Sounds like the reason why they've ended this clock grading in 5000s. :D

Oh and the 5600X comes with a serviceable fan. I'm hoping they eventually release a 5700X that includes the Wraith Prism cooler
Though questions on cooling were a bit further in my "roadmap" may I ask you for a couple of words about the benefits of a serviceable fan (smth. more than easier cleaning?) and the Wraith Prism cooler?

But AMD chips will become unstable if the IF is run too high. Typically they top out somewhere between 1800MHz and 1900MHz

this is the reason that AMD Ryzen systems are known to be 'sensitive' to memory speeds
Useful info, thank you! How do I know the IF parameter for the Zen3 architecture? Is it the same for all CPUs of the series, regardless of their own clocks?
 
What I am getting at is that any general rule you come up with for 2021 may be swiftly negated in 2024 if software keeps up.
Yeah, this is true - makes no sense to plan so much ahead.

I'm not so techie to support your vivid conversation with sinisterDei on AVX(512) technology, but being from marketing myself I surely know - regardless of how brilliant the technology is, its chances to be widespread are directly dependant on the potential marketshare to be gained. Otherwise it stays niche, albeit brilliant. Just my 5c to this talk.
 
serviceable fan
By serviceable, I don't mean maintainable. I mean serviceable in the sense that it acceptably performs the service of cooling the CPU, and if your usage isn't heavy duty you may not even need to replace the heatsink/fan with an aftermarket unit. The Wraith Prism is the higher-end heatsink/fan included with the 3700X (and higher) and is a very competent cooler.

How do I know the IF parameter for the Zen3 architecture?
Zen3's IF seems only slightly more tolerant of high frequencies. 1900MHz (DDR4 3800) is more common on Zen3 than Zen2. But it's not a huge difference, and performance is perfectly fine anywhere between 1800 and 1900 MHz.
 
Yeah, this is true - makes no sense to plan so much ahead.

I'm not so techie to support your vivid conversation with sinisterDei on AVX(512) technology, but being from marketing myself I surely know - regardless of how brilliant the technology is, its chances to be widespread are directly dependant on the potential marketshare to be gained. Otherwise it stays niche, albeit brilliant. Just my 5c to this talk.
Fair, but intel has very high deployment vs AMD.. I see things moving in that direction
 
What do you mean by 1:1 ratio here? Just that ASUS Dark Hero doesn't support RAM with higher clock rates?
Others here did most of the work, but this has pictures!! And stuff!!:





AVX of any bit is a big deal; and 512bit registers HeHZues! I must agree when Linus speaks I listen. People think he is a dick, but I get those that are like him. However, those extensions with Intel's clout are going to be adopted to some degree. The only issue I become irate about is it seems most of the times when AVX is utilized consumer wise, parts-or more-of it are taken out. Recently was Cyberpunk and before that was Assassin's Creed (q.v Hotfix 1.05 - Cyberpunk 2077 — from the creators of The Witcher 3: Wild Hunt).
  • Removed the use of AVX instruction set thus fixing crashes occurring at the end of the Prologue on processors not supporting AVX.
I remember an article, cant recall if it was early 2000's or late 90's. But it had to do with the elegance of the code that AMD often does in comparison to Intel's. I do believe this was for 3DNow. It mentioned similar to Linus how badly things are implemented in comparison of the arch's, hardware and software (e.g. SSE vs 3DNow). AMD has a tendency to do things with a bit more of grace, such as x64. HOWEVER, one thing AMD cannot do is market well and create monikers well. Cool'n'Quiet, 3DNow! gag........just a little vomit, sorry.

I have to admit I prefer the terms MMX, SSE, AVX. But also, I feel, because it is similar to the ISA. I was very happy with the Ryzen moniker, and also the enso emblem (Ensō - Wikipedia). Also, having terms such as AFR, Sense MI, et al, were much more palatable.
 
Last edited by a moderator:
Re AVX and cyberpunk, I think there was something lost in translation, as cp2077 still uses AVX..
 
OP, let me give you an even easier explanation if I may.

IT DOESN'T MATTER!!!!

Depending on resolution, you won't benefit much if any from going with say a 6 core vs a 16 core CPU (or higher). So don't expect no where near even a 60% performance increase.

And unless you have an specific application that benefits from more cores/threads like say rendering or image editing, you won't really notice much of a difference.
That said, if you have software that can take advantage of many cores, then by all means get the most cores you can afford.
 
Thank you all guys for spending your time and being so much helpful - I highly appreciate this! Being absolutely honest here I'd say this is the most friendly forum, at least when its General sub-forum is concerned. 👍

which MIN difference in MHz between 2 CPUs is worth to be considered - 0.5?
Trying to understand here how much (in +% to the price for the CPU with lower clock) makes sense to pay for higher clocks. E.g. "+0.2MHz don't cost more than +10% to the price" (taken the rest of parameters like cores/threads are similar).

And unless you have an specific application that benefits from more cores/threads like say rendering or image editing, you won't really notice much of a difference.
That said, if you have software that can take advantage of many cores, then by all means get the most cores you can afford.
Thank you for your input! The problem is that while with wide strokes everything is clear (i.e. the more for the money you can afford - the better), I'm not a supporter of a straight-forward approach, especially when the complicated tech stuff is concerned. There are so many dependancies between components, that one can be a sure winner only if they buy the very top of the range, but this will wind up the price to like $3000-4000 I guess. So while there definitely is an easier way, trying to get more efficiency for reasonable money here is kind of challenge, sport if you will. Finally, "well, boys, uh?".
 
Back
Top