Have we reached a dead-end in CPU microarchitecture performance?

Ghost of Cyrix · Feb 1, 2018

Hi there, I'm new to posting here but have been a longtime lurker and want to discuss something that both fascinates me from a tech nerd perspective and bothers me as a chronically CPU-bound PC gamer. I realize this is a really long post and apologize in advance, but it’s a topic I’ve wanted to discuss for a good long while. You can probably get the gist of the question from my topic title and just skipping to the last paragraph (under "conclusion"). If anyone knows a forum that tends to be even more generally knowledgeable or interested in the matter, I would appreciate being pointed in that direction too.

Can Intel or any CPU-designing company for that matter actually and substantially progress consumer-level/gaming desktop CPU performance beyond where we are now with a typical overclocked Coffee Lake i7? There are three interrelated avenues to consider on the matter: (1) clock speeds, (2) core count, and (3) IPC.

Clock Speeds

The history of my view on this topic begins with the death of Dennard scaling around 2005. Before the current concern about Moore's Law coming to end, I view Dennard scaling as the bigger foretelling of stagnation in PC CPU performance. Dennard scaling is basically the increase in performance per transistor added that accompanied Moore's Law and actually gave it teeth for high performance markets. When Dennard Scaling died, the prospective future of clock speed increases died.

We've since seen the results of this. Viable consumer-level clock speeds as a whole have barely increased from the Pentium 4's heights and more-or-less peaked with Sandy Bridge's upper 4.x - lower 5.x GHz limits. Successive generations after Sandy Bridge even had trouble reaching those same frequencies and we've only more-or-less returned to or slightly exceeded Sandy Bridge's limits with Kaby Lake and Coffee Lake after Intel deliberately stuck to and refined one of their existing process nodes for multiple generations.

Concurrently, AMD are having trouble even breaking 4.0 GHz with Ryzen, also peaked around 5 GHz overclockable frequencies with Vishera, and even non-x86 CPUs seem to top out at ~5 GHz (see: SPARC M8). It appears that clock speeds are going nowhere from here (or at least not up), so that leaves IPC and more cores as potential avenues of performance gain. Perhaps the theoretical switch to semiconductors other than silicon might help get frequency on the rise again, but those other materials are not quite as abundant and cheap as silicon (a necessity for economic viability) and all the of the work done to get silicon to its current transistor size levels might be useless for another material and make any switch highly difficult from an architectural level.

Core Counts

Now, the death of Dennard scaling influenced the existence of multi-core CPUs as a non-ideal band-aid. It is important to note that the functions of CPUs, particularly in the consumer space, are generally inherently serialized code. The first few steps towards additional cores were relatively low-hanging fruit and could be used for some multitasking and multithreaded gains in the few aspects of consumer software/games that can actually leverage multiple cores to a useful extent. But that avenue is limited. Again, CPUs are specifically designed to handle largely serialized code. Embarrassingly parallel stuff is largely relegated to GPUs which are much more specialized to handle such workloads. In games, particularly, CPUs tend to handle game logic, AI, and physics with a lot of dependencies and resultantly, we see increasingly limited gains from multiple cores. See: Amdahl’s Law for an algorithmic demonstration of the problem:

Personally, I play many CPU-heavy games (the nature of PC exclusives such as simulation, strategy, and MMO games). I have a 5.0 GHz 8700K with DDR4-3600… and I’m CPU-limited in many, many games. Sometimes, CPU utilization seems to be heavily-spread across all 12 threads as in Watch Dogs 2, AC: Origins, Dying Light, and Civ VI. Other times (more annoyingly), CPU utilization is either somewhat more disproportional (by necessity, mind you; claiming “lazy programming” or “poor optimization” casually and insultingly dismisses the real limitations of logic and physics by trying to unfairly shift blame to developers and programmers) and/or adding cores in these CPU-bound scenarios yield extremely diminishing returns if any at all.

From my personal testing, Civ VI (best-case scenario I have) uses many threads and benefits by about 30% FPS gain from 50% more cores going from 4 to 6 (presumably due to effective spreading of draw calls on a draw call-heavy, late-game map). On the other hand, games like Planet Coaster and Cities: Skylines barely gain 10-15% FPS with 50% more cores from 4 to 6 (despite being heavily multithreaded and remaining CPU-bound on my 1080 Ti).

Consequently, it seems fairly clear that additional cores, especially from this point forward where we’ve already breached the multicore shift and have 6 to 8 core SMT/hyperthreaded consumer CPUs, have their limits and, speaking personally, per-core CPU performance is much more of a pressing concern in the most CPU intensive of games. There are simply limits, imposed by dependencies and the nature of game logic, that put a hard cap on multi-core scaling in many gaming scenarios. Therefore, scaling onwards to more cores in the future, which it may help some games in aspects which benefit from additional threads and reduced context switching, isn’t really a viable cure for many demanding consumer tasks such as many CPU-heavy games.

IPC

So finally, and most unclearly, we have potential IPC (instructions per cycle or instructions per clock) gains. IPC is a not a universal and constant unit and is derived from a combination of ILP (instruction-level parallelism; exploited through several factors such branch prediction and speculative execution), memory subsystem performance (caches + RAM), other architectural details, and specific demands of any particular piece of software.

Now, for several years now (and despite the lackluster gains), Intel have actually been beefing up their microarchitectures to try and improve IPC. Haswell and Skylake attempted to bring substantive improvements in ILP (scheduling and out-of-order windows) and memory performance (through cache bandwidth improvements on Haswell and DDR4 + more minor tweaks on Skylake) while Haswell also significantly expanded the design’s execution engine. The end-result has infamously not been particularly groundbreaking IPC gains (even ignoring the clock speed regressions hampering end-performance gains from Ivy Bridge to Skylake). So, what effect might attempts to improve IPC even have and how can it be improved without potentially having a mitigating adverse effect on clock speeds due to increased pipeline complexity? All I know in particular is the eDRAM Intel put on Broadwell-C (5775c/5675c) was actually quite helpful in some games. Is more cache a path forward Intel can/would take?

Lack of competition as the problem?

Since Sandy Bridge, we haven’t really gotten too much performance improvement given how long it’s been and given the design improvements Intel actually attempted. And I’m not convinced it was just because Intel were holding back from a lack of competition or anything (different story for the lack of solder and core counts/price though, imo; I’m only speaking about my views on per-core performance with regards to competitive pressure).

Consider, for instance, that despite Intel’s slow improvement, AMD’s return to the game with a wholly new Ryzen design still sees them substantially trailing Intel’s per-core performance, max OC to max OC. If Intel was really sitting on some treasure trove of improvements, I would think the Ryzen team would have exploited some and at least reached parity with Intel or capitalized on their relative stagnation to fully catch up (or project doing so rather than merely keeping expectations low by citing annual average improvement rates in the industry of 7-8%), but that’s not the case.

Ryzen does have a substantially similar amount of “IPC” when doing tasks that don’t stress its memory subsystem, but tasks that do such as a good number of games see the gap widening and clock speeds on Ryzen are also quite low compared to Coffee Lake’s capabilities (in fact, we have to return to Nehalem to see the same frequency limitations on Intel’s designs). While it may be tempting to dismiss those frequency limitations as being the fault of GF’s current process node, one can’t just assume that given that it is a physical fact that pipeline complexity can and has had a substantial impact on clock speeds. Look no further than Maxwell vs GCN back on the same 28nm process node for an example.

Given how difficult it seems to be to try and match up with Intel’s current per-core capabilities, I think for the purposes of discussion we should presently assume the possibility that Intel are choosing an ideal IPC vs frequency trade off and may even be pushing the limits of what may physically be done until such a time comes that we see either AMD or Intel substantially leap ahead of 5 GHz Coffee Lake per-core design on current materials. And now, even some of the speculative execution improvements used to get to our current performance levels are threatened by security vulnerabilities like Spectre.

Conclusion

At the end of the day, I’m wondering how much Intel could even improve their architectures (ignoring core count right now simply because of its limitations for consumer purposes). More frequency seems off the table and previous generation improvements in ILP and memory subsystems haven’t exactly yielded impressive performance gains and have their own diminishing returns. I’m skeptical as to how much more “IPC” Intel or AMD could even extract from these designs and also need to point out that the increased pipeline complexity that comes with improved “IPC” may also further limit frequencies as a tradeoff. Furthermore, this all has me wondering… have we more-or-less simply reached the limit of what’s possible? Will we even be able to look back a decade from now and notice even a 40-50% aggregate gain?

Justintoxicated · Feb 1, 2018

They could start improving their architectures by eliminating meltdown and spectre

Welcome to the forum.

tangoseal · Feb 2, 2018

Long ass post for your first one ever. Welcome to H.

Meh ... were about to hit the physical limits of Si ... time to switch to a new element or technology.

Shintai · Feb 2, 2018

The last great IPC gain was back with the PPro 23 years ago. Anything since then has been tiny IPC wise and mainly clock bumps. To say you haven't really gotten anything since SB is kinda BS.

Software is about 20 years behind, about 5-10 for newer instructions.

Zen is nowhere near SKL/KBL/CFL IPC unless you use legacy loads with more or less SSE loops. But then you just have a glorified SB with DDR4, more cores and more cache.

Its hard to increase IPC and clocks at the same time. I mean you can easily make a CPU with 2-3x the IPC of say CFL. It may just peak out at 200Mhz instead of 4.7Ghz as stock.

Core wars is already dead again before it started due to it´s inherit problems for consumers not being able to scale concurrency. Even for servers diminished returns are showing now. For phones the new wars is memory, first smartphone with 10GB is out after the core wars died there.

OOO engine and the quickly dead Mitosis project is essentially both attempts to counter some of these scheduling issues that can´t be solved without going the route of something like IA64. You end up spending a lot of resources trying to execute in advance, throwing away the bad results.

For Civ6 the AI turns doesn´t seem to scale either.

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS84LzIvNzE1NjgyL29yaWdpbmFsL2ltYWdlMDAyLnBuZw==

Verado · Feb 2, 2018

Great post.

Sad that this thread will turn into just another AMD vs Intel war.

Denpepe · Feb 2, 2018

I think there is room for improvement with multiple cores and/or brand new architectures, imo you should be able to better use multiple cores by better spreading workloads on multitasking systems.

One of pc's biggest problems is that they always want to keep backwards compatibility with a lot of antiquated stuff, while building something new from the ground up may perform better, you would not be able to use older software and or hardware which would be a big nono for big companies etc..

Not sure if you heard of this but some time ago an airport in France had to be shut down for a while because one of the computers they had there running win 3.1 or 95 or something broke down and the software they use on that machine wont run on anything more recent and they can't do without, you would think something critical like this would be modernized, but nope it still works so why pay someone to make a newer version.

TrailRunner · Feb 2, 2018

One of the explanations given for Ryzen's lack of clock speeds over 4GHz is that it is built on a low-power process, which ends up sacrificing the ability to clock to higher speeds. Looking at Intel's designs, their mainstream desktop chips are derived from mobile parts where power usage would be prioritized over clock speed, and in the server market the core count is more important than clock speed as well. With Prescott, we definitely saw that clock speeds couldn't be ramped to ludicrous levels indefinitely, but I think that if CPU designers wanted to design high IPC chips with high clock speeds it would be possible to do so. A higher clocked core designed for the desktop first and foremost would come at the expense of higher power usage, making it less suitable for mobile, and limited core count making it less suitable for enterprise. Since the desktop is declining in market share year-over-year, I don't think we'll see an architecture designed for a moderate core count (4 core) processor at 5+ GHz speeds, but we will continue to see clock speeds slowly decrease by generation while IPC slowly increases to make up the difference.

Also - even in desktop, the demand is growing for higher core counts, even though they're not needed by most user's current workflows, and this is sacrificing the ability to clock higher. i7 7700k - 4 core 4.2 GHz. i7 8700k - 6 core 3.7 GHz

Brian_B · Feb 2, 2018

The old war was clock speed.

Then, when that wasn't sustainable, it became energy efficiency. That lead to some continued improved performance and much different form factors. It wasn't nearly as fast of improvement (as the chart Shintai posts clearly shows), since clock speed is a 1:1 speed increase, and the speed increase due to energy efficiency is much less.

There has been some core count movement, particularly lately with Ryzen/TR/Epyc really moving the bar. Issues with parallelism/concurrency make that a diminishing return though - as the OP points out. It's not that more cores aren't good, it's just that it's a declining return - whereas clock speed was linear, and when you get to the algortihms/programs that can really take advantage of parallelism - we've already got GPUs that have thousands of available cores that CPUs can't compete with.

I think it's still about energy efficiency in some form or fashion.

Dan_D · Feb 2, 2018

The problem isn't that Intel and AMD can't make CPU's that perform better. Raw performance is relatively easy to achieve. The problem, is that they can't do so while maintaining a reasonable power envelopes. On the desktop this isn't really a problem, but the desktop is now a niche market product and not worth Intel or AMD spending tons of money on it. Furthermore, the software on the desktop just doesn't demand that much right now. The hardware generally always comes before the software.

FlawleZ · Feb 2, 2018

OP, you state you are CPU limited in the aforementioned games but you didn't specify at which resolution.

juanrga · Feb 2, 2018

The answer to your first question is "no". x86 is a dead-end ISA. It hits both an IPC wall and a frequency wall. And core count is limited by stuff as Amdhlaw law, as you mention. Next figure show the IPC and the frequency walls than Intel chips have been approaching in late years. Frequency is bound by about 5GHz and IPC is bound by about 8-wide

This deceleration on CPU development doesn't have anything to do with lack of competition, but it is a consequence of physical and technological limits and that is why the deceleration affects to everyone, not only Intel. For instance this is a visual representation of the frequency wall for Intel, AMD, IBM, Sun DEC and others.

The existence of the IPC wall has been known since the 80s or so. It is the reason why Intel tried to use the migration from 32bit to 64bit to replace the x86 ISA by a new scalable ISA. The approach failed miserably (specially because it relied on the existence of a smart enough compiler that never was built), but at least Intel and HP engineers tried.

There is absolutely nothing that AMD and Intel engineers can do to solve the current situation. Academic and industrial researchers (including people at Intel labs) have been investigating lots of new architectures and microarchitectures during decades, but not one of the proposals has worked well enough to be moved to commercial production.

Of course, engineers can optimize the existing commercial muarchs here and there to get 2% IPC here and 4% there... and foundry engineers can optimize this and that to get some MHz here and some MHz there from the next node. But that is all.

There are some academics that are working in a continuation of Intel EPIC: the guys of the Mill CPU group are working on a very wide VLIW muarch (up to 33-wide) with some improvements over the original Intel approach and much more advanced compilers. Some technical discussion abouy the Mill happened in RWT the past year, with Ivan Godard (chief of the Mill project and a compiler guy). As many therein I am completely skeptical about the Mill project. I think it will be another fiasco.

Ghost of Cyrix · Feb 2, 2018

FlawleZ said:
OP, you state you are CPU limited in the aforementioned games but you didn't specify at which resolution.

I'm currently using a 2560x1440 165 Hz Gsync panel. Though my CPU-bound FPS can get so low in some of my games that I'd still be having problems at 4K. Mostly with simulation-heavy indie titles. Personally, I find CPU performance strongly underrated on gaming forums. For the typical single-player, console-friendly multiplatform game that has to run well on Jaguar, of course any half-decent PC CPU will be perfectly awesome. But it's those niche and variable/progressively more complex (as your game progresses) type games that reviewers both rarely test for lack of standardization and lack of individual popularity that really show the limitations of pretty much any CPU out there.

Heck, with a good enough card and a high enough refresh rate, stuff like PUBG and Fallout 4 even can be problematic. I bought into Gsync to help out CPU-limited titles going below 60 FPS more than I did to help out my GPU. As if any x80 Ti Nvidia releases ever fails to impress, lol.

Dan_D · Feb 2, 2018

Brian_B said:
The old war was clock speed.

Then, when that wasn't sustainable, it became energy efficiency. That lead to some continued improved performance and much different form factors. It wasn't nearly as fast of improvement (as the chart Shintai posts clearly shows), since clock speed is a 1:1 speed increase, and the speed increase due to energy efficiency is much less.

There has been some core count movement, particularly lately with Ryzen/TR/Epyc really moving the bar. Issues with parallelism/concurrency make that a diminishing return though - as the OP points out. It's not that more cores aren't good, it's just that it's a declining return - whereas clock speed was linear, and when you get to the algortihms/programs that can really take advantage of parallelism - we've already got GPUs that have thousands of available cores that CPUs can't compete with.

I think it's still about energy efficiency in some form or fashion.

IPC has been something that's been an issue since AMD and Cyrix began producing their own knock off x86 compatible CPUs. I used to see it in benchmarks in magazines all the time and in my own tests back then. I had a Cyrix 486 DX2-80 that was only about as fast as an Intel 486DX2-66. Clock speed wasn't the issue with the 486 compared to early Pentiums but they got smoked clock for clock. The Pentium 60/66MHz were huge over 486's clocked higher than that.

IPC is and always will be part of any CPU war.

Doward · Feb 2, 2018

If we could get rid of x86 ISA, we might be able to do something.

Unfortunately, it's just too rooted.

Shintai · Feb 3, 2018

Doward said:
If we could get rid of x86 ISA, we might be able to do something.

Unfortunately, it's just too rooted.

ARM, MIPS, POWER, SPARC, x86 all share the same problem. The reason why is due to software needing a low bar.

Doward · Feb 3, 2018

Shintai said:
ARM, MIPS, POWER, SPARC, x86 all share the same problem. The reason why is due to software needing a low bar.

Software doesn't specifically need a low bar - that's a business decision vs an engineering decision, imho.

OutOfPhase · Feb 3, 2018

Doward said:
Software doesn't specifically need a low bar - that's a business decision vs an engineering decision, imho.

Right, but I think that’s what he means. Managers who aren’t skilled themselves tend to hire Cheap, not Good engineers. They compound it with absurd schedules and budgets which ensure you can’t do something well. Just ram crap out, they get an on-time perf bonus and move on to their next dumpster fire.

Inexplicably “kinda works” has become acceptable to consumers for software.

dvsman · Feb 3, 2018

For the past few years / builds I've been happy at 4ghz -ish. If I didn't get FPS that I liked, I'd throw more GPU at it. But I guess that is due to the type of games that I play and enjoy. RPG/MMO/FPSes.

If I was doing something that could benefit from high O/C - I guess I'd be more keen on it, but high O/C is usually grows inversely to stability and at the end of the day I prefer stability over whatever few fps that may result from a CPU O/C.

Brian_B · Feb 3, 2018

Dan_D said:
IPC has been something that's been an issue since AMD and Cyrix began producing their own knock off x86 compatible CPUs. I used to see it in benchmarks in magazines all the time and in my own tests back then. I had a Cyrix 486 DX2-80 that was only about as fast as an Intel 486DX2-66. Clock speed wasn't the issue with the 486 compared to early Pentiums but they got smoked clock for clock. The Pentium 60/66MHz were huge over 486's clocked higher than that.

IPC is and always will be part of any CPU war.

True, but as of late, IPC gains have only come when paired with energy efficiency gains.

You take that 90W package, you make it 10% more power efficient, then you pack more electronics into it and crank the clock speed back up to 90W.

That's what Intel has been doing for the past 8 or so generations.

Dan_D · Feb 3, 2018

Brian_B said:
True, but as of late, IPC gains have only come when paired with energy efficiency gains.

You take that 90W package, you make it 10% more power efficient, then you pack more electronics into it and crank the clock speed back up to 90W.

That's what Intel has been doing for the past 8 or so generations.

Fair enough.

Hagrid · Feb 3, 2018

Might get a bit more performance if Intel would ask AMD for help on using solder.
It's also the programming. If everything was multi-threaded then we would be fine.
Both sides have lots of cores, but the software(games mostly) just don't take advantage of it.

OutOfPhase · Feb 4, 2018

Hagrid said:
Might get a bit more performance if Intel would ask AMD for help on using solder.
It's also the programming. If everything was multi-threaded then we would be fine.
Both sides have lots of cores, but the software(games mostly) just don't take advantage of it.

Because for most tasks - games included, there really is a limit to effective concurrency. Graphics are a bit of an exception - an absurdly parallel case where you can get all the way to multiple shaders per pixel and STILL improve throughput.

There are some aspects of game programming which lend themselves to high threading, but many really don't. Increasingly the aspects that do support significant parallelism are being sent to the GPUs which are machines designed to handle incredibly parallel tasks.
What is left is stuff like background texture / geometry management (streaming of world data, etc), management of multiple (usually non-essential) actors, and the like. Almost all games do utilize extra cores for that stuff. The main gamestate management is largely single threaded itself, with some steps being farmed out across cores.

There's no magic wand to wave which eliminates inherent serialization unfortunately. To put it in math terms, it's the age old problem of:
A = f(B,C)
D = f(A,H)

You cannot calculate D until A is done, regardless of how many cores you have.

Shintai · Feb 5, 2018

Servers depend on the same scaling issue. More cores in a lot of aspect can be slower unless you offset it with multiple clients. Core wars even for servers is dying out now. And its already dead on consumer devices.

There is a reason why you still see high clocked low core count Xeons.

Filiprino · Feb 16, 2018

The only way to continue improving is adding more cores. And making them power efficient. Which means that you will end up with multiple small in-order cores and a few big out-of-order cores.
Then you have to use runtime libraries to exploit those types of chips.
But then you get into commercial products and you see they are still programming like if it was 20 years ago with threads, mutexes, semaphores, locks, yadda yadda. That was ok two decades ago. Now we are in the multicore era. You can not continue doing that unless you want a big fat die full of buses and big caches to absorb the traffic of the continuous use of memory addresses locking.
That programming model is not scalable and it shows.

OutOfPhase · Feb 16, 2018

Filiprino said:
The only way to continue improving is adding more cores. And making them power efficient. Which means that you will end up with multiple small in-order cores and a few big out-of-order cores.
Then you have to use runtime libraries to exploit those types of chips.
But then you get into commercial products and you see they are still programming like if it was 20 years ago with threads, mutexes, semaphores, locks, yadda yadda. That was ok two decades ago. Now we are in the multicore era. You can not continue doing that unless you want a big fat die full of buses and big caches to absorb the traffic of the continuous use of memory addresses locking.
That programming model is not scalable and it shows.

It's not a programming model problem at all. It's that most of the problems we're solving are not widely parallel at all, even from a logical standpoint (totally independent of programming).

You either have problem sets with independent data, or you need to access common data atomically (from the perspective of core interaction) thus must serialize access. Decrying things used to serialize access as "old" or something is neither here nor there. It doesn't matter what that mechanism is, once you gate access (which you must, for data coherency), your parallelism is gone.
And sadly, most general purpose computing does not have a bunch of completely independent things to do.

Revenant_Knight · Feb 16, 2018

We can have big jumps in performance if everyone was willing to tolerate cooling solutions for 350+ Watts.

10GHZ 8700k anyone?

Batboy88 · Feb 17, 2018

IPC...those extra cores/threads still not gonna do much in some things and at 4k you'll still see quite a hit in some stuff. goes back to that throw more gpu at it.

Gideon · Feb 17, 2018

Revenant_Knight said:
We can have big jumps in performance if everyone was willing to tolerate cooling solutions for 350+ Watts.

10GHZ 8700k anyone?

You forgot a 0 on that wattage estimate

Batboy88 · Feb 17, 2018

Gideon said:
You forgot a 0 on that wattage estimate

Will probably still want to turn to more cores that way. because the scaling with more clocks still isn't that great. at the same and current rate still, still would become pretty inefficient.

Shintai · Feb 17, 2018

Batboy88 said:
Will probably still want to turn to more cores that way. because the scaling with more clocks still isn't that great. at the same and current rate still, still would become pretty inefficient.

Core scaling is worse than clock scaling for consumer apps at this point. Core wars died as soon as it begun again. Even servers is having issues with core scaling now due to ever diminishing returns.

juanrga · Feb 17, 2018

Revenant_Knight said:
We can have big jumps in performance if everyone was willing to tolerate cooling solutions for 350+ Watts.

10GHZ 8700k anyone?

Gideon said:
You forgot a 0 on that wattage estimate

I guess he also forgot that no one can get 10GHz even when using LN2 as cooling solution.

juanrga · Feb 17, 2018

Filiprino said:
The only way to continue improving is adding more cores. And making them power efficient. Which means that you will end up with multiple small in-order cores and a few big out-of-order cores.
Then you have to use runtime libraries to exploit those types of chips.
But then you get into commercial products and you see they are still programming like if it was 20 years ago with threads, mutexes, semaphores, locks, yadda yadda. That was ok two decades ago. Now we are in the multicore era. You can not continue doing that unless you want a big fat die full of buses and big caches to absorb the traffic of the continuous use of memory addresses locking.
That programming model is not scalable and it shows.

As explained before, adding more cores is limited by Amdahl law. This is not a programing model problem, but just a consequence of the sequential nature of some algorithms.

A big.LITTLE approach does not solve the performance problem, because the sequential portions of code will have to be executed on the big cores, whose performance will continue being limited by both frequency and IPC, as it happens on current cores.

Moreover, those heterogeneous approaches have the additional problem that the partition of silicon into big and small is static and made during the design phase, whereas different applications require different combinations of latency and throughput: one application would work better on a 16 BIG + 256 LITTLE configuration, whereas another application would work better on a 4 BIG + 1024 LITTLE configuration. If your heterogeneous CPU is 8 BIG + 512 LITTLE, then those two applications will run inefficiently compared to the respective optimal silicon cases.

RC-Heli-3D · Feb 19, 2018

Nano Technology will take over Moore's law for some time. Be expecting the CPU manufactures to milk the customer for some time.

stormy1 · Feb 19, 2018

There is still a lot to be gained at the edges. The execution units spend a lot of time waiting for cache, memory, i/o and other factors outside of the actual execution.

A cache miss is extremely costly and increasing the cache size is a band-aid that does not scale very well and actually makes the problem worse.
More cores actually makes that problem worse for many workloads.
The more cores and the larger the cache the more resources you need just to take care of the cache which increases the penalty for a cache miss.

Revenant_Knight · Feb 19, 2018

juanrga said:
I guess he also forgot that no one can get 10GHz even when using LN2 as cooling solution.

Mostly because Intel hasn’t designed for clock speed in a long time. The Pentium 4 was originally designed to scale to 10GHZ. Thermal solutions at the time couldn’t keep up, and Intel changed their game plan. However, a non LN2 cpu wouldn’t be out of the question with modern coolers. What would drop though is the IPC ( due to the longer and longer pipelines required) and core count.

Prescott also had the ability to run some of its ALUs at double bass clock speed. Effectively, this meant parts of the core were running at 7.6GHZ. However, it had an 31 stage pipeline that ultimately had poor error prediction.

At the end of the day, it wouldn’t be very efficient. And with computers trying to get smaller and smaller, there is no point.

My original post was a bit tongue and cheek for those who recall the Prescott days.

Honestly, most cpu advances are in the cell phone market right now.

Shintai · Feb 19, 2018

The problem with high clock speeds is you need to have the entire clock domain in sync.

SvenBent · Feb 19, 2018

Ghost of Cyrix said:
Sometimes, CPU utilization seems to be heavily-spread across all 12 threads

Sorry pet peeve of mine

Did you mean threads or did you mean logical cores? it's different. CPU's does not have threads they have logical cores that can run a thread. It's a general misconception that most people do due to bad marketing and tech sites not really being that techie anymore.

if you did mean threads I would like to see that.
if you did mean logical cores (As seen in task manager, msi afterburner etc) you have to remember that just because the load is distributed among all cores does not mean the load benefits from it or is in anyway or shape multithreaded.
due to the simple fact that you are seeing an average OVER TIME.

aka 1 threads (software term) that runs 100% of the time on an 16 cores (logical or physical) CPU can easily show up as 6% usage on all 16 cores.
that does not mean the software was multithreaded.
its the same mistake Kyle did in the doom 2016 CPU scaling concussion and many other people do as well.

you single threaded software is not locked to specific cores. it can jump around every quantum if need be over a given measured time. But off cause only at one core at a time in any giving instant.

if you want to look at scalability you need to look into proper threads load (not core load)

--- edit --
Here is the exact example by using 7-zip in 1threaded mode. it was to demonstrate how we cant look at CPU/core load to find bottleneck but the same logic applies here

Also there is more here to show you that load across cores does not mean its multithreaded
https://hardforum.com/threads/just-...-runs-only-a-little-better-why.1914911/page-2

We need to stop the misconception that load spread among cores automatically means it takes advantages of multiple cores.

OutOfPhase · Feb 19, 2018

SvenBent said:
Sorry pet peeve of mine

Did you mean threads or did you mean logical cores? it's different. CPU's does not have threads they have logical cores that can run a thread. It's a general misconception that most people do due to bad marketing and tech sites not really being that techie anymore.

if you did mean threads I would like to see that.
if you did mean logical cores (As seen in task manager, msi afterburner etc) you have to remember that just because the load is distributed among all cores does not mean the load benefits from it or is in anyway or shape multithreaded.
due to the simple fact that you are seeing an average OVER TIME.

Actually, the original term for what you're calling "logical cores" was in fact threads. No, it is not the same as an OS thread as you are saying, so I actually prefer the term "logical cores" in general conversation as well. But, the term "thread" is not incorrect, and is in fact the term used internally.

My thesis actually has the word "threading" in big bold letters, and that tech later became "hyperthreading".

Anyway, carry on. Your analysis of how many OS threads vs. cores doing something is correct. The scheduler will rotate the physical core based upon a variety of factors.

SvenBent · Feb 19, 2018

PhaseNoise said:
Actually, the original term for what you're calling "logical cores" was in fact threads. No, it is not the same as an OS thread as you are saying, so I actually prefer the term "logical cores" in general conversation as well. But, the term "thread" is not incorrect, and is in fact the term used internally.

Please show me any technical document that use the word threads for the hardware part of the CPU/Logical cores and not software.
Cause all I can track it down to is people calling it t threads because Intel has a specification sin ark saying "# of threads", but if you click the description it clearly says threads are software and we are talking about the capacity oh how many threads it can handle
Any Microsoft document or wiki shows threads as part of software.

AMD is now starting to call it threads because of the long time with the misuse of the words. Sadly that is how language progress/degress.

Having two different things that are that closely related sharing the same name just confusses stuff when you are working with thread/core load distribution.

OutOfPhase · Feb 19, 2018

SvenBent said:
Please show me any technical document that use the word threads for the hardware part of the CPU/Logical cores and not software.
Cause all I can track it down to is people calling it t threads because Intel has a specification sin ark saying "# of threads", but if you click the description it clearly says threads are software and we are talking about the capacity oh how many threads it can handle
Any Microsoft document or wiki shows threads as part of software.

AMD is now starting to call it threads because of the long time with the misuse of the words. Sadly that is how language progress/degress.

Having two different things that are that closely related sharing the same name just confusses stuff when you are working with thread/core load distribution.

As for a document, you can google it yourself. Try something like "hardware threads", or "Hyperthreading" (HT) or "simultanteous multithreading" (SMT). Note that all are referring to threads. In hardware.

Of course microsoft is talking about OS threads. They make an OS.
AMD is using the term threads for a CPU because they design CPUs, and that's what we call it when designing CPUs. I'm sorry you don't care for the term, but it is what it is.

Software people tend to call them logical cores, hardware people call them threads or hardware threads. And that's pretty accurate, since the HW thread is the unit which will execute a SW thread.

Have we reached a dead-end in CPU microarchitecture performance?

n00b

[H]F Junkie

[H]F Junkie

Supreme [H]ardness

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Extremely [H]

[H]ard|Gawd

2[H]4U

n00b

Extremely [H]

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

2[H]4U

2[H]4U

Extremely [H]

[H]F Junkie

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Gawd

Limp Gawd

2[H]4U

Limp Gawd

Supreme [H]ardness

2[H]4U

2[H]4U

Limp Gawd

[H]ard|Gawd

Gawd

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

2[H]4U

Supreme [H]ardness