Ghost of Cyrix
n00b
- Joined
- Jul 5, 2017
- Messages
- 12
Hi there, I'm new to posting here but have been a longtime lurker and want to discuss something that both fascinates me from a tech nerd perspective and bothers me as a chronically CPU-bound PC gamer. I realize this is a really long post and apologize in advance, but it’s a topic I’ve wanted to discuss for a good long while. You can probably get the gist of the question from my topic title and just skipping to the last paragraph (under "conclusion"). If anyone knows a forum that tends to be even more generally knowledgeable or interested in the matter, I would appreciate being pointed in that direction too.
Can Intel or any CPU-designing company for that matter actually and substantially progress consumer-level/gaming desktop CPU performance beyond where we are now with a typical overclocked Coffee Lake i7? There are three interrelated avenues to consider on the matter: (1) clock speeds, (2) core count, and (3) IPC.
Clock Speeds
The history of my view on this topic begins with the death of Dennard scaling around 2005. Before the current concern about Moore's Law coming to end, I view Dennard scaling as the bigger foretelling of stagnation in PC CPU performance. Dennard scaling is basically the increase in performance per transistor added that accompanied Moore's Law and actually gave it teeth for high performance markets. When Dennard Scaling died, the prospective future of clock speed increases died.
We've since seen the results of this. Viable consumer-level clock speeds as a whole have barely increased from the Pentium 4's heights and more-or-less peaked with Sandy Bridge's upper 4.x - lower 5.x GHz limits. Successive generations after Sandy Bridge even had trouble reaching those same frequencies and we've only more-or-less returned to or slightly exceeded Sandy Bridge's limits with Kaby Lake and Coffee Lake after Intel deliberately stuck to and refined one of their existing process nodes for multiple generations.
Concurrently, AMD are having trouble even breaking 4.0 GHz with Ryzen, also peaked around 5 GHz overclockable frequencies with Vishera, and even non-x86 CPUs seem to top out at ~5 GHz (see: SPARC M8). It appears that clock speeds are going nowhere from here (or at least not up), so that leaves IPC and more cores as potential avenues of performance gain. Perhaps the theoretical switch to semiconductors other than silicon might help get frequency on the rise again, but those other materials are not quite as abundant and cheap as silicon (a necessity for economic viability) and all the of the work done to get silicon to its current transistor size levels might be useless for another material and make any switch highly difficult from an architectural level.
Core Counts
Now, the death of Dennard scaling influenced the existence of multi-core CPUs as a non-ideal band-aid. It is important to note that the functions of CPUs, particularly in the consumer space, are generally inherently serialized code. The first few steps towards additional cores were relatively low-hanging fruit and could be used for some multitasking and multithreaded gains in the few aspects of consumer software/games that can actually leverage multiple cores to a useful extent. But that avenue is limited. Again, CPUs are specifically designed to handle largely serialized code. Embarrassingly parallel stuff is largely relegated to GPUs which are much more specialized to handle such workloads. In games, particularly, CPUs tend to handle game logic, AI, and physics with a lot of dependencies and resultantly, we see increasingly limited gains from multiple cores. See: Amdahl’s Law for an algorithmic demonstration of the problem:
Personally, I play many CPU-heavy games (the nature of PC exclusives such as simulation, strategy, and MMO games). I have a 5.0 GHz 8700K with DDR4-3600… and I’m CPU-limited in many, many games. Sometimes, CPU utilization seems to be heavily-spread across all 12 threads as in Watch Dogs 2, AC: Origins, Dying Light, and Civ VI. Other times (more annoyingly), CPU utilization is either somewhat more disproportional (by necessity, mind you; claiming “lazy programming” or “poor optimization” casually and insultingly dismisses the real limitations of logic and physics by trying to unfairly shift blame to developers and programmers) and/or adding cores in these CPU-bound scenarios yield extremely diminishing returns if any at all.
From my personal testing, Civ VI (best-case scenario I have) uses many threads and benefits by about 30% FPS gain from 50% more cores going from 4 to 6 (presumably due to effective spreading of draw calls on a draw call-heavy, late-game map). On the other hand, games like Planet Coaster and Cities: Skylines barely gain 10-15% FPS with 50% more cores from 4 to 6 (despite being heavily multithreaded and remaining CPU-bound on my 1080 Ti).
Consequently, it seems fairly clear that additional cores, especially from this point forward where we’ve already breached the multicore shift and have 6 to 8 core SMT/hyperthreaded consumer CPUs, have their limits and, speaking personally, per-core CPU performance is much more of a pressing concern in the most CPU intensive of games. There are simply limits, imposed by dependencies and the nature of game logic, that put a hard cap on multi-core scaling in many gaming scenarios. Therefore, scaling onwards to more cores in the future, which it may help some games in aspects which benefit from additional threads and reduced context switching, isn’t really a viable cure for many demanding consumer tasks such as many CPU-heavy games.
IPC
So finally, and most unclearly, we have potential IPC (instructions per cycle or instructions per clock) gains. IPC is a not a universal and constant unit and is derived from a combination of ILP (instruction-level parallelism; exploited through several factors such branch prediction and speculative execution), memory subsystem performance (caches + RAM), other architectural details, and specific demands of any particular piece of software.
Now, for several years now (and despite the lackluster gains), Intel have actually been beefing up their microarchitectures to try and improve IPC. Haswell and Skylake attempted to bring substantive improvements in ILP (scheduling and out-of-order windows) and memory performance (through cache bandwidth improvements on Haswell and DDR4 + more minor tweaks on Skylake) while Haswell also significantly expanded the design’s execution engine. The end-result has infamously not been particularly groundbreaking IPC gains (even ignoring the clock speed regressions hampering end-performance gains from Ivy Bridge to Skylake). So, what effect might attempts to improve IPC even have and how can it be improved without potentially having a mitigating adverse effect on clock speeds due to increased pipeline complexity? All I know in particular is the eDRAM Intel put on Broadwell-C (5775c/5675c) was actually quite helpful in some games. Is more cache a path forward Intel can/would take?
Lack of competition as the problem?
Since Sandy Bridge, we haven’t really gotten too much performance improvement given how long it’s been and given the design improvements Intel actually attempted. And I’m not convinced it was just because Intel were holding back from a lack of competition or anything (different story for the lack of solder and core counts/price though, imo; I’m only speaking about my views on per-core performance with regards to competitive pressure).
Consider, for instance, that despite Intel’s slow improvement, AMD’s return to the game with a wholly new Ryzen design still sees them substantially trailing Intel’s per-core performance, max OC to max OC. If Intel was really sitting on some treasure trove of improvements, I would think the Ryzen team would have exploited some and at least reached parity with Intel or capitalized on their relative stagnation to fully catch up (or project doing so rather than merely keeping expectations low by citing annual average improvement rates in the industry of 7-8%), but that’s not the case.
Ryzen does have a substantially similar amount of “IPC” when doing tasks that don’t stress its memory subsystem, but tasks that do such as a good number of games see the gap widening and clock speeds on Ryzen are also quite low compared to Coffee Lake’s capabilities (in fact, we have to return to Nehalem to see the same frequency limitations on Intel’s designs). While it may be tempting to dismiss those frequency limitations as being the fault of GF’s current process node, one can’t just assume that given that it is a physical fact that pipeline complexity can and has had a substantial impact on clock speeds. Look no further than Maxwell vs GCN back on the same 28nm process node for an example.
Given how difficult it seems to be to try and match up with Intel’s current per-core capabilities, I think for the purposes of discussion we should presently assume the possibility that Intel are choosing an ideal IPC vs frequency trade off and may even be pushing the limits of what may physically be done until such a time comes that we see either AMD or Intel substantially leap ahead of 5 GHz Coffee Lake per-core design on current materials. And now, even some of the speculative execution improvements used to get to our current performance levels are threatened by security vulnerabilities like Spectre.
Conclusion
At the end of the day, I’m wondering how much Intel could even improve their architectures (ignoring core count right now simply because of its limitations for consumer purposes). More frequency seems off the table and previous generation improvements in ILP and memory subsystems haven’t exactly yielded impressive performance gains and have their own diminishing returns. I’m skeptical as to how much more “IPC” Intel or AMD could even extract from these designs and also need to point out that the increased pipeline complexity that comes with improved “IPC” may also further limit frequencies as a tradeoff. Furthermore, this all has me wondering… have we more-or-less simply reached the limit of what’s possible? Will we even be able to look back a decade from now and notice even a 40-50% aggregate gain?
Can Intel or any CPU-designing company for that matter actually and substantially progress consumer-level/gaming desktop CPU performance beyond where we are now with a typical overclocked Coffee Lake i7? There are three interrelated avenues to consider on the matter: (1) clock speeds, (2) core count, and (3) IPC.
Clock Speeds
The history of my view on this topic begins with the death of Dennard scaling around 2005. Before the current concern about Moore's Law coming to end, I view Dennard scaling as the bigger foretelling of stagnation in PC CPU performance. Dennard scaling is basically the increase in performance per transistor added that accompanied Moore's Law and actually gave it teeth for high performance markets. When Dennard Scaling died, the prospective future of clock speed increases died.
We've since seen the results of this. Viable consumer-level clock speeds as a whole have barely increased from the Pentium 4's heights and more-or-less peaked with Sandy Bridge's upper 4.x - lower 5.x GHz limits. Successive generations after Sandy Bridge even had trouble reaching those same frequencies and we've only more-or-less returned to or slightly exceeded Sandy Bridge's limits with Kaby Lake and Coffee Lake after Intel deliberately stuck to and refined one of their existing process nodes for multiple generations.
Concurrently, AMD are having trouble even breaking 4.0 GHz with Ryzen, also peaked around 5 GHz overclockable frequencies with Vishera, and even non-x86 CPUs seem to top out at ~5 GHz (see: SPARC M8). It appears that clock speeds are going nowhere from here (or at least not up), so that leaves IPC and more cores as potential avenues of performance gain. Perhaps the theoretical switch to semiconductors other than silicon might help get frequency on the rise again, but those other materials are not quite as abundant and cheap as silicon (a necessity for economic viability) and all the of the work done to get silicon to its current transistor size levels might be useless for another material and make any switch highly difficult from an architectural level.
Core Counts
Now, the death of Dennard scaling influenced the existence of multi-core CPUs as a non-ideal band-aid. It is important to note that the functions of CPUs, particularly in the consumer space, are generally inherently serialized code. The first few steps towards additional cores were relatively low-hanging fruit and could be used for some multitasking and multithreaded gains in the few aspects of consumer software/games that can actually leverage multiple cores to a useful extent. But that avenue is limited. Again, CPUs are specifically designed to handle largely serialized code. Embarrassingly parallel stuff is largely relegated to GPUs which are much more specialized to handle such workloads. In games, particularly, CPUs tend to handle game logic, AI, and physics with a lot of dependencies and resultantly, we see increasingly limited gains from multiple cores. See: Amdahl’s Law for an algorithmic demonstration of the problem:
Personally, I play many CPU-heavy games (the nature of PC exclusives such as simulation, strategy, and MMO games). I have a 5.0 GHz 8700K with DDR4-3600… and I’m CPU-limited in many, many games. Sometimes, CPU utilization seems to be heavily-spread across all 12 threads as in Watch Dogs 2, AC: Origins, Dying Light, and Civ VI. Other times (more annoyingly), CPU utilization is either somewhat more disproportional (by necessity, mind you; claiming “lazy programming” or “poor optimization” casually and insultingly dismisses the real limitations of logic and physics by trying to unfairly shift blame to developers and programmers) and/or adding cores in these CPU-bound scenarios yield extremely diminishing returns if any at all.
From my personal testing, Civ VI (best-case scenario I have) uses many threads and benefits by about 30% FPS gain from 50% more cores going from 4 to 6 (presumably due to effective spreading of draw calls on a draw call-heavy, late-game map). On the other hand, games like Planet Coaster and Cities: Skylines barely gain 10-15% FPS with 50% more cores from 4 to 6 (despite being heavily multithreaded and remaining CPU-bound on my 1080 Ti).
Consequently, it seems fairly clear that additional cores, especially from this point forward where we’ve already breached the multicore shift and have 6 to 8 core SMT/hyperthreaded consumer CPUs, have their limits and, speaking personally, per-core CPU performance is much more of a pressing concern in the most CPU intensive of games. There are simply limits, imposed by dependencies and the nature of game logic, that put a hard cap on multi-core scaling in many gaming scenarios. Therefore, scaling onwards to more cores in the future, which it may help some games in aspects which benefit from additional threads and reduced context switching, isn’t really a viable cure for many demanding consumer tasks such as many CPU-heavy games.
IPC
So finally, and most unclearly, we have potential IPC (instructions per cycle or instructions per clock) gains. IPC is a not a universal and constant unit and is derived from a combination of ILP (instruction-level parallelism; exploited through several factors such branch prediction and speculative execution), memory subsystem performance (caches + RAM), other architectural details, and specific demands of any particular piece of software.
Now, for several years now (and despite the lackluster gains), Intel have actually been beefing up their microarchitectures to try and improve IPC. Haswell and Skylake attempted to bring substantive improvements in ILP (scheduling and out-of-order windows) and memory performance (through cache bandwidth improvements on Haswell and DDR4 + more minor tweaks on Skylake) while Haswell also significantly expanded the design’s execution engine. The end-result has infamously not been particularly groundbreaking IPC gains (even ignoring the clock speed regressions hampering end-performance gains from Ivy Bridge to Skylake). So, what effect might attempts to improve IPC even have and how can it be improved without potentially having a mitigating adverse effect on clock speeds due to increased pipeline complexity? All I know in particular is the eDRAM Intel put on Broadwell-C (5775c/5675c) was actually quite helpful in some games. Is more cache a path forward Intel can/would take?
Lack of competition as the problem?
Since Sandy Bridge, we haven’t really gotten too much performance improvement given how long it’s been and given the design improvements Intel actually attempted. And I’m not convinced it was just because Intel were holding back from a lack of competition or anything (different story for the lack of solder and core counts/price though, imo; I’m only speaking about my views on per-core performance with regards to competitive pressure).
Consider, for instance, that despite Intel’s slow improvement, AMD’s return to the game with a wholly new Ryzen design still sees them substantially trailing Intel’s per-core performance, max OC to max OC. If Intel was really sitting on some treasure trove of improvements, I would think the Ryzen team would have exploited some and at least reached parity with Intel or capitalized on their relative stagnation to fully catch up (or project doing so rather than merely keeping expectations low by citing annual average improvement rates in the industry of 7-8%), but that’s not the case.
Ryzen does have a substantially similar amount of “IPC” when doing tasks that don’t stress its memory subsystem, but tasks that do such as a good number of games see the gap widening and clock speeds on Ryzen are also quite low compared to Coffee Lake’s capabilities (in fact, we have to return to Nehalem to see the same frequency limitations on Intel’s designs). While it may be tempting to dismiss those frequency limitations as being the fault of GF’s current process node, one can’t just assume that given that it is a physical fact that pipeline complexity can and has had a substantial impact on clock speeds. Look no further than Maxwell vs GCN back on the same 28nm process node for an example.
Given how difficult it seems to be to try and match up with Intel’s current per-core capabilities, I think for the purposes of discussion we should presently assume the possibility that Intel are choosing an ideal IPC vs frequency trade off and may even be pushing the limits of what may physically be done until such a time comes that we see either AMD or Intel substantially leap ahead of 5 GHz Coffee Lake per-core design on current materials. And now, even some of the speculative execution improvements used to get to our current performance levels are threatened by security vulnerabilities like Spectre.
Conclusion
At the end of the day, I’m wondering how much Intel could even improve their architectures (ignoring core count right now simply because of its limitations for consumer purposes). More frequency seems off the table and previous generation improvements in ILP and memory subsystems haven’t exactly yielded impressive performance gains and have their own diminishing returns. I’m skeptical as to how much more “IPC” Intel or AMD could even extract from these designs and also need to point out that the increased pipeline complexity that comes with improved “IPC” may also further limit frequencies as a tradeoff. Furthermore, this all has me wondering… have we more-or-less simply reached the limit of what’s possible? Will we even be able to look back a decade from now and notice even a 40-50% aggregate gain?
Last edited: