AMD CTO Mark Papermaster: More Cores Coming in the 'Era of a Slowed Moore's Law'

Hardware always leads Software and You Cannot predict the future.
 
LOL! :D ;) Look, things are going to change, that is just the facts, computing and programming will change as well. However, if hardware does not lead the way, why bother, which is why hardware must lead the way.

No doubt about it it will change. Everything changes. Just not in the way you suggest. Computer programming is logic, and what you are suggesting would require a logical fallacy to be true, which is impossible.
 
No doubt about it it will change. Everything changes. Just not in the way you suggest. Computer programming is logic, and what you are suggesting would require a logical fallacy to be true, which is impossible.

Logic isn't 'short-sighted'.

And yet, this is not based upon logic, at least in the way you want to be locked into it. That is ok, things will change and programming will change with it, not today but definitely tomorrow.
 
We have plenty of software that will take all the hardware we can give it.
I was speaking "developmental" per discussion. Sure go ahead and run a billion iterations of minesweeper or whatever to saturate anything. It is irrelevant to the discussion at hand.
 
I was speaking "developmental" per discussion. Sure go ahead and run a billion iterations of minesweeper or whatever to saturate anything. It is irrelevant to the discussion at hand.

Or a large database? Or, for unparallelizable code, many games even.
 
Y'all can argue all you want. Benches don't lie. Core heavy CPUs from AMD are mopping the floor with Intel on everything other than gaming. And even that they are only behind about 5% in certain titles.
 
If C needs the output of B to run, and B needs the output of A to run, there is no way you can ever run these three at the same time.

All modern CPUs include multiple ALUs because there's usually some amount of low-level parallelism possible even within a single thread. Now, admittedly, that's not quite the same thing, but nearly everything has some parallelism these days.
 
LOL! :D ;) Look, things are going to change, that is just the facts, computing and programming will change as well. However, if hardware does not lead the way, why bother, which is why hardware must lead the way.

"Things" will change, but not in any meaningful manner that will suddenly turn games into embarrassingly parallel problems.

If you look at the tasks where a 12-core beats a faster clocked 8 core, there are almost entirely composed of obviously "embarrassingly parallel" problems.

These are problems where you can easily break up the task into smaller pieces, and dependencies between the pieces of the problem don't exist. Usually it's usually just chopping up your data set into small pieces for each thread to work on.

Video rendering(x265), Screen pixels are your data set, you break it into small chunks every thread can work on it's chunk independently. 3D Rendering (Cinebench AKA AMD bench of choice), again break the screen into chunks, and work independently on them, Image filtering, same...

Games are completely the opposite of this. It's a simulation based on player agency, so it doesn't have a data set to parcel out. It's always reacting in a dependent cascade.

Sure you can squeeze out some parallelism, which developers have done, but it isn't going to radically improve from here, even with more cores becoming the norm.

When you have a game with decent amount of parallelism you still get nailed by Amdahl's Law. The more cores you throw at, the less they matter, and the less the parallel portions matter, and the more the single threads of control rise dominate run (frame) time.
 
Y'all can argue all you want. Benches don't lie. Core heavy CPUs from AMD are mopping the floor with Intel on everything other than gaming. And even that they are only behind about 5% in certain titles.

We're talking about serial workloads. The thing that AMD was good at before Bulldozer? That.
 
Until we start seeing a shift in GPU design into a solid MCM design there are only so many threads you can break a game into before you start seeing a decrease in performance this will always remain true. What will change though is with more cores that are faster they can start adding more features, better sound, physics, larger levels, distructable environments, more players, AI controlled opponents/team mates, list goes on and on. More features can be broken into more threads but designers will build for what they have available and right now the CPU's are not our gaming bottleknecks. GPU's are hitting the Moores Law wall now and it is their architecture that is going to have to radically change, and changing that in a way that doesn't cripple gaming performance for older titles is going to be the hardest part of it.
 
"Things" will change, but not in any meaningful manner that will suddenly turn games into embarrassingly parallel problems.

If you look at the tasks where a 12-core beats a faster clocked 8 core, there are almost entirely composed of obviously "embarrassingly parallel" problems.

These are problems where you can easily break up the task into smaller pieces, and dependencies between the pieces of the problem don't exist. Usually it's usually just chopping up your data set into small pieces for each thread to work on.

Video rendering(x265), Screen pixels are your data set, you break it into small chunks every thread can work on it's chunk independently. 3D Rendering (Cinebench AKA AMD bench of choice), again break the screen into chunks, and work independently on them, Image filtering, same...

Games are completely the opposite of this. It's a simulation based on player agency, so it doesn't have a data set to parcel out. It's always reacting in a dependent cascade.

Sure you can squeeze out some parallelism, which developers have done, but it isn't going to radically improve from here, even with more cores becoming the norm.

When you have a game with decent amount of parallelism you still get nailed by Amdahl's Law. The more cores you throw at, the less they matter, and the less the parallel portions matter, and the more the single threads of control rise dominate run (frame) time.

Today but not tomorrow. Things Wil change, our understanding will change and how we do things will change.
 
Not at this basic level, no. It's clear you don't understand the problem space enough to offer meaningful input.

It's clear I do understand better than you think, I am just more opened minded about it. You just want me to agree and be a yes man and that is not going to happen.
 
It's clear I do understand better than you think, I am just more opened minded about it. You just want me to agree and be a yes man and that is not going to happen.
Honestly from a gaming perspective I think the largest change is going to be a larger emphasis on instruction sets and less so on thread count. As the number of cores goes up eventually you will reach a point where an instruction thread can’t be broken down any further and thread count ceases to matter. That number will change with the applications but you will always reach that same state. The next advances are going to start being specialized cores and instruction sets within the CPU’s we can shit all over nVidia’s RTX cores but when they are used they do show noticeable improvements. Workloads are becoming increasingly complex and specialized processors and/or cores are going to become the new norm.
 
Honestly from a gaming perspective I think the largest change is going to be a larger emphasis on instruction sets and less so on thread performance. As the number of cores goes up eventually you will reach a point where an instruction thread can’t be broken down any further and thread count ceases to matter. That number will change with the applications but you will always reach that same state. The next advances are going to start being specialized cores and instruction sets within the CPU’s we can shit all over nVidia’s RTX cores but when they are used they do show noticeable improvements. Workloads are becoming increasingly complex and specialized processors and/or cores are going to become the new norm.


I'm thinking that above 8 cores, I'd use the die space from die shrinks on just giving it MASSIVE amounts of L2 and L3 cache instead. Games seem to love that shit. The less often you have to go back to comparatively slow RAM...
 
Workloads are becoming increasingly complex and specialized processors and/or cores are going to become the new norm.

This is where most "AI" is going. A good predictor is the mobile market; with hard battery life limits, mobile SoCs include quite a bit of application-specific processor modules.
 
I'm thinking that above 8 cores, I'd use the die space from die shrinks on just giving it MASSIVE amounts of L2 and L3 cache instead. Games seem to love that shit. The less often you have to go back to comparatively slow RAM...
8 cores may be the magic number now but some of the new AI, Destructible Environment, Map technologies I have been seeing in development need 4+ cores on their own. So we would need to start seeing 16+ core consumer cores being the norm before they become something viable. Which is why a lot of developers are sorta excited for the various cloud computing gaming platforms that are launching. Stadia may be the first and currently a smidge “underwhelming” but there are still a lot of really cool things that are currently only viable with that sort of backend and with the amount of money major developers and publishers are throwing at it I expect it to be around for a while.
 
I have no problem with seeing core counts increase more. Part of the reason is because once you get the lowest common denominator hardware bumped up quite a bit it can open up new options and innovations. I think a couple people have already pointed out new possibilities with games needing quite a bit more CPU hardware to be a reality. That's simply not possible now with your average hardware but may very well be possible in the future if the average hardware has increased in power quite a bit. Who knows what other innovations just for gaming could be out there if only the average hardware was good enough to support it. I don't know what they are and I doubt many others do either simply because it's not yet possible.
 
It's clear I do understand better than you think, I am just more opened minded about it. You just want me to agree and be a yes man and that is not going to happen.

Being open minded, is NOT assuming an outcome you desire will happen, regardless of how improbable (or impossible). That is being prone to wishful thinking.
 
I'm thinking that above 8 cores, I'd use the die space from die shrinks on just giving it MASSIVE amounts of L2 and L3 cache instead. Games seem to love that shit. The less often you have to go back to comparatively slow RAM...

Yeah, I figured this is what Intel should have done with the 9900K. Left out the IGP and used that space for more cache. Who buys a top end, expensive 8 core CPU and uses the IGP? It's just wasted die space.

I wonder if they will still have an IGP on the 10 core Comet Lake. It strikes me that as much of a process hole Intel is in, they still don't seem to be developing a strong competitive mindset.
 
Just think about it logically. If one calculation depends on the output of the calculation that came just before it, you can never spread those two calculations over two separate cores and run them at the same same time. This is a limit of logic. No innovation in code or otherwise can solve this dilemma, unless someone invents a time machine.

I'm no programmer but am curious - decreasing the latency to each core via new process tech (e.g. stacking), in order to process each calculation, only passing the result from each core to a central core? Central core does clocking/timing (I think you call it scheduling) and 'traditional single thread' role to the software, other cores do the calculating, just sending the output. Wouldn't that save calculation time IF the calculation time is longer than the calc time + latency to send the commands to other cores? Or are games typically relying on many simultaneously executed, very simple calculations?
So really, core-core latency is what needs improvement if that would be possible but isn't currently?

Reason I ask as a hardware orientated type and not a programming type of geek is frames are in ms region. CPU core-core latency ping time is under 50ns for the best current designs. 100ns minimum round trip is 0.00010ms + calculation time.

p.s. sorry if I don't get your prior explanations but the beer isn't helping ;D
Thanks in advance.
 
I'm no programmer but am curious - decreasing the latency to each core via new process tech (e.g. stacking), in order to process each calculation, only passing the result from each core to a central core? Central core does clocking/timing (I think you call it scheduling) and 'traditional single thread' role to the software, other cores do the calculating, just sending the output. Wouldn't that save calculation time IF the calculation time is longer than the calc time + latency to send the commands to other cores? Or are games typically relying on many simultaneously executed, very simple calculations?
So really, core-core latency is what needs improvement if that would be possible but isn't currently?

Reason I ask as a hardware orientated type and not a programming type of geek is frames are in ms region. CPU core-core latency ping time is under 50ns for the best current designs. 100ns minimum round trip is 0.00010ms + calculation time.

p.s. sorry if I don't get your prior explanations but the beer isn't helping ;D
Thanks in advance.

Maybe? I don't know.

You might be able to make some efficiency gains by doing something like that, I don't know enough about that subject.

You are still going to have the problem that a vast majority of calculations in a game engine depend on the outputs of other calculations so they need to be sequential, and can't be run at the same time.

In some applications you may be able to do some sort of multicore overkill branch prediction, where you predict the likely outcomes of the next step and pre-calculate them using your many cores, but I think in most gaming situations there are too many potential outcomes for this to have much of an impact.
 
Maybe? I don't know.

You might be able to make some efficiency gains by doing something like that, I don't know enough about that subject.

You are still going to have the problem that a vast majority of calculations in a game engine depend on the outputs of other calculations so they need to be sequential, and can't be run at the same time.

In some applications you may be able to do some sort of multicore overkill branch prediction, where you predict the likely outcomes of the next step and pre-calculate them using your many cores, but I think in most gaming situations there are too many potential outcomes for this to have much of an impact.

I hope someone who can chip in on the first part does, very curious about this as to drop individual calculation time and make use of multiple processing cores or sub-cores if the load allows it, it's the only way I can see around what you mention as sequential calculations. Focus on each calc as a data unit assigned to a core. But I have a feeling they would've already done that as it sounds too simple to be easy for games. Unless they need something crazy like 5ns round trips or something to be viable.. very curious.

What you mention about branch prediction is also very interesting. Maybe something ML/NN (A.i. in a radio voice) can help out with when we have enough spare cores and dedicated hardware. I'm particularly Interested in that aspect of some of the latest 'lakes Intel is dropping, I believe it has been in the Xeons for a while as a 'custom solution' only enabled for certain customers.
 
I'm no programmer but am curious - decreasing the latency to each core via new process tech (e.g. stacking), in order to process each calculation, only passing the result from each core to a central core? Central core does clocking/timing (I think you call it scheduling) and 'traditional single thread' role to the software, other cores do the calculating, just sending the output. Wouldn't that save calculation time IF the calculation time is longer than the calc time + latency to send the commands to other cores? Or are games typically relying on many simultaneously executed, very simple calculations?
So really, core-core latency is what needs improvement if that would be possible but isn't currently?

Reason I ask as a hardware orientated type and not a programming type of geek is frames are in ms region. CPU core-core latency ping time is under 50ns for the best current designs. 100ns minimum round trip is 0.00010ms + calculation time.

p.s. sorry if I don't get your prior explanations but the beer isn't helping ;D
Thanks in advance.

It has to be reasonably substantive work. There is overhead spawning a new thread(and off core communication), there is overhead getting the result back and collating them, so you aren't going to spawn a thread to perform a couple of simple calculations. You also introduce large amount of complexity in you main control thread if it has collate a bunch of what would be in this case different types responses. It's especially pointless if the code is sequentially dependent.

Like I said earlier in the thread. Your big gains are on obvious divide and conquer problems.

You can still do multi-threading through your code when you can spawn multi-threaded loop constructs in place of conventional single threaded loop constructs.

An example if you were doing something like an RTS game, you would have a move loop, where each unit would have to look around for obstacles, enemies, in their time slice, and decide on the route, on attack or avoiding they enemy, it's a fair bit of work. With 1000 units, its a painful amount of work.

In the old days this would have been one big sequential loop. Today that would be launched with a threaded loops structure, that would batch the say 125 units to each of 8 threads (actual division depending on run-time check of CPU core/thread count), and they could each work through their queue of units, and drastically speed up unit movement.

But most games are not RTS games with 1000+ units to move.

Code profilers are run on code precisely to find where optimization can benefit. If it's running through unique code paths, not much opportunity for optimization. It's code that ends up called repeatedly that is more open to multi-threading.

If the profiler runs reveals little, and you can't imagine how any other work gets parceled up, it's very unlikely you can benefit significantly from more multi-threading.

These days, I expect most of this work is already done.

Future games may utilize more cores fully, but that will depend on creating more new types of work, likely in a few different types of games, than it will on better multi-threading to current [types of] games.
 
Last edited:
Where AI (really machine learning) can help is not only in finding code that may be broken up, but also how. I suspect that there's quite a bit of work being done in more complex games that's just not yet worth the effort to optimize for multithreading.

And we haven't really seen an example of a complex game where this is done well -- my best example of an optimized game engine is the id tech engine series, where the feel of the gameplay is focused on.

Battlefield, a game series that could really benefit from optimization, is the opposite. It's a laggy mess.


Note that UI responsiveness is certainly an ongoing topic. In the last year or so it was brought up that terminals tend to be much more responsive than GUIs still today, and there's really no reason for that except that it hasn't been an area of focus on desktop operating systems.
 
  • Like
Reactions: N4CR
like this
Where AI (really machine learning) can help is not only in finding code that may be broken up, but also how. I suspect that there's quite a bit of work being done in more complex games that's just not yet worth the effort to optimize for multithreading.

And we haven't really seen an example of a complex game where this is done well -- my best example of an optimized game engine is the id tech engine series, where the feel of the gameplay is focused on.

Battlefield, a game series that could really benefit from optimization, is the opposite. It's a laggy mess.


Note that UI responsiveness is certainly an ongoing topic. In the last year or so it was brought up that terminals tend to be much more responsive than GUIs still today, and there's really no reason for that except that it hasn't been an area of focus on desktop operating systems.

Not sure what battlefield you are playing, maybe turn off ray tracing garbage
 
Future games may utilize more cores fully, but that will depend on creating more new types of work, likely in a few different types of games, than it will on better multi-threading to current [types of] games.

Exactly. MMO and open-world games are surely to gain a lot from many cores. Deus Ex tends to run good enough on small maps but fails miserably on huge maps, even though you can't see all of the huge city/characters they are simulating. They seem to simulate everything on a single core. I am sure there are technical reasons but I am also sure it can be improved significantly if more effort is put in dealing with the problem.
 
Logic isn't 'short-sighted'.

?u=http%3A%2F%2Fcommunity.ew.com%2Fwp-content%2Fuploads%2F2015%2F09%2Fspock-logic-gif.gif
 
On the 1080Ti in my sig, that's what's wrong!

Or perhaps DICE doesn't make game engines that are as responsive as what comes out of id software.

I'm torn on ID software's stuff.

When I played Doom and the Wolfenstein series I was very impressed with how the Tech engine was able to be smooth and pump out the framerates even at high settings.

I'm playing Wolfenstein Youngbloods right now after finishing Far Cry New Dawn, and I am starting to think it's just that they use lower polygon models or something. The visuals just aren't as impressive.

Thinking back to all of the recent Wolfenstein and Doom games, that pretty much appears to be par for the course. So I'm thinking they just make games with less intensive visuals, and then take credit for their engine being good.
 
I'm torn on ID software's stuff.

When I played Doom and the Wolfenstein series I was very impressed with how the Tech engine was able to be smooth and pump out the framerates even at high settings.

I'm playing Wolfenstein Youngbloods right now after finishing Far Cry New Dawn, and I am starting to think it's just that they use lower polygon models or something. The visuals just aren't as impressive.

Thinking back to all of the recent Wolfenstein and Doom games, that pretty much appears to be par for the course. So I'm thinking they just make games with less intensive visuals, and then take credit for their engine being good.

So you're basically saying Carmack doesn't know how to code game engines?
 
I'm torn on ID software's stuff.

When I played Doom and the Wolfenstein series I was very impressed with how the Tech engine was able to be smooth and pump out the framerates even at high settings.

I'm playing Wolfenstein Youngbloods right now after finishing Far Cry New Dawn, and I am starting to think it's just that they use lower polygon models or something. The visuals just aren't as impressive.

Thinking back to all of the recent Wolfenstein and Doom games, that pretty much appears to be par for the course. So I'm thinking they just make games with less intensive visuals, and then take credit for their engine being good.

I really can't speak to the technical abilities of the engines -- just that they manage to make them feel responsive, and consider them the benchmark.

It's probably a combination of careful resource management and fine tuning of the engines themselves.
 
So you're basically saying Carmack doesn't know how to code game engines?

Nope.

It's a fine engine, just suggesting that some of the reason is that the games running on the engine don't necessarily stress the hardware as hard as others do.

All that said, John Carmack hasn't always been right about things. For instance, he was - for a long time - an opponent of using hardware 3D acceleration in games (like with Glide or Direct 3D) claiming that software rendering was the only way to do it right :p

He also was lead on the engine development for Rage which was widely panned for its texture issues due to trying to render with dynamic quality in order to keep framerate constant.

He has been hugely influential in getting us to where games are today, but no one is perfect :p
 
I really can't speak to the technical abilities of the engines -- just that they manage to make them feel responsive, and consider them the benchmark.

It's probably a combination of careful resource management and fine tuning of the engines themselves.
I don't think you know what responsive is.. You are talking about battlefield being a mess in multiplayer, but considering BF5 has higher tickrate and lower latency I really can't see how you can talk about ID being a benchmark of any sort other than being a good corridor shooter/single player game. ID Games didn't even develop the multiplayer for doom, but had to take over quite a while after release.
Considering battlefield has huge maps, 64 players and still manage to keep the latency low while visually stunning.. Can't see how you can say ID engine is superior.
 
Back
Top