after 32 cores?

Most strategy game are turn-based, moving one unit at the time...aka very serial.
You could optimize it, though. Run a pre-pass to see which units are unaware of what's going on in certain areas, and aren't close enough that another unit would possibly move into their view, then batch all those together on separate threads. Or you could compute all the possible moves of the units that are near other units, and then toss out those computations if something changes in the area of that unit (for example, another unit appears or a unit in the area moves).
 
Most strategy game are turn-based, moving one unit at the time...aka very serial.
I remember Civ 5 had an option that allowed the AI to calculate their next moves while you took your turn.

Not sure how well that was implemented, but that is my thinking behind some of the methods that could be taken.

My thinking is concurrent turns followed by some kind of error correction last pass to correct for units in the same tile, war declerations, diplomacy etc.. (Basicly what Nobu said)

I don't mean to stray too off topic. I feel that the Threadripper thread push we have seen could really lead to a really interesting multicore Strategy gaming future.

I would love to see a proof of concept indie strategy game that had super basic graphics, but had some really bonkers scalable multithreaded AI. Makes me wish I had gone down the com-sci route.
 
Most strategy game are turn-based, moving one unit at the time...aka very serial.
did you just take the all games can use all the threads argument and put it into the meat grinder? Civ 6 on a ryzen is actually quite good. There will always be a limit to how many threads can realistically be used in anything. Even the most CPU intensive games are going to struggle to find ways to manage 64 threads since at some point the thread managing threads will be the problem. Anything that CAN be threaded very well belongs on the GPU.

Again I repeat... I want 8 real cores. I just want all 8 to be faster and forget 32 cores. If I want 32 cores I'll build a server to process those tasks. Things like video processing like my DVR/comskip stuff goes on my server. There is a huge difference between what I want on a server and what I want on my desktop/laptop. I've had 12 cores in my server for years and its nearly time to upgrade it. Threadripper may very well be what I build to replace my current server.
 
You could optimize it, though. Run a pre-pass to see which units are unaware of what's going on in certain areas, and aren't close enough that another unit would possibly move into their view, then batch all those together on separate threads. Or you could compute all the possible moves of the units that are near other units, and then toss out those computations if something changes in the area of that unit (for example, another unit appears or a unit in the area moves).

They still need to move one unit at the time aka serial.
 
I remember Civ 5 had an option that allowed the AI to calculate their next moves while you took your turn.

Not sure how well that was implemented, but that is my thinking behind some of the methods that could be taken.

My thinking is concurrent turns followed by some kind of error correction last pass to correct for units in the same tile, war declerations, diplomacy etc.. (Basicly what Nobu said)

I don't mean to stray too off topic. I feel that the Threadripper thread push we have seen could really lead to a really interesting multicore Strategy gaming future.

I would love to see a proof of concept indie strategy game that had super basic graphics, but had some really bonkers scalable multithreaded AI. Makes me wish I had gone down the com-sci route.

Even if you multithread the A.I. those "worker threads" are still going to wait for unit to move one at the time.
 
Even if you multithread the A.I. those "worker threads" are still going to wait for unit to move one at the time.
Well we are talking about what could hypothetically be done, not what is currently practiced.

Goodness I know there are smarter minds than ours to wrap their minds over such problems. As we understand things now there would be a 1 core bottleneck, but perhaps there are ways to mitigate its load.

All it takes is an innovative approach. An attitude of "lets try to make this work" instead of "no lets not even try".

Just another uninformed idea of mine:

In a strategy gaming context it seems the main issue is 1 core would need to keep track of where each AI is positioned. Lets call this core "Time and Space"

The solutions here would be how can the Time and Space Core duties be divided without sacrificing its integrity and accuracy.

I'm not sure how this would go about, but I think it can be done. We may never overcome the need of 1 fast core, but anything would be better than what we currently have to put up with.

Edit: on a technical note: a do wonder how cpu architecture plays a role in all of this.

Yes mainstream Intel CPUs rely on their ringbus architecture. Something that is serial in nature. How could Skylake X and Skylake EP's mesh architechture be utilized here? Infinity fabric?

Maybe Quantom Computing is the real winner here? It seems strategy games are all about multiple simultaneous possibilities that may or may not happen at a given time.
 
Last edited:
Well we are talking about what could hypothetically be done, not what is currently practiced.

Goodness I know there are smarter minds than ours to wrap their minds over such problems. As we understand things now there would be a 1 core bottleneck, but perhaps there are ways to mitigate its load.

All it takes is an innovative approach. An attitude of "lets try to make this work" instead of "no lets not even try".

Just another uninformed idea of mine:

In a strategy gaming context it seems the main issue is 1 core would need to keep track of where each AI is positioned. Lets call this core "Time and Space"

The solutions here would be how can the Time and Space Core duties be divided without sacrificing its integrity and accuracy.

I'm not sure how this would go about, but I think it can be done. We may never overcome the need of 1 fast core, but anything would be better than what we currently have to put up with.

Edit: on a technical note: a do wonder how cpu architecture plays a role in all of this.

Yes mainstream Intel CPUs rely on their ringbus architecture. Something that is serial in nature. How could Skylake X and Skylake EP's mesh architechture be utilized here? Infinity fabric?

Maybe Quantom Computing is the real winner here? It seems strategy games are all about multiple simultaneous possibilities that may or may not happen at a given time.

You are just throwing *bleep* at the wall and hoping something will stick now...strategy games are still depandant on moving one unit at the time.
(I am not talking RTS games, as they are badly mislabled and should be called "Real Time Tactical" games.
 
You are just throwing *bleep* at the wall and hoping something will stick now...strategy games are still depandant on moving one unit at the time.
(I am not talking RTS games, as they are badly mislabled and should be called "Real Time Tactical" games.
I was including RTS games in my definition of strategy games.

Just brainstorming. Thing about ideas is even the bad ones can inspire good ones. If you have any ideas I would like to hear them too.

I may start a brainstorming thread on this because I realize this is not the place.
 
There's brainstorming and there's reinventing the wheel. Parallel computing is not new; it was taught in my CS classes in college which was almost three decades ago. The filtration down to desktop technology is somewhat recent, by about 15 or so years. One of my very productive video rendering workstations is 8 years old and that has 12 cores / 24 threads and that is a somewhat mainstream workstation of that time period.

Since multi-core machines are more common at the lower end of the market now, it may make sense for developers to concentrate more effort into parallel workloads but this has always been the conundrum for developers: Do you program for the niche customer who has the configuration you want or do you assume Least Common Denominator (LCD) so what you're developing has more appeal to the masses?
 
They still need to move one unit at the time aka serial.
They don't need to, that's just an arbitrary restriction created by the gameplay mechanics of that particular game. The main thread could just as easily send all the updated unit positions to the GPU and tell it to wait to move until "x" condition is true (unit on screen, or unit centered on screen, or whatever). Being limited by the game mechanics also makes your high speed, "low" core count (geeze, "only" 8 cores) cpu irrelevant, so it's not a valid argument imho.
 
Last edited:
They don't need to, that's just an arbitrary restriction created by the gameplay mechanics of that particular game. The main thread could just as easily send all the updated unit positions to the GPU and tell it to wait to move until "x" condition is true (unit on screen, or unit centered on screen, or whatever). Being limited by the game mechanics also makes your high speed, "low" core count (geeze, "only" 8 cores) cpu irrelevant, so it's not a valid argument imho.

So the "game mechanics" are an "arbitrary restriction"...not sure if serious...
 
So the "game mechanics" are an "arbitrary restriction"...not sure if serious...
Yes, they could have made the game play differently, but they chose not to. It's not a technical limitation, but an arbitrary one put in place for the kind of experience they want players to have. Technically, it's possible to speed it up or even skip the unit turn simulation entirely, but it would make for a poor gameplay experience.

Fwiw, they probably already use some of the optimizations I mentioned, there may be other technical limitations I'm not aware of. There's no technical reason why unit "a" has to go before unit "b" when they are nowhere near each other and have no knowledge of each other.
 
Last edited:
The former rumor was 48 core for 7nm and 64 core for 7nm+. But the last rumor is that 64 core is already coming to 7nm.

7nm+ will be like '12nm' today i'd bet in typical AMD flavour.
So I would not expect any shrinkage and thus either double or 8 core ccx with infinity fabric links via the interposer, as amd has already suggested for gpu use. So 64 core on 7nm is almost a given. Interposer e.g. 65nm handling IF, will reduce die space quite a bit, so the actual density with this approach can be increased significantly.
 
Yes, they could have made the game play differently, but they chose not to. It's not a technical limitation, but an arbitrary one put in place for the kind of experience they want players to have. Technically, it's possible to speed it up or even skip the unit turn simulation entirely, but it would make for a poor gameplay experience.

Fwiw, they probably already use some of the optimizations I mentioned, there may be other technical limitations I'm not aware of. There's no technical reason why unit "a" has to go before unit "b" when they are nowhere near each other and have no knowledge of each other.


These are interesting points but I'd like to address some of what's been brought up in this thread:

1) AI in computer games is no such thing. It is simply a calculation of possible moves and countermoves that is calculated either a. on the fly (RTS games) or b. when you press the Next turn button (turn-based strategy games). Most 'AI' can be tuned to calculate only one turn in advance or can calculate several moves in advance. Furthermore, higher difficulty levels usually include the 'AI' cheating either by giving itself extra resources or by stacking the odds against you where you get more challenges during the game than the computer controlled 'character'.

2) As a side-effect of point 1, the computer can't simply make all the moves whilst you are taking your turn (TBS) as it needs to 'read' the state before calculating all of its moves. And, contrary to popular belief, calculating moves is computationally expensive. For reference, a chess board has 16 game pieces and 64 squares to place them on. There are approximately 35 legal moves that can be made per turn and competitive games are capped (previously soft-capped at 50) but mandatory capped at 75. Even with the rather simple rules and small board size and relatively sparse amount of pieces, there are approximately 10^123. Now, take a game like Civilization which has more 'pieces', a larger 'board', throw in variables such as politics, religion, cities, resources, terrain, etc. and things become very complex quickly. Furthermore, the calculations are interdependent on each other so, sure, you could spin out each calculation to a different child process (thread) but at some point they all need to be merged back to the main decision tree and calculated against each other.

3) Not all workloads can be parallelized as illustrated by the faulty logic saying: If it takes one woman 9 months to give birth, it would take 9 women only 1 month to give birth. We know and acknowledge that there are tasks in the real world that are simply not parallel-able but we seem to think that all tasks can be made so just by 'being more clever' or 'less lazy'. There are plenty of workloads that don't thread well and there are other workloads where it doesn't make sense to thread them too far. For instance, when I send a video clip to ffmpeg for transcoding, it calculates how long the clip is, it looks at the groupings and all the other settings I send to the task and then it divvies up the clip into multiple threads (by default, though I can also give it a command not to thread at all or to limit it to a certain thread count). We can apply some common sense logic to the thread splitting though: My machine has 12 hyperthreaded cores which means I have 24 threads available. But, hyperthreading is simply one technique that Intel uses to utilize portions of the CPU that are not being used for the main task to run another, unlike task, 'at the same time'. So, since video encoding is doing the same task, it doesn't make much sense for me to then force ffmpeg to use all 24 threads for the same task. It makes more sense for me to ask for 12 threads. But if I sent a 20 second clip to be transcoded, ffmpeg is going to spend more time dividing the clip and reassembling it if I forced it to use 12 threads than if I just left it alone and let it transcode. Main point being that just because I can throw 12 threads at a task doesn't mean I should.

4) A developer for a program doesn't always know what hardware a person has at their disposal to run it, the exception being a game developer who is programming for a console which has a specific hardware profile to target and doesn't have any plans for it being ported to the chaotic PC ecosystem. So if I were to assume that I had a powerful GPU at my disposal as a resource, and I asked a task to utilize the GPU of a person who has, let's say, an Intel GPU (which, to be honest, is struggling just to display the screen graphics), how well do you think that task is going to perform? Sure, I could reason that the user just selected Next Turn and there's 'nothing to display' until the task is complete, but that is an assumption that can have usability consequences. Main point being that you can't take for granted what hardware you have to work with unless you're exclusively a console developer.
 
Most strategy game are turn-based, moving one unit at the time...aka very serial.

The thing with strategy games is you could have each AI be its own thread. This is especially helpful in turn based strategy games and even more so on simultaneous turn based strategy games. Such as where each player does their turn and clicks "ready", and once all players are ready every players moves execute at once. This would let AIs process out their moves and ready up while the player sets up their own moves. You can easily have these AI threads in the background going on and spread each AI into its own thread which can go to a different core. Even in RTS type games having each AI on its own thread processing its own information would be a great way to let each AI act as its own entity in a way where information isnt "shared" between the other AIs as they all act independently instead of all being processed together and acting like a single large entity controlling different groups of units like most RTS games. This would let you fight actual AI type opponents instead of just having a single entity ganging up on you with its various groups it controls and doing cheating things to change difficulty.

And no, " most" strategy games do not move one unit at a time. Some do, the majority do not. Moving one unit at a time for each players turn is simply one style of strategy game and how game play was implemented. It is not how strategy games as a genre work.
 
  • Like
Reactions: Nobu
like this
Now that we have quad and 6 core laptops readily available for the masses to buy, its only a matter of time until the developers start taking advantage of all the cores. This being [H] I would only assume the average [H] user has 4-8 times the computing power of your average power user....
 
Even if you multithread the A.I. those "worker threads" are still going to wait for unit to move one at the time.

Civ in multiplayer moves all units at once and resolves all collisions with units. Just cause you see one unit move on the screen does not mean thats how the game engine handled it. Granted not many play it in multiplayer. Also racing games dont move in serial order as well so you can have a thread per car just for the AI.
 
xeon phi it's about 215W up to 260W depending on which exact model, VEGA64 is AT LEAST 295W, so if someone still undervolt and underclock to keep power controlled Xeon Phi still will win by a large margin in every aspect, performance, power and efficiency in such case scenario (mining monero). and 1000h/s mean about 30 extra dollars monthly without factor efficiency from 215W to 295W 24/7 the entire month, of course everything will be the opposite depending on much the phi and Vega64 cost, phi tend to be more expensive at about 1200 -1500$ for the 7120Model while you can buy right now a Vega64 for less than 600$

You have no idea what are you are talking about.. My 6x 56s draw 130W doing 2Kh/s and my single 64 does 2.15Kh/s drawing 140W. My cards run @ 1360Mhz/1100, 1360/1025 for the 64 @ .9V
 
1) AI in computer games is no such thing. It is simply a calculation of possible moves and countermoves that is calculated either a. on the fly (RTS games) or b. when you press the Next turn button (turn-based strategy games). Most 'AI' can be tuned to calculate only one turn in advance or can calculate several moves in advance. Furthermore, higher difficulty levels usually include the 'AI' cheating either by giving itself extra resources or by stacking the odds against you where you get more challenges during the game than the computer controlled 'character'.
You can have a look at the open source Spring RTS engine and the various AIs that plug into it. Certainly more happens than what you write, and there are challenging AIs that don't cheat (and the cheating ones mostly have an unfair knowledge advantage rather than more resources).
You can even follow the AI decisions in real-time, see its build queue, the threshold when it thinks that a group of units is large enough to overpower defenses, etc. if you want. For example some AI will notice if your base is weak on air defenses, and send air troops to destroy them.
 
You can have a look at the open source Spring RTS engine and the various AIs that plug into it. Certainly more happens than what you write, and there are challenging AIs that don't cheat (and the cheating ones mostly have an unfair knowledge advantage rather than more resources).
You can even follow the AI decisions in real-time, see its build queue, the threshold when it thinks that a group of units is large enough to overpower defenses, etc. if you want. For example some AI will notice if your base is weak on air defenses, and send air troops to destroy them.

Which is why I used words such as most and usually to indicate that it wasn't universally true. And nothing you stated denies that these AIs are simply calculations of probable game outcomes which was my other point in that paragraph. There's usually no NN behind the scenes.

And of course more happens than what I wrote. I already spent way more time posting something that should have been common knowledge to people who should know better.
 
So beyond 32 cores? Looks like the 16+3 VRM will do the trick on MSI's X399 Creation motherboard (Would hopefully be a doubled 8 phase VRM as I don't believe 16 phase VRM controllers exist).

OC3D's Preview

64 cores? I don't see why it couldn't. Sure to be at least $500 though.
 
  • Like
Reactions: mikeo
like this
As a previous owner/user of an X399 Aorus I would say it had a few problems - mine died a horrible and sudden death (and after some research found that quite a few others had the same issue). MSI's board is an evolution of the original X399 platform to handle the power needs of the TR 2990X and onward.
 
  • Like
Reactions: mikeo
like this
Isn't 64 core the last rumor for Rome?
The latest and most reliable rumors are that Zen 2/Rome will be 48 core on 7nm and then 64 core on 7nm+. That also makes sense when you consider yields; jumping straight to 16 core dies on a new process is risky.

The should mean 48 core in 2019 and 64 core in 2020. However, it remains to be seen what the clocks on those parts will be and how quickly that many cores will be made available on Threadripper.
 
The latest and most reliable rumors are that Zen 2/Rome will be 48 core on 7nm and then 64 core on 7nm+. That also makes sense when you consider yields; jumping straight to 16 core dies on a new process is risky.

The should mean 48 core in 2019 and 64 core in 2020. However, it remains to be seen what the clocks on those parts will be and how quickly that many cores will be made available on Threadripper.

That is the older rumor. The new is 64 core for Rome.
 
12 years from now ....

AMD Just announced Threadripper 12 -- 256 cores / 512 threads still on our wonderful x399 chipset .... $1799.99 - still no f'ing cooling solution included bitches

meanwhile ... Intel just rumored their upcoming 32 core HEDT for $2999.99 - This is the latest iteration of Intel Core technology for the year 2030 and beyond - "were still looking for a new CEO"
 
The part I underlined and bold.

Relax dude, if you were familiar with Zen3, you would know what am I trying to say.
introducing more than 2 threads per core doesn't make a lot of sense. Core design is always supposed to reduce stalls as much as possible, and the extra threads takes up processing slack when one of the threads is stalled. Having 3-4 or even more threads would just introduce far too much contention for resources and reduce the performance gained per thread added. Putting more threads per core would just emphasize far more for multithreading with a drop in single thread because the core would have to switch between so many threads. If you'll remember, AMD already did this, it was called Bulldozer. And simply adding a ton of threads per core on big cores would be less performing in the end than simply making a 512-core processor with cores the size of Jaguars.
 
introducing more than 2 threads per core doesn't make a lot of sense. Core design is always supposed to reduce stalls as much as possible, and the extra threads takes up processing slack when one of the threads is stalled. Having 3-4 or even more threads would just introduce far too much contention for resources and reduce the performance gained per thread added. Putting more threads per core would just emphasize far more for multithreading with a drop in single thread because the core would have to switch between so many threads. If you'll remember, AMD already did this, it was called Bulldozer. And simply adding a ton of threads per core on big cores would be less performing in the end than simply making a 512-core processor with cores the size of Jaguars.

SMT is used for increasing throughput in wide cores and also for hiding memory latency in narrow cores with simpler OOO or no OOO at all. Zen is a wide core.

512 Jaguar cores would produce about same throughput than 256 Zen cores, but the Jaguar based chip would perform worse in latency sensitive workloads.

In any case, anyone familiar with Zen3 and server roadmaps knows what I am saying.
 
That is the older rumor. The new is 64 core for Rome.
Yes, it depends on what sources you're willing to put some faith in or not. There are rumors both ways from a variety of sources. It's always up to personal evaluation as to what rumor/source you treat as possibly informed vs. just wishful thinking. That's really the fun part of making educated guesses as to what will happen.

My personal evaluation of all those rumors and AMDs roadmaps are: we won't see a 64 core Threadripper until at least 7nm+ in 2020, if not much later; it remains to be seen how far the high end workstation market will keep wanting more cores and at what price. Perhaps aligned with a change in socket and DDR5 as a quad channel DDR4 bus is going to be oversubscribed for typical workstation workloads that could even use that many cores. It's possible we'll see such a beast on EPYC in 2019 on 7nm (it already has 2x the memory bandwidth of TR on DDR4), but I suspect that won't happen unless yields are surprisingly good or Intel decides to put some pressure on AMD. Time will tell.
 
It's possible we'll see such a beast on EPYC in 2019 on 7nm (it already has 2x the memory bandwidth of TR on DDR4), but I suspect that won't happen unless yields are surprisingly good or Intel decides to put some pressure on AMD. Time will tell.

Yields is the explanation given for the roadmap changes.
 
Yields is the explanation given for the roadmap changes.
The 64 core on 7nm is being pushed by WCCFTech and SemiAccurate, which while interesting, I put very little faith in. Here's a source I do trust (at least a lot more than those two!): https://www.servethehome.com/amd-epyc-rome-details-trickle-out-64-cores-128-threads-per-socket/

They had to clarify because there was some initial confusion that they also throught 64 core was slated for first gen Zen 2 on 7nm instead of 7nm+ the following year. Yes, it's older information and things might have changed, but I have not seen a reliable source say 64 core on 7nm in 2019 yet. Perhaps you'd like to share?

Edit: older information means early June 2018. Older than the early July rumors from WCCFTech and SemiAccurate, but it's backed up from multiple, far more trust worthy sources. I put those two sources mostly into the interesting, but likely wishful thinking category. Make your own decisions.

I should also note that this is what I think will happen based on what I consider reasonably reliable sources, not what I'd like to happen. If we're wishful thinking, I'd love Kyle to be right and TR2 32c will be $999. I'd love to see the TR3 64c in 2019 at $999 as well! And if we're going for broke, how about TR4 128c in 2020 and TR5 256c in 2021! SWEET!!! I'm in, but I wouldn't believe that roadmap for an instant. When looking at the bigger picture of die sizes, yields, power/clocks, etc. it just doesn't make sense.

I would love to be wrong though!
 
Last edited:
7nm is going to be really new for AMD's Zen2 release. I suspect yields at that size may keep AMD away from 8-core CCX. I do see 6-core CCX with a chip total of 48 cores as possible though with some cherry-picking of dies. It will take some time for 7nm to mature and yields stabilize to the point where 8-core CCX is feasible. So I see zen2 topping out at 48 cores on their "WX" chips with Zen3 hitting 64 cores. It may be possible that AMD may follow Intel's early lead of dropping hyperthreading ny the Zen3 timeframe though. But then if you have 64 cores, hyperthreading ends up being far less important..
 
Back
Top