AMD Mantle capabilities

notarat · Feb 2, 2015

I saw

https://twitter.com/Thracks/status/561708827662245888

Where Robert Hallock states that AMD's Mantle API allows the summing up of multiple video cards' RAM so that two, 4GB cards can now be seen as having 8GB or total ram to dedicate.

Seems like this would be a major improvement.

Any thoughts on benefits of or difficulties implementing?

Yakk · Feb 2, 2015

Definitely see it possible to address individual vram on individual cards with Mantle. The flexibility Mantle brings to developers fingertips has barely been harnessed. Developing the expertize is what AMD has been working hard on, and that does take time.

Civ. BE is probably the best example of the technological implementation which can be done with Mantle. Using practically no CPU resources for graphics and the new holy grail of multi-card implementation and Crossfire.

Thief is also a great example of what can be done even on an old graphics engine like UE3. Mantle essentially supercharges the graphics engine beyond what anyone thought it could do in terms of game smoothness and up to 3x minimum FPS!

Mantle appears extremely flexible, the difficulty is coding like this for the first time, in a very long time.

Sycraft · Feb 2, 2015

Probably fairly difficulty to implement. You'd have to explicitly decide what parts of the scene get rendered on each GPU. That's doable, of course, but means that the developers need to make the engine do it. So either the engine has to have some way of analyzing the scene and deciding on the split, which could potentially end up being a significant performance issue, or the devs have to hard code it beforehand.

I don't see it as being something you are likely to see much of, particularly since the multi GPU market is a pretty small one. Basically you have to have a situation where your game has more assets than can fit in the memory of a single GPU and you are willing to spend the time and resources to optimize it to deal with that on multiple AMD GPUs only.

I'm just not seeing it. Games cost too much to make already, which is why we see that developers want to use turn-key solutions for whatever they can. So if someone could make this work automatically, then there might be some interest, however if it takes optimization on their part then forget it.

Particularly since you would only see benefits if you also had increased art assets to use it. I mean if a game only needs 3GB of VRAM, then having 8GB available to it does nothing. So you have to design enough detail in assets (which costs more money) to fill that. However you still need lower rez assets for all the people who don't have dual GPUs, which is most people.

DejaWiz · Feb 2, 2015

Wouldn't doing that effectively double (or worse) the VRAM latency since each separate GPU would have independent access to the other GPUs VRAM?

Well, at least for current gen cards that have hardware limitations...that could change with future gen GPUs that may provide full access to non-native banks of VRAM (same GPU and VRAM residing on the same PCB being able to fully access the VRAM on a different GPU PCB, but treating it as native) with future revisions of AMD's implementation of HSA.

Would be absolutely awesome to have summed VRAM instead of pseudo-mirrored.

H-street · Feb 2, 2015

not sure it even matters, they tried something similar with the AGP interface - the ability to use system memory as vMemory - the result was horrendously slow..

so even though you could access the memory of a second GPU, the speed across the bus for access to that memory will be significantly slower than the onboard memory..

even texture access from system memory would probably be on par speed wise

DejaWiz · Feb 2, 2015

^ AMD may have to go back to some kind of bridge to overcome any interconnect bus speed problems. Doing so may eliminate that variable.

Yakk · Feb 2, 2015

Hmm... Let's see...Don't see a need for a bridge or slowdown if you assign specific textures and objects to specific GPUs. This would really allow you to render an 8GB, or greater, screen using multiple GPUs with an asymmetrical vram setup at full speed. You could potentially build up that screen from multiple parts.

duby229 · Feb 2, 2015

I don't know, then you'll wind up with something like a NUMA-like adressing architecture. For GPU's that would mean coherency across the PCIe buss. That wouldn't be a good thing.

Yakk · Feb 2, 2015

Let's be careful here...

The tweet refers to a dual-gpu card, so no pcie bus involved. But even if there is access to the bus like crossfire does already, with a proper setup assigning tasks to individual GPUs is potentially feasible. Complicated though.

H-street · Feb 2, 2015

Yakk said:
Hmm... Let's see...Don't see a need for a bridge or slowdown if you assign specific textures and objects to specific GPUs. This would really allow you to render an 8GB, or greater, screen using multiple GPUs with an asymmetrical vram setup at full speed. You could potentially build up that screen from multiple parts.

This was already tried if anyone remembers the "Scissor" method Nvidia had in their early GPUs. The idea of each GPU rendering a different part of the scene and then "stitching" it together.

the scissor method was so much less efficient than the AFR (Alternate Frame Rendering).. that it eventually went away completely as an option

But then again, AMD's mantle may have come up with a more efficient Scissor Method. Being able to access memory across GPU IMO doesn't seem as beneficial as simply not needing to duplicate the Texture if the GPU doesn't need to render it. That gets you to the 8Gb scene not the cross GPU memory access.

even v4 of the PCIe bus only gets you something like 30GB/s .. compare that to 320GB/s that the GPU can access its local memory (for the 290x for example)

its just an immense speed hit to try and have GPU1 access a texture out of GPU2's memory

does anyone know what the bandwidth was when they were using a Xfire bridge?

I'm not saying doing a pool of memory is a bad thing, I'm just not convinced we'll really get that much benefit from it as gamers.

DejaWiz · Feb 2, 2015

H-street said:
does anyone know what the bandwidth was when they were using a Xfire bridge?

That may be irrelevant since AMD is at full liberty to implement whatever kind of bridge they wanted to, if they had to.

cageymaru · Feb 2, 2015

Worst case scenario is that Intel and AMD REALLY revise the PCIE bus.

But if something like this gets baked into Windows 10, it would be awesome! I did SFR years ago and it was a lot more enjoyable than AFR. AFR has always been a stuttering mess regardless of Nvidia or AMD's implementation. I haven't tried R9 bridgeless yet. AFR's claim to fame is that it makes bigger fps numbers and that sells more cards. I would like to see SFR return on more powerful hardware because it was always a lot smoother than AFR and less headache inducing.

Bring back the VooDoo days when innovations were every product cycle and you truly wanted to see what the engineers were going to show at trade shows. I just can't get excited about 5% more power savings other than my overclock will go up by 5%. And I concur that this should happen in software automatically with minimum user input.

Thanks for the link.

Final8ty · Feb 3, 2015

H-street said:
This was already tried if anyone remembers the "Scissor" method Nvidia had in their early GPUs. The idea of each GPU rendering a different part of the scene and then "stitching" it together.

the scissor method was so much less efficient than the AFR (Alternate Frame Rendering).. that it eventually went away completely as an option

But then again, AMD's mantle may have come up with a more efficient Scissor Method. Being able to access memory across GPU IMO doesn't seem as beneficial as simply not needing to duplicate the Texture if the GPU doesn't need to render it. That gets you to the 8Gb scene not the cross GPU memory access.

even v4 of the PCIe bus only gets you something like 30GB/s .. compare that to 320GB/s that the GPU can access its local memory (for the 290x for example)

its just an immense speed hit to try and have GPU1 access a texture out of GPU2's memory

does anyone know what the bandwidth was when they were using a Xfire bridge?

I'm not saying doing a pool of memory is a bad thing, I'm just not convinced we'll really get that much benefit from it as gamers.

ATI had Supertiling, Scissor, AFR

Nenu · Feb 3, 2015

DejaWiz said:
Wouldn't doing that effectively double (or worse) the VRAM latency since each separate GPU would have independent access to the other GPUs VRAM?

Well, at least for current gen cards that have hardware limitations...that could change with future gen GPUs that may provide full access to non-native banks of VRAM (same GPU and VRAM residing on the same PCB being able to fully access the VRAM on a different GPU PCB, but treating it as native) with future revisions of AMD's implementation of HSA.

Would be absolutely awesome to have summed VRAM instead of pseudo-mirrored.

If both cards are still drawing the same things then yes.

Alternately, if one card rendered the background to a certain distance, then the 2nd card rendered the foreground and overlayed it, that could work.
There will be some overlap of data and what is drawn at the boundary between foreground and background, but memory use could be vastly reduced and dual card efficiency should be pretty high.

The CPU would likely be put to more use co-ordinating them, but Mantle handles multi cores better.

Relayer · Feb 4, 2015

Nenu said:
If both cards are still drawing the same things then yes.

Alternately, if one card rendered the background to a certain distance, then the 2nd card rendered the foreground and overlayed it, that could work.
There will be some overlap of data and what is drawn at the boundary between foreground and background, but memory use could be vastly reduced and dual card efficiency should be pretty high.

The CPU would likely be put to more use co-ordinating them, but Mantle handles multi cores better.

How would you assure that the load was evenly split between the 2 GPU's?

cageymaru · Feb 4, 2015

Relayer said:
How would you assure that the load was evenly split between the 2 GPU's?

HSA provides a set of rules to handle just that. The Sniper Elite developers Rebellion were mulling various scenarios like this last summer for future updates to Sniper Elite 3. According to them Mantle allows them to do this seamlessly without a fuss. Towards the end of their blog post found here. The GPUs don't even have to be remotely close in speed. You can combine an APU GPU with a 295x2 for example if you like. APU can do simple tasks and help the CPU schedule tasks while the big GPU handles the big work. Gone are the days of feeding tasks one spoonful at a time to your system.

griff30 · Feb 4, 2015

I hope to see more Mantle support in upcoming games, like GTA 5.

Pieter3dnow · Feb 4, 2015

griff30 said:
I hope to see more Mantle support in upcoming games, like GTA 5.

If it that was the case it would have leaked already

. Not sure if i missed that the engine they are using already been ported to Mantle but if not slim changes it will.

Anyway Mantle can use the gpu as compute device for it to address 8GB of memory is not that weird. The only people who are baffled by this tend to be stuck in AFR

Yakk · Feb 4, 2015

Pieter3dnow said:
The only people who are baffled by this tend to be stuck in AFR

Also, seeing the work AMD did with SFR in Civilization: Beyond Earth which can actually reduce latency instead of increasing it making everything extremely smooth and responsive. I can see a lot of possibilities going forward.

griff30 · Feb 4, 2015

Pieter3dnow said:
If it that was the case it would have leaked already . Not sure if i missed that the engine they are using already been ported to Mantle but if not slim changes it will.

Kinda leaked:

Anyway Mantle can use the gpu as compute device for it to address 8GB of memory is not that weird. The only people who are baffled by this tend to be stuck in AFR

Report: GTA V to Have AMD Mantle Support

AMD Mantle support coming to GTA V and CoD: AW says report

GTA 5 Mantle support pending

cageymaru · Feb 4, 2015

Unfortunately AMD said that they have no knowledge of the game having Mantle support. I asked Robert Hallock during a Twitch livestream.

{NG}Fidel · Feb 5, 2015

I don't think they would talk about it yet.

Pieter3dnow · Feb 5, 2015

griff30 said:
Report: GTA V to Have AMD Mantle Support

AMD Mantle support coming to GTA V and CoD: AW says report

GTA 5 Mantle support pending

Links from half a year ago

.

polonyc2 · Feb 5, 2015

has Mantle turned out to be a bust?

cageymaru · Feb 5, 2015

polonyc2 said:
has Mantle turned out to be a bust?

Nope! Got more developers like Capcom on board so that they can practice working with DX12.

tybert7 · Feb 5, 2015

I still don't understand why dual gpu cards aren't redesigned to have the memory located towards the center of the card where both gpus can access the central cluster of memory. Then they really would have access to 8GB of memory, but maybe there is some technical constraint in doing that.

The Mac · Feb 5, 2015

because they are 2 actual full cards on one pc board with an internal pcie bus.

there are no shared resources.

Nenu · Feb 5, 2015

He knows, which is why he asks the question.
The word 'redesigned' is key.

@tybert
It would be great for consumers but it would require another branch of hardware and driver design + continued support. ie it must not be abandoned because it becomes inconvenient.
I have no doubt that it will promote higher sales of dual GPU cards but the number sold would be very small relatively, tough to justify.
Maybe its on the cards (hah) when the world is in less of a recession.

Drep · Feb 5, 2015

Pieter3dnow said:
Links from half a year ago

.

Here is a newer one for you...

http://www.dsogaming.com/news/valve-to-be-on-stage-at-gdc-to-introduce-glnext/

They stated that AMD shared their Mantle code with them and used alot of it in their glNext API.

Looks like AMD has already been doing what alot of people on the site bet they would never do...share the code.

haz_mat · Feb 6, 2015

tybert7 said:
I still don't understand why dual gpu cards aren't redesigned to have the memory located towards the center of the card where both gpus can access the central cluster of memory. Then they really would have access to 8GB of memory, but maybe there is some technical constraint in doing that.

That is physically impossible with current memory architecture. The memory controller would need to be shared, and currently each GPU has it's own controller for it's own pool of RAM. We might see some exotic solutions like this in the future once HBM and silicon interposer tech proliferate. But until then, clever load balancing will be needed to split assets between GPUs.

jbltecnicspro · Feb 6, 2015

notarat said:
I saw

https://twitter.com/Thracks/status/561708827662245888

Where Robert Hallock states that AMD's Mantle API allows the summing up of multiple video cards' RAM so that two, 4GB cards can now be seen as having 8GB or total ram to dedicate.

Seems like this would be a major improvement.

Any thoughts on benefits of or difficulties implementing?

Pretty cool post. Only question is - how time-consuming would it be for developers to go this route? Because you'd have to write different code paths for multi-gpu and single gpu. Not saying this wouldn't be a good thing - it totally would! I'm just wondering if it's feasible for developers to get this fine-grained with their products and still release the games in a timely manner...

The Mac · Feb 6, 2015

I still like Firaxis' idea of splitting the geometry and shader workloads between the two cards then compositing.

That would be cool, and crossfire would no longer need matched cards.

Yakk · Feb 6, 2015

The Mac said:
I still like Firaxis' idea of splitting the geometry and shader workloads between the two cards then compositing.

That would be cool, and crossfire would no longer need matched cards.

And no need to be humble about it!

http://youtu.be/vmSnLIhBwvE

The shirt!

The Mac · Feb 6, 2015

They used split frame in BE, not split workloads.

Split workoads was a theoretical spitballing idea one of the developers was proposing in one of the dev blogs back in november...

AMD Mantle capabilities

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

Fully [H]

Supreme [H]ardness

Fully [H]

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

Fully [H]

Fully [H]

Gawd

[H]ardened

[H]ard|Gawd

Fully [H]

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

Fully [H]

Supreme [H]ardness

Supreme [H]ardness

Fully [H]

Fully [H]

2[H]4U

Supreme [H]ardness

[H]ardened

Weaksauce

Limp Gawd

[H]F Junkie

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness