AMD Mantle capabilities

notarat

2[H]4U
Joined
Mar 28, 2010
Messages
2,501
I saw

https://twitter.com/Thracks/status/561708827662245888



Where Robert Hallock states that AMD's Mantle API allows the summing up of multiple video cards' RAM so that two, 4GB cards can now be seen as having 8GB or total ram to dedicate.

Seems like this would be a major improvement.

Any thoughts on benefits of or difficulties implementing?
 
Definitely see it possible to address individual vram on individual cards with Mantle. The flexibility Mantle brings to developers fingertips has barely been harnessed. Developing the expertize is what AMD has been working hard on, and that does take time.

Civ. BE is probably the best example of the technological implementation which can be done with Mantle. Using practically no CPU resources for graphics and the new holy grail of multi-card implementation and Crossfire.

Thief is also a great example of what can be done even on an old graphics engine like UE3. Mantle essentially supercharges the graphics engine beyond what anyone thought it could do in terms of game smoothness and up to 3x minimum FPS!

Mantle appears extremely flexible, the difficulty is coding like this for the first time, in a very long time.
 
Probably fairly difficulty to implement. You'd have to explicitly decide what parts of the scene get rendered on each GPU. That's doable, of course, but means that the developers need to make the engine do it. So either the engine has to have some way of analyzing the scene and deciding on the split, which could potentially end up being a significant performance issue, or the devs have to hard code it beforehand.

I don't see it as being something you are likely to see much of, particularly since the multi GPU market is a pretty small one. Basically you have to have a situation where your game has more assets than can fit in the memory of a single GPU and you are willing to spend the time and resources to optimize it to deal with that on multiple AMD GPUs only.

I'm just not seeing it. Games cost too much to make already, which is why we see that developers want to use turn-key solutions for whatever they can. So if someone could make this work automatically, then there might be some interest, however if it takes optimization on their part then forget it.

Particularly since you would only see benefits if you also had increased art assets to use it. I mean if a game only needs 3GB of VRAM, then having 8GB available to it does nothing. So you have to design enough detail in assets (which costs more money) to fill that. However you still need lower rez assets for all the people who don't have dual GPUs, which is most people.
 
Wouldn't doing that effectively double (or worse) the VRAM latency since each separate GPU would have independent access to the other GPUs VRAM?

Well, at least for current gen cards that have hardware limitations...that could change with future gen GPUs that may provide full access to non-native banks of VRAM (same GPU and VRAM residing on the same PCB being able to fully access the VRAM on a different GPU PCB, but treating it as native) with future revisions of AMD's implementation of HSA.

Would be absolutely awesome to have summed VRAM instead of pseudo-mirrored. :cool:
 
not sure it even matters, they tried something similar with the AGP interface - the ability to use system memory as vMemory - the result was horrendously slow..

so even though you could access the memory of a second GPU, the speed across the bus for access to that memory will be significantly slower than the onboard memory..

even texture access from system memory would probably be on par speed wise
 
^ AMD may have to go back to some kind of bridge to overcome any interconnect bus speed problems. Doing so may eliminate that variable.
 
Hmm... Let's see...Don't see a need for a bridge or slowdown if you assign specific textures and objects to specific GPUs. This would really allow you to render an 8GB, or greater, screen using multiple GPUs with an asymmetrical vram setup at full speed. You could potentially build up that screen from multiple parts.
 
I don't know, then you'll wind up with something like a NUMA-like adressing architecture. For GPU's that would mean coherency across the PCIe buss. That wouldn't be a good thing.
 
Let's be careful here...

The tweet refers to a dual-gpu card, so no pcie bus involved. But even if there is access to the bus like crossfire does already, with a proper setup assigning tasks to individual GPUs is potentially feasible. Complicated though.
 
Hmm... Let's see...Don't see a need for a bridge or slowdown if you assign specific textures and objects to specific GPUs. This would really allow you to render an 8GB, or greater, screen using multiple GPUs with an asymmetrical vram setup at full speed. You could potentially build up that screen from multiple parts.


This was already tried if anyone remembers the "Scissor" method Nvidia had in their early GPUs. The idea of each GPU rendering a different part of the scene and then "stitching" it together.

the scissor method was so much less efficient than the AFR (Alternate Frame Rendering).. that it eventually went away completely as an option

But then again, AMD's mantle may have come up with a more efficient Scissor Method. Being able to access memory across GPU IMO doesn't seem as beneficial as simply not needing to duplicate the Texture if the GPU doesn't need to render it. That gets you to the 8Gb scene not the cross GPU memory access.

even v4 of the PCIe bus only gets you something like 30GB/s .. compare that to 320GB/s that the GPU can access its local memory (for the 290x for example)

its just an immense speed hit to try and have GPU1 access a texture out of GPU2's memory


does anyone know what the bandwidth was when they were using a Xfire bridge?

I'm not saying doing a pool of memory is a bad thing, I'm just not convinced we'll really get that much benefit from it as gamers.
 
does anyone know what the bandwidth was when they were using a Xfire bridge?

That may be irrelevant since AMD is at full liberty to implement whatever kind of bridge they wanted to, if they had to.
 
Worst case scenario is that Intel and AMD REALLY revise the PCIE bus. ;) But if something like this gets baked into Windows 10, it would be awesome! I did SFR years ago and it was a lot more enjoyable than AFR. AFR has always been a stuttering mess regardless of Nvidia or AMD's implementation. I haven't tried R9 bridgeless yet. AFR's claim to fame is that it makes bigger fps numbers and that sells more cards. I would like to see SFR return on more powerful hardware because it was always a lot smoother than AFR and less headache inducing.

Bring back the VooDoo days when innovations were every product cycle and you truly wanted to see what the engineers were going to show at trade shows. I just can't get excited about 5% more power savings other than my overclock will go up by 5%. And I concur that this should happen in software automatically with minimum user input.

Thanks for the link. ;)
 
This was already tried if anyone remembers the "Scissor" method Nvidia had in their early GPUs. The idea of each GPU rendering a different part of the scene and then "stitching" it together.

the scissor method was so much less efficient than the AFR (Alternate Frame Rendering).. that it eventually went away completely as an option

But then again, AMD's mantle may have come up with a more efficient Scissor Method. Being able to access memory across GPU IMO doesn't seem as beneficial as simply not needing to duplicate the Texture if the GPU doesn't need to render it. That gets you to the 8Gb scene not the cross GPU memory access.

even v4 of the PCIe bus only gets you something like 30GB/s .. compare that to 320GB/s that the GPU can access its local memory (for the 290x for example)


its just an immense speed hit to try and have GPU1 access a texture out of GPU2's memory


does anyone know what the bandwidth was when they were using a Xfire bridge?

I'm not saying doing a pool of memory is a bad thing, I'm just not convinced we'll really get that much benefit from it as gamers.

ATI had Supertiling, Scissor, AFR
 
Wouldn't doing that effectively double (or worse) the VRAM latency since each separate GPU would have independent access to the other GPUs VRAM?

Well, at least for current gen cards that have hardware limitations...that could change with future gen GPUs that may provide full access to non-native banks of VRAM (same GPU and VRAM residing on the same PCB being able to fully access the VRAM on a different GPU PCB, but treating it as native) with future revisions of AMD's implementation of HSA.

Would be absolutely awesome to have summed VRAM instead of pseudo-mirrored. :cool:
If both cards are still drawing the same things then yes.

Alternately, if one card rendered the background to a certain distance, then the 2nd card rendered the foreground and overlayed it, that could work.
There will be some overlap of data and what is drawn at the boundary between foreground and background, but memory use could be vastly reduced and dual card efficiency should be pretty high.

The CPU would likely be put to more use co-ordinating them, but Mantle handles multi cores better.
 
If both cards are still drawing the same things then yes.

Alternately, if one card rendered the background to a certain distance, then the 2nd card rendered the foreground and overlayed it, that could work.
There will be some overlap of data and what is drawn at the boundary between foreground and background, but memory use could be vastly reduced and dual card efficiency should be pretty high.

The CPU would likely be put to more use co-ordinating them, but Mantle handles multi cores better.

How would you assure that the load was evenly split between the 2 GPU's?
 
How would you assure that the load was evenly split between the 2 GPU's?

HSA provides a set of rules to handle just that. The Sniper Elite developers Rebellion were mulling various scenarios like this last summer for future updates to Sniper Elite 3. According to them Mantle allows them to do this seamlessly without a fuss. Towards the end of their blog post found here. The GPUs don't even have to be remotely close in speed. You can combine an APU GPU with a 295x2 for example if you like. APU can do simple tasks and help the CPU schedule tasks while the big GPU handles the big work. Gone are the days of feeding tasks one spoonful at a time to your system.
 
I hope to see more Mantle support in upcoming games, like GTA 5.
 
I hope to see more Mantle support in upcoming games, like GTA 5.

If it that was the case it would have leaked already :) . Not sure if i missed that the engine they are using already been ported to Mantle but if not slim changes it will.

Anyway Mantle can use the gpu as compute device for it to address 8GB of memory is not that weird. The only people who are baffled by this tend to be stuck in AFR :)
 
The only people who are baffled by this tend to be stuck in AFR :)

Also, seeing the work AMD did with SFR in Civilization: Beyond Earth which can actually reduce latency instead of increasing it making everything extremely smooth and responsive. I can see a lot of possibilities going forward.
 
If it that was the case it would have leaked already :) . Not sure if i missed that the engine they are using already been ported to Mantle but if not slim changes it will.

Kinda leaked:

Anyway Mantle can use the gpu as compute device for it to address 8GB of memory is not that weird. The only people who are baffled by this tend to be stuck in AFR :)

Report: GTA V to Have AMD Mantle Support

AMD Mantle support coming to GTA V and CoD: AW says report


GTA 5 Mantle support pending
 
Unfortunately AMD said that they have no knowledge of the game having Mantle support. I asked Robert Hallock during a Twitch livestream.
 
I still don't understand why dual gpu cards aren't redesigned to have the memory located towards the center of the card where both gpus can access the central cluster of memory. Then they really would have access to 8GB of memory, but maybe there is some technical constraint in doing that.
 
because they are 2 actual full cards on one pc board with an internal pcie bus.

there are no shared resources.
 
He knows, which is why he asks the question.
The word 'redesigned' is key.

@tybert
It would be great for consumers but it would require another branch of hardware and driver design + continued support. ie it must not be abandoned because it becomes inconvenient.
I have no doubt that it will promote higher sales of dual GPU cards but the number sold would be very small relatively, tough to justify.
Maybe its on the cards (hah) when the world is in less of a recession.
 
Links from half a year ago ;)

.

Here is a newer one for you...

http://www.dsogaming.com/news/valve-to-be-on-stage-at-gdc-to-introduce-glnext/

They stated that AMD shared their Mantle code with them and used alot of it in their glNext API.

Looks like AMD has already been doing what alot of people on the site bet they would never do...share the code.
 
I still don't understand why dual gpu cards aren't redesigned to have the memory located towards the center of the card where both gpus can access the central cluster of memory. Then they really would have access to 8GB of memory, but maybe there is some technical constraint in doing that.

That is physically impossible with current memory architecture. The memory controller would need to be shared, and currently each GPU has it's own controller for it's own pool of RAM. We might see some exotic solutions like this in the future once HBM and silicon interposer tech proliferate. But until then, clever load balancing will be needed to split assets between GPUs.
 
I saw

https://twitter.com/Thracks/status/561708827662245888



Where Robert Hallock states that AMD's Mantle API allows the summing up of multiple video cards' RAM so that two, 4GB cards can now be seen as having 8GB or total ram to dedicate.

Seems like this would be a major improvement.

Any thoughts on benefits of or difficulties implementing?

Pretty cool post. Only question is - how time-consuming would it be for developers to go this route? Because you'd have to write different code paths for multi-gpu and single gpu. Not saying this wouldn't be a good thing - it totally would! I'm just wondering if it's feasible for developers to get this fine-grained with their products and still release the games in a timely manner...
 
I still like Firaxis' idea of splitting the geometry and shader workloads between the two cards then compositing.

That would be cool, and crossfire would no longer need matched cards.
 
They used split frame in BE, not split workloads.

Split workoads was a theoretical spitballing idea one of the developers was proposing in one of the dev blogs back in november...
 
Back
Top