AMD Sheds More Light on Explicit Multiadapter in DirectX 12 in New Slides

cageymaru

Fully [H]
Joined
Apr 10, 2003
Messages
22,090
AMD Sheds More Light on Explicit Multiadapter in DirectX 12 in New Slides
http://wccftech.com/amd-sheds-more-light-on-explicit-multiadapter-in-directx-12-in-new-slides/

The article has some really nice quotes and explanations from developers. It's worth a read!

3kVvUESh.jpg

Sv7elKFh.jpg

WjVHOUfh.jpg

bVkiBJzh.jpg

M4CYkZvh.jpg
 
Boom head shot

wait does this mean what I think and we will not be tied to drivers for multiple gpu?
 
Boom head shot

wait does this mean what I think and we will not be tied to drivers for multiple gpu?

More to come but developers will have more control what the hardware does.
we still wait for win 10 :)
 
Does this mean that, from the developer side, enabling an aggregate GPU memory pool is coding SFR instead of (or in addition to) AFR for multi-GPU?
Sounds a little simpler than the previous assumption of direct, GPU specific, memory management.
 
it will be up to the developer how they implement it.

it just means each gpu can be target independently, to they can shuffle around rendering tasks however they like.

It also means they dont have to be the same silicon.

So when you upgrade, you can leave you old one in there for a speed boost if you want.
 
With SFR, does the system wait to display the frame until both cards have completed their job? Or will it be possible to have poor implementations where we see a different kind of screen tearing?
 
oh, im sure it will be possible.

lol

According to Firaxis, yes, one card is picked as master for composition of SFR.
 
oh, im sure it will be possible.

lol

According to Firaxis, yes, one card is picked as master for composition of SFR.

I hope this doesn't add too much overhead. I think this is where FreeSync could use some dedicated hardware within the monitor itself to handle this.

While a GPU or even CPU will still be needed to keep the GPUs in sync, the "master" could track zeros and ones instead of entire frames. I mean literally... [1, 0, 1, 0] means two of the four GPUs have sent their frames to the monitor. But only on [1, 1, 1, 1], set this array to [0 , 0, 0, 0] and send back a 1 to all other GPUs start rendering the next frame and repeat. All of while the monitor handles portions of the frames and updates when all have been received.

I'd imagine something like this would be faster, as passing integers back and forth keeping the GPUs in sync should be a lot more efficient than having to send over entire frames and having one GPU put them all together before sending to the monitor. You'd need some kind of special input for this work though since all GPUs would need their own direct line of communication to the monitor...

Maybe the final frames themselves, which should only be a few MB worth of data, are small enough for this to not really matter.
 
Last edited:
i think youve missed the point here.

the master composition is necessary because you can only have one output..

Thats the card that has to composite the output.

Once the frame has been rendered composition is trivial, overhead is a non issue.
 
AMD and their slides... same as any company marketing slides, only worse. Trust those numbers and promises and you will walk in winter in Alaska in just flipflops
 
AMD and their slides... same as any company marketing slides, only worse. Trust those numbers and promises and you will walk in winter in Alaska in just flipflops

It already works this way in Mantle, and Civilization Beyond Earth implemented SFR. Mantle games are liquid smooth without SFR. SFR cuts the frame times in half! I hope that Nvidia's DX12 drivers are up to snuff so that you can experience it also when Win 10 ships. It will make all of your DX11 games seem like they were coded by blundering idiots. I literally have to make myself play DX11 titles nowadays.

First world problems. :)
 
yeah, everytime i get a new game i think to myself "i wish this was mantle"

lol

the uplift on single cards is impressive.

i get an extra 20FPS on every mantle game, and thats with an i-7
 
yeah, everytime i get a new game i think to myself "i wish this was mantle"

lol

DAI is a perfect example, its so smooth even at 30fps its frightning.

This x1000!!!

Also, say what you will about Thief, but it is not the same game when running under Mantle. Soooo damn smooth and immersive I wouldn't have believed it had I not tried it. And yes, that's with 2x Crossfire.
 
i agree, its literally not the same game. even with a single card.
 
yeah, everytime i get a new game i think to myself "i wish this was mantle"

lol

the uplift on single cards is impressive.

i get an extra 20FPS on every mantle game, and thats with an i-7

Ironically, I'm most excited about what this can do for Intels i7s... Using more threads and also the iGP means that the i7s could become beastly setups if the Intel HD iGP is any good at handling post-processing tasks...

That said, I expect the AMD FX and APU lineups to see similar increases, but the i7 will just stay that much further ahead... All around I think everything is going to see major performance gains.

On the GPU front I am more excited about what it does for AMD. I think AMDs cards have been over-engineered for 'future tech' for a while, and it is going to be very interesting to see them utilize what they have been talking about for a long time... They have engineered for the future while NVidia has always done a better job at fine-tuning what is currently being used most. This should make for an interesting shift of old tech and even new tech.
 
Boom head shot

wait does this mean what I think and we will not be tied to drivers for multiple gpu?

Pretty much. That means each game's multi-GPU performance is basically entirely dependent on the developers instead of AMD/nVidia.

Does this mean that, from the developer side, enabling an aggregate GPU memory pool is coding SFR instead of (or in addition to) AFR for multi-GPU?
Sounds a little simpler than the previous assumption of direct, GPU specific, memory management.

SFR does not mean VRAM pooling. Even if you have SFR, if you divide the screen evenly, each card will still need the same assets, as you never know where the screen will rotate to. Since each card is rendering half (or one third, or one fourth) of the pixels, VRAM usage will be less in that regard. But textures, etc, still all need to be loaded on, thus you end up with the same VRAM mirroring as AFR.

To do VRAM pooling properly, you need to assign each GPU to a particular element of the screen rather than a portion. For example, you assign one to the trees, background, and lighting, while you assign the other characters. This is not the same thing as SFR. You also cannot dynamically adjust load, for example, have the trees and background GPU help the characters GPU inside a tavern full of people. That introduces latency due to having to copy assets over. I highly doubt we will see VRAM pooling in a significant number of games, and I doubt we will see it in any game that isn't trying to be a tech demo.

With SFR, does the system wait to display the frame until both cards have completed their job? Or will it be possible to have poor implementations where we see a different kind of screen tearing?

The GPUs would have to wait for the slowest one to complete its frame, at least in a properly implemented setup.
 
Can't wait for the brilliance that is Ubisoft to be responsible for the bulk of your video cards' performance.....
 
Why is DX11 AFR and DX12 SFR? What does SFR have to do with DX12? -- its been around for a decade and never been all that great an option.

And the memory pooling shit, exactly how often is alternating frames completely different sets of textures? Or different halves of the same frame, it doesn't matter the answer is never.

And BTW all this shit is "Look what game developers can do with low level access to hardware!" ... yeah ... you mean those same developers that are pushing out half finished games? On top of that now they're also going to focus on those small number of multigpu users to make sure their game is super efficient with multple GPUs? Or wait are they going to spend time offloading stuff to APUs?

SFR is stupid, memory pooling is stupid, game developers don't have time for shit. When Unity and new Source engine start taking advantage of stuff then we'll see it show up in games.
 
Why is DX11 AFR and DX12 SFR? What does SFR have to do with DX12? -- its been around for a decade and never been all that great an option.
It's the difference between requesting something and actually doing it. It removes any assumptions.

And the memory pooling shit, exactly how often is alternating frames completely different sets of textures? Or different halves of the same frame, it doesn't matter the answer is never.
Different halves might actually be fairly common. Any sort of UI/HUD should be easy to render separately. Geometry vs pixels could be an easy split in addition to geometry being used for other parts of an engine(sound, AI/pathfinding, etc). The biggest division would likely be LOD, as lower detail textures would be used for distant objects. The logic of which GPU to use for different tasks with their independent memory pools should be trivial. Even if the split isn't perfect, it's still faster than a single GPU.
 
Different halves might actually be fairly common. Any sort of UI/HUD should be easy to render separately. Geometry vs pixels could be an easy split in addition to geometry being used for other parts of an engine(sound, AI/pathfinding, etc). The biggest division would likely be LOD, as lower detail textures would be used for distant objects. The logic of which GPU to use for different tasks with their independent memory pools should be trivial. Even if the split isn't perfect, it's still faster than a single GPU.


But I though the reason AFR was preferred over SFR was because you could utilize all of the second GPU's power, whereas in SFR you were limited to a single GPU computing geometry? This means you only scale if you are fill-rate limited, which wasn't always the case.

Does that still hold for the DX12 version of SFR, and if so, why are we suddenly pushing this tech again? The more esoteric and specialized your load for each GPU gets, the harder it gets to effectively split the load evenly. And that also limits your scaling.
 
Last edited:
Pretty much. That means each game's multi-GPU performance is basically entirely dependent on the developers instead of AMD/nVidia.

Welp, all the more reason NOT to go with multi-GPU setups in the future.
 
I'm not sure how useful using the onboard GPU will be, since you'll be adding some latency (probably a frame on average) into the mix. Probably depends on the game, but if say the extra input lag from Vsync bugs you, this might do the same.

Same with trying to pool memory. Yes, you can do it - but are you really going to be so horrifyingly memory starved that it's going to be worth the sacrifices elsewhere? What do you do to load balance? How you load balance the system is a can of worms on its own.

These may end up as cool things that you _can_ do in theory that just aren't super useful in actuality.
 
But I though the reason AFR was preferred over SFR was because you could utilize all of the second GPU's power, whereas in SFR you were limited to a single GPU computing geometry? This means you only scale if you are fill-rate limited, which wasn't always the case.

Does that still hold for the DX12 version of SFR, and if so, why are we suddenly pushing this tech again? The more esoteric and specialized your load for each GPU gets, the harder it gets to effectively split the load evenly. And that also limits your scaling.

dual/triple or quad always brings technical issues.
Dx12 wont remove those even if it might be easier to make such solutions work better than Dx11 ever did. I stay away from crossfire/sli as I rather lower my settings slightly than stand the stuff that happens with such solutions. Unless Dx12 changes it.
 
so with dx12 my spare 760 which I never sold due to its value dropping so much its not worth selling it, I could mix it with my 970 for combined processing power?
 
But I though the reason AFR was preferred over SFR was because you could utilize all of the second GPU's power, whereas in SFR you were limited to a single GPU computing geometry? This means you only scale if you are fill-rate limited, which wasn't always the case.

Does that still hold for the DX12 version of SFR, and if so, why are we suddenly pushing this tech again? The more esoteric and specialized your load for each GPU gets, the harder it gets to effectively split the load evenly. And that also limits your scaling.
You could utilize all the power, but you would still have to deal with the technical limitations of the card. Your available memory pool would be that of the smallest card as all the resources would be duplicated. Then anything that wouldn't fit would need fed to the card in advance of any rendering. SFR would allow pooling of the resources as cards could have discrete tasks that didn't overlap. AFR would also have no effect on the time to renderer a specific frame. From what I've read, this delay is crucial when dealing with VR as it contributes to motion sickness.

The SFR method of DX12 would make more sense when dealing with substantial differences in card capabilities and approaching technical limits. An integrated APU for example contains a respectable amount of processing power, but it likely won't compare to a high end GPU for many tasks. The new method should allow developers to offload some of the burden from GPUs while still taking advantage of spare processing power that would otherwise be idle.

It's been a while since I've really messed with 3D programming, but the logic seems sound. Understanding asynchronous rendering seems to be key with the new APIs. The old methods still work, but more options is never a bad thing IMHO.
 
It's the difference between requesting something and actually doing it. It removes any assumptions.

Different halves might actually be fairly common. Any sort of UI/HUD should be easy to render separately. Geometry vs pixels could be an easy split in addition to geometry being used for other parts of an engine(sound, AI/pathfinding, etc). The biggest division would likely be LOD, as lower detail textures would be used for distant objects. The logic of which GPU to use for different tasks with their independent memory pools should be trivial. Even if the split isn't perfect, it's still faster than a single GPU.

Agreed with first statement, but disagree with the second.

If you have one card rendering the left half and the other rendering the right half, how do you know that the right half won't become what the left half was showing due to turning around? Then all of the sudden the right half card needs all of the textures of the left half card, and you introduce massive latency. As I stated in a previous post, the only viable way of doing vram pooling is if each card is rendering different objects on the screen. And that opens up whole new worlds of complexity.

But I though the reason AFR was preferred over SFR was because you could utilize all of the second GPU's power, whereas in SFR you were limited to a single GPU computing geometry? This means you only scale if you are fill-rate limited, which wasn't always the case.

Does that still hold for the DX12 version of SFR, and if so, why are we suddenly pushing this tech again? The more esoteric and specialized your load for each GPU gets, the harder it gets to effectively split the load evenly. And that also limits your scaling.

AFR is preferred (I remember reading an article where a dev in DICE mentioned this) because it gives higher average and maximum FPS. In the days before people started caring about frame time variance and frame pacing, average FPS was what it was all about. Now people do know about frame time variance and latency, hence the renewed interest towards SFR, which gives better minimum FPS, lower frame time variance and lower latency. It also eliminates the frame pacing issues that AMD was having.
 
From what I have read it seems that DX12 allows for far greater efficient use of the hardware. DX11 abstraction layer contributes quite a bit to latency and inefficient use of hardware but allows all hardware to work. Now single frames can have parts split between different hardware resources to reduce time spent on each frame. So the only question then would be is "What part is contributing to the highest latency/frametime and then is it worse than what we had with DX11. Mantle showed huge gains to framepacing, not so much to max frames but minimum increases were huge.
 
From what I have read it seems that DX12 allows for far greater efficient use of the hardware. DX11 abstraction layer contributes quite a bit to latency and inefficient use of hardware but allows all hardware to work. Now single frames can have parts split between different hardware resources to reduce time spent on each frame. So the only question then would be is "What part is contributing to the highest latency/frametime and then is it worse than what we had with DX11. Mantle showed huge gains to framepacing, not so much to max frames but minimum increases were huge.


essentially we get more free fps for no change with hardware.
cant complain about that.
 
AFR is preferred (I remember reading an article where a dev in DICE mentioned this) because it gives higher average and maximum FPS. In the days before people started caring about frame time variance and frame pacing, average FPS was what it was all about. Now people do know about frame time variance and latency, hence the renewed interest towards SFR, which gives better minimum FPS, lower frame time variance and lower latency. It also eliminates the frame pacing issues that AMD was having.

Ok, I'll buy that on the latency thing. I will also buy that post-processing effects will scale 100%, although those are typically simple.

If you specifically use a tile renderer (splitting the frame into tiny chunks), I'll agree frame times will be more consistent as well.

We'll have to see how much of each frame is taken up by geometry, but if it's a minority of the frame we'll see good scaling. But the more cards you add, the less the scaling will be, because there's portions of the render time that can't be split.

Looking forward to THE RETURN OF THE SFR. Much better movie than SLI WARS and THE NVIDIA STRIKES BACK :D
 
Last edited:
Civilization: Beyond Earth uses SFR in Mantle. Compared to DX11 Crossfire (AFR), minimums are higher, average and maximum are lower, frame time variance is lower, and latency is lower. That's working proof that SFR is generally superior for a smoother gaming experience.
 
AFR is preferred (I remember reading an article where a dev in DICE mentioned this) because it gives higher average and maximum FPS. In the days before people started caring about frame time variance and frame pacing, average FPS was what it was all about. Now people do know about frame time variance and latency, hence the renewed interest towards SFR, which gives better minimum FPS, lower frame time variance and lower latency. It also eliminates the frame pacing issues that AMD was having.

The main problem is SFR can be fairly tough to load-balance properly.
Moreover, it can cause some ugly frame-stitching issues. Which is another reason AFR is preferred.
 
essentially we get more free fps for no change with hardware.
cant complain about that.

No. You get it if the game developers utilize it. Otherwise you don't.

And if done wrong/sloppy, you get ugly screen stitching.
 
No. You get it if the game developers utilize it. Otherwise you don't.

And if done wrong/sloppy, you get ugly screen stitching.

Only with vsync off. I'm pretty sure the selling point of this tech reintroduction is SFR combined with adaptive/free/g/sync :D
 
If memory pooling of multiple video cards works smoothly in DX12, a 390X with 4GB suddenly makes absolutely perfect sense.

A single 390X with 4GB of VRAM would be fine for 1440P and lower. Throw a second 390X in the mix and now you have 8GB of VRAM and two GPUs which should be fine for 4K resolutions. Add a third 390X and you end up with just as much VRAM as a Titan X and WAAAAY more processing power. A fourth 390X would be an option only for masochists and hardware snobs. Processing power would increase with available VRAM which is what you need for higher resolutions/settings.

On the other hand, a pair of Titan X in SLI would have 24GB of VRAM. Most of which would be utterly wasted. Even a 980Ti which will supposedly come with 6GB would probably have more VRAM than necessary in SLI at 12GB.

I guess we'll have to see how this feature works in DX12. And if it can operate with current cards like Maxwell and Hawaii, it could DEFINITELY extend their useful lifespans. A pair of 290X with 8GB of total VRAM for around $500? Sign me up!
 
Last edited:
Screen stitching artifacting has nothing to do with vsync.

Yes they do. With vsync on, if you don't have the buffer ready, you use the previous buffer. If you have vsync off you see whatever you have in-progress when the monitor is ready for the next frame.

It's the same as tearing you get with vsync off from partially-completed frames on a single GPU. If you have sync on, you wait for the next frame to complete. It will be the same with AFR or SFR, except the SFR artifacts look much worse.

And Creig: please stop posting like you know what you are talking about when you don't. We're discussing exactly why it doesn't work that way right here in this thread, if you bothered to read.

With SFR, you do free up some space, but not much. You only only get a smaller frame buffer on the second/third/etc card (first card has the full frame buffer), and that's a fraction of the total graphics memory (around 600MB for 4k). You still have to replicate assets in each card.

We're not talking linear scaling here where 4GB + 4GB = 8GB, so quit lying to yourself, or you're going to be mightily disappointed when real DX12 games come out.

Cliffnotes for the ones who can't be bothered:

SFR = Split Frame Rendering, using either lots of tiles or a single split-frame. Does not require as much frame buffer memory in cards that are not the master, but requires that all other assets (textures) are replicated.

AFR = Alternate Frame Rendering. Has each GPU render an entire frame, so it requires a full frame buffer for all GPUs, plus the assets that must be replicated.

These have both been supported since CFX and SLI launched, but AFR won because it provided better benchmark scores. SFR is only now making a comeback because people care about frame times more than average frame rates. It's not hard to believe this will work, since previous attempts have already been made to mix cards with SFR, but I'm not a believer as to HOW WELL it will work.

THIS IS NOT NEW TECHNOLOGY! SFR was ditched for AFR because it has many limitations, including poor estimation of render times, and of course the fact that the geometry rendering must be duplicated.

See here for more info:

http://techreport.com/review/8826/ati-crossfire-dual-graphics-solution/3
 
Last edited:
Back
Top