AMD’s Navi Will Be a Traditional Monolithic GPU

Megalith

24-bit/48kHz
Staff member
Joined
Aug 20, 2006
Messages
13,000
David Wang, the new SVP of engineering for AMD’s Radeon Technologies Group (RTG), has clarified that Navi GPUs will remain a traditional, monolith design, as the gaming world lacks software to make a multi-chip module (MCM) made possible by Infinity Fabric worthwhile. Wang likens it to doing “Crossfire in a single package,” but independent software vendors and dwindling multi-GPU support are blocking the way forward.

That infrastructure doesn’t exist with graphics cards outside of CrossFire and Nvidia’s SLI. And even that kind of multi-GPU support is dwindling to the point where it’s practically dead. Game developers don’t want to spend the necessary resources to code their games specifically to work with a multi-GPU array with a miniscule install base, and that would be the same with an MCM design.
 
I never understood why they can't do this without requiring developers to code for it (i.e. both GPU's appearing as a single GPU to the system).

Or wait... Didn't they do this already? I don't recall SLI (the original - scan line interleave) requiring extra coding. Or did it?

Why can't they do "frame interleave" (each GPU rendering every other frame)?
 
yea for gaming this was expected, but AI and pro sector they will overtake Nvidia performance wise i think.
 
Last edited:
I never understood why they can't do this without requiring developers to code for it (i.e. both GPU's appearing as a single GPU to the system).

Or wait... Didn't they do this already? I don't recall SLI (the original - scan line interleave) requiring extra coding. Or did it?

Why can't they do "frame interleave" (each GPU rendering every other frame)?

Nvidia currently has tech to make multiple GPUs appear as a single package and pool the VRAM in their professional and AI things.
 
Maybe the reason is also that the performance of Navi in MCM is just not worth doing.
 
  • Like
Reactions: N4CR
like this
I never understood why they can't do this without requiring developers to code for it (i.e. both GPU's appearing as a single GPU to the system).

Or wait... Didn't they do this already? I don't recall SLI (the original - scan line interleave) requiring extra coding. Or did it?

Why can't they do "frame interleave" (each GPU rendering every other frame)?

There's two ways to handle it, and both have major downsides:

1: Split-Frame-Rendering. This is the way 3dfx handled multi-GPU. This works by having both GPUs tag team each frame. The primary downside is the latencies involved when communicating between the two GPUs often meant one GPU would be sitting around waiting for the data from the second one, as only one GPU can draw the entire frame to the display.

2: Alternate-Frame-Rendering. This is the way ATI(AMD)/NVIDIA typically handle multi-GPU. Each GPU is assigned a specific frame to render, so each GPU alternates frames. The downside to this technique is if any INDIVIDUAL frame takes too long to be created, the other GPU stalls out. In addition, like SFR only one GPU can render to the display, so the "primary" GPU sometimes has to sit around waiting for the other one to send its data across.

In either case, you can easily end up in a situation where you have FPS measured in the hundreds, but only have 30-40 frames actually reach the display due to frames missing the display refresh window. This is why there are literally THOUSANDS of threads of people who used SLI/CF asking "why is my game stuttering even though my FPS is over 100?"

The fact is, as long as we have fixed-rate refresh displays, multi-GPU is doomed by latency. Maybe when HDMI 2.1 makes VRR mainstream multi-GPU can be reexamined, but for now it's basically dead.
 
yea for gaming this was expected, but AI and pro sector they will overtake Nvidia performance wise i think.

You need to lay off the Kool aid, or whatever other substabce you might be abusing.

People need to keep expectations in check, they'll be extremely lucky if they even match or are within 10%-15% of NV in high end.
 
I want 4,000 microprocessors in an array, each driving a single pixel in a 4K display...
 
You need to lay off the Kool aid, or whatever other substabce you might be abusing.

People need to keep expectations in check, they'll be extremely lucky if they even match or are within 10%-15% of NV in high end.

I don't think you realize the Radeon pros are already ahead of NV cards, depending on use, and software being used.

AI isn't about the cards as much as the software driving it... Nvidia to their credit have really almost created that market. That is a combination of hardware and software tools though. I'll give it to nvidia they recognized that early and are leading no question.

In terms of pro workstation cards though AMD is more then holding their own... and in some fields such as 8k video editing (every movie and pro commercial for the last few years has been shot 8k) AMD is the only real option.

I guess I'm saying when it comes to pro stuff they both really know their markets... and both AMD and NV have more demand then they can meet. AMD can't make SSG cards fast enough, trying to buy more then a couple at a time is next to impossible.
 
I'm not a game or graphics developer, but wondering if compilers could handle some of the work of using multiple cores? Modern compilers already help code use threads (to an extent). For sure, it's a lot easier than it was 15 years ago!
I remember going through old code and optimizing it to use threads. Our marketing department read an article and told all of our customers that the next version used threads and was going to run faster and do more. Too bad a lot of the code base we had was old as fuck and a total pain in the ass to make threadsafe. Not too mention, a lot of the tasks it did needed to be sequential. Threads don't magically make everything work better. That said, we were able to add threads in a few places that could speed up certain tasks by 400% - marketing came all over the table when they had information like this in their hands.
 
DX12 supports Multi-GPU natively as does the latest Vulkan API. They both pool the GPU's in to one GPU and pool the available video memory.

Its developer centric and up to them to enable and support the feature. It no longer falls to the GPU driver to support it with "profiles" like back in the day with SLI and Crossfire.
 
I don't think you realize the Radeon pros are already ahead of NV cards, depending on use, and software being used.

AI isn't about the cards as much as the software driving it... Nvidia to their credit have really almost created that market. That is a combination of hardware and software tools though. I'll give it to nvidia they recognized that early and are leading no question.

In terms of pro workstation cards though AMD is more then holding their own... and in some fields such as 8k video editing (every movie and pro commercial for the last few years has been shot 8k) AMD is the only real option.

I guess I'm saying when it comes to pro stuff they both really know their markets... and both AMD and NV have more demand then they can meet. AMD can't make SSG cards fast enough, trying to buy more then a couple at a time is next to impossible.

And I don't think you realize that the SSG pro cards are a niche card. So no matter how much AMD fills that niche, overall it's a drop in the bucket to the industry at large.

NV will probably let them have their niche. They probably feel they won't make enough profit from it, which so far has been the case.
 
And now you know why nvidia bought the remains of 3dfx to do nothing with it. Or others.
 
DX12 supports Multi-GPU natively as does the latest Vulkan API. They both pool the GPU's in to one GPU and pool the available video memory.

Its developer centric and up to them to enable and support the feature. It no longer falls to the GPU driver to support it with "profiles" like back in the day with SLI and Crossfire.

But here's the problem: Those implementations have to work for EVERY possible combination of GPUs. Intel iGPU and NVIDIA 1080 Ti? Gotta support it. AMD and NVIDIA? Same thing.

Developers quickly found out this type of support is impossible to load balance. AMD and NVIDIA don't want the headache either. So, developers made the decision to simply stop supporting it entirely.

I note I predicted this when both APIs were first announced; if you are going to try and support more then two of the same exact model of GPU, the headache of load balancing performance becomes more trouble then its worth.
 
  • Like
Reactions: Elios
like this
I never understood why they can't do this without requiring developers to code for it (i.e. both GPU's appearing as a single GPU to the system).

I always wondered this too. Isn't a single GPU already like 1200 SMD cores or something
 
And I don't think you realize that the SSG pro cards are a niche card. So no matter how much AMD fills that niche, overall it's a drop in the bucket to the industry at large.

NV will probably let them have their niche. They probably feel they won't make enough profit from it, which so far has been the case.

They both fill niche markets that's my point. Neither can produce pro cards fast enough to fill demand....

There is only so much chip manufacturing capacity. Both companies can barely meet the demand for their cards in the markets they seem to be going after. Your right NV seems to be allowing AMD to have their way in a handful of niches... and AMD likewise hasn't went after the NV niches all that hard either. Makes me think of the anti comp NV video Kyle just posted... AMD and NV have been accused of colluding before. lol
 
I'm not a game or graphics developer, but wondering if compilers could handle some of the work of using multiple cores? Modern compilers already help code use threads (to an extent). For sure, it's a lot easier than it was 15 years ago!
I remember going through old code and optimizing it to use threads. Our marketing department read an article and told all of our customers that the next version used threads and was going to run faster and do more. Too bad a lot of the code base we had was old as fuck and a total pain in the ass to make threadsafe. Not too mention, a lot of the tasks it did needed to be sequential. Threads don't magically make everything work better. That said, we were able to add threads in a few places that could speed up certain tasks by 400% - marketing came all over the table when they had information like this in their hands.

In gaming, latency is a huge deal when splitting threads. If you end up with one thread waiting for another, you have stuttering even though overall it will be faster. Essentially the same problems as the alternate frame rendering method mentioned above.

DX12 supports Multi-GPU natively as does the latest Vulkan API. They both pool the GPU's in to one GPU and pool the available video memory.

Its developer centric and up to them to enable and support the feature. It no longer falls to the GPU driver to support it with "profiles" like back in the day with SLI and Crossfire.

DX12 and Vulkan support nothing more than direct access to the GPUs without translational APIs. It is up to the developers to implement the translation, which has to be done for each GPU architecture, each GPU configuration, etc. The amount of development work quickly becomes enormous for improvements to a very small market.

I always wondered this too. Isn't a single GPU already like 1200 SMD cores or something

Only one front end/decoder per GPU, memory directly accessed, etc. There is a significant difference between distributing load and instructions at the hardware level and at the software level, which is what would have to happen for multiple GPUs. Unless somehow there is a controller card developed that acts as the receiver for GPU commands, instantly processes and analyzes the request, and distribute the work as necessary. Needless to say, stuffing more cores into a single die is very different from multiple dies communicating with each other over a relatively slow interface.
 
I always wondered this too. Isn't a single GPU already like 1200 SMD cores or something
It comes down to a problem with task scheduling, what would happen if Frame X was assigned to vGPU0 and Frame X+1 was assigned to vGPU1, but X+1 renders and is outputted before X. How it could work is assigning say frame rendering to one vGPU and physics to another vGPU maybe audio to a third and so on, but this is inefficient as it underutilised those other vGPU's. Graphics very much need to be processed in order.
 
It comes down to a problem with task scheduling, what would happen if Frame X was assigned to vGPU0 and Frame X+1 was assigned to vGPU1, but X+1 renders and is outputted before X. How it could work is assigning say frame rendering to one vGPU and physics to another vGPU maybe audio to a third and so on, but this is inefficient as it underutilised those other vGPU's. Graphics very much need to be processed in order.

I guess the crux of my confusion is that... say a Titan Xp is what, 3,840 Pascal cores. A 1050 is 640 Pascal cores.

Now, I will admit that a Titan Xp is a bit different than just gluing 6 1050 dies together. But it doesn't seem like it's that much more. If they can make 3,840 cores schedule just fine, it doesn't sound to me, as a totally ignorant laymen, that getting 6x640 would be a whole lot different.

Maybe it's like saying a Lambo will go 230MPH, and a Focus will go 77MPH down hill with the wind, so if I buy 3 Ford Focii I should be able to go 230 MPH as well... I don't know. Just seems like there are already a crazy number of SIMD cores all scheduling and talking now. I guess if you go outside the die is gets much more complicated.
 
I guess the crux of my confusion is that... say a Titan Xp is what, 3,840 Pascal cores. A 1050 is 640 Pascal cores.

Now, I will admit that a Titan Xp is a bit different than just gluing 6 1050 dies together. But it doesn't seem like it's that much more. If they can make 3,840 cores schedule just fine, it doesn't sound to me, as a totally ignorant laymen, that getting 6x640 would be a whole lot different.

Maybe it's like saying a Lambo will go 230MPH, and a Focus will go 77MPH down hill with the wind, so if I buy 3 Ford Focii I should be able to go 230 MPH as well... I don't know. Just seems like there are already a crazy number of SIMD cores all scheduling and talking now. I guess if you go outside the die is gets much more complicated.

Pretty close example. The 3840 cores on a Titan xP will be using the same caches, RAM, and have only one scheduler in close proximity with very low latency. 6 1050 dies will have 6 schedulers, none of which know what the other scheduler is doing without software implementation, which means going through the slow PCI-E bus to the CPU and having the CPU translate what is happening, not to mention the schedulers cannot access each other's caches or RAM, at least not without significant latency. The thing to remember is that gaming is a unique type of load, where not only how fast it is rendered but when it is rendered matters.
 
So (IMO) the problem is you can only really ever render one frame at a time, especially in gaming scenarios where the scene is very dynamic. I've owned and used SLI and Crossfire and neither offered a genuinely smooth experience for the really demanding games. It seems to make a whole lot of sense that a graphics card just needs to be able to render frame by frame very quickly. Worrying about anything but the current frame doesn't seem to make much sense. Of course you can optimize how a card handles instructions and memory for a single frame, but anything beyond the current frame, I don't see how you really optimize for that. To me it seems logical that displays with flexible frame rates are what we all need across the board in addition to cards that aren't gimped on memory bandwidth and processing power for all the geometry, lighting/shaders, physics, etc -- whatever it takes to render a frame at the target resolution and level of detail needs to be on one card/GPU chip IMO.
 
There's two ways to handle it, and both have major downsides:

1: Split-Frame-Rendering. This is the way 3dfx handled multi-GPU. This works by having both GPUs tag team each frame. The primary downside is the latencies involved when communicating between the two GPUs often meant one GPU would be sitting around waiting for the data from the second one, as only one GPU can draw the entire frame to the display.

2: Alternate-Frame-Rendering. This is the way ATI(AMD)/NVIDIA typically handle multi-GPU. Each GPU is assigned a specific frame to render, so each GPU alternates frames. The downside to this technique is if any INDIVIDUAL frame takes too long to be created, the other GPU stalls out. In addition, like SFR only one GPU can render to the display, so the "primary" GPU sometimes has to sit around waiting for the other one to send its data across.

In either case, you can easily end up in a situation where you have FPS measured in the hundreds, but only have 30-40 frames actually reach the display due to frames missing the display refresh window. This is why there are literally THOUSANDS of threads of people who used SLI/CF asking "why is my game stuttering even though my FPS is over 100?"

The fact is, as long as we have fixed-rate refresh displays, multi-GPU is doomed by latency. Maybe when HDMI 2.1 makes VRR mainstream multi-GPU can be reexamined, but for now it's basically dead.

1) SFR (really sub-frame-rendering) doesn't have much downside. Just need enough bandwidth that the two GPUs can communicate effectively well which isn't much of an issue (if doing an MCM, it is reasonable to expect 64-128GB/s as a min between dies). In general, it provides actual reliable speedups but can be inefficient in utilizing GPU resources (aka 25-50% speed up instead of 75-100%). Internally, high end GPUs already do SFR as it is generally much more efficient utilization of the clustered resources on chip.

2) AFR - absolutely useless snake oil with no real benefit. You can easily simulate AFR- just have the GPU display every frame twice. AFR is for idiots.

The reality is that what we've seen as SLI is just poor shitty execution, using the equiv of 1-2 PCIe lanes to link 2 GPUs, your only really viable option is crappy AFR. However, if you are designing for an MCM, you can utilize much higher bandwidth links that makes SFR very viable and should be pretty cost effective, esp at the higher performance tiers.
 
As long as AFR remains the preferred method for multi-GPU rendering, this is probably for the better.

AFR ought to just die.
 
And I don't think you realize that the SSG pro cards are a niche card. So no matter how much AMD fills that niche, overall it's a drop in the bucket to the industry at large.

NV will probably let them have their niche. They probably feel they won't make enough profit from it, which so far has been the case.
The thing about niches, is you can charge a lot more for products that fill that niche when you are the only one making them, and your customers aren't poor by any stretch of the imagination.
 
AMD needs to find a way to have their drivers present their infinity fabric GPU designs to the operating system as a single GPU, and then have it split loads across the multiple chips using some form of SFR algorithm.

This could be quite fantastic.
 
I guess the crux of my confusion is that... say a Titan Xp is what, 3,840 Pascal cores. A 1050 is 640 Pascal cores.

Now, I will admit that a Titan Xp is a bit different than just gluing 6 1050 dies together. But it doesn't seem like it's that much more. If they can make 3,840 cores schedule just fine, it doesn't sound to me, as a totally ignorant laymen, that getting 6x640 would be a whole lot different.

Maybe it's like saying a Lambo will go 230MPH, and a Focus will go 77MPH down hill with the wind, so if I buy 3 Ford Focii I should be able to go 230 MPH as well... I don't know. Just seems like there are already a crazy number of SIMD cores all scheduling and talking now. I guess if you go outside the die is gets much more complicated.

Um, those aren't actual cores. Those are marketing cores.

Depending on how you really want to count, GP102 has either 6 or 60 "cores" with 6 being closer to what we consider CPU cores. and 60 being closer to what is normally consider CPU execution units.

When AMD/Nvidia count cores, they are really counting SP FP ALU lanes. By that counting, each current gen Xeon core has ~40-50 "cores".
 
The thing about niches, is you can charge a lot more for products that fill that niche when you are the only one making them, and your customers aren't poor by any stretch of the imagination.

In AMD's case, they need to charge what the market considers fair. They are in absolutely no position to price even niche products out of the hands of people that populate that niche.

AMD isn't the only rooster in the coop, nor are they the biggest and baddest in the coop either.
 
You need to lay off the Kool aid, or whatever other substabce you might be abusing.

People need to keep expectations in check, they'll be extremely lucky if they even match or are within 10%-15% of NV in high end.
Very high proof kool aid floating around here lately.
 
In AMD's case, they need to charge what the market considers fair. They are in absolutely no position to price even niche products out of the hands of people that populate that niche.

AMD isn't the only rooster in the coop, nor are they the biggest and baddest in the coop either.
It wouldn't surprise me if there were others, but I'm not aware of them myself. Who else provides an SSG-like solution, and what is it called, if I may ask?
 
Monolithic... okay.. Question out of curiosity, you can have a monolithic architecture and still spread it across multiple chips right? Maybe not all the same chip i mean
 
It wouldn't surprise me if there were others, but I'm not aware of them myself. Who else provides an SSG-like solution, and what is it called, if I may ask?

No one offers something similar. The problem is that they aren't beneficial for the majority of the industry that they were designed for. The only people that they would really benefit might be big name YouTubers. And let's be honest here, those are a niche of a niche.
 
Back
Top