AMD’s Navi Will Be a Traditional Monolithic GPU

The word monolithic means one (mono), so by definition, no.
Well that is why im thinking architecture, and thinking multiple chips because maybe gpu chips are getting so large, might be wise to break them up something like that.
 
Well that is why im thinking architecture, and thinking multiple chips because maybe gpu chips are getting so large, might be wise to break them up something like that.

That is what mgpu was developed for and as we've seen it wasn't the end all be all solution.
 
AMD needs to find a way to have their drivers present their infinity fabric GPU designs to the operating system as a single GPU, and then have it split loads across the multiple chips using some form of SFR algorithm.

This could be quite fantastic.

Vega already introduced a “Intelligent Workgroup Distributor" the idea being that the card now has a larger L2 Cache with HBM allowing the "IWD" to split work up early in the pipeline.

One of the main issues with splitting work up is something has to decide how and when to split things. What do you send where... what if one things relies on another. This is where processors in general end up with ideas like predictive hardware. Not sure we want out GPUs to have meltdown specter issues (ok I'm joking)... but that is what is happening as GPUs get more complicated. The only way to split the work in an efficient manner is to either have software optimizations done by the software developer... or include more complicated processors to do the work.

As much as it pains me to admit Nvidia may have the right idea going forward. Nvidia for years have included their own Falcon microprocessor (FAst Logic CONtroller) in their GPUs. Falcon is a class of general-purpose microprocessor units, used in multiple instances on nvidia GPUs starting from G98. Nvidia has always been a bit cagey about the complete operation of Falcon... the open source Nvidia driver folks call it "Fuc" instead of falcon. However the idea is solid... a functioning RISC chip inside the GPU to take care of such cases (one of which can easily be splitting work early) Which is what AMD claims to have achieved with Vega once they could address enough L2. Anyway my point is... Nvidia is killing Falcon off as it has some serious limitations such as being a 32 bit design. Nvidia is switching to Risc-V... which is actually very interesting.

Nvidia switching to RiscV could well give them the ability if to more easily split work loads. Now make no mistake NV isn't bettering their falcon design for gamers... this is clearly aimed at AI and Compute work. Still the potential is high.

Anyway it seems AMD and Nvidia are both on to the same line of thinking... in that they need faster and more intelligent on board logic to efficiently handle internal multi-threading. I think from a technical stand point Nvidia may be on the right track including a RiscV logic processor (and I find it a bit subversive perhaps that the first major open source CPU design RiscV may find its way into average users machines through Nvidia one of the least open friendly companies around). Having said that... NV will likely use the chip to power its internal logic but do as it has with Falcon and obscure its programmable operation from anyone but their driver team.

PS Western Digital is also talking about building RiscV chips into every drive they manufacture in the next few years to power on board storage logic using AI. GPUs, Storage... its all going to get a lot smarter in the next few years. CPU and GPU cores may not get much faster over the next number of years but the engines feeding them data may... which is just as good for performance hopefully.
 
That is what mgpu was developed for and as we've seen it wasn't the end all be all solution.
Yes but mgpu implies a whole complete gpu core connecting to another complete gpu core.. im thinking, perhaps moronically.. can you have like pieces of silicon composed of those steam processor or whatever they are called ( the one that numbers in the 1000s) only that in the silicon, then another chunk of silicon with the processor that glues then together, so on. The advantage im thinking is you put together the chip from mini chunks of silicon, you can cool it better, probably get better yeilds instead of giant chips.. plus, you can create different model cards by adding or removing chunks of silicon as opposed to disabling parts of the same giant chip... Problem is of course interconnections.
 
It wouldn't surprise me if there were others, but I'm not aware of them myself. Who else provides an SSG-like solution, and what is it called, if I may ask?

There is nothing like SSG from anyone else. Nvidia is not currently capable of creating an SSG like verison of any of their cards.

What makes SSG possible is Vegas HBCC (high bandwidth cache controller). It allows remote memory to behave like a last level cache. It Vega HBCC can address up to 512 terrabytes of memory. The HBCC also allows the Vega based non SSG parts to address system ram as video ram, this isn't as big a deal as the SSG tends to get used for things like 8k video and loading massive data sets... if you can afford to put enough RAM in a workstation for that to matter, you may as well just buy a SSG instead of the 9100.
 
I guess the crux of my confusion is that... say a Titan Xp is what, 3,840 Pascal cores. A 1050 is 640 Pascal cores.

Now, I will admit that a Titan Xp is a bit different than just gluing 6 1050 dies together. But it doesn't seem like it's that much more. If they can make 3,840 cores schedule just fine, it doesn't sound to me, as a totally ignorant laymen, that getting 6x640 would be a whole lot different.

Maybe it's like saying a Lambo will go 230MPH, and a Focus will go 77MPH down hill with the wind, so if I buy 3 Ford Focii I should be able to go 230 MPH as well... I don't know. Just seems like there are already a crazy number of SIMD cores all scheduling and talking now. I guess if you go outside the die is gets much more complicated.

Each of those cores is not like a CPU core, a single CPU core can do a lot where it would be better to think of each of those GPU cores as a single graphics pipeline which only works on one pixel at a time. For a better description of the difference look up “The Graphics Pipeline” There are a number of sites that go into detail about this model and they should answer your questions pretty well.
 
There are ways AMD could get creative, depending on how standardized the pipeline is with new APIs.


At the very least, I always thought they could split things into 2 GPUs: a "render" GPU and a "display" GPU.

So the render GPU does all the 3D work, then writes the raw output frames to the display GPU's memory. The display GPU then applies post-processing shaders (which don't necessarily need much data from the render GPU's memory, other than the raw image and depth buffer), while also holding the silicon for encoding/decoding video (meaning this GPU would handle game recording, streaming, watching a video, browser acceleration, and other background GPU tasks).

AMD could even make a small, standardized display GPU to use across their whole lineup, so they don't have to tape out 2 chips for every single card.
 
Last edited:
The issue you have are effects that are temporal in nature.

For the original scan line interleave times were simpler, now you have effects that rely on the data from neighbouring pixels which make this impractical.

After this you have effects that rely on data from the previous frame (the temporal stuff as above), this basically breaks sli/crossfire.
 
There are ways AMD could get creative, depending on how standardized the pipeline is with new APIs.


At the very least, I always thought they could split things into 2 GPUs: a "render" GPU and a "display" GPU.

So the render GPU does all the 3D work, then writes the raw output frames to the display GPU's memory. The display GPU then applies post-processing shaders (which don't necessarily need much data from the render GPU's memory, other than the raw image and depth buffer), while also holding the silicon for encoding/decoding video (meaning this GPU would handle game recording, streaming, watching a video, browser acceleration, and other background GPU tasks).

AMD could even make a small, standardized display GPU to use across their whole lineup, so they don't have to tape out 2 chips for every single card.

Congratulations; you just cut output FPS in half.

In your design, you lose a ton of performance at the interrconnect. One GPU is going to be waiting for the other to send/receive data. And the one that is waiting will be sitting there doing next to nothing while the other card is overburdended with work.

Nevermind both GPUs will need large amounts of on board RAM, which is the most expensive part of a GPU, making your design uneconomical to implement.
 
Sooo is multi gpu really in its death throes then? If a gpu manufacturer honcho comes out with these things its not a good sign.
 
I know nothing about chip design, but would it be possible to create a scheduler "core" GPU, which through infinity fabric connects could have multiple "3D rendering" GPUs doing the heavy lifting. Or would the latency/bandwidth be to small to make it feasable, or is not even close to what is possible? :)
 
Um, those aren't actual cores. Those are marketing cores.

Depending on how you really want to count, GP102 has either 6 or 60 "cores" with 6 being closer to what we consider CPU cores. and 60 being closer to what is normally consider CPU execution units.

When AMD/Nvidia count cores, they are really counting SP FP ALU lanes. By that counting, each current gen Xeon core has ~40-50 "cores".

Each of those cores is not like a CPU core, a single CPU core can do a lot where it would be better to think of each of those GPU cores as a single graphics pipeline which only works on one pixel at a time. For a better description of the difference look up “The Graphics Pipeline” There are a number of sites that go into detail about this model and they should answer your questions pretty well.

All well and good - appreciate the info.

Does not affect my point in the slightest.
 
I know nothing about chip design, but would it be possible to create a scheduler "core" GPU, which through infinity fabric connects could have multiple "3D rendering" GPUs doing the heavy lifting. Or would the latency/bandwidth be to small to make it feasable, or is not even close to what is possible? :)

Possible? sure. A good idea? Nope. If one is going to do a segmented die/MCM design, you are better off just doing the the scheduling local to each GPU die. AKA, you feed all the chiplets the full command stream and they filter it to the region that own. filtering is super cheap and GPUs generally have that functionality built in. The main bottleneck is to the memory of the other GPUs for things like generated texture/data maps as part of the scene. With reasonable cross chiplet bandwidth it all works well, without it all falls down. If you were going to do a chiplet design, you would likely want in the range of 100 GB/s between each chiplet with on the order of 200-400 GB/s of local memory bandwidth per chiplet.
 
I never understood why they can't do this without requiring developers to code for it (i.e. both GPU's appearing as a single GPU to the system).

Or wait... Didn't they do this already? I don't recall SLI (the original - scan line interleave) requiring extra coding. Or did it?

Why can't they do "frame interleave" (each GPU rendering every other frame)?

If they had to go and create a custom chip that can makes the little chips totally transparent to games/users, it defeats the money saving purpose of using several smaller chips.
 
All well and good - appreciate the info.

Does not affect my point in the slightest.
OK to explain it a smidge further the input to the graphics pipeline is a scene (1 frame) containing some lighting sources, a camera position and field of view then a bunch of objects to render and some textures for good measure, that raw 3D data is sent to the GPU the GPU then takes that data and puts each of those cores to work determining each pixels colour starting in the top left and working to the bottom right then spits out that one rendered 2D frame. Then comes the next scene (1 frame) and the process repeats, each of those GPU cores essentially process 1 pixel of that scene and nothing more. If they split it as described they would have the ability to render 2 frames at once but what if that second frame renders faster than the first, it is sitting there waiting until first frame renders then it would just spit out the 2 frames in succession the result is some very choppy and jarring visual effects as your frame rate while technically constant feels jittery and laggy.

So regardless of all those thousands of cores the GPU is never working on more than 1 frame at a time and always in sequential order as otherwise requires a lot of developer back end work into task scheduling and resource allocation and will almost always be a buggier product due to the exponential incrase in complexity. However, being able to render multiple scenes at once for animation or drafting work is a huge plus because it doesn't matter what order they are done in as long as they are all completed in the end, this is why this setup was traditionally done on high end cards like the Tesla's and the FirePro's.
 
PS Western Digital is also talking about building RiscV chips into every drive they manufacture in the next few years to power on board storage logic using AI. GPUs, Storage... its all going to get a lot smarter in the next few years. CPU and GPU cores may not get much faster over the next number of years but the engines feeding them data may... which is just as good for performance hopefully.

Yeah, a lot is stuff is positive, but it does make you wonder, with all of the various components now having their own little CPU's doing their own little things, what else might they be doing on my system? First we had Intel's Management Engine. Now Hard drive controller boards and GPU's. How many computers do I need inside my computer, and what are the risks if I don't understand fully what all of them do?

I think the reason why the hard drives are using more processing power is because of the trend of real time hardware encryption (primarily in order to speed up secure erase) bu who knows what else they could be doing?

And are these devices with firmware that rarely gets updated going to be vulnerable to attack? Could I get an infected hard drive or GPU with a controller that is mining my data?
 
  • Like
Reactions: ChadD
like this
Well that is why im thinking architecture, and thinking multiple chips because maybe gpu chips are getting so large, might be wise to break them up something like that.
You are correct that manufacturers are looking for ways to break up big monolothic chips but at the same time having completely separate chips communicate through PCB traces is slow and time consuming (PCB traces are enormous which both limits the number of actual traces you can put in and introduces a lot of capacitance which means increased latency/power). This is where AMD's research into silicon interposers and Intel's reseach into EMIB comes in. They are still separate chips though, so there is still a latency spike so trying to synchronize a single task across multiple chips may be tricky.
 
AdoredTV is going to be pissed. He'll have to do a new video to explain this huge roadmap change.
 
Yeah, a lot is stuff is positive, but it does make you wonder, with all of the various components now having their own little CPU's doing their own little things, what else might they be doing on my system? First we had Intel's Management Engine. Now Hard drive controller boards and GPU's. How many computers do I need inside my computer, and what are the risks if I don't understand fully what all of them do?

I think the reason why the hard drives are using more processing power is because of the trend of real time hardware encryption (primarily in order to speed up secure erase) bu who knows what else they could be doing?

And are these devices with firmware that rarely gets updated going to be vulnerable to attack? Could I get an infected hard drive or GPU with a controller that is mining my data?

Legit concerns for sure. HDD have micro controllers now of course... but ya making them smarter could no doubt open possible new avenues to attack. In the case of GPUs the nvidia falcon does I believe run DRM type stuff. It will be interesting to see how well it all goes... it seems the way things are heading more fully featured micro processors. GPUS and HDD have them now they are just a lot less intelligent... there isn't much to attack now by exploiting a HDD micro controller, as it doesn't store a ton of data. But to enable AI its going to have to. Hopefully such things are locked down really well.

Its been known for awhile that some very good (state level) hackers are capable of hacking some HDD firmware.
https://www.wired.com/2015/02/nsa-firmware-hacking/
Currently what that really means is simply writing encrypted data elsewhere on the physical disc... stealing say a laptop that has been hacked in this way would allow you to read the encrypted data your fake firmware copied unencrypted somewhere else on that disc. Also it allows your hack to persist even if someone completely re formats and fresh installs. Its nothing average folks like us need to worry about.... right now. However I could envision a possible HDD product with firmware that communicates more directly with the OS. Making it possible that multiple hacks of the drive and OS could allow those types of high end hackers to leech the data remotely.

So its an issue... but I guess I'm saying it sounds like it would still take multiple hacks to remotely compromise.
 
but aren't consoles built this way? and aren't all games these days made for consoles as a priority anyway? and then when greed kicks in even more, hastily ported in the most terrible fashion to pc?
 
but aren't consoles built this way? and aren't all games these days made for consoles as a priority anyway? and then when greed kicks in even more, hastily ported in the most terrible fashion to pc?
All modern consoles have monolithic GPUs (like, in the last 18 years). The GPU resources are not spread across multiple dies. I'm not familiar with older consoles.
 
All modern consoles have monolithic GPUs

Hell, the last few generations have had monolithic APUs.

and aren't all games these days made for consoles as a priority anyway? and then when greed kicks in even more, hastily ported in the most terrible fashion to pc?

This isn't as cut-and-dry as it used to be, especially now that there are different performance targets for the consoles as well. Many/most games that are cross-platform from inception run well on each platform that they are released for.
 
Sooo is multi gpu really in its death throes then? If a gpu manufacturer honcho comes out with these things its not a good sign.
Current solutions might not be as elegant for developers but once they hit barriers and or break them then things might switch around.
Some of the pain of mGPU is gone when you have a game engine which natively supports it. It has worked on titles as Ashes of the Singularity (Nitro engine).

The same way people treated 8 core cpu optimizations and now it is part of the ecosystem in the PS4 and Xbox1 .
Developers are notoriously slow adapting in the pc market.
 
Trying to google-fu to see if there are other leaks with NAVI, but nothing...AMD running a tight ship there. Anybody got something else besides 'multi- gpu is extra extra dead'?
 
Current solutions might not be as elegant for developers but once they hit barriers and or break them then things might switch around.
Some of the pain of mGPU is gone when you have a game engine which natively supports it. It has worked on titles as Ashes of the Singularity (Nitro engine).

The same way people treated 8 core cpu optimizations and now it is part of the ecosystem in the PS4 and Xbox1 .
Developers are notoriously slow adapting in the pc market.
I agree, it most likely will happen in some years, is just that single GPUs been keeping up just fine.
 
No software?

What about all those hollow promises dx12 brought?

Mixing gpus vram stacking and the like?
 
Sooo is multi gpu really in its death throes then? If a gpu manufacturer honcho comes out with these things its not a good sign.
MCM'd Navi wasn't traditional multi GPU. It was something else. Its not going to fly, at least with the current version of IF, for different reasons for why Crossfire/SLI are dying.

What about all those hollow promises dx12 brought?
DX12 allows all that the developers just aren't using it.
 
No software?

What about all those hollow promises dx12 brought?

Mixing gpus vram stacking and the like?

I'm not to sure how you understood DX12 would function. With certain engines you can do some of those things.
DX12 is a really big upgrade because it allows developers to go balls to the wall on key areas they want to, that not many do is not something you can attribute to DX12
 
I'm not to sure how you understood DX12 would function. With certain engines you can do some of those things.
DX12 is a really big upgrade because it allows developers to go balls to the wall on key areas they want to, that not many do is not something you can attribute to DX12
Has there been any games that support any of dx12 features?

Ashes is all I can think of.

this is from tom's hardware

and that is only async compute

so far mixed gpu is a non starter and vram stacking is no where in sight,
 
Last edited:
Has there been any games that support any of dx12 features?
Ashes is all I can think of.
this is from tom's hardware
and that is only async compute
so far mixed gpu is a non starter and vram stacking is no where in sight,

It is not something magical it has to be able to give performance or a better way of doing a graphics/gaming engine then what was previously possible. If you have a Swiss army knife doesn't mean you have to use all of the features to get the job done.
https://www.anandtech.com/show/9740/directx-12-geforce-plus-radeon-mgpu-preview/4

Some of the feature set can be used easily but that does not mean that the exotic ones define what DX12 is about.
 
I'm not to sure how you understood DX12 would function. With certain engines you can do some of those things.
DX12 is a really big upgrade because it allows developers to go balls to the wall on key areas they want to, that not many do is not something you can attribute to DX12

Going "balls to the wall" costs the developer more money...hence why eg. Multi-GPU is dying andDX12 is aiding in it's demise. (the devs have to implement and code for MGPU now, unlike before when the drivers did most of thework.
But I guess all gaming companies now ar "evil, anti-consumer and anti-tech"...right? ^^
 
Going "balls to the wall" costs the developer more money...hence why eg. Multi-GPU is dying andDX12 is aiding in it's demise. (the devs have to implement and code for MGPU now, unlike before when the drivers did most of thework.
But I guess all gaming companies now ar "evil, anti-consumer and anti-tech"...right? ^^
Not really your opinion is your own. Has nothing to do with the fact that you only have to do it once in gaming engines. For people running a serious effort like Dice and Oxide it is trivial. And more companies run the Oxide Nitro engine.
The way it ran before still needed an engine that was suited for AFR and Nvidia and or AMD had to work on each game separately and bugs in code one way or another could mess this up and would require a 2 step process to getting it fixed if it meant code from AMD or Nvidia was the culprit.
Adoption rate is always small at the start and that most games do not need to be that optimized to run on a PC tells you enough about the industry. I'm not advocating that every game needs an engine where they would have to use DX12 but running your game still in DX9 to date is a shame.
To say it is a lot more work is only down to moving away from cpu bound gaming and there have not been engines/games that severely use cpu where you would have real trouble if you did not use a 8 core cpu.

Then again people who pretend that DX12 is here and need instant gratification to tick all boxes of the feature set are missing some important information.
 
Not really your opinion is your own. Has nothing to do with the fact that you only have to do it once in gaming engines. For people running a serious effort like Dice and Oxide it is trivial..

"Trivial" is not the word I would use....but that for for confirming my hunch that you don't code.
 
"Trivial" is not the word I would use....but that for for confirming my hunch that you don't code.
Yes because when your development cycle of about 5 years of a game mGPU will take up way more then that. If have confirmation that you lack a fully functional brain...
 
Yes because when your development cycle of about 5 years of a game mGPU will take up way more then that. If have confirmation that you lack a fully functional brain...

Your kinda sad...but I don't care.
Watch mGPU get more and more niche and then die of.
Because developers are not interested in the expense they cannot make a profit from from that tiny, little mGPU segment.

This was predicted by a lot of people(incl. myself) when the DX12 specs were revealed...but the fanboys...oh golly...they thoguht mGPU-heaven was right around the corner...you really have such a shotty memory that I need you to remind you of this?
 
First, can we cut the personal stuff and discuss the topic?

Watch mGPU get more and more niche and then die of.
Because developers are not interested in the expense they cannot make a profit from from that tiny, little mGPU segment.

This was predicted by a lot of people(incl. myself) when the DX12 specs were revealed...but the fanboys...oh golly...they thoguht mGPU-heaven was right around the corner

For every revision of DirectX (and for every other API, such as Vulkan) each engine needs to add support for Multi-GPU.

DirectX12 and Vulkan pushed the coding requirements up to eleven, essentially delaying multi-GPU support; this had a downhill effect, where new games were shipping without multi-GPU support which made the commercial case for multi-GPU less tenable. Thus, AMD and Nvidia backed off on promoting it.

Now, support for multi-GPU with DX12 and/or Vulkan is in all of the major engines, meaning that we should see an increase in games released with support. Further, DX12 and Vulkan provide a means to fine-tune multi-GPU at the engine level, which may improve the experience versus what was possible with DX11 and OpenGL.
 
Back
Top