Raja explains in interview Vega High Bandwidth Cache gaming devwork etc

CSI_PC

2[H]4U
Joined
Apr 3, 2016
Messages
2,193
I think this is a very important video to watch as it explains what is required to use well the new High Bandwidth Cache, along with the various 'memory' solutions including the SSD.
Worth noting, in the gaming arena there will need to be a fair amount of work in future and this solution is designed to push them to consider and design better the allocation of various memory with regards to the virtual textures/megatextures/memory pool.
Starts off explaining how games while allocating say 6GB VRAM is only actually utilising 50% of what they load, this solution with fine-granularity will only load what is used (so reduces to 3GB in that example), its primary benefit is that this fine-granularity will assist in efficiency that developers may have difficulty with for now (but they ignore modern game development using DX12/Vulkan and modern OS).
Longer term developers will need to change their approach and this will enable making much better use of the High Bandwidth Cache, if developers cannot find alternatives for doing this within the OS/API.
For gaming I tend to think they will try the latter, but this will focus the gaming industry better at the very least, sort of like how Mantle did.
Anyway definitely worth watching.



Cheers
 
Already in the defence over 8GB :/

[ 10:00 ] Why does Raja think there’s not much of a difference between ultra and normal settings?
[ games are developed for default settings, changing settings to ultra only makes artwork up-scaled ]

[ 12:00 ] What will it take to make developers develop texture for ultra settings?
[ it’s very hard to notice the difference in textures at higher resolutions, says Christmas 2017 will be interesting time, Raja’s goal for 2017 is to bring 4k fluidity to games ]

Also interesting thing about Freesync 2.

[ 32:35 ] If game support HDR, but it lacks FreeSync2 implementation, will we see still any benefits from FreeSync2?
[ Yes, we will see benefits in some cases ]

[ 36:00 ] FreeSync2 monitors will be more expensive, so probably not as popular as FreeSync. What kind of share does AMD expect to gain with FreeSync2 over FreeSync?
[ There will be few key monitor announcements in Q1 2017, but it should help panel manufacturers to give them some marketing value ]
 
Huh!? Nature of texture mapping suggests otherwise.

Yep, its very discouraging to hear from him. And The Xbox Scorpio with 12GB will only increase VRAM usage.

Vega is several months out and already being told to lower settings.
 
Its looking like Vega will have 8GB of HBM2 and an additional memory pool of GDDR and/or flash storage. I'm thinking probably the drivers will represent the VRAM capacity as 2x the HBM capacity and use standard memory management techniques to keep the most relevant data closest to the GPU.

Heck the way things are going we might see GPUs with simple ARM or x86 cores that can load a minimal kernel and participate in compute clusters without needing the 'rest' of a traditional server. They could be a new type of blade for instance, just using the enclosure for power, network I/O, and cooling.
 
Its looking like Vega will have 8GB of HBM2 and an additional memory pool of GDDR and/or flash storage. I'm thinking probably the drivers will represent the VRAM capacity as 2x the HBM capacity and use standard memory management techniques to keep the most relevant data closest to the GPU.

Heck the way things are going we might see GPUs with simple ARM or x86 cores that can load a minimal kernel and participate in compute clusters without needing the 'rest' of a traditional server. They could be a new type of blade for instance, just using the enclosure for power, network I/O, and cooling.

Nothing points to that. And only certain Pro cards will have SSD storage. The roadmap and slides for future products is also leaked and nothing like that is shown.
 
Nothing points to that. And only certain Pro cards will have SSD storage. The roadmap and slides for future products is also leaked and nothing like that is shown.
Nothing? You seen the slides on this page?
http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/3

The one I attached is the last slide on that page and is very eye opening. It tells a lot they didn't spell out but is pretty clear to me.
You'll notice one of them with 5 grey boxes that never got labeled on the original but are exactly where GDDR5 was in the second to last slide. And they are connected to both the HBCC and the L2 so they almost have to be storage of some sort. There is a separate box for NVRAM (flash). It remains to be seen if only the Pro models will get SSD capability. Perhaps consumer models will get a small stack of NAND (say 16-32GB) and Pro will get much the larger flash drives, probably continuing to employ expandable M2 slots like the current Radeon Pro SSG

My second paragraph was purely speculative (2020's kinda stuff)
 

Attachments

  • Vega%20Final%20Presentation-16_575px.png
    Vega%20Final%20Presentation-16_575px.png
    332.7 KB · Views: 62
Last edited:
Already in the defence over 8GB :/

[ 10:00 ] Why does Raja think there’s not much of a difference between ultra and normal settings?
[ games are developed for default settings, changing settings to ultra only makes artwork up-scaled ]

[ 12:00 ] What will it take to make developers develop texture for ultra settings?
[ it’s very hard to notice the difference in textures at higher resolutions, says Christmas 2017 will be interesting time, Raja’s goal for 2017 is to bring 4k fluidity to games ]
I'm not sure you understood what he was saying about this. what I got out of it is that games aren't currently devolved for 4k and that will change as gpus and monitors become more predominant and affordable then devs will start actually devolving with proper 4k textures etc. he's not defending a 8GB vram/cache what ever he want to call it now.
 
Nothing? You seen the slides on this page?
http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/3

The one I attached is the last slide on that page and is very eye opening. It tells a lot they didn't spell out but is pretty clear to me.
You'll notice one of them with 5 grey boxes that never got labeled on the original but are exactly where GDDR5 was in the second to last slide. And they are connected to both the HBCC and the L2 so they almost have to be storage of some sort. There is a spate box for NVRAM (flash). It remains to be seen if only the Pro models will get SSD capability. Perhaps consumer models will get a small stack of NAND (say 16-32GB) and Pro will get much the larger flash drives, probably continuing to employ expandable M2 slots like the current Radeon Pro SSG

My second paragraph was purely speculative (2020's kinda stuff)

So you are guessing :)

They got labels. CPU, MM, Display, XDMA and PCIe.
AMD-VEGA-VIDEOCARDZ-35.jpg
 
I'm not sure you understood what he was saying about this. what I got out of it is that games aren't currently devolved for 4k and that will change as gpus and monitors become more predominant and affordable then devs will start actually devolving with proper 4k textures etc. he's not defending a 8GB vram/cache what ever he want to call it now.

I think it was quite clear. Dont use ultra textures, use default.
 
he said most devs design the games with default 1080p in mind and that upscaling to 4k isn't really 4k, its doesn't look as good as it could. when games are actually built in 4k they will look even better. that is what I understood. he didn't say use default settings.

edit: rewatch 10-14min mark and listen closer.
 
So you are guessing :)

They got labels. CPU, MM, Display, XDMA and PCIe.
AMD-VEGA-VIDEOCARDZ-35.jpg
The HB Cache controller (looks like DMA) can access data from all storage devices, up to 512TB and stores it in the High-Bandwidth Cache. Indicated by the connections. Also to note it is parallel with PCIe and CPU which to me looks almost like this is an APU block diagram.

The Cache controller feeds the High-Bandwidth Cache - What is the High-Bandwidth Cache? Capacity? Bandwidth?

Typical vagueness AMD spews out but I guess if it gets folks talking that can help bring more attention to their launch. It is not like AMD is a totally unsuccessful company. In reality they are very successful in a number of things (Just making money has been difficult for them lately).
 
The cache is just the HBM2 VRAM. Its not different than GP100 or Xeon Phi.

Vega suffers the same issue as GP100 tho, and despite the PR, its not good at handling large data sets.

Vega 20 may improve this with GMI. But like NVlink its still slower. And by then Knights Landing have moved to Knights Mill and then again to Knights Hill (Those silly names.). Not to mention Lake Crest.

Intel owns the large data sets big time.
 
Last edited:
The cache is just the HBM2 VRAM. Its not different than GP100 or Xeon Phi.

Vega suffers the same issue as GP100 tho, and despite the PR, its not good at handling large data sets.

Vega 20 may improve this with GMI. But like NVlink its still slower.
That actually makes sense, just AMD renaming stuff, it is not a 8GB VRam card but a 1TB card due to heterogeneous memory pool :LOL:. It allows though more direct paths to the GPU as well so that can be very interesting if ever implemented on the gaming cards.

I do believe a SSD on the card or other faster memory pools would be faster then a SSD on the SATA controller sending game data like textures etc. to system memory and then from system memory to cpu, cpu to pcie and then to GPU. There is no indication other then a possibility that Vega would use something like this for a gaming card, more likely Pro cards and future APUs.
 
That actually makes sense, just AMD renaming stuff, it is not a 8GB VRam card but a 1TB card due to heterogeneous memory pool :LOL:. It allows though more direct paths to the GPU as well so that can be very interesting if ever implemented on the gaming cards.

I do believe a SSD on the card or other faster memory pools would be faster then a SSD on the SATA controller sending game data like textures etc. to system memory and then from system memory to cpu, cpu to pcie and then to GPU. There is no indication other then a possibility that Vega would use something like this for a gaming card, more likely Pro cards and future APUs.

It would be obsolete faster than you can blink. NVME drives keep getting more and more popular, not to mention Optane, PCIe 4.0 etc.

And then there is the issue, you are still fighting main memory+PCIe speeds. Even 2GB/sec is going to be very slow compared to the 16GB/sec you already got. And the main memory can be a cache buffer as well.

And how would you fill it up? If it becomes too big, you end up with silly loading times on games. (Wait 5 minutes while SSD cache being prepared).

Just sell people a bit more main memory, cheaper and better :p

Without stepping anyone over the toes, I think its safe to say their current SSG cards is a complete joke with their regular M.2 slots. You get a peak of 2GB/sec.
 
Last edited:
It would be obsolete faster than you can blink. NVME drives keep getting more and more popular, not to mention Optane, PCIe 4.0 etc.

And then there is the issue, you are still fighting main memory+PCIe speeds. Even 2GB/sec is going to be very slow compared to the 16GB/sec you already got. And the main memory can be a cache buffer as well.

And how would you fill it up? If it becomes too big, you end up with silly loading times on games. (Wait 5 minutes while SSD cache being prepared).

Just sell people a bit more main memory, cheaper and better :p

Without stepping anyone over the toes, I think its safe to say their current SSG cards is a complete joke with their regular M.2 slots. You get a peak of 2GB/sec.
Why do you keep stating 16gb/s -> the assets are not automatically cached in system ram and then instantly streamed to the gpu. There is other overhead, latency and PCIe lanes available to the graphics card, could be 8gb/s max for 2 cards SLI/CFX. Data comes mostly from the system SSDs going though a number of steps but now it automatically converts to a 16gb/s speed? That is incorrect. Now AMD via drivers per a few games did this for Fiji.

The other aspect is interrupts to the GPU for the memory transfer which the GPU will not have access to the memory - this is just not the max theoretical bandwidth speed of the 16x PCIe 3.0 bus - rarely do you get that proven by HardOCP test showing 8x makes virtually no difference because the holdups is system storage speeds, CPU etc.

Now for NVME drives etc. it can feed AMD new Vega Arch directly - yes could get very popular and powerful indeed.

If AMD can reduce the unneeded writes to the VRam that will free up the memory for GPU use. If you reduce the constant writing to VRam by 1/2 you increase the effective bandwidth of the GPU since VRam will be tied up less with assets that will never be used and have to be erased.
 
Why do you keep stating 16gb/s -> the assets are not automatically cached in system ram and then instantly streamed to the gpu. There is other overhead, latency and PCIe lanes available to the graphics card, could be 8gb/s max for 2 cards SLI/CFX. Data comes mostly from the system SSDs going though a number of steps but now it automatically converts to a 16gb/s speed? That is incorrect. Now AMD via drivers per a few games did this for Fiji.

The other aspect is interrupts to the GPU for the memory transfer which the GPU will not have access to the memory - this is just not the max theoretical bandwidth speed of the 16x PCIe 3.0 bus - rarely do you get that proven by HardOCP test showing 8x makes virtually no difference because the holdups is system storage speeds, CPU etc.

Now for NVME drives etc. it can feed AMD new Vega Arch directly - yes could get very popular and powerful indeed.

If AMD can reduce the unneeded writes to the VRam that will free up the memory for GPU use. If you reduce the constant writing to VRam by 1/2 you increase the effective bandwidth of the GPU since VRam will be tied up less with assets that will never be used and have to be erased.

The SSD on SSG sits on PCIe. And with something like 100us accesses I doubt latency is an issue. Even with U.2 SSDs none as such can be detected. Even with Optane you are at 10us.

8x vs 16x 3.0 makes little difference because its not utilized. Not because its not capable.

And again, you need to fill the SSD in the first place.
 
Last edited:
The SSD on SSG sits on PCIe. And with something like 100us accesses I doubt latency is an issue. Even with U.2 SSDs none as such can be detected. Even with Optane you are at 10us.

8x vs 16x 3.0 makes little difference because its not utilized. Not because its not capable.

And again, you need to fill the SSD in the first place.
What I get from your reply is you don't think it would help much for games. It has already helped tremendously for larger datasets with GPU compute type workloads.

Doesn't matter to much for most of us here, it will be the bread and butter and how it performs on mostly games here and VR type work loads. We just don't have an accurate idea yet how well or not Vega will perform or if the price/performance is good.

I would not underestimate the unique memory architecture that Vega appears to have - it is patent protected - mostly having direct access to different storage type devices right on the GPU. This most likely would be more applicable for HPC type loads, since an X2 version is coming out 2nd half of this year this maybe first time the two pools of memory can be combined (that would be a very huge step) since it would virtually double the performance of a card transparently for HPC type loads. In other words performance per space AMD will be way ahead of Nvidia.
 
What I get from your reply is you don't think it would help much for games. It has already helped tremendously for larger datasets with GPU compute type workloads.

Doesn't matter to much for most of us here, it will be the bread and butter and how it performs on mostly games here and VR type work loads. We just don't have an accurate idea yet how well or not Vega will perform or if the price/performance is good.

I would not underestimate the unique memory architecture that Vega appears to have - it is patent protected - mostly having direct access to different storage type devices right on the GPU. This most likely would be more applicable for HPC type loads, since an X2 version is coming out 2nd half of this year this maybe first time the two pools of memory can be combined (that would be a very huge step) since it would virtually double the performance of a card transparently for HPC type loads. In other words performance per space AMD will be way ahead of Nvidia.

Could you show me this? And with full specs of the compares.

Accessing another cards memory pool isn't something new. The speed of doing so is the issue. Vega 10 doesn't do anything in that matter. You have to wait for Vega 20 with GMI links for that.

So no, AMD is way behind Nvidia on this with at least 2 years. The X2 is just another regular dual card.


AMD-VEGA-20-specifications-1000x377.jpg
 
Could you show me this? And with full specs of the compares.

Accessing another cards memory pool isn't something new. The speed of doing so is the issue. Vega 10 doesn't do anything in that matter. You have to wait for Vega 20 with GMI links for that.

So no, AMD is way behind Nvidia on this with at least 2 years. The X2 is just another regular dual card.


AMD-VEGA-20-specifications-1000x377.jpg

Read here: (Good rundown and tests)
http://www.tomshardware.com/news/amd-radeon-pro-ssg,32365.html

Don't confuse Peer to Peer GPU communication with both GPU's sharing memory as one pool. Peer to Peer will be more akin as two or more die's acting as one big die or GPU.
 
Sharing 1 memory pool requires really fast interconnect. And Vega 10 doesn't have this.

I am sorry, where are the tests in your link?
Did you read it? These cards are scheduled to be released 2017, there are clear advantages here.

As for the interconnect that is mitigated or should be by the new High Bandwidth Cache Controller.
 
Did you read it? These cards are scheduled to be released 2017, there are clear advantages here.

As for the interconnect that is mitigated or should be by the new High Bandwidth Cache Controller.

I didn't ask for a PR piece, I asked for tests.

And at this time I think you should prove that the interconnect is there. If not, I think we can conclude it isn't there and wont be before Vega 20.
 
Did you read it? These cards are scheduled to be released 2017, there are clear advantages here.

As for the interconnect that is mitigated or should be by the new High Bandwidth Cache Controller.
Apart from when they try to do the same as the DGX-1 as a full node.
Strengths and weaknesses from both manufacturers, and for different focuses.
I would like to know how the page faulting is going to work with multi-node unified memory design when they use the integral SSD model, along with offload mode for accelerators.
Cheers
 
So you are guessing :)

They got labels. CPU, MM, Display, XDMA and PCIe.

Source for that slide? Hadn't seen it before

Also, I think I read somewhere that while the current SSG use a PLX for the SSD that the new ones would have a direct interface. Should be good stuff. May or may not help games but I'm looking forward to finding out more.
 
Source for that slide? Hadn't seen it before

Also, I think I read somewhere that while the current SSG use a PLX for the SSD that the new ones would have a direct interface. Should be good stuff. May or may not help games but I'm looking forward to finding out more.

Its just part of the slide deck:
http://videocardz.com/65406/exclusive-amd-vega-presentation
http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/4

PLX or no PLX isn't so important. 150ns latency added to something with 100us isn't making any difference. There is a reason why U.2 got no downsides.
 
I didn't ask for a PR piece, I asked for tests.

And at this time I think you should prove that the interconnect is there. If not, I think we can conclude it isn't there and wont be before Vega 20.
Well you can go buy one then and test away if you are really curious ;). Not sure if $10,000 is still the going price. I would be more interested in folks having them and using any unique capability.

Then again the Vega version, if ever produced would be even more enticing.
 
The cache is just the HBM2 VRAM. Its not different than GP100 or Xeon Phi.

Vega suffers the same issue as GP100 tho, and despite the PR, its not good at handling large data sets.

Vega 20 may improve this with GMI. But like NVlink its still slower. And by then Knights Landing have moved to Knights Mill and then again to Knights Hill (Those silly names.). Not to mention Lake Crest.

Intel owns the large data sets big time.
Source ?
 
Last edited:
Vega APUs, in what documents are out there, all show full bandwidth to system memory. So 512GB/s HBM plus ~100GB/s to main memory.
 
Vega APUs, in what documents are out there, all show full bandwidth to system memory. So 512GB/s HBM plus ~100GB/s to main memory.

What APUs are you talking about? There is currently only 1 APU and its called Raven Ridge. The HPC lineup from RTG shows no such thing either.

Hopefully you dont refer to some vague slides that haven't had any news the last 2 years.
 
Last edited:
P for performance? What happened to the "P" standing for "Premium?"

Not to be pessimistic but I won't believe or get excited about about anything that comes out of Raja, Su, or Huddy's mouths until benchmarks prove it.

Saying AMD is back without proof is ridiculous. In order for them to be back their new CPU's need to meet/beat Intel performance wise and GPU's need to beat NV. Hopefully Zen/Vega does all this. I for one want a single 4K GPU and hopefully Vega will do it but I am 100% gun shy of AMD's statements.

If Vega comes out and just matches NV's year old GPU's then it is just more of the same AMD. Day late, dollar short.
 
What APUs are you talking about? There is currently only 1 APU and its called Raven Ridge. The HPC lineup from RTG shows no such thing either.

Hopefully you dont refer to some vague slides that haven't had any news the last 2 years.
The larger Naples based one a bit off. That would be the bigger enterprise one. In theory that was 16 core Zen plus Vega 10, although that's looking to be a really huge package atm. Even for the smaller APUs that lack HBM, they'd have full access to system memory as that's all they use.
 
That is NOT the "real" Gordon Mah Ung. The "real" Gordon Mah Ung only drinks Romulan ale and doesn't get drunk so badly that he can barely stand and needs to prop himself up on computer cases. Where is the "Apple rage" and "Star Wars" movie references he always throws in regardless of the topic?? Where is the uncontrollable swearing Gordon is known for??

The "real" Gordon Mah Ung doesn't do "fluff" pieces and interviews that were about "vaporware, sometime in the far flung future hardware announcements".. where kind of important stuff like price, availability and clock speeds are not discussed.

Kyle, this isn't your "long time buddy". That guy disappeared when he left MaximumPC and suddenly became, according to PCWorld.. "One of founding fathers of hardcore tech reporting, Gordon has been covering PCs and components since 1998." Geez Kyle.. you have been covering PC and components since 1998.. why aren't you also considered a "founding fathers"?? All tech reporting before 1998 must have been "softcore"!!

That video doesn't contain the "real" Gordon Mah Ung.. he can be found here instead, at least in digital form. Compare the two.. and you tell me that the "real" Gordon Mah Ung isn't dead just like George Lucus died in the middle of "Return of the Jedi" and replaced with "fake" George Lucas. How else do you explain the Ewoks??

Below IS a pic of the "real" Gordon Mah Ninja Ung.. that other "Gordon" is a Stormtrooper clone who ended up washing dishes

18mnzhc8vicfwjpg.jpg
 
Huh!? Nature of texture mapping suggests otherwise.

The nature of games seems to suggest otherwise to my eyes.
YES there are games that do look better, the dude just said that they're crappy with developing for good quality, textures on walls and floor improve but models just doesn't, weapons, and so on...
very often I am sad to say!

GTX970 owner here and I can tell ya I tune the shit out of games and some settings make no difference. crappy nvidia 3.5 gb instead of 4gb.. strange how those 500 mb matter versus a simular amd card with 4gb :)

However, it's worrying that he takes that line too, but it might be to tell people with dual titan .. hey you're over doing it, there's no point other than the point of having em - I've been there and I'm done with that, I just want good value and good performance. 1080 performance at a good price!

50% optimistic, 50% skeptic :)

Edit: fixed correct quote :)
 
The larger Naples based one a bit off. That would be the bigger enterprise one. In theory that was 16 core Zen plus Vega 10, although that's looking to be a really huge package atm. Even for the smaller APUs that lack HBM, they'd have full access to system memory as that's all they use.
You also need to consider situation that is minimum of 6-8 dGPUs in a single node implementation and also multi-node implementation; normal practice now seems to be 6-8 dGPUs per 2S processor and as an example the standard for Volta with IBM seems to be 6 Volta dGPUs per 2 IBM Power 9 and the Bluelink giving 300GB/s just for the Volta GPUs-CPUs and the Summit supercomputer will have a total of 6PB of system memory (512GB per node).
This impacts what you can do with the various technologies, including criteria for offload mode.
Chees
 
Last edited:
That is NOT the "real" Gordon Mah Ung. The "real" Gordon Mah Ung only drinks Romulan ale and doesn't get drunk so badly that he can barely stand and needs to prop himself up on computer cases. Where is the "Apple rage" and "Star Wars" movie references he always throws in regardless of the topic?? Where is the uncontrollable swearing Gordon is known for??

The "real" Gordon Mah Ung doesn't do "fluff" pieces and interviews that were about "vaporware, sometime in the far flung future hardware announcements".. where kind of important stuff like price, availability and clock speeds are not discussed.

[snip]

That video doesn't contain the "real" Gordon Mah Ung.. he can be found here instead, at least in digital form. Compare the two.. and you tell me that the "real" Gordon Mah Ung isn't dead just like George Lucus died in the middle of "Return of the Jedi" and replaced with "fake" George Lucas. How else do you explain the Ewoks??

Below IS a pic of the "real" Gordon Mah Ninja Ung.. that other "Gordon" is a Stormtrooper clone who ended up washing dishes.

Obviously a masterfully executed and perfectly planted Imperial clone... and my that is the largest cue card reading Ewok besides him that I have ever lain eyes on. The Empire's advances in growth acceleration serum used within their new captive Ewok breeding program has obviously produced some rather stunning results!

I for one will not fall for this veiled propaganda until Commander Kyle and his rebel forces have either debunked their claims or have demonstrated that this hardware is as powerful as claimed, in which case I may need to look into capturing some...
 
Last edited:
Congrats on your upcoming new video hardware. Will there be regular driver updates now?
 
Congrats on your upcoming new video hardware. Will there be regular driver updates now?

Haven't been paying attention for a while have you?

Edit: That wasn't the most polite answer I could give. AMD has really stepped up their driver game over the past year(?), as far as I can tell, they release a driver about the same time and frequency as Nvidia.

Edit2: They also added more features to match up with Nvidia, like getting rid of Raptr, and making their own version of Shadowplay, Nvidia's streaming and capture software feature.
 
Last edited:
Back
Top