Vega: AMD’s New Graphics Architecture for Virtually Unlimited Workloads

HardOCP News · Jan 5, 2017

AMD unveiled preliminary details of its forthcoming GPU architecture, Vega. Conceived and executed over 5 years, Vega architecture enables new possibilities in PC gaming, professional design and machine intelligence that traditional GPU architectures have not been able to address effectively. Data-intensive workloads are becoming the new normal, and the parallel nature of the GPU lends itself ideally to tackling them. However, processing these huge new datasets requires fast access to massive amounts of memory. The Vega architecture's revolutionary memory subsystem enables GPUs to address very large data sets spread across a mix of memory types. The high-bandwidth cache controller in Vega-based GPUs can access on-package cache and off-package memories in a flexible, programmable fashion using fine-grained data movement.

rezerekted · Jan 5, 2017

Will this stop the little pauses I still get in some games even on a GTX1070 with 8GB vram? High frame rates are great but not when some games still exhibit object loading pauses.

cyclone3d · Jan 5, 2017

rezerekted said:
Will this stop the little pauses I still get in some games even on a GTX1070 with 8GB vram? High frame rates are great but not when some games still exhibit object loading pauses.

I am guessing that this will definitely help in those situations.

JustReason · Jan 5, 2017

rezerekted said:
Will this stop the little pauses I still get in some games even on a GTX1070 with 8GB vram? High frame rates are great but not when some games still exhibit object loading pauses.

yes and no. First games would need to code to have textures loaded to the gpu/ssd or at least have the game recognize they are there if AMD/Nvidia have drivers make it happen. Second or rather primarily this is a professional aspect for huge data sets when calculating or performing critical analysis. So not likely a general consumer addon at least in the near future. But I am sure we all agree with your sentiments.

Peppercorn · Jan 5, 2017

Does AMD's Radeon Pro SSG have any competing products? This is something AMD is waaaay ahead on. The potential is huge, imagine how a Vega would fair in such a configuration. There is a lot of excitement around this GPU, I think the glee and gloating from the anti-AMD crowd that Vega is a bust is going to look funny in the near future..
Im with Flopper, this is going to be a killer GPU.

cyclone3d · Jan 6, 2017

JustReason said:
yes and no. First games would need to code to have textures loaded to the gpu/ssd or at least have the game recognize they are there if AMD/Nvidia have drivers make it happen. Second or rather primarily this is a professional aspect for huge data sets when calculating or performing critical analysis. So not likely a general consumer addon at least in the near future. But I am sure we all agree with your sentiments.

Why would the games need to directly support it?

Regular programs don't have to have any special code to load stuff into the CPU or GPU cache. The CPU or GPU itself just takes care of it. If it asks for data, it looks in the caches first before going to main memory.

The same concept should be easily doable with another layer of cache for the GPU.

It shouldn't even require any special driver code. The GPU should handle it all by itself.

travisty · Jan 6, 2017

cyclone3d said:
Why would the games need to directly support it?

Regular programs don't have to have any special code to load stuff into the CPU or GPU cache. The CPU or GPU itself just takes care of it. If it asks for data, it looks in the caches first before going to main memory.

The same concept should be easily doable with another layer of cache for the GPU.

It shouldn't even require any special driver code. The GPU should handle it all by itself.

It's true programs dont specify data for the cpu cache. Programs do specifically load data into the cache/memory of GPUs. Every texture, mesh, shader, etc are all loaded explicitly by the program.

Games are constantly loading and dumping GPU data as they run. Open world games will determine what area you're in and load/dump data for surrounding areas dynamically. Another example is Half Life 2. Each time a new area is reached the game pauses, dumps data, and then loads the new data - this was to allow lesser cards to still perform well.

The paradigm switch which games would have to make is the constant loading/dumping of data is no longer as important. That will free up both GPU and CPU cycles as, hopefully, most if not all data can be loaded once and from then on the state doesnt have to be managed.

Older games will not be able to take advantage of this. Hopefully in a few years time, as all GPUs have massive amounts of memory, all games will load it and forget it.

Edgar · Jan 6, 2017

So will this compete or beat a gtx 1080?

Shintai · Jan 6, 2017

rezerekted said:
Will this stop the little pauses I still get in some games even on a GTX1070 with 8GB vram? High frame rates are great but not when some games still exhibit object loading pauses.

Not at all. Your limit is still the speed of the PCIe bus in that case.

Nvidia made NVlink for a reason, tho only interconnect between GPUs for now. And Intel got 6 channel DDR4 (100GB/sec) on Xeon Phi due to this as well.

Gigus Fire · Jan 6, 2017

this is all marketing drivel

Nukester · Jan 6, 2017

rezerekted said:
Will this stop the little pauses I still get in some games even on a GTX1070 with 8GB vram? High frame rates are great but not when some games still exhibit object loading pauses.

The question we all want answered. Thanks.

Quartz-1 · Jan 6, 2017

Show me the numbers!

luminousone · Jan 6, 2017

Saw some slides claiming that they would be integrating xGMI links into the chips, This would be the same cache coherency links that are on the ryzen cpu's. Its very possible that memory will work completely differently on these cards when in multi gpu configs from anything else that existed before. Memory in multi gpu configs might be additive rather then copies of the other gpu chips memory.

Anarchist4000 · Jan 6, 2017

luminousone said:
Saw some slides claiming that they would be integrating xGMI links into the chips, This would be the same cache coherency links that are on the ryzen cpu's. Its very possible that memory will work completely differently on these cards when in multi gpu configs from anything else that existed before. Memory in multi gpu configs might be additive rather then copies of the other gpu chips memory.

Adding a SSD to one apparently adds storage space. Using a second adapter for additional storage shouldn't be much different. The link wouldn't have to be limited to the PCIE slot bandwidth either if on the same board.

Spidey329 · Jan 6, 2017

Hypergreatthing said:
this is all marketing drivel

What gave it away? The revolutionary part or the fine-grained part? They really should have worked a "disruptive" or two in there.

euskalzabe · Jan 6, 2017

I don't really know what I'm talking about here, but wouldn't it be possible to add a SATA3 (for example) connector to the GPU board, so one could directly connect a SATA3 SSD? Imagine reusing an older 128GB SSD as direct to GPU memory: in SATA3's case up to 600MB/s speed should be enough to keep most textures loaded on SSD and just transfer as needed. Am I just missing something painfully obvious? Speed concerns maybe?

thesmokingman · Jan 6, 2017

Thinking outside the box again...

CSI_PC · Jan 6, 2017

euskalzabe said:
I don't really know what I'm talking about here, but wouldn't it be possible to add a SATA3 (for example) connector to the GPU board, so one could directly connect a SATA3 SSD? Imagine reusing an older 128GB SSD as direct to GPU memory: in SATA3's case up to 600MB/s speed should be enough to keep most textures loaded on SSD and just transfer as needed. Am I just missing something painfully obvious? Speed concerns maybe?

I would say speed concerns, because remember there is also the GDDR4 memory RAM on modern systems and this can hold a fair chunk of textures when combined with the GPU, also worth remembering modern games are getting better and more efficient at stream-loading textures.
The direct SSD will be much slower than GDDR4 RAM, while also the direct SSD requiring special controller mechanisms.
Any system looking to use a GPU with a direct SSD, will more than likely have 16GB minimum RAM that performs well and a GPU that is likely to be 12GB VRAM (context being new enthusiast GPUs).
The SSD solution will fit more with HPC-scientific-data modelling-etc rather than prosumer.
What is the maximum RAM supported on graphic workstations using Xeon CPUs?
It is pretty massive so I am not sure it has much use even in CAD or related 3D modelling.

From a game perspective it would be interesting to compare this approach to the Fiji VRAM-DRAM one, maybe the SSD one is better overall even in games, but then game developers are not even making decent use of 12GB VRAM available on some cards even now.
Cheers

noko · Jan 6, 2017

CSI_PC said:
I would say speed concerns, because remember there is also the GDDR4 memory RAM on modern systems and this can hold a fair chunk of textures when combined with the GPU, also worth remembering modern games are getting better and more efficient at stream-loading textures.
The direct SSD will be much slower than GDDR4 RAM, while also the direct SSD requiring special controller mechanisms.
Any system looking to use a GPU with a direct SSD, will more than likely have 16GB minimum RAM that performs well and a GPU that is likely to be 12GB VRAM (context being new enthusiast GPUs).
The SSD solution will fit more with HPC-scientific-data modelling-etc rather than prosumer.
What is the maximum RAM supported on graphic workstations using Xeon CPUs?
It is pretty massive so I am not sure it has much use even in CAD or related 3D modelling.

From a game perspective it would be interesting to compare this approach to the Fiji VRAM-DRAM one, maybe the SSD one is better overall even in games, but then game developers are not even making decent use of 12GB VRAM available on some cards even now.
Cheers

What games will load up assets in ram which would be vary variable machine to machine as well as other processes going on? AMD is the only one I know that will cache the ram via drivers only on selected games for Fiji cards. Stream straight from onboard storage would cut down tremendous amount of latency and free up resources, cpu cycles, pcie bandwidth etc. Games now days do have relatively big data sets plus you could do some pre-compilation work on the drive itself like texture compression. I still think having a big onboard storage space for data for the GPU would definitely speed things up for games and free up the most time consuming and bandwidth operations over the PCIe.

Shintai · Jan 7, 2017

noko said:
What games will load up assets in ram which would be vary variable machine to machine as well as other processes going on? AMD is the only one I know that will cache the ram via drivers only on selected games for Fiji cards. Stream straight from onboard storage would cut down tremendous amount of latency and free up resources, cpu cycles, pcie bandwidth etc. Games now days do have relatively big data sets plus you could do some pre-compilation work on the drive itself like texture compression. I still think having a big onboard storage space for data for the GPU would definitely speed things up for games and free up the most time consuming and bandwidth operations over the PCIe.

SSD vs memory is way too slow. Also the textures are not laying around in ready formats as well. So what would the implementation be, a several minutes startup for a game while 20-30GB of assets being moved?

And even then, 2GB/sec vs 16GB/sec?

The SSD on the pro cards is a hotfix solution for the main issue. They dont have enough VRAM. You also notice how well Xeon Phi does in sales due to the ability to work with large data sets. Datasets P100 cant even get close to. 100GB/sec is a lot more than 2GB/sec or 16GB/sec.

Vega will change nothing in this area, neither will Volta for that matter.

luminousone · Jan 7, 2017

Anarchist4000 said:
Adding a SSD to one apparently adds storage space. Using a second adapter for additional storage shouldn't be much different. The link wouldn't have to be limited to the PCIE slot bandwidth either if on the same board.

You misunderstand, this isn't about SSD's, GMI on the cpu is for cache coherency, xGMI on the GPU may be in effect the same thing. This is about the VRAM memory, and the ability to access another GPU's memory as though it was the same memory pool as the memory directly connected to it. Think of this more like a dual CPU system memory wise, 2 XEON/Opteron CPU's that each have 16gig of ram create a total system memory of 32gigs. While existing multi GPU configs RAM is effectively the size of the smallest card because all textures geometry whatever have you, must be copied to all gpu's as each gpu can not see the memory, or effectively access/share the memory of its neighbors.

Shintai · Jan 7, 2017

luminousone said:
You misunderstand, this isn't about SSD's, GMI on the cpu is for cache coherency, xGMI on the GPU may be in effect the same thing. This is about the VRAM memory, and the ability to access another GPU's memory as though it was the same memory pool as the memory directly connected to it. Think of this more like a dual CPU system memory wise, 2 XEON/Opteron CPU's that each have 16gig of ram create a total system memory of 32gigs. While existing multi GPU configs RAM is effectively the size of the smallest card because all textures geometry whatever have you, must be copied to all gpu's as each gpu can not see the memory, or effectively access/share the memory of its neighbors.

You could simplify and say this is the entire purpose of NVlink as well.

CSI_PC · Jan 7, 2017

noko said:
What games will load up assets in ram which would be vary variable machine to machine as well as other processes going on? AMD is the only one I know that will cache the ram via drivers only on selected games for Fiji cards. Stream straight from onboard storage would cut down tremendous amount of latency and free up resources, cpu cycles, pcie bandwidth etc. Games now days do have relatively big data sets plus you could do some pre-compilation work on the drive itself like texture compression. I still think having a big onboard storage space for data for the GPU would definitely speed things up for games and free up the most time consuming and bandwidth operations over the PCIe.

Yeah I agree it depends and comes down to developers/engine and only recently have games (rarely) started to become more efficient with their virtual textures/'megatextures'/memory pool.
Anyway I see this solution more for prosumer-workstation/HPC than enthusiast and it is still a little while away even for that, but from a gaming perspective its purpose is to provide a fine granularity on controlling frame buffer/VRAM/textures loaded, even Raja mentions that most games only use 50% of the VRAM efficiently (so if engine allocated-loaded 6GB may only actually be using 3GB) and their solution would manage it better by only loading into VRAM what is needed, longer term this is meant to nudge the developers to consider expanding the way they use the VRAM and then would really see the benefit of such a solution (or it will make developers consider using VRAM-DRAM-local storage better but depends how they can do this with the API-OS they work with).

Cheers

CSI_PC · Jan 7, 2017

Shintai said:
You could simplify and say this is the entire purpose of NVlink as well.

Yeah and along with the massive 49-bit coherent unified memory Pascal has for HPC-modelling implementations along with no need now for offload mode (really important feature in HPC space for some).
Cheers

Anarchist4000 · Jan 7, 2017

luminousone said:
You misunderstand, this isn't about SSD's, GMI on the cpu is for cache coherency, xGMI on the GPU may be in effect the same thing. This is about the VRAM memory, and the ability to access another GPU's memory as though it was the same memory pool as the memory directly connected to it. Think of this more like a dual CPU system memory wise, 2 XEON/Opteron CPU's that each have 16gig of ram create a total system memory of 32gigs. While existing multi GPU configs RAM is effectively the size of the smallest card because all textures geometry whatever have you, must be copied to all gpu's as each gpu can not see the memory, or effectively access/share the memory of its neighbors.

I understand what you're saying, but the cache controller they implemented is already capable of venturing into other memory pools for the purpose of sharing data. Whether or not a resource resides on a SSD, host memory, or another adapter is largely transparent to the application. In the case of the SSD, the way it's set up the GPU sees it as an extension of it's memory pool. Granted the performance is different from that of RAM, but the implementation is similar.

N4CR · Jan 7, 2017

Edgar said:
So will this compete or beat a gtx 1080?

From the only two benchamarks which are confirmed, AOTS and Doom, yes it will beat a stock and even OCd in most case 1080. It seems it will be mostly between Pitan and 1080, maybe under on some certain titles and scenarios.

This SSD tech is dismissed by some and praised by others. Few reasons make it a good step forward; detailed texture environments, certain applications within games could enable things we can barely imagine. Could have creations with millions of changing or unique textures, shared with a global pool or part of some online system. We can exchange data to and from the GPU with ease, leaving the PCI bus for graphics communication, keeping the textures and other objects off it as much as possible. Maybe a game like No Mans' Sky but far more detailed can be built, faster using GPUs and SSD tech to store information for further processing or later reference to reduce load time.

By shifting data off the bus, we reduce latency and load, I think people underestimate the usefulness this could have in mGPU systems if integrated with further interconnecting technologies.

For now it seems dGPU Vega won't have that on board SSG tech.

lolfail9001 · Jan 7, 2017

N4CR said:
detailed texture environments

Last thing SSD tech enables is memory speed bound activities.

luminousone · Jan 7, 2017

lolfail9001 said:
Last thing SSD tech enables is memory speed bound activities.

Mega-textures!, Id soft has done some neat things with them, a place to cache recently used tiles, or to store read ahead of tiles so that they can be swiftly swapped in and out of memory when needed. The game Rage's texture assets original sources before compression and resolution reduction to fit on a blueray where over 1TB in size.

Peppercorn · Jan 8, 2017

So considering the size of the Vega die, i wonder how much space is taken up by the new high bandwidth cache.

Shintai · Jan 8, 2017

Peppercorn said:
So considering the size of the Vega die, i wonder how much space is taken up by the new high bandwidth cache.

Less than on Fiji. The "cache" is the HBM2 VRAM/Controller and a 2048bit controller takes up less space than a 4096bit.

Peppercorn · Jan 8, 2017

Shintai said:
Less than on Fiji. The "cache" is the HBM2 VRAM/Controller and a 2048bit controller takes up less space than a 4096bit.

Hmm interesting. So if the high bandwidth cache is the high bandwidth cache controller, then the high bandwidth cache controller is the high bandwidth cache??

From what i've been reading, the HBCC streams in data to the high bandwidth cache. Obviously the cache won't be big enough to store the 512TB virtual address space that is available, but it does seem to suggest that the high bandwidth cache would be larger than Fiji. Did Fiji even half high bandwidth cache? i thought that is new with Vega.

Peppercorn · Jan 8, 2017

The HBM2 arrangement in the Cube looks quite different than the Vega package shown at CES. Hard to really see though.

razor1 · Jan 8, 2017

Peppercorn said:
The HBM2 arrangement in the Cube looks quite different than the Vega package shown at CES. Hard to really see though.

Looks the same.... its just the chip/ unit is turned 90 degree counter clockwise.

The High bandwidth cache is used as an intermittent caching just like L1, L2 cache in the past, but since its much more then what is on die, there are benefits. But from what I understand, for older games this won't be the case, at least in game performance, seems like to use for performance in game it will need to be done in code. Things like going form one program to another, yeah that will be faster but in game performance must be done in code (extensions outside of the the current API's, DX 12 and Vulkan).

Peppercorn · Jan 8, 2017

razor1 said:
Looks the same.... its just the chip/ unit is turned 90 degree counter clockwise.

The High bandwidth cache is used as an intermittent caching just like L1, L2 cache in the past, but since its much more then what is on die, there are benefits. But from what I understand, for older games this won't be the case, at least in game performance, seems like to use for performance in game it will need to be done in code. Things like going form one program to another, yeah that will be faster but in game performance must be done in code (extensions outside of the the current API's, DX 12 and Vulkan).

That looks different to me. The Vega package shown had the interposer filled, but this appears to have gaps and extra components. Unless my eyes are deceiving me as it is hard to see.

At 2:00 in the video he says that the resource management is done automatically.

CSI_PC · Jan 8, 2017

Peppercorn said:
..... Obviously the cache won't be big enough to store the 512TB virtual address space that is available, but it does seem to suggest that the high bandwidth cache would be larger than Fiji. Did Fiji even half high bandwidth cache? i thought that is new with Vega.

That should not take up much room, the P100 for Nvidia also has a similar control for dealing with 49-bit coherent unified memory (that is at least 512TB storage as well).
While that is 610mm2 it also has 2:1 ratio of DP64 cores.
But the AMD solution will need more functionality due to also integrating various storage options/latency control/managing more actively the memory pool/virtual textures/etc with regards to the VRAM.
Cheers

Edit:
Just to add, Nvidia's solution is pure HPC-scientific-data modelling-Grid VCA (also used by those doing modelling-rendering), while AMD's also includes aspects relating to prosumer.

razor1 · Jan 8, 2017

Peppercorn said:
That looks different to me. The Vega package shown had the interposer filled, but this appears to have gaps and extra components. Unless my eyes are deceiving me as it is hard to see.

At 2:00 in the video he says that the resource management is done automatically.

The resource management is done automatically for in game situations if its coded for it

its part of the new primitive shaders

The tile renderer Vega has, it has to be used through primitive shaders too.

Pretty much what they saw with nV's tech, they wanted to incorporate everything they had (from a rasterizer stand point) but couldn't do it with the current architecture, to do thing automatically but they have tech that can be used in certain ways to do it through their primitive shaders *it could have been planned for too as it gives more flexibility to programmers....

Vega: AMD’s New Graphics Architecture for Virtually Unlimited Workloads

[H] News

2[H]4U

[H]F Junkie

razor1 is my Lover

Limp Gawd

[H]F Junkie

Gawd

2[H]4U

Supreme [H]ardness

2[H]4U

[H]ard|Gawd

Supreme [H]ardness

Weaksauce

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

2[H]4U

2[H]4U

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

Weaksauce

Limp Gawd

Supreme [H]ardness

Limp Gawd

Limp Gawd

[H]F Junkie

Limp Gawd

2[H]4U

[H]F Junkie