Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
I thought nvidia was already prototyping this as well?
http://devblogs.nvidia.com/parallelforall/nvlink-pascal-stacked-memory-feeding-appetite-big-data/
The best we have of GDDR5 is on par with the "worst" entry level of HBM ... it will only get better, where GDDR5 is near its cap.
Hear, hear! We shall be seeing lots of cool shit with graphics cards soon. This should effectively eliminate any memory bandwidth caps for... Quite a long time, I would think.
Zarathustra[H];1041611649 said:HBM is undoubtedly the future memory tech, at the very least for Video cards.
I question how much of a practical performance benefit it will actually have in this generation of video cards - however.
Sure, it may MASSIVELY increase the available video ram bandwidth, but we already know from overclocking that current gen GPU's only marginally benefit from upping the memory bandwidth, and there is no reason to believe that this upcoming generation of GPU's from AMD will be any different.
Power savings have also been listed as a huge benefit of HBM. And sure, the 3x better performance per watt sure is impressive. Considering - however - how small a portion of the overall video card power use is in the RAM, this will have a rather marginal impact on high performance products. (it will be huge on low power mobile devices though)
Lets say you have a 300W video card with GDDR5 RAM, which uses 275W for the GPU and 25 W for the ram. Cut the RAM power use by a factor of 3, and you are now using 8.3W for the RAM, giving you an additional 16.7W to use for the GPU while still staying within the 300W power envelope.
So that's moving from 275w to 291.7W, and increase of ~6%. Not bad. Every little bit counts, but it's not enough to be a game changer
If anything, limited early supply of early HBM production will be the biggest impact of going HBM this gen, in reducing the availability of these cards.
I hate to say it, but Nvidia's "wait and see" approach, going with HBM in its second generation instead, was likely the smarter approach, though I will be happy to be proven wrong on launch day
Over time HBM will be of huge importance, but its importance will only grow slowly, bit by bit each generation. Those expecing an overnight change because OMG HBM are going to be hugely disappointed.
More bandwidth is always good. However I'm more concerned about what arstechnica says about this. Their article (linked) says that 4gb max has been confirmed?? That would mean the 390x is limited to competing with the 980 and isn't any competition for the titan-x. I'm a new owner of a titan-x, but I was still hoping for AMD to show up for some competition as it's good for the market. 4GB max would be very bad IMO.
http://arstechnica.com/information-...nfirms-4gb-limit-for-first-hbm-graphics-card/
I can see the use in a GPU.
I am wondering what it would do if properly set up on my motherboard (well a future board).
Is it not true that one of the big bottle necks on a main board is waiting for dram? If I remember right lvl 2/3 caching takes up a bit of space on the CPU die and if that could be moved off the CPU wouldn't it seriously decrease heat and allow for higher clocks?
Yeah, basically the biggest "con" would seem to be heat. It's all packed into a much smaller, tighter area.
Hear, hear! We shall be seeing lots of cool shit with graphics cards soon. This should effectively eliminate any memory bandwidth caps for... Quite a long time, I would think.
That means the next generation of GPUs with HBM could not only be much smaller physically, everything other than the GPU+memory package can be much simpler and cheaper. Since you more or less only need to route PCIe in and video signals out, everything gets smaller, cheaper, and simpler. This is a win/win/win for almost everyone, and once volume comes up, it won’t cost any more than the ‘old way’.
How exactly do you pull heat from the layers underneath? I'm guessing this is why there's a voltage/clock nerf? It shows 4 stacks on the slide, which would be 400"+" GB/s versus the traditional example they gave at 448 GB/s. I'm wondering what kind of room we have to move from here. Just more stacks on the interposer? What about overclocking?
Zarathustra[H];1041611649 said:HBM is undoubtedly the future memory tech, at the very least for Video cards.
I question how much of a practical performance benefit it will actually have in this generation of video cards - however.
Sure, it may MASSIVELY increase the available video ram bandwidth, but we already know from overclocking that current gen GPU's only marginally benefit from upping the memory bandwidth, and there is no reason to believe that this upcoming generation of GPU's from AMD will be any different.
...
I hate to say it, but Nvidia's "wait and see" approach, going with HBM in its second generation instead, was likely the smarter approach, though I will be happy to be proven wrong on launch day .....
I think you've nailed it. Nvidia in particular have being talking about stacked DRAM since Fermi came out. So it seems to me, given Nvidia's larger R&D budget, the only reason AMD will bring HBM to market first is because Nvidia let them. As you noted, the real world performance benefit probably won't be significant enough, at least on 28nm.
Kinda like how their 512 bit bus on their current cads isn't enough to give them an advantage over the competition.
Also, notice how those slides focus on efficiency: space saving, power saving etc. There's little mention/promotion of benefits to the high end market.
Surely this will help the memory bandwidth starved APU's if you can put a 1/2 stack or full stack for 512mb or 1gb of RAM right there for it, yes? Really, how many of these could you fit onto an existing APU? Can you fit one of them?
And for AMD's next CPU, can this be used as well? Either as a replacement for L3 cache or as L4 Cache before turning to the DDR3/4?
Surely this will help the memory bandwidth starved APU's if you can put a 1/2 stack or full stack for 512mb or 1gb of RAM right there for it, yes? Really, how many of these could you fit onto an existing APU? Can you fit one of them?
And for AMD's next CPU, can this be used as well? Either as a replacement for L3 cache or as L4 Cache before turning to the DDR3/4?
We will eventually see HBM or something similar stacked ontop of a host die, once the thermal roadblock is figured out.Zarathustra[H];1041613747 said:I'd imagine there is plenty of space on the APU package for an additional HBM chip, but from the little I know about the tech, I think the processes are different enough that you won't see HBM on die.
Power to AMD however i want to see this in action. AMD had said a number of times they have got something great but it turns out to be meh.
Zarathustra[H];1041613758 said:AMD have - in the last 15 years - had a history of being ahead of the curve, launching tech that would become crucial and mainstream 5 - 6 years down the road, paving the way for their competitors.
AMD spends the money and time to develop it for a market that is not ready, and gets little market advantage from it. Then their competitors come along a few years later and profit off of the tech more than AMD does.
On die memory controllers, 64bit x86 cpu's, multi-core/many-core CPU's, Heterogeneous unified memory access, HMB, you name it.
The market has benefited tremendously from their innovations. To bad they haven't
AMD's problem is marketing. They have no idea at all. They want to put out a halo card with 4GB. They will sell it as 4K ready. 4GB vs 12GB. There is a real disconnect there. They already lost the marketing battle. If they have to go into details why 4GB HBM is better than 12GB GDDR5: more bandwidth, blah, blah, blah. Nobody cares about that. Can it hold 4K super hi res texture with all post processing enabled? Does it offer more frame rates than GDDR5? They are offering technology that is not mature (hence only 4GB right now) against mature technology that doesn't have that limitation. Their GCN Cores are also inefficient (4096 vs 3072). It takes 25% more cores and 25% more power to equal a Titan X. You save $150 but lose 8GB of frame buffer. It'll be hard sell to consumers that somehow the Fiji is a better deal. Then there's the 980Ti to undercut it in both memory and price. It's going to be ugly for AMD, again.
AMD's problem is marketing. They have no idea at all. They want to put out a halo card with 4GB. They will sell it as 4K ready. 4GB vs 12GB. There is a real disconnect there. They already lost the marketing battle. If they have to go into details why 4GB HBM is better than 12GB GDDR5: more bandwidth, blah, blah, blah. Nobody cares about that. Can it hold 4K super hi res texture with all post processing enabled? Does it offer more frame rates than GDDR5? They are offering technology that is not mature (hence only 4GB right now) against mature technology that doesn't have that limitation. Their GCN Cores are also inefficient (4096 vs 3072). It takes 25% more cores and 25% more power to equal a Titan X. You save $150 but lose 8GB of frame buffer. It'll be hard sell to consumers that somehow the Fiji is a better deal. Then there's the 980Ti to undercut it in both memory and price. It's going to be ugly for AMD, again.
Joe Macri said:If you actually look at frame buffers and how efficient they are and how efficient the drivers are at managing capacities across the resolutions, you’ll find that there’s a lot that can be done. We do not see 4GB as a limitation that would cause performance bottlenecks. We just need to do a better job managing the capacities. We were getting free capacity, because with [GDDR5] in order to get more bandwidth we needed to make the memory system wider, so the capacities were increasing. As engineers, we always focus on where the bottleneck is. If you’re getting capacity, you don’t put as much effort into better utilising that capacity. 4GB is more than sufficient. We’ve had to go do a little bit of investment in order to better utilise the frame buffer, but we’re not really seeing a frame buffer capacity [problem]. You’ll be blown away by how much [capacity] is wasted.
From article said:According to Macri, GDDR5 fed GPUs actually have too much unused memory today. Because to increase GPU memory bandwidth, wider memory interfaces are used. And because wider memory interfaces require a larger amount of GDDR5 memory chips, GPUs ended up with more memory capacity than is actually needed.
Macri also stated that AMD invested a lot into improving utilization of the frame buffer. This could include on-die memory compression techniques which are integrated into the GPU hardware itself. Or more clever algorithms on the driver level.
AMD Addresses Potential Fiji 4GB HBM Capacity Concern Investing In More Efficient Memory Utilization
I think it's interesting that people were criticizing NVIDIA for going with compression on the 256-bit Maxwell parts, and now AMD is doing to same. Some people are already parroting the line that "4GB is enough" because AMD said so. I wonder what people with 4k Eyefinity would say about that.
A lot of tough talk. We'll see how that plays out in the real world.
Fair enough, I guess we will know with the H's reviews that are sure to come.
One thing.. although I am struggling a bit to understand the tech, it would seem that it makes certain parts of the GPU itself simpler also. It might help further in power/heat savings, squeeze more performance then?
Zarathustra[H];1041613833 said:What I don't get about texture compression is the following.
Is this just making up for poor game developer optimization?
If textures can be compressed without much (any?) quality loss, wouldn't it be better to do so one time up front, rather than have the GPU waste cycles doing it as the game is running?
Compression is a bandaid solution for limitations. Nvidia's answer to limited bandwidth and AMD's answer to insufficient RAM. Any time you throw in compression, then you have to create ASICS for that task (I doubt uses the CPU for that) as well as introduce latency. In the end, I doubt they can compress 8GB into 4GB, no matter how efficient their compression is.
Surely this will help the memory bandwidth starved APU's if you can put a 1/2 stack or full stack for 512mb or 1gb of RAM right there for it, yes? Really, how many of these could you fit onto an existing APU? Can you fit one of them?
And for AMD's next CPU, can this be used as well? Either as a replacement for L3 cache or as L4 Cache before turning to the DDR3/4?
Compression is a bandaid solution for limitations. Nvidia's answer to limited bandwidth and AMD's answer to insufficient RAM. Any time you throw in compression, then you have to create ASICS for that task (I doubt uses the CPU for that) as well as introduce latency.
In the end, I doubt they can compress 8GB into 4GB, no matter how efficient their compression is.
It won't be much use as a cache as it's still DRAM so won't have the lower latency that made external SRAM caches work. I would expect it see it as video memory and/or main memory for their APUs. I can see this enabling APUs to match the lower end cards that used to beat them just because of VRAM bandwidth advantage.
The lower power may be key for mobile (tablet and laptop) applications.
Zarathustra[H];1041613833 said:If textures can be compressed without much (any?) quality loss, wouldn't it be better to do so one time up front, rather than have the GPU waste cycles doing it as the game is running?
Can't everyone else also see the PS4.1 and XBone.1 coming in a few years with HMB?