AMD HBM High Bandwidth Memory Technology Unveiled @ [H]

Can't everyone else also see the PS4.1 and XBone.1 coming in a few years with HMB?

was going to post that same thing earlier.. definitely could see the xbox one and PS4 and nintendo using HBM when they decide to do a hardware refresh sometime after HBM gen 2 is released. also see there being a really good market for the high end performance/enterprise laptops/tablets since those gpu's won't need the same PCB real estate that current gpu's need for dedicated graphic memory.


Also, for the great number of people in this world 4gb of system RAM is enough. Wouldn't an APU with 4gb of on-die RAM (shared with GPU) be the perfect setup for a super small form factor unit? You need onboard NIC, sound, and SATA and you're done.

Also, didn't AMD say with the newest version of Mantle that X-fire cards get to access the VRAM as a single pool? Wouldn't that mean a hypothetical X395 with 8gb of HMB... actually be a true 8GB? You could see a lot more dual solutions down the line, no? 385, 375, etc etc..

if that ends up happening, that would be awesome.
 
Just get Pied Piper on that.

I think the increased bandwidth and lower power will be great for mobile phones.

This first generation of cards should learn a lot.

I'm concerned that a GPU sitting right next to these in the same package will fry them..

As far as the slides go, it seemed they were showing a performance comparison of 2 290x's one with hbm, one with gddr5... where's the last slides?

Thermal dissipation is going to be an issue. I would personally like to see them ship with watercooling from the factory. That could be a big solution to that whole issue.
 
Thermal dissipation is going to be an issue. I would personally like to see them ship with watercooling from the factory. That could be a big solution to that whole issue.

i dont know about that the hbm modules run at 500mhz and at a lower voltage than gddr5 that kinda points to producing less heat and since it is on the die it is not like they are not cooling it. The heat off the core heating up the memory I think is going to be the biggest issue also we have no idea how well this memory handles clock speed 500 mhz might be tight to tolerance it might not overclock well at all. Or it might do very well but just has no real benefit with so much more bandwidth.
 
Last edited:
I've worked with Joe Macri back in the DEC days in the early 90's and if he says this shit will work, it certainly will.
 
Thermal dissipation is going to be an issue. I would personally like to see them ship with watercooling from the factory. That could be a big solution to that whole issue.

Well, you may have it concentrated in a smaller area, but HBM is also producing about a 3rd of the heat for the same size and performance. (power use directly correlates to heat production)

So I'm not too concerned here. At least not yet. Maybe down the road when we have 32GB of Vram at much higher clock speeds, this will become an issue, but not yet.
 
Zarathustra[H];1041611649 said:
HBM is undoubtedly the future memory tech, at the very least for Video cards.

I question how much of a practical performance benefit it will actually have in this generation of video cards - however.

You are probably right for this generation of GPUs for gaming.
But where this is going to shine, is in HPC and learning applications.
 
Catch 22:
Current GPUs are not bottlenecked by memory bandwidth (much), and are instead themselves the weak link. HOWEVER, with a GPU with enough horsepower to use the greater bandwidth, you'll need more than 4gb of vram.

From what I've seen in HPC applications, memory bandwidth and data bus speed (PCI-E) are the two massive bottlenecks.
NVLINK is going to eventually replace PCI-E, and hopefully HBM will become the new memory standard.
 
NVLINK is going to eventually replace PCI-E, and hopefully HBM will become the new memory standard.

I certainly hope not.

The LAST thing we need is a proprietary expansion slot format...

Hopefully - outside of specialized supercomputing applications - we will continue to rely on new revisions of PCIe instead. All the interconnectivity standards need to be open!
 
NVLINK is going to eventually replace PCI-E, and hopefully HBM will become the new memory standard.

NVLink is gonna what? You think Intel and AMD are going to let NVIDIA dictate standards?

HBM surely will become a new standard. Everything that Joe Macri and AMD have pushed out in terms of GPU memory over the years has been very successful.
 
Once the MC became the IMC, NVIDIA's influence into interconnect standards went dormant IMO.
 
Zarathustra[H];1041616590 said:
Yep, and the distinction here is that while AMD is a co-developer of HBM, it is not a proprietary AMD standard.

NVLink on the other hand...


Also, luckily history has proven this.

IBM tried to push their proprietary Microchannel bus in the 90's, but it failed in favor of the slower, but open standard that was Vesa Local Bus.

Ah this is true, I never thought of it that way, good points!
 
PCI-E 4.0 is supposedly quadrupling maximum bandwidth to 64 GB/s over 16 lanes. We still only know that NVLink will be "between 80 and 200 GB/s." As far as interconnectivity goes I want a universal standard in place. Besides, I don't think Intel will go along with adopting NVLink into their chipsets.
 
PCI-E 4.0 is supposedly quadrupling maximum bandwidth to 64 GB/s over 16 lanes. We still only know that NVLink will be "between 80 and 200 GB/s." As far as interconnectivity goes I want a universal standard in place. Besides, I don't think Intel will go along with adopting NVLink into their chipsets.

Right now it's only promised at 32GB/s over 16 lanes.

Don't get your hopes up. A 16Gbps signaling rate over copper is absolutely insane - DisplayPort 1.3 (and of course Thunderbolt 3) won't even top 10 Gbps per-lane, so this is really going to be insane. The DSP required to double that at 32GBps per-lane would be astronomical, and that's not going to work for a consumer standard.

I wonder what modulation mwethod they're going to use to pack that many bits onto the same piece of copper? I'm assuming PCIe, much like Ethernet, used up any available base-bandwidth long ago :D
 
Last edited:
Right now it's only promised at 32GB/s over 16 lanes.

Don't get your hopes up. A 16Gbps signaling rate over copper is absolutely insane - DisplayPort 1.3 (and of course Thunderbolt 3) won't even top 10 Gbps per-lane, so this is really going to be insane. The DSP required to double that at 32GBps per-lane would be astronomical, and that's not going to work for a consumer standard.

I wonder what signalling standard they're going to use to pack that many bits onto the same piece of copper? I'm assuming PCIe, much like Ethernet, used up any available base-bandwidth long ago :D

I've just been conditioned to expect an approximate doubling of the bandwidth per lane every generation, so approximately 32GB/s for a 16x 4th gen link sounds about right :p
 
This doesn't go on the mobo. It and the *PU go on an interposer together eliminating the complexity and space needed for the board. Also the connections can be smaller and much more precise.

Right -

I'm thinking DIMMs build with this technology.

Maybe even an extra wide bus between cpu and memory. Of course the speeds and density would need to be significantly higher to make the re-design worth it.
 
Here's an interesting paper published in 2012 that discusses the power savings and bandwidth improvements with HBM for the technically inclined.
 
Right -

I'm thinking DIMMs build with this technology.

Maybe even an extra wide bus between cpu and memory. Of course the speeds and density would need to be significantly higher to make the re-design worth it.

From what I read the longer the interconnect from the hbm ram it exponentially decreased performance. So memory modules are out but perhaps apu could benefit greatly give them 2-4 gig of hbm on the apu. You could have an a10 on board gpu that works as well as or better than low end discrete cards.

My thoughts are can hbm and regular memory work together or is it one or the other. If it can work and I think it should be able to hsa is pretty much what this is about management of a pool of dissimilar memory to work fluidly between gpu and cpu.

I would love to see the zen version of the a10 built this way.
 
Right -

I'm thinking DIMMs build with this technology.

Maybe even an extra wide bus between cpu and memory. Of course the speeds and density would need to be significantly higher to make the re-design worth it.

You're not going to be able to put this on a DIMM since there is no way to pack the really wide bus onto a connector.

You could do memory boards that have the controller chip plus the HBM modules on an interposer and then a link back to the *PU. This would be close to the old fully buffered DIMM idea or the memory fanout chips on some servers now. Drawback would be added latency and you would need a much different form factor than DIMMs have now, likely taller and spaced father apart.
 
Yup, the reason this tech works at all is because the wide interface is on-package. They can save power by going wider and slow versus a normal BGA chip. But since a wide bus takes up massive amounts of board space and power, you have to keep the chips very close to keep costs and power down.

FBDIMM does the same thing, but since it requires a buffer controller, it's much more expensive (and the controller adds power).
 
From what I read the longer the interconnect from the hbm ram it exponentially decreased performance. So memory modules are out but perhaps apu could benefit greatly give them 2-4 gig of hbm on the apu. You could have an a10 on board gpu that works as well as or better than low end discrete cards.

My thoughts are can hbm and regular memory work together or is it one or the other. If it can work and I think it should be able to hsa is pretty much what this is about management of a pool of dissimilar memory to work fluidly between gpu and cpu.

I would love to see the zen version of the a10 built this way.

there was talk of using HBM on die as L3 cache so yes you could have HBM and regular memory on board, theoretically.
 
remember when ATI would just make a graph that started above zero to emphasize a tiny advantage? even though they sucked in all other measures?
well look no more AMD has improved the technology to, "just making shit up®"
 
there was talk of using HBM on die as L3 cache so yes you could have HBM and regular memory on board, theoretically.

i was referring to as video/regular memory space for an apu as in dimms are an option or left out completely for ultrabook style laptops with a strong apu. Heck hypothetically if everything was small enough a tablet or phone soc...
 
I'd love to see 1GB of more on die L3 HBM!

You people want everything for free.

You know how much it costs to buy a 1GB graphics card right? Prepare to pay at least that much plus CPU cost for an APU with 1GB L3 cache. Every report I have seen says HBM is currently more expensive then GDDR5 + PCB.

AMD has never cut anyone a "deal" on their high-end APUs, unless they don't sell and they have to clear back inventory.

Llano premium launched at $135, a $40 premium over GPU-less Athlon II x4 processors with the same specs.

Trinity premium launched at $122, a $40 premium over the Athlon X4 750K, which was similar in performance (and just as overclockable).

Kaveri had the highest premium yet, launching at $169. And since the CPU gained very little, it was once again comparable to the new speed-bumped $85 Athlon X4 860K, which is Steamroller and the exact same clocks as the 7850k!

Could you imagine the price they would have to charge with 1GB on-chip? They already couldn't sell Kabini at the price they needed to to make money, so what makes you think people will pay more for this?
 
The new Rambus memory with uber bandwidth :p . Except this memory is open for others to use without endless lawsuits.

I do not see why a 4gb HBM APU cannot have also a pool of cheaper DDR3 ram, OS could keep active programs/GPU with the faster HBM memory and page out to DDR3 the slower ram as needed. Obviously there will be another controller between the two standards. Would the overhead be worth it in the extra costs? Software/drivers? I do not know. Sounds like it would price itself out of the market with the added complexity.

A game console with HBM APU sounds awesome though, Nintendo?
 
You know how much it costs to buy a 1GB graphics card right? Prepare to pay at least that much plus CPU cost for an APU with 1GB L3 cache. Every report I have seen says HBM is currently more expensive then GDDR5 + PCB.

I would like to see that report.
 
I would like to see that report.

that report will also change and as the process for making it matures it will get cheaper putting it on an apu is a very good idea as the main thing that has held them back is the memory speed. As was seen with kavari and how the a10-7850k would gain 40%-50% by having 2400 mhz ddr3 vs 1333 mhz ddr3. I imagine somewhere it would plateau off but hbm would help immensely
 
I would like to see that report.

Charlie claims it's more expensive here:

http://semiaccurate.com/2015/05/19/amd-finally-talks-hbm-memory/

In a nutshell, if you count the costs of the interposer, HBM is probably a bit more expensive than GDDR5 for approximately the same bandwidth, 448GBps vs 512GBps. That isn’t good. It also consumes about a third the power of GDDR5, that is good, very good but not a game changer in the world of 200-300W graphics cards. If this is true, why go with HBM now?

It make sense, since GDDR5 was a whole lot more expensive than GDDR3 on introduction. The only thing that saved it from the demise that befell GDDR4 was the massive bandwidth increase, allowing for half the bus size / half the memory devices Nvidia was forced to use on The GTX 280..

It will get cheaper as it gets produced in larger numbers and process tech matures, just like GDDR5. Anyway, I just wanted to point out that just because it's on-die, don't expect the cost of those dies to magically be so cheap you can add them to anything for nothing.. Including a1GB fast ram on-interposer is just as expensive as putting 1GB fast ram on a discrete PCB, so stop pretending like you'll ever see capacities as high as 1GB in the next 5 years. Intel has a hard enough problem making their 128MB single extra L4 chip on-package affordable.

We have obviously hit the point where HBM's size and power benefits outweigh the extra cost and capacity limits for AMD to implement it, since Charlie claims it's been viable since 2012 (not sure I believe that, but given the downsides of HBM it's not impossible they sat on it)
 
Last edited:
HBM is just DRAM with a wide interface. The latency is about the same as main memory, but depends on your access pattern. The organization (banks, rows, etc.) is about the same as SDRAM. So you get a lot of bandwidth if your access pattern works with that memory organization, but not the benefit of reduction latency that you would get from an SRAM cache. If you wanted a chip-private high bandwidth memory, it's the right kind of solution.
 

Well of course it is more expensive but not significantly so...
I guess the other guy was talking about using it as true on-die L3 cache, which won't happen.
It will be used on APUs in the next ~2years. We might see HBM, or something similar, true 3d stack on GPU/CPU dies in ~5years.
I still expect TOP-PIM to be the next step after Fiji, which we might see in 1.5-3years. They definitely need at least 16/14nm for TOP-PIM but it may be pushed back until 10nm.
 
It will get cheaper as it gets produced in larger numbers and process tech matures, just like GDDR5. Anyway, I just wanted to point out that just because it's on-die, don't expect the cost of those dies to magically be so cheap you can add them to anything for nothing.. Including a1GB fast ram on-interposer is just as expensive as putting 1GB fast ram on a discrete PCB, so stop pretending like you'll ever see capacities as high as 1GB in the next 5 years. Intel has a hard enough problem making their 128MB single extra L4 chip on-package affordable.

If AMD is able to slap 4GB on an interposer combined with a massive GPU, how can one say that you won't see 1GB HBM DDR in 5 years? Since a single HBM1 stack is 1GB, AMD can take what they learned in creating Fiji, and apply it to the graphics core of an APU.

I can see it happening within the next two years.
 
If AMD is able to slap 4GB on an interposer combined with a massive GPU, how can one say that you won't see 1GB HBM DDR in 5 years? Since a single HBM1 stack is 1GB, AMD can take what they learned in creating Fiji, and apply it to the graphics core of an APU.

I can see it happening within the next two years.

Well the stacks are going to be 5x7 mm that is really small and it looked like they were bga attached to the interposer like the die. Also the memory can be made on the old 65nm process which means it should be cheaper much quicker.
 
Well the stacks are going to be 5x7 mm that is really small and it looked like they were bga attached to the interposer like the die. Also the memory can be made on the old 65nm process which means it should be cheaper much quicker.

The silicon interposer is manufactured on the 65nm process from UMC.
The HBM ICs are on an unknown process from Hynix. I'm pretty sure it is 29/28nm.22/20nm is what HBM2 will be on, which is why we will be seeing much higher density. It definitely isn't smaller than 20nm.

Edit- Hynix fell behind other DRAM/NAND manufacturers due to that fire at their fab. They lost at least 6 months, maybe close to a year.
 
Last edited:
it has a 1024bit bus vs standard 32bit bus at 1/3 the speed, much less heat/watt usage and 90% pluss less foot print. The speed loss is misleading, don't measure by mhz. This is a faster new tech with a smalller footprint that produces less heat and requires less power. What is not to love? 500mhz vs 1500mhz gddr5 sounds bad. However 32x32 =1024. this seems huge at the throughput provided by the new bus architecture. Cant wait hope for 370x with same tech.
 
IBM has taken a similar approach to HBM memory stacking with their separate memory die integration on their CPU's since Power5.
https://en.wikipedia.org/wiki/POWER5

I hope AMD has taken a page from IBM's playbook, and is very careful in their heatsink design for these new HBM chips. Granted, these are not socketable-chips, but still.

IBM power 6 chip example:
http://www.cpu-world.com/forum/files/power6_single_die-2_191.jpg
look at the machined boarder on that chip face, providing an even mounting surface and adequate stress relief. Thus, no die cracking from a wobbly heatsink mounting interface.

If AMD packages it up like their R9 290, with a designated metal framed design, or use an IHS-like design (like the nvidia 8800gtx) this will be golden. Or else we'll get a lot of broken memory dies on GPU packages and a massive recall.
Don't mess this up AMD. thanks.
 
IBM has taken a similar approach to HBM memory stacking with their separate memory die integration on their CPU's since Power5.
https://en.wikipedia.org/wiki/POWER5

I hope AMD has taken a page from IBM's playbook, and is very careful in their heatsink design for these new HBM chips. Granted, these are not socketable-chips, but still.

IBM power 6 chip example:
http://www.cpu-world.com/forum/files/power6_single_die-2_191.jpg
look at the machined boarder on that chip face, providing an even mounting surface and adequate stress relief. Thus, no die cracking from a wobbly heatsink mounting interface.

If AMD packages it up like their R9 290, with a designated metal framed design, or use an IHS-like design (like the nvidia 8800gtx) this will be golden. Or else we'll get a lot of broken memory dies on GPU packages and a massive recall.
Don't mess this up AMD. thanks.
Won't look exactly like this test vehicle but Fiji will definitely have an IHS.

 
IBM has taken a similar approach to HBM memory stacking with their separate memory die integration on their CPU's since Power5.
https://en.wikipedia.org/wiki/POWER5

I hope AMD has taken a page from IBM's playbook, and is very careful in their heatsink design for these new HBM chips. Granted, these are not socketable-chips, but still.

IBM power 6 chip example:
http://www.cpu-world.com/forum/files/power6_single_die-2_191.jpg
look at the machined boarder on that chip face, providing an even mounting surface and adequate stress relief. Thus, no die cracking from a wobbly heatsink mounting interface.

If AMD packages it up like their R9 290, with a designated metal framed design, or use an IHS-like design (like the nvidia 8800gtx) this will be golden. Or else we'll get a lot of broken memory dies on GPU packages and a massive recall.
Don't mess this up AMD. thanks.

Oh my gosh! I hope not. Have you contacted AMD directly? If not please do so immediately! Thanx in advance from an AMD fanboy.
 
Back
Top