AMD Looking to Chiplets

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
55,534
Silicon interposers are nothing new to the geeks around these parts, as we have seen AMD use these for a while now in regards to Vega GPUs and HBM2. AMD is however outlining some plans on how to use these interposers to connect full networks of CPUs and GPUs onto one piece of silicon. This is being referred to as "chiplets." And it was highly intersint that the one example showed multiple GPU chiplets in use. Thanks cageymaru.


A future system might contain a CPU chiplet and several GPUs all attached to the same piece of network-enabled silicon.

Amazingly, if you follow those rules you can pretend everything else on the interposer—all the other logic chiplets, memory, the interposer’s own network, everything—is just one node on the network. Knowing that, separate teams of engineers can design chiplets without having to worry about how the networks on other chiplets work or even how the network on the active interposer works.
 
This isn't entirely surprising. We've seen nVidia, AMD, Xilinx and ex-Altera all have to move to interposer solutions. Intel/Altera have also moved to their EMIB tech to accomplish the same thing. It's only natural that we'd eventually see the bottom silicon start to have transistors in it.

Now, what this says about future node health (e.g. needing to make small die and stitch them together) and the thermal challenges of cooling stacked active silicon is another interesting story.
 
im wondering how long before we get the chips interconnected via optics instead of electrical paths.
 
There's an old warehouse with tons of PC Jr. peripherals that got very excited for a bit, until they realized the word was "chiplets".
 
Anyone remember when AMD bought Seamicro back in 2012? They've been dreaming this up, albeit using different tech, for awhile now. Remember, the future is fusion.
 
  • Like
Reactions: N4CR
like this
If you look at the Vega 10 blcok diagram, there are AMD calls NCUs. They are mostly independent and would work well with these designs.
 
NCUs would likely be too small to be practical. Shader engines might be more representative.

Shaders/ROPs are part of the NCU enhancements under VEGA

"Finally, in Vega the render back-ends also known as Render Output Units or ROPs for short are now clients of the L2 cache rather than the memory pool. This implementation is especially advantageous in boosting performance of games that use deferred shading."

As data is separated onto a cache for the ROPs, AMD has already began segmenting the data to be independent as I predicted. From what I have read the GCN architecture limits the number of ROPs to 64. This is broken between 4 blocks of 16 each. Now imagine eliminating that limitation and making 8 blocks of 16 each. You're no longer limited by the die size. Where NVIDIA wins on efficiency, AMD could win on brute force.
 
  • Like
Reactions: N4CR
like this
I wouldn't even call that brute force. Sure, they might have to throw more square milometers of Si at it, but it still could still be more cost effective in the end due to the geometric scaling of defects with die area for monolithic designs.
 
  • Like
Reactions: N4CR
like this
As data is separated onto a cache for the ROPs, AMD has already began segmenting the data to be independent as I predicted. From what I have read the GCN architecture limits the number of ROPs to 64. This is broken between 4 blocks of 16 each. Now imagine eliminating that limitation and making 8 blocks of 16 each. You're no longer limited by the die size. Where NVIDIA wins on efficiency, AMD could win on brute force.
The 4 blocks of 16 are the shader engines and likely a more ideal chiplet size for manufacturing. A Vega10 comparable chiplet would be <10mm2 if broken into NCUs. Certainly possible, but as I said it might not be all that practical to manufacture. A SE would be <200mm2 which may be more reasonable to scale. GCN doesn't limit ROPs, just that the scaling hasn't been worthwhile to implement. 64 is a nice square from a tiling perspective and chiplets essentially work around that limit already.

Brute force aside, more silicon at lower clocks will be more efficient and quite possibly cheaper. The only cache is the serialized front-end that currently requires higher clocks. There is already talk of designing APIs around that bottleneck and a dedicated chiplet for just the front end with high clocks wouldn't be unreasonable either.
 
  • Like
Reactions: N4CR
like this
im wondering how long before we get the chips interconnected via optics instead of electrical paths.

I'm going to make a pseudo educated guess, and say probably never. Whenever you go from an electrical signal to an optical one, you need a transducer, and transducers by their very nature add latency. These interconnects would need to be EXTREMELY low latency. I'd imagine this will eventually allow for stringing together multiple CPU's or GPU's, but that they probably would have to be pretty close together on a board.
 
iplhl8g.jpg
 
im wondering how long before we get the chips interconnected via optics instead of electrical paths.


At the scale of a piece of silicon, light and electric signal travel virtually the same speed. It wouldn't make much of a difference.
 
I'm going to make a pseudo educated guess, and say probably never. Whenever you go from an electrical signal to an optical one, you need a transducer, and transducers by their very nature add latency. These interconnects would need to be EXTREMELY low latency. I'd imagine this will eventually allow for stringing together multiple CPU's or GPU's, but that they probably would have to be pretty close together on a board.

i agree that light will probably be never used to communicate internally within a chip, although it might get there some day.

I think Intel's been working on this type of tech for quite some time.
I think its mainly for chip interconnects, but with light, I suppose you could have it travel quite some distance, more than an electrical cable would allow in that power envelope, at higher bandwidth.
Imagine trying to connect thousands of CPU's together, with maybe 8-32 (or more) of these per chip. Sort of a replacement for infiniband, closer to the CPU itself, rather than from board to board, more CPU to CPU directly (or SoC to SoC).

Thunderbolt was supposed to be with an Optical Cable integrated iirc (Light Peak), but was left out because it was either not ready for Consumer use, or was not needed yet.

Isn't Optical also impervious to EM fields ? (I am not an electrical/electronic engineer).
 
At the scale of a piece of silicon, light and electric signal travel virtually the same speed. It wouldn't make much of a difference.

About 80% speed of light through silicon if I remember my EE classes correctly. This is provided there is adequate conductive path (wide enough for current) and no material changes or physical imperfections in material.
 
i agree that light will probably be never used to communicate internally within a chip, although it might get there some day.

I think Intel's been working on this type of tech for quite some time.
I think its mainly for chip interconnects, but with light, I suppose you could have it travel quite some distance, more than an electrical cable would allow in that power envelope, at higher bandwidth.
Imagine trying to connect thousands of CPU's together, with maybe 8-32 (or more) of these per chip. Sort of a replacement for infiniband, closer to the CPU itself, rather than from board to board, more CPU to CPU directly (or SoC to SoC).

Thunderbolt was supposed to be with an Optical Cable integrated iirc (Light Peak), but was left out because it was either not ready for Consumer use, or was not needed yet.

Isn't Optical also impervious to EM fields ? (I am not an electrical/electronic engineer).

Yes and no. Light is just a visible part of the EM band. However photons as we see them do not react to magnetic fields. Ionizing radiation can be affected by magnetic fields once they interact with matter. Photonic energy is absorbed by matter and matter is subject to magnetic fields. Light photons do however exhibit characteristics of mass (Both a particle and a wave), and are subject to other influences like gravity, or being slowed down as it passes through materials.

Light based interconnects have to be fixed and likely why it was dropped from the original thunderbolt. The glass used in cables is fragile and subject to easy breakage when bent too tight. In consumer hands it would be a disaster. And whomever said that a transducer is necessary is partially right which introduces latency. However they do have molecular level transducers which makes such delays negligible, but subject to a great amount of noise given the size.

You also have to weigh latency against bandwidth. Latency is what delay you get between your TX/RX request and how long it takes for you to actually get something useful. Bandwidth is how fast. Now if you have bandwidth that is 400% higher for 5% latency, that works great for huge datasets. For really small pieces of data, that doesn't help you so much.

That's where a lot of speed improvements for memory has come from. They ramp up latency to get more bandwidth. It's a trade off of design. DDR3->DDR4 did this. And at a similar clock DDR3 was always faster. But DDR4 eventually won because it could scale faster in speed (Bandwidth)

That is your worthless trivia of the day.
 
Last edited by a moderator:
  • Like
Reactions: N4CR
like this
Yes and no. Light is just a visible part of the EM band. However photons as we see them do not react to magnetic fields. Ionizing radiation can be affected by magnetic fields once they interact with matter. Photonic energy is absorbed by matter and matter is subject to magnetic fields. Light photons do however exhibit characteristics of mass (Both a particle and a wave), and are subject to other influences like gravity, or being slowed down as it passes through materials.

Light based interconnects have to be fixed and likely why it was dropped from the original thunderbolt. The glass used in cables is fragile and subject to easy breakage when bent too tight. In consumer hands it would be a disaster. And whomever said that a transducer is necessary is partially right which introduces latency. However they do have molecular level transducers which makes such delays negligible, but subject to a great amount of noise given the size.

You also have to weigh latency against bandwidth. Latency is what delay you get between your TX/RX request and how long it takes for you to actually get something useful. Bandwidth is how fast. Now if you have bandwidth that is 400% higher for 5% latency, that works great for huge datasets. For really small pieces of data, that doesn't help you so much.

That's where a lot of speed improvements for memory has come from. They ramp up latency to get more bandwidth. It's a trade off of design. DDR3->DDR4 did this. And at a similar clock DDR3 was always faster. But DDR4 eventually won because it could scale faster in speed (Bandwidth)

That is your worthless trivia of the day.

Cheers, thanks for the info. :)
 
i agree that light will probably be never used to communicate internally within a chip, although it might get there some day.

I think Intel's been working on this type of tech for quite some time.
I think its mainly for chip interconnects, but with light, I suppose you could have it travel quite some distance, more than an electrical cable would allow in that power envelope, at higher bandwidth.
Imagine trying to connect thousands of CPU's together, with maybe 8-32 (or more) of these per chip. Sort of a replacement for infiniband, closer to the CPU itself, rather than from board to board, more CPU to CPU directly (or SoC to SoC).

Thunderbolt was supposed to be with an Optical Cable integrated iirc (Light Peak), but was left out because it was either not ready for Consumer use, or was not needed yet.

Isn't Optical also impervious to EM fields ? (I am not an electrical/electronic engineer).

I believe you are referring to QPI link for Intel. SGI, now owned HP, created a NUMAlink technology that used fiber optics between systems to link up to 32 Processors and memory.
 
I'm going to make a pseudo educated guess, and say probably never. Whenever you go from an electrical signal to an optical one, you need a transducer, and transducers by their very nature add latency. These interconnects would need to be EXTREMELY low latency. I'd imagine this will eventually allow for stringing together multiple CPU's or GPU's, but that they probably would have to be pretty close together on a board.
Laser can do it. We just need to find a way to make atom sized lasers.
Some larger diode lasers have rise times in order of 10ns (at power levels you'd never use in a CPU), this likely can be improved for low power usage.
 
Laser can do it. We just need to find a way to make atom sized lasers.
Some larger diode lasers have rise times in order of 10ns (at power levels you'd never use in a CPU), this likely can be improved for low power usage.

Well, regardless of the size, you Still need to transform energy from one form to another, and that's inherently going to involve latency
 
The silicon photonics prototypes for IC interconnects work a bit differently than the fiber / vertical cavity lasers used for long distance communications. They're mostly based on WDM waveguides and tuned silicon that only "receives" one frequency. I'm not sure where in the TLR level stack these are at the moment...just haven't kept up with it lately, but the concepts been around and tested for years (as with essentially every technology that makes it to mass production, TRLs are valid for very good reasons).
 
Back
Top