AMD Looking to Chiplets

Discussion in 'HardForum Tech News' started by FrgMstr, Jun 25, 2018.

  1. FrgMstr

    FrgMstr Just Plain Mean Staff Member

    Messages:
    48,110
    Joined:
    May 18, 1997
    Silicon interposers are nothing new to the geeks around these parts, as we have seen AMD use these for a while now in regards to Vega GPUs and HBM2. AMD is however outlining some plans on how to use these interposers to connect full networks of CPUs and GPUs onto one piece of silicon. This is being referred to as "chiplets." And it was highly intersint that the one example showed multiple GPU chiplets in use. Thanks cageymaru.


    A future system might contain a CPU chiplet and several GPUs all attached to the same piece of network-enabled silicon.

    Amazingly, if you follow those rules you can pretend everything else on the interposer—all the other logic chiplets, memory, the interposer’s own network, everything—is just one node on the network. Knowing that, separate teams of engineers can design chiplets without having to worry about how the networks on other chiplets work or even how the network on the active interposer works.
     
  2. stadisticado

    stadisticado n00b

    Messages:
    1
    Joined:
    Nov 16, 2010
    This isn't entirely surprising. We've seen nVidia, AMD, Xilinx and ex-Altera all have to move to interposer solutions. Intel/Altera have also moved to their EMIB tech to accomplish the same thing. It's only natural that we'd eventually see the bottom silicon start to have transistors in it.

    Now, what this says about future node health (e.g. needing to make small die and stitch them together) and the thermal challenges of cooling stacked active silicon is another interesting story.
     
    Sulphademus likes this.
  3. katanaD

    katanaD [H]ard|Gawd

    Messages:
    1,987
    Joined:
    Nov 15, 2016
    im wondering how long before we get the chips interconnected via optics instead of electrical paths.
     
  4. Yep mGPU will never happen with sub processors....*coughs* /s
     
    {NG}Fidel and N4CR like this.
  5. Gweenz

    Gweenz [H]ard|Gawd

    Messages:
    1,216
    Joined:
    Dec 18, 2003
    Great now I'm hungry.
     
    gtrguy and FrgMstr like this.
  6. horskh

    horskh Limp Gawd

    Messages:
    135
    Joined:
    Jan 19, 2018
    Keeps your breath fresh while you game.
     
    Uvaman2 and Darth Kyrie like this.
  7. cjcox

    cjcox [H]ard|Gawd

    Messages:
    1,097
    Joined:
    Jun 7, 2004
    There's an old warehouse with tons of PC Jr. peripherals that got very excited for a bit, until they realized the word was "chiplets".
     
  8. JosiahBradley

    JosiahBradley [H]ard|Gawd

    Messages:
    1,721
    Joined:
    Mar 19, 2006
    Anyone remember when AMD bought Seamicro back in 2012? They've been dreaming this up, albeit using different tech, for awhile now. Remember, the future is fusion.
     
    N4CR likes this.
  9. pcgeekesq

    pcgeekesq [H]ard|Gawd

    Messages:
    1,403
    Joined:
    Apr 23, 2012
    I knew people working on a version of Multi-Chip Module (MCM) tech at GE's Corporate R&D center back in ... let me see ... 1989 I believe.
     
    viscountalpha and GoldenTiger like this.
  10. If you look at the Vega 10 blcok diagram, there are AMD calls NCUs. They are mostly independent and would work well with these designs.
     
  11. Anarchist4000

    Anarchist4000 [H]ard|Gawd

    Messages:
    1,659
    Joined:
    Jun 10, 2001
    NCUs would likely be too small to be practical. Shader engines might be more representative.
     
  12. Shaders/ROPs are part of the NCU enhancements under VEGA

    "Finally, in Vega the render back-ends also known as Render Output Units or ROPs for short are now clients of the L2 cache rather than the memory pool. This implementation is especially advantageous in boosting performance of games that use deferred shading."

    As data is separated onto a cache for the ROPs, AMD has already began segmenting the data to be independent as I predicted. From what I have read the GCN architecture limits the number of ROPs to 64. This is broken between 4 blocks of 16 each. Now imagine eliminating that limitation and making 8 blocks of 16 each. You're no longer limited by the die size. Where NVIDIA wins on efficiency, AMD could win on brute force.
     
    N4CR likes this.
  13. NismoTigerWVU

    NismoTigerWVU n00b

    Messages:
    1
    Joined:
    Oct 4, 2012
    I wouldn't even call that brute force. Sure, they might have to throw more square milometers of Si at it, but it still could still be more cost effective in the end due to the geometric scaling of defects with die area for monolithic designs.
     
    N4CR likes this.
  14. Anarchist4000

    Anarchist4000 [H]ard|Gawd

    Messages:
    1,659
    Joined:
    Jun 10, 2001
    The 4 blocks of 16 are the shader engines and likely a more ideal chiplet size for manufacturing. A Vega10 comparable chiplet would be <10mm2 if broken into NCUs. Certainly possible, but as I said it might not be all that practical to manufacture. A SE would be <200mm2 which may be more reasonable to scale. GCN doesn't limit ROPs, just that the scaling hasn't been worthwhile to implement. 64 is a nice square from a tiling perspective and chiplets essentially work around that limit already.

    Brute force aside, more silicon at lower clocks will be more efficient and quite possibly cheaper. The only cache is the serialized front-end that currently requires higher clocks. There is already talk of designing APIs around that bottleneck and a dedicated chiplet for just the front end with high clocks wouldn't be unreasonable either.
     
    N4CR likes this.
  15. Zarathustra[H]

    Zarathustra[H] Official Forum Curmudgeon

    Messages:
    27,727
    Joined:
    Oct 29, 2000
    I'm going to make a pseudo educated guess, and say probably never. Whenever you go from an electrical signal to an optical one, you need a transducer, and transducers by their very nature add latency. These interconnects would need to be EXTREMELY low latency. I'd imagine this will eventually allow for stringing together multiple CPU's or GPU's, but that they probably would have to be pretty close together on a board.
     
  16. Azphira

    Azphira [H]ard|Gawd

    Messages:
    1,821
    Joined:
    Aug 18, 2003
  17. KazeoHin

    KazeoHin [H]ardness Supreme

    Messages:
    7,783
    Joined:
    Sep 7, 2011

    At the scale of a piece of silicon, light and electric signal travel virtually the same speed. It wouldn't make much of a difference.
     
  18. Evildead666

    Evildead666 n00b

    Messages:
    48
    Joined:
    Jul 7, 2011
    i agree that light will probably be never used to communicate internally within a chip, although it might get there some day.

    I think Intel's been working on this type of tech for quite some time.
    I think its mainly for chip interconnects, but with light, I suppose you could have it travel quite some distance, more than an electrical cable would allow in that power envelope, at higher bandwidth.
    Imagine trying to connect thousands of CPU's together, with maybe 8-32 (or more) of these per chip. Sort of a replacement for infiniband, closer to the CPU itself, rather than from board to board, more CPU to CPU directly (or SoC to SoC).

    Thunderbolt was supposed to be with an Optical Cable integrated iirc (Light Peak), but was left out because it was either not ready for Consumer use, or was not needed yet.

    Isn't Optical also impervious to EM fields ? (I am not an electrical/electronic engineer).
     
  19. About 80% speed of light through silicon if I remember my EE classes correctly. This is provided there is adequate conductive path (wide enough for current) and no material changes or physical imperfections in material.
     
    KazeoHin likes this.
  20. Yes and no. Light is just a visible part of the EM band. However photons as we see them do not react to magnetic fields. Ionizing radiation can be affected by magnetic fields once they interact with matter. Photonic energy is absorbed by matter and matter is subject to magnetic fields. Light photons do however exhibit characteristics of mass (Both a particle and a wave), and are subject to other influences like gravity, or being slowed down as it passes through materials.

    Light based interconnects have to be fixed and likely why it was dropped from the original thunderbolt. The glass used in cables is fragile and subject to easy breakage when bent too tight. In consumer hands it would be a disaster. And whomever said that a transducer is necessary is partially right which introduces latency. However they do have molecular level transducers which makes such delays negligible, but subject to a great amount of noise given the size.

    You also have to weigh latency against bandwidth. Latency is what delay you get between your TX/RX request and how long it takes for you to actually get something useful. Bandwidth is how fast. Now if you have bandwidth that is 400% higher for 5% latency, that works great for huge datasets. For really small pieces of data, that doesn't help you so much.

    That's where a lot of speed improvements for memory has come from. They ramp up latency to get more bandwidth. It's a trade off of design. DDR3->DDR4 did this. And at a similar clock DDR3 was always faster. But DDR4 eventually won because it could scale faster in speed (Bandwidth)

    That is your worthless trivia of the day.
     
    Last edited by a moderator: Jun 26, 2018
    N4CR likes this.
  21. Evildead666

    Evildead666 n00b

    Messages:
    48
    Joined:
    Jul 7, 2011
    Cheers, thanks for the info. :)
     
  22. NoOther

    NoOther [H]ardness Supreme

    Messages:
    6,431
    Joined:
    May 14, 2008
    I believe you are referring to QPI link for Intel. SGI, now owned HP, created a NUMAlink technology that used fiber optics between systems to link up to 32 Processors and memory.
     
  23. N4CR

    N4CR 2[H]4U

    Messages:
    3,623
    Joined:
    Oct 17, 2011
    Laser can do it. We just need to find a way to make atom sized lasers.
    Some larger diode lasers have rise times in order of 10ns (at power levels you'd never use in a CPU), this likely can be improved for low power usage.
     
  24. Zarathustra[H]

    Zarathustra[H] Official Forum Curmudgeon

    Messages:
    27,727
    Joined:
    Oct 29, 2000
    Well, regardless of the size, you Still need to transform energy from one form to another, and that's inherently going to involve latency
     
    Chimpee likes this.
  25. LuxTerra

    LuxTerra Limp Gawd

    Messages:
    211
    Joined:
    Jul 3, 2017
    The silicon photonics prototypes for IC interconnects work a bit differently than the fiber / vertical cavity lasers used for long distance communications. They're mostly based on WDM waveguides and tuned silicon that only "receives" one frequency. I'm not sure where in the TLR level stack these are at the moment...just haven't kept up with it lately, but the concepts been around and tested for years (as with essentially every technology that makes it to mass production, TRLs are valid for very good reasons).
     
Tags: