Navi 'nexgen' memory is SSG? Roadmap hints..

N4CR

Supreme [H]ardness
Joined
Oct 17, 2011
Messages
4,947
Roadmap-640x360.jpg


With the SSG release today, is this what Navi will bring to dGPU?

http://semiaccurate.com/2016/07/25/amd-puts-massive-ssds-gpus-calls-ssg/

To me, this seems to be the most likely as we have heard nothing else about a next gen memory from AMD side.

SSG brings huge scalability and 'next gen' type jump in memory capabilities. It could potentially even make vega and similar traditional cards obsolete in some scenarios.
 
I don't think this will be next gen memory as it's too slow, it seems less for consumers and more for people producing content like video or rendering scenes that are quite data intensive or some other type of ultra large data sets a gpu can work on.


I still have no idea what amd means by next gen memory, but I suspect it could be something related to the hp machine, where they sought to decouple the processing from the memory and other components.




To me scalability might mean amd producing smaller die gpus and combining them into a larger effective gpu, the memory limitation and distance requirements could be alleviated with photonic connectors.


Now in that case they were also talking about memristors and universal memory... but that might be too radical for current software. If that was live, the ssg style graphics could have much more speed and open up how much of the graphics workloads are operated on. All the copying data from one place to another would evaporate, I am sure there are many ways this could improve performance for gaming, but I know almost nothing about it so I'll shut up now.

 
Last edited:
SSG isn't for the common user, it is still incredibly slow & inefficient for regular use vs the cost.
 
SSG isn't for the common user, it is still incredibly slow & inefficient for regular use vs the cost.

Agreed. It's for hollywood VFX rendering where 64GB of data/frame means you've not only overflowed the GPUs ram, but your multi-card renderbox has overflowed system ram as well; and moving the SSDs to the cards themselves increases the amount of SSD bandwidth available to each card.
 
Roadmap-640x360.jpg


With the SSG release today, is this what Navi will bring to dGPU?

AMD puts massive SSDs on GPUs and calls it SSG - SemiAccurate

To me, this seems to be the most likely as we have heard nothing else about a next gen memory from AMD side.

SSG brings huge scalability and 'next gen' type jump in memory capabilities. It could potentially even make vega and similar traditional cards obsolete in some scenarios.
Point me to anything from AMD that actually says "next gen." The use of "Nexgen" in this slide has to be purposeful. I think with the unveiling of SSG that we're closer to confirming that they're actually working with the company NexGen, who has been making PCI-E flash memory for a couple years now.

NexGen :: Architecture

Putting it in a high margin professional card now makes sense, as they can amortize and prove the technology in that space and move it down to consumers in a couple years.
 
Roadmap-640x360.jpg


With the SSG release today, is this what Navi will bring to dGPU?

AMD puts massive SSDs on GPUs and calls it SSG - SemiAccurate

To me, this seems to be the most likely as we have heard nothing else about a next gen memory from AMD side.

SSG brings huge scalability and 'next gen' type jump in memory capabilities. It could potentially even make vega and similar traditional cards obsolete in some scenarios.
No not at all.
Why would we even think its SSD based. That SSG GPU was a product for a niche set of the market. Big data, oil and gas companies and video editing.
 
Is not Nexgen actually a memory manufacturer, not a typo in Next-gen?

Yes lol nexgen is the manufacturer. Tiredposting FTL... already read about them a while back and completely forgot it.

Either way, Nexgen is a vendor of large capacity SSD style storage and AMD is supposed to use it for Navi according to roadmaps... I bet they are making some consumer version that is cheaper. I'd assume it's still using HBM/GDDR etc in some flavour for local processing and intermediate data storage.

The persistent worlds, huge texture storage and lack of texture streaming plus pcie latency/utilisation reduction is something we should not overlook.

Just like when GCN came out, no one could see the point for consumer use.. we know how that worked out now. AMD is playing the long term game. It's the quiet guy in the corner who's plotting to steal your missus!
 
The SSD storage makes more sense from the standpoint of using Intel's 3D XPoint. The Nexgen stuff would be interesting as a method to bypass PCIE limitations. Put 32/64GB on the card in addition to a limited amount of video memory acting more like a cache than traditional video memory.
 
Just like when GCN came out, no one could see the point for consumer use.. we know how that worked out now. AMD is playing the long term game. It's the quiet guy in the corner who's plotting to steal your missus!

As the quite guy in the corner i can assure you, that never works. AMD's financials over last few years want to support my claim too.
 
I don't think this will be next gen memory as it's too slow, it seems less for consumers and more for people producing content like video or rendering scenes that are quite data intensive or some other type of ultra large data sets a gpu can work on.


I still have no idea what amd means by next gen memory, but I suspect it could be something related to the hp machine, where they sought to decouple the processing from the memory and other components.




To me scalability might mean amd producing smaller die gpus and combining them into a larger effective gpu, the memory limitation and distance requirements could be alleviated with photonic connectors.


Now in that case they were also talking about memristors and universal memory... but that might be too radical for current software. If that was live, the ssg style graphics could have much more speed and open up how much of the graphics workloads are operated on. All the copying data from one place to another would evaporate, I am sure there are many ways this could improve performance for gaming, but I know almost nothing about it so I'll shut up now.



The technology in HP's "the machine" does not fit the glove...it's about getting a fiber backend and dynamically scalable servers, I take you have had no briefs on The Machine/Synergi?

(I manage +5000 physical HP servers in two datacenter)
 
Yes lol nexgen is the manufacturer. Tiredposting FTL... already read about them a while back and completely forgot it.

Either way, Nexgen is a vendor of large capacity SSD style storage and AMD is supposed to use it for Navi according to roadmaps... I bet they are making some consumer version that is cheaper. I'd assume it's still using HBM/GDDR etc in some flavour for local processing and intermediate data storage.

The persistent worlds, huge texture storage and lack of texture streaming plus pcie latency/utilisation reduction is something we should not overlook.

Just like when GCN came out, no one could see the point for consumer use.. we know how that worked out now. AMD is playing the long term game. It's the quiet guy in the corner who's plotting to steal your missus!
AMD isn't the quiet guy. AMD develops a technology and then tries to force everyone to adopt their approach by playing the nice guy with open source software that is highly biased to their own hardware. The only reason GCN is still relevant today is because of how influential they were in the development of Vulkan and DX12. I can't blame them, though. I'm sure they want to avoid another Bulldozer incident.
 
The technology in HP's "the machine" does not fit the glove...it's about getting a fiber backend and dynamically scalable servers, I take you have had no briefs on The Machine/Synergi?

(I manage +5000 physical HP servers in two datacenter)

I figured it was a long shot but... hp seems to be focusing on the most boring aspect of the technology... lower powered servers.. what about games and real time rendering!

Right now our gpus have their own separate pools of memory with dual gpu setups, why can't two gpus have access to the same pool of memory? Couldn't these optical connections allow that without the distance and latency penalties? Wouldn't that be a boon to multi gpus? And what about inter gpu communication? I don't know this stuff would allow multiple gpus to be "connected" in such a way as to allow them to function as a single mega gpu but... if it could allow it, less constraints on multiple smaller die chips acting as one. You could have 4x 250mm die chips all connected to the same pool of 8-12 GB of video memory.

And later on, why even bother with that, why can't we have a combination of nand flash and dram type of storage/memory where entire games can be stored without the need to load anything into system memory, think of the latencies we could free up there, think of the boosts in performance. I don't know if that memristor stuff is up to the task, but that should be one of the end goals.
 
  • Like
Reactions: N4CR
like this
I knew I remembered NexGen from somewhere. I actually owned one of these..

NexGenNX586PF100-EEL.jpg


They were sold to Cyrix, (IIRC) then sold to AMD. Interesting how it all comes back around.
 
I figured it was a long shot but... hp seems to be focusing on the most boring aspect of the technology... lower powered servers.. what about games and real time rendering!

Right now our gpus have their own separate pools of memory with dual gpu setups, why can't two gpus have access to the same pool of memory? Couldn't these optical connections allow that without the distance and latency penalties? Wouldn't that be a boon to multi gpus? And what about inter gpu communication? I don't know this stuff would allow multiple gpus to be "connected" in such a way as to allow them to function as a single mega gpu but... if it could allow it, less constraints on multiple smaller die chips acting as one. You could have 4x 250mm die chips all connected to the same pool of 8-12 GB of video memory.

And later on, why even bother with that, why can't we have a combination of nand flash and dram type of storage/memory where entire games can be stored without the need to load anything into system memory, think of the latencies we could free up there, think of the boosts in performance. I don't know if that memristor stuff is up to the task, but that should be one of the end goals.

Memristors ends up somewhere between NAND and DDR4 in regards to latency and from what I can see it seems they (HPE) want to "merge" memory and storage and just have one "storage" type.

Makes a lot of sense for a datacenter.

They have never talked about GPU's and anything "special sauce" in that regard and it is a rather large change from Gen9 (current HP servers) to Gen 10 (new gen, fiberbackplane, dynamically scalable, OneView replaced by Synergy)...big things, doubt they care much for AMD GPU to be honest.
Neither HP nor Cisco has any AMD chips in the hardware we use (You cannot order a server with AMD CPU for our datacenters, we run a business, not a charity an AMD CPU is always the lesser choice for our workloads...AMD is a none-player in the serverbusiness in our eyes)
 
I don't think this will be next gen memory as it's too slow, it seems less for consumers and more for people producing content like video or rendering scenes that are quite data intensive or some other type of ultra large data sets a gpu can work on.


I still have no idea what amd means by next gen memory, but I suspect it could be something related to the hp machine, where they sought to decouple the processing from the memory and other components.




To me scalability might mean amd producing smaller die gpus and combining them into a larger effective gpu, the memory limitation and distance requirements could be alleviated with photonic connectors.


Now in that case they were also talking about memristors and universal memory... but that might be too radical for current software. If that was live, the ssg style graphics could have much more speed and open up how much of the graphics workloads are operated on. All the copying data from one place to another would evaporate, I am sure there are many ways this could improve performance for gaming, but I know almost nothing about it so I'll shut up now.




What you're describing sounds a lot like what IBM does on POWER8 systems. There's local memory and cache, but the L4 cache and main RAM is detached and controlled by a device known as a Centaur. The Centaur ties multiple nodes together allowing them all equal access to NUMA memory. I wonder if we will see something like this coming to GPUs, with clusters of compute units being tied together in a similar way.
 
To my understanding it's still going over pci-e, so the bandwidth is exactly the same as dma access from the gpu. Correct me if I'm wrong.
 
To my understanding it's still going over pci-e, so the bandwidth is exactly the same as dma access from the gpu. Correct me if I'm wrong.

I don't know if we are on the same page.

But here is little info form anandtech.

The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec, while reading that same file from the system SSD was only averaging under 900MB/sec, which is lower than what we know 950 Pro can do in sequential reads. After putting some thought into it, I think AMD has hit upon the fact that most M.2 slots on motherboards are routed through the system chipset rather than being directly attached to the CPU. This not only adds another hop of latency, but it means crossing the relatively narrow DMI 3.0 (~PCIe 3.0 x4) link that is shared with everything else attached to the chipset.
 
I don't know if we are on the same page.

But here is little info form anandtech.

The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec, while reading that same file from the system SSD was only averaging under 900MB/sec, which is lower than what we know 950 Pro can do in sequential reads. After putting some thought into it, I think AMD has hit upon the fact that most M.2 slots on motherboards are routed through the system chipset rather than being directly attached to the CPU. This not only adds another hop of latency, but it means crossing the relatively narrow DMI 3.0 (~PCIe 3.0 x4) link that is shared with everything else attached to the chipset.

Yes I understand it's fast, but can the same exact thing not be done using an equally fast storage device using another pcie slot?
 
Yes I understand it's fast, but can the same exact thing not be done using an equally fast storage device using another pcie slot?

I am sure if it could ssd makers will be all over it for the media creation companies. Looks like this is designed especially for people big on data. Seems that way since its a controlled launch.
 
I would think SSD, HBM2 and multiple GPU's on a single interposer for some serious performance. Maybe HPC initially but Nvidia may have yet another competitor springing up in the HPC market.
 
  • Like
Reactions: N4CR
like this
I would think SSD, HBM2 and multiple GPU's on a single interposer for some serious performance. Maybe HPC initially but Nvidia may have yet another competitor springing up in the HPC market.

I think Navi might bring that because they talk about scalability. I hope we see a day soon where they can finally kill multigpu issue with stacking multiple gpus on a single interposer and it acts like on in hardware. That would be truly revolutionary for once, and something that I think we have waited so long for. Still wishing and dreaming lol
 
So for the RX 490 we could be seeing little Vega with two 4GB stacks = 8GB with either 400gb/s or 500gb/s depending upon the HBM2 speed selected. Which begs the question is there only going to be one skew for the RX 490? No X? 495? In other words the 490 would compete against the 1070 and the X or 95 against the 1080? Since yields on little Vega is not going to be perfect I expect at least two skews and probably more as time goes on.

Large Vega with four stacks 16gb+, HPC? High End FirePros and Fury high end? Since AMD established a Pro Consumer brand with the Radeon Pro, will that continue next generation? I would think so and it will probably be very pricey in range of the Titan, hopefully cheaper with better performance to establish itself.

Now what I would probably buy rather quickly is a Nano size card with 8GB plus and little Vega unless big Vega is also available - that would be uber :smuggrin:

Which for Nvidia users will probably force the 1080Ti out the door as well which can also be cool except without HBM2 memory for me it would be yet another boring card.
 
  • Like
Reactions: Boil
like this
AMD isn't the quiet guy. AMD develops a technology and then tries to force everyone to adopt their approach by playing the nice guy with open source software that is highly biased to their own hardware. The only reason GCN is still relevant today is because of how influential they were in the development of Vulkan and DX12. I can't blame them, though. I'm sure they want to avoid another Bulldozer incident.

know this is an old post, but you say that like its a bad thing. I would call that shrewd and smart business.
 
Now what I would probably buy rather quickly is a Nano size card with 8GB plus and little Vega unless big Vega is also available - that would be uber :smuggrin:

Big Vega with 8GB HBM2 in an ITX-friendly Nano 'format' would be awesome...! But I will definitely accept a little Vega / 8GB HBM2 Nano as well...!
 
Big Vega with 8GB HBM2 in an ITX-friendly Nano 'format' would be awesome...! But I will definitely accept a little Vega / 8GB HBM2 Nano as well...!
Yes except Vega 11 is no where to be seen, at least yet and Vega 10 versions are still unknown. I had to laugh a little reading what I put down. Anyways I am still hoping for a Nano replacement but now the I7 6700K looks a little anemic in the scheme of things. Once Threadgripper comes out and if good priced, the I7 6700K is going to look like an I3 low end cpu :LOL:. I guess I could always sell the Crosshair VI Hero, buy a mITX AM4 board - plus a X399 board with 16 core cpu . . . Very fun times here!
 
Back
Top