IBM and NVidia partner, and BAM!!!

Lakados

Supreme [H]ardness
Joined
Feb 3, 2014
Messages
5,780
https://amp.hothardware.com/news/bam-nvidia-ibm-connect-gpus-directly-to-ssds

So AMD tried their hand at this with SSG, followed up with Microsoft and Direct Storage. Both still use the CPU as an intermediary to pass data between the CPU and the GPU, this lets the GPU fetch the data directly.

BAM, is a software solution that cuts the CPU out of the equation for fetching data from Ram or Storage. Not a big deal for desktops but workhorse servers with lots of GPU’s it’s a big time saver.
 

Eymar

Limp Gawd
Joined
Sep 15, 2005
Messages
302
1647559604369.png


Hmm, Mvidia? Intel? Article says IBM and Nvidia :) BAM is correct though (y)
 

cdabc123

2[H]4U
Joined
Jun 21, 2016
Messages
3,919
Makes me think of a product heading to the supercomputing front.

How much benefit do cpus really have on the hpc front?
 

xx0xx

Gawd
Joined
Oct 20, 2005
Messages
738
Makes me think of a product heading to the supercomputing front.

How much benefit do cpus really have on the hpc front?

Depends on the application and workload, really. Certain HPC environments use CPUs for computation and GPUs only for certain types of jobs.
 

xx0xx

Gawd
Joined
Oct 20, 2005
Messages
738
Also, I thought MSFT's Direct Storage also avoided routing things through the CPU? Maybe I'm mistaken
 

DanNeely

Supreme [H]ardness
Joined
Aug 26, 2005
Messages
4,253
Makes me think of a product heading to the supercomputing front.

How much benefit do cpus really have on the hpc front?

It totally depends on the algorithm being used. GPUs are great as accelerators for problems that either only need relatively small amounts of data and can fit the entire working set into the caches/etc in the GPU die itself. They're also great for problems where your memory access pattern is just sequentially streaming data in and out. But if you've got a highly random memory access pattern GDDR, which is optimized for streaming large blocks of data in/out, will fall flat on its face and a CPU will end up working about as well or even better because the zillions of GPU cores are paralyzed waiting for data.

Edit: I don't know enough about HBM to know if behaves more like DDR or GDDR in terms of handling random access without completely face-planting. (Sequential access is much better than random with DDR too; but it's minimum chunk sizes are small because a lot of real world code is random access. In contrast GDDR was optimized to stream textures/etc to the GPU where you've got large blocks of sequential data being handled as one.)
 
Last edited:

Lakados

Supreme [H]ardness
Joined
Feb 3, 2014
Messages
5,780
Also, I thought MSFT's Direct Storage also avoided routing things through the CPU? Maybe I'm mistaken
Direct storage still uses the CPU for moving and authorizing access to the data. What it does is removes the need for the CPU to decompress the textures first and replaces the standard storage API’s with ones explicitly from the NVME Express Controller allowing for much faster data transport.

So your moving less raw data at a faster pace then decompressing it much faster. But the CPU is still involved in all the access permissions and data transport.
 

LukeTbk

2[H]4U
Joined
Sep 10, 2020
Messages
2,349
Also, I thought MSFT's Direct Storage also avoided routing things through the CPU? Maybe I'm mistaken
Regardless of the amount of difference between the 2, those large dataset running of those:
getty.servers.jpg


Are probably not running Windows 11.

From my very limited understanding, Direct Storage has more a video game scenario in mind, you read a small amount (ideally just one "zip") of very large files from the drive to the GPU and limiting the cpu workload by making the decompression of the compressed file happen somewhere else, it is GPU accelerate asset decompression. It does not have to worry about the CPU handling the memory address and so on because it would be very little.

BAM setup a memory space (ram-hard drive) that the GPU will speak directly too, accessing giant amount of small object sometime without creating a work load to the CPU to handle the memory adress, I imagine that on a regular PC that would make little sense, but if you have 8 A100 GPU doing work on 2TB of ram/raid of high drives 40gb/s, it start to make a difference.
 
  • Like
Reactions: xx0xx
like this

OutOfPhase

Supreme [H]ardness
Joined
May 11, 2005
Messages
5,217
It totally depends on the algorithm being used. GPUs are great as accelerators for problems that either only need relatively small amounts of data and can fit the entire working set into the caches/etc in the GPU die itself. They're also great for problems where your memory access pattern is just sequentially streaming data in and out. But if you've got a highly random memory access pattern GDDR, which is optimized for streaming large blocks of data in/out, will fall flat on its face and a CPU will end up working about as well or even better because the zillions of GPU cores are paralyzed waiting for data.
Very correct.

We're in an interesting and complex time where we have massive computation at the ready - if we can just ask the question in a way which allows massively-parallel to answer it. Translating a user story to ideal computation is non-trivial in most cases.

Also - the job of translating the simple statement to the complex solution is super fun. :)
 

toast0

[H]ard|Gawd
Joined
Jan 26, 2010
Messages
1,811
Also, I thought MSFT's Direct Storage also avoided routing things through the CPU? Maybe I'm mistaken
I read some of the paper last night and woke up with a headcold, but it seems like Direct Storage is host CPU issues NVMe commands with results DMAd to the GPU. This is host CPU sets up NVMe queues in GPU memory, and the GPU program fills the queues and pokes the NVMe device... So post setup, the GPU and NVMe communicate without host software interaction. It would seem like you'd need to setup a special partition or something for this though, it would be difficult to share a filesystem with a GPU program.
 
  • Like
Reactions: xx0xx
like this
Top