IBM and NVidia partner, and BAM!!!

Lakados · Mar 17, 2022

https://amp.hothardware.com/news/bam-nvidia-ibm-connect-gpus-directly-to-ssds

So AMD tried their hand at this with SSG, followed up with Microsoft and Direct Storage. Both still use the CPU as an intermediary to pass data between the CPU and the GPU, this lets the GPU fetch the data directly.

BAM, is a software solution that cuts the CPU out of the equation for fetching data from Ram or Storage. Not a big deal for desktops but workhorse servers with lots of GPU’s it’s a big time saver.

Eymar · Mar 17, 2022

Hmm, Mvidia? Intel? Article says IBM and Nvidia

BAM is correct though (y)

Lakados · Mar 18, 2022

Eymar said:
View attachment 454827

Hmm, Mvidia? Intel? Article says IBM and Nvidia BAM is correct though

Yeah I’ll update that, gonna blame those typos on the drink.

cdabc123 · Mar 18, 2022

Makes me think of a product heading to the supercomputing front.

How much benefit do cpus really have on the hpc front?

xx0xx · Mar 18, 2022

cdabc123 said:
Makes me think of a product heading to the supercomputing front.

How much benefit do cpus really have on the hpc front?

Depends on the application and workload, really. Certain HPC environments use CPUs for computation and GPUs only for certain types of jobs.

xx0xx · Mar 18, 2022

Also, I thought MSFT's Direct Storage also avoided routing things through the CPU? Maybe I'm mistaken

DanNeely · Mar 18, 2022

cdabc123 said:
Makes me think of a product heading to the supercomputing front.

How much benefit do cpus really have on the hpc front?

It totally depends on the algorithm being used. GPUs are great as accelerators for problems that either only need relatively small amounts of data and can fit the entire working set into the caches/etc in the GPU die itself. They're also great for problems where your memory access pattern is just sequentially streaming data in and out. But if you've got a highly random memory access pattern GDDR, which is optimized for streaming large blocks of data in/out, will fall flat on its face and a CPU will end up working about as well or even better because the zillions of GPU cores are paralyzed waiting for data.

Edit: I don't know enough about HBM to know if behaves more like DDR or GDDR in terms of handling random access without completely face-planting. (Sequential access is much better than random with DDR too; but it's minimum chunk sizes are small because a lot of real world code is random access. In contrast GDDR was optimized to stream textures/etc to the GPU where you've got large blocks of sequential data being handled as one.)

Lakados · Mar 18, 2022

xx0xx said:
Also, I thought MSFT's Direct Storage also avoided routing things through the CPU? Maybe I'm mistaken

Direct storage still uses the CPU for moving and authorizing access to the data. What it does is removes the need for the CPU to decompress the textures first and replaces the standard storage API’s with ones explicitly from the NVME Express Controller allowing for much faster data transport.

So your moving less raw data at a faster pace then decompressing it much faster. But the CPU is still involved in all the access permissions and data transport.

cybrnook · Mar 18, 2022

Randall Stephens · Mar 18, 2022

Lakados said:
Yeah I’ll update that, gonna blame those typos on the drink.

Breast milk with a splash of Bicardi?

Lakados · Mar 18, 2022

Randall Stephens said:
Breast milk with a splash of Bicardi?

If only, Guinness and Bacon Greese.

OutOfPhase · Mar 18, 2022

Randall Stephens said:
Breast milk with a splash of Bicardi?

Marry an alcoholic, save a step.

LukeTbk · Mar 18, 2022

xx0xx said:
Also, I thought MSFT's Direct Storage also avoided routing things through the CPU? Maybe I'm mistaken

Regardless of the amount of difference between the 2, those large dataset running of those:

Are probably not running Windows 11.

From my very limited understanding, Direct Storage has more a video game scenario in mind, you read a small amount (ideally just one "zip") of very large files from the drive to the GPU and limiting the cpu workload by making the decompression of the compressed file happen somewhere else, it is GPU accelerate asset decompression. It does not have to worry about the CPU handling the memory address and so on because it would be very little.

BAM setup a memory space (ram-hard drive) that the GPU will speak directly too, accessing giant amount of small object sometime without creating a work load to the CPU to handle the memory adress, I imagine that on a regular PC that would make little sense, but if you have 8 A100 GPU doing work on 2TB of ram/raid of high drives 40gb/s, it start to make a difference.

OutOfPhase · Mar 18, 2022

DanNeely said:
It totally depends on the algorithm being used. GPUs are great as accelerators for problems that either only need relatively small amounts of data and can fit the entire working set into the caches/etc in the GPU die itself. They're also great for problems where your memory access pattern is just sequentially streaming data in and out. But if you've got a highly random memory access pattern GDDR, which is optimized for streaming large blocks of data in/out, will fall flat on its face and a CPU will end up working about as well or even better because the zillions of GPU cores are paralyzed waiting for data.

Very correct.

We're in an interesting and complex time where we have massive computation at the ready - if we can just ask the question in a way which allows massively-parallel to answer it. Translating a user story to ideal computation is non-trivial in most cases.

Also - the job of translating the simple statement to the complex solution is super fun.

toast0 · Mar 18, 2022

xx0xx said:
Also, I thought MSFT's Direct Storage also avoided routing things through the CPU? Maybe I'm mistaken

I read some of the paper last night and woke up with a headcold, but it seems like Direct Storage is host CPU issues NVMe commands with results DMAd to the GPU. This is host CPU sets up NVMe queues in GPU memory, and the GPU program fills the queues and pokes the NVMe device... So post setup, the GPU and NVMe communicate without host software interaction. It would seem like you'd need to setup a special partition or something for this though, it would be difficult to share a filesystem with a GPU program.

Ziran · Mar 19, 2022

Does it have blast processing ?

IBM and NVidia partner, and BAM!!!

Lakados

[H]F Junkie

Eymar

Limp Gawd

Lakados

[H]F Junkie

cdabc123

Supreme [H]ardness

xx0xx

Gawd

xx0xx

Gawd

DanNeely

Supreme [H]ardness

Lakados

[H]F Junkie

cybrnook

[H]ard|Gawd

Randall Stephens

[H]ard|Gawd

Lakados

[H]F Junkie

OutOfPhase

Supreme [H]ardness

LukeTbk

Supreme [H]ardness

OutOfPhase

Supreme [H]ardness

toast0

2[H]4U

Ziran

Limp Gawd