Intel AVX512 up to 15x faster than NVidia for training AI’s

why are these mutually exclusive? why not both? can NN dataset be configured for both AVX512 and GPGPU compute be used?
I want to see the host device catch fire!
Ideally, the systems would still need both, training is just one aspect of the system they still have to do whatever it was they were trained to do which is also pretty hardware-specific. But currently, the bulk of the A100's have been sold as part of their HGX systems which bundle 8 of them with a pair of Epycs which start at $200K a pop. Ideally, it would seem then you would want a pair of Xeons running those 8 GPU's for some ungodly hell furnace of AI badassery.
 
Only PS3.

Every other system designed since only gains a little performance going that wide./improved mask/gather. instructions.

Jaguar and Tx1 both feature bog-standard 128-bit vector units
According to that Reddit post, the Yuzu (Nintendo Switch) Emulator is already using AVX512 for some stuff. Testing with AVX512 on and off, would be interesting.
 
I think we can form some consensus that AVX512 is turning out to be way more fun than many of us were led to believe it would be. Which I am glad for, X86-64 is so 1999, it's about time it got something new and shiny.
 
According to that Reddit post, the Yuzu (Nintendo Switch) Emulator is already using AVX512 for some stuff. Testing with AVX512 on and off, would be interesting.

And, if it is anything like the grand release of AVX2 h.265, the total amount of critical code path that can be optimized is around ten percent.

The PS3 Cell is essentialy a broken GPU (and thus requires massive amount of vector-compute optimizations -ripe pickings for this new system)

EDIT: see here where your additional performance improvements (avx 512 vs avx2) is under 10%, except in a single corner case. This is closer to how most emulators code looks (expected emulated code is somewhere closer 50/50 vector and scalar, and perhaps 1/3 of the vector operations can be optimized)

hevc avx512.png


The whitepaper from Intel:

https://networkbuilders.intel.com/d...with-intel-advanced-vector-extensions-512.pdf

Even Intel engineers couldn't work-around the scalar limitations of non-3d-rendering code! What makes you think most game devs are going to hand-optimize Game-engine or AI vector code into their next multi-platform console game? We all know how shitty auto-vectorizing compilers are!
 
Last edited:
And, if it is anything like the grand release of AVX2 h.265, the total amount of critical code path that can be optimized is around ten percent.

The PS3 Cell is essentialy a broken GPU (and thus requires massive amount of vector-compute optimizations -ripe pickings for this new system)

EDIT: see here where your additional performance improvements (avx 512 vs avx2) is under 10%, except in a single corner case. This is closer to how most emulators code looks (expected emulated code is somewhere closer 50/50 vector and scalar, and perhaps 1/3 of the vector operations can be optimized)

View attachment 347265

The whitepaper from Intel:

https://networkbuilders.intel.com/d...with-intel-advanced-vector-extensions-512.pdf

Even Intel engineers couldn't work-around the scalar limitations of non-3d-rendering code! What makes you think most game devs are going to hand-optimize Game-engine or AI vector code into their next multi-platform console game? We all know how shitty auto-vectorizing compilers are!
Well I don't pretend to be an expert on programming for emulators. But, it seems like emulators are all about little wins. And something like AVX512 could mean more consistent frame times, etc. *That Reddit page actually cites some of the benefits AVX512 has had for that Yuzu backend in translating ARM.
Anyway, I will be able to test Yuzu for AVX512, fairly soon. I have an 11600k arriving this week.
 
And, if it is anything like the grand release of AVX2 h.265, the total amount of critical code path that can be optimized is around ten percent.

The PS3 Cell is essentialy a broken GPU (and thus requires massive amount of vector-compute optimizations -ripe pickings for this new system)

EDIT: see here where your additional performance improvements (avx 512 vs avx2) is under 10%, except in a single corner case. This is closer to how most emulators code looks (expected emulated code is somewhere closer 50/50 vector and scalar, and perhaps 1/3 of the vector operations can be optimized)

View attachment 347265

The whitepaper from Intel:

https://networkbuilders.intel.com/d...with-intel-advanced-vector-extensions-512.pdf

Even Intel engineers couldn't work-around the scalar limitations of non-3d-rendering code! What makes you think most game devs are going to hand-optimize Game-engine or AI vector code into their next multi-platform console game? We all know how shitty auto-vectorizing compilers are!
Emulators are all about the small wins, so a few % here a few more there, and suddenly what was unplayable before becomes downright decent. I'm going to keep an eye on it, based on some white papers I have seen there are some really cool things that could be offloaded from the GPU to the CPU for a lot of titles that would overall improve things especially for titles that are currently GPU bound but it's going to be a long while before we see those trickle into games if at all. Once DX12 or whatever version they are on at that point supports it then we will see it hitting mainstream but until then....
 
Emulators are all about the small wins, so a few % here a few more there, and suddenly what was unplayable before becomes downright decent. I'm going to keep an eye on it, based on some white papers I have seen there are some really cool things that could be offloaded from the GPU to the CPU for a lot of titles that would overall improve things especially for titles that are currently GPU bound but it's going to be a long while before we see those trickle into games if at all. Once DX12 or whatever version they are on at that point supports it then we will see it hitting mainstream but until then....


When current CPUs and software optimizations are already able to hit 60fps on Switch, the need for more optimizations isn't as clear. Without a real performance improvement, it's unlikely that these AVX512 tweaks will get uploaded into an official release anytime soon

Red Dead Redemption often drops below 30fps on a Zen 3 (so there is a lot of optimization required to get us to playable - something AVX 512 will make MUCH easier)
 
Last edited:
If I remember correctly the PS2 had a standalone VPU. In the hardware GSdx plugin I believe those VPU instructions are being translated on the GPU. PCSX2 has AVX2 available for the software renderer. I wonder if additional improvements in performance and/or accuracy can be gained by better utilizing AVX and implementing AVX-512. The abstract emulation of the PS2 hardware still bugs the crap out of me.
 
Back
Top