Apples neural engine is exactly what they say it is. Go find the CoreML API and all their other developer info its not hard to get a good picture of what the hardware does. No its not complicated they are not full GPU cores... they are actually fairly stripped down which = speed.Apple is entirely to blame by forcing obsolescence via spite and creating artificial market segmentation.
They don't *need* to optimize anything, they just need to stop being slimy and two faced, but that'll never happen. When they introduced the M1 processor, they also introduced Rosetta 2 and had a whole dog and pony show to tell everyone how easy it is to cross compile applications and have the same functionality on both architectures. Then they do an about face with all of these new features with the lie that their buzzword "neural" engine is required.
Their "Neural" engine is probably just some GPGPU or customized ARM core with specific extensions to enhance whatever code they want to run. They pulled this same stunt back in the G3 to G4 transition with their "velocity engine", which just turned out to be vector extensions similar to SSE and had little performance implications outside highly specific workloads, which is not a thing here. We won't really know for a few years what it exactly is because they like keeping things a secret.
What all these features are are software bits that are using the CoreML API.... apple has started including CoreML bits to show off what it can do. Third Party developers will start using it more as well. There really isn't a good way to translate those API calls to a CPU... could they code them to work on a GPU ? Perhaps of course it would probably be drastically slower... and then Apple would run into the other obvious issue of being accused of purposely making things run like ass on Intel. As they likely would.
To put it in terms gamers can easier understand. (even though I know I know M1 is just a crap phone SOC nothing Intel can't better em hmm) The situation here would be like this.... Nvidia has DLSS and RT hardware on their GPUs. Now some games use a very light bit of RT, and no one says they could code that to run on a GPU (Which they could) and DLSS quality should be able to run on a GPU as well. AND they can for sure if Nvidia was to code them to engage a few GPU cores instead. I mean performance would blow.... but it can still be run on a GPU. They also have very little financial incentive to make their new features work on old cards.
In this situation ya it may seem silly that Apple doesn't have some silly 3D global feature run on anything. As I see it they are just writing CoreML modules to leverage their new hardware bits. Not to keep going back to Nvidia but another example would be cuda. If you wrote something that used Cuda.... yes that same thing could be written to use opencl and run on any GPU.... but yes it would run slower and it would be a pretty good amount of work to rewrite said module to also work with opencl.
I don't think this one really falls under planned obsolescence M1 actually has some hardware Intels chips don't have... in the same way Nvidia GPUs have tensor cores, and AMDs newest radeons have RT bits.