the fact they made a big deal about qpi direct memory access and then sat on it is likely part of the issue. The interconnects between the gpu and it's memory is a lot faster than Intel's memory to anything after the level 1 cache. Big companies do this all the time they act like they are at the top of the world and no one can design a better widget and then when someone does they all scramble to replace their widget without any lead time. Intel processors are much faster at branching logic but run into issues when the data is not there.
Having on board integrated gpu that can draw what is on the screen means that at the tasks they used to be used for the most they still stay relevant. Right now most of the rendering of scene files gets kicked over to a gpu with a bunch of logic processors that do one task really fast and don't have to wait in line for the branching logic. There is only so fast you can spin the electrons through gates before they end up just as waste heat, but if you can figured out how to build a better mouse trap at solving problems...
Having on board integrated gpu that can draw what is on the screen means that at the tasks they used to be used for the most they still stay relevant. Right now most of the rendering of scene files gets kicked over to a gpu with a bunch of logic processors that do one task really fast and don't have to wait in line for the branching logic. There is only so fast you can spin the electrons through gates before they end up just as waste heat, but if you can figured out how to build a better mouse trap at solving problems...