AMD working on next-gen GPU after Navi for 2020-2021

Look at the Raven Ridge Graphic...

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9FL1YvNzQ5NjIzL29yaWdpbmFsLzA1LkpQRw==.jpg

See that? Instruction decode-> early setup-> Instruction dispatch (distributor) -> geometry pipeline/DBSR -> compute units -> ROPs...all independent blocks in the pipe. And the CPU connected to the GPU via infinity fabric. (One of the things I said would be used)

It has incredible performance for a simple SOC. Now imagine optimizing a Ryzen core for just graphics early setup/geometry /dispatch. That by itself would be a sub 30 Watt part I'm betting (Based on rough numbers) GCN's strength comes in it's ability to dynamically divide resources during ASYNC. So the CU work mostly independently of each other.
 
Look at the Raven Ridge Graphic...

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9FL1YvNzQ5NjIzL29yaWdpbmFsLzA1LkpQRw==.jpg

See that? Instruction decode-> early setup-> Instruction dispatch (distributor) -> geometry pipeline/DBSR -> compute units -> ROPs...all independent blocks in the pipe. And the CPU connected to the GPU via infinity fabric. (One of the things I said would be used)

It has incredible performance for a simple SOC. Now imagine optimizing a Ryzen core for just graphics early setup/geometry /dispatch. That by itself would be a sub 30 Watt part I'm betting (Based on rough numbers) GCN's strength comes in it's ability to dynamically divide resources during ASYNC. So the CU work mostly independently of each other.


Did you see that infinity fabric actually increases the power consumption on this part (as the memory is overclocked the power consumption on the system increases more than just the RAM overclock)? And still need to worry about the latency increases such a system will cause. There is absolutely no way to hide this amount of latency too. Cause the latency amounts far exceed the amount that can be hidden.

Async compute the way GCN does it is not that good, its too ASIC specific. Volta seems to be quite a bit better at it but the complexity of the hardware is considerably more with the added benefits that a programmer no longer has to worry about thread dependencies.

PS even of the blocks are independent doesn't matter, data flows only in certain directions for the fixed function parts, till that is solved nothing will change. There is no way to map data different because the API's don't allow it. Also right now the hardware doesn't allow this either. Primitive shaders looked like it get around this but there has to be something with them that caused the issues AMD had to back track on their word of having them work in the background. I wouldn't be surprised if there is a huge penalty with them with current hardware. There doesn't seem to be enough transistors invested to make a geometry pipeline programmable and to have performance at the same time. To me it looks like they invested almost no extra transistors to make the geometry pipeline programming, but I'm guessing based on the transistors amounts of Vega vs Polaris.
 
Last edited:
Did you see that infinity fabric actually increases the power consumption on this part (as the memory is overclocked the power consumption on the system increases more than just the RAM overclock)? And still need to worry about the latency increases such a system will cause. There is absolutely no way to hide this amount of latency too. Cause the latency amounts far exceed the amount that can be hidden.

Async compute the way GCN does it is not that good, its too ASIC specific. Volta seems to be quite a bit better at it but the complexity of the hardware is considerably more with the added benefits that a programmer no longer has to worry about thread dependencies.

PS even of the blocks are independent doesn't matter, data flows only in certain directions for the fixed function parts, till that is solved nothing will change. There is no way to map data different because the API's don't allow it. Also right now the hardware doesn't allow this either. Primitive shaders looked like it get around this but there has to be something with them that caused the issues AMD had to back track on their word of having them work in the background. I wouldn't be surprised if there is a huge penalty with them with current hardware. There doesn't seem to be enough transistors invested to make a geometry pipeline programmable and to have performance at the same time. To me it looks like they invested almost no extra transistors to make the geometry pipeline programming, but I'm guessing based on the transistors amounts of Vega vs Polaris.

Async on GCN seems to kick Pascals but because Pascal can't dynamically re-allocate resources because it's resource tracking is non existent. NVIDIA even said so. But I agree there were way too many transistors to make Polaris/Vega efficient. The only option is to split up the power into separate chips as efficiency gains are limited. That's still a hell of a tall order for GCN.
 
Async on GCN seems to kick Pascals but because Pascal can't dynamically re-allocate resources because it's resource tracking is non existent. NVIDIA even said so. But I agree there were way too many transistors to make Polaris/Vega efficient. The only option is to split up the power into separate chips as efficiency gains are limited. That's still a hell of a tall order for GCN.


Pascal can, Maxwell can't ;), Where Pascal lacks over GCN is the amount that it can do, GCN has much finer granularity than Pascal, 50% better or so.
 
Back
Top