RX490 Speculation - Vega 10, not Polaris.

razor1 · Aug 16, 2016

They didn't go wrong (lets just assume for a moment), if Pascal didn't come out as soon as it did, and the 14nm node issues with leakage wasn't there, Polaris, they would have been in a much better position, they just didn't expect Pascal so soon or nV's capability to get clocks even higher on the new node.

We always have to have a comparative look at things because otherwise we can't draw conclusions based on one piece in a full set of data right?

nV and AMD have been saying for a very long time the node can only do so much now with clock speeds and power draw everything has to come from the design.

AMD's lack of resources might be what is forcing their hand but I think its more of the way the designers are thinking and not able to forecast what the competition is capable of.

PS was talking to a friend at work about this, and he watched one of Adorned videos, and was telling me laughing what Adroned stated "nV once they had maxwell and saw how good the architecture was for power consumption, they had to do Pascal with a similar architecture and push Volta out some" Paraphrasing it cause I didn't watch the video but we both started laughing about it cause chip design doesn't work that way lol, nV, AMD are fully aware of what the outcomes "should" be if everything goes right from a node and design perspective. Now they won't know what the competition is doing but after seeing what Maxwell 2 was capable of AMD should have been able to understand Pascal would do those things even better. Just more time and effort at the very early stages of the design.

Ieldra · Aug 16, 2016

razor1 said:
They didn't go wrong (lets just assume for a moment), if Pascal didn't come out as soon as it did, and the 14nm node issues with leakage wasn't there, Polaris, they would have been in a much better position, they just didn't expect Pascal so soon or nV's capability to get clocks even higher on the new node.

We always have to have a comparative look at things because otherwise we can't draw conclusions based on one piece in a full set of data right?

nV and AMD have been saying for a very long time the node can only do so much now with clock speeds and power draw everything has to come from the design.

AMD's lack of resources might be what is forcing their hand but I think its more of the way the designers are thinking and not able to forecast what the competition is capable of.

I understand your argument, but let's leave Pascal out of this entirely for a minute, they (AMD) even specifically mentioned "advanced power gating" in their Polaris landing page before launch. Power efficiency was clearly targeted with this release, and we don't have (to my knowledge) any solid evidence to lay the blame either way (GloFo or AMD) for the leakage characteristics so let's leave that out as well.

AMD uses vector units which is cheaper than implementing an equal number of lanes of scalar units in terms of xtor cost, it uses less space as well. The downside is there is overhead in control/scheduling logic. On top of this they claim to have implemented per-lane powergating (at least that's what the paper they published suggested) + process change and they're still matching Maxwell power efficiency (barely!) and haven't made up for the geometry performance deficit... I mean, the ratio of ALU:rasterizer is more or less same comparing Pascal and Polaris (9 CUs per SE with four 16-wide ALU per CU vs 5 SM per GPC with 128 ALU per SM) so it's really a question of it being a lower performing part.

Hawaii with halved power consumption (~170w lol) + GM204 geometry performance would have been a winner, even with a 50W overhead compared to GP106. god dammit.

It totally boggles my mind that they spend who-knows-how-many transistors implementing the whole async shaders logic on die, which brings with ~10% performance increase at most yet the front-end is left almost unchanged and continues to be an enormous bottleneck, especially as these damned GCN chips keep scaling up the tflop count. P10 is almost 6tflop which is 20% more than a (stock) 980, yet with the geometry performance of a 960. Why ?

If AMD designed aftermarket CPU coolers:

Designs incredible ultra-quiet fan design that can push up to 500CFM, highest airflow in the world etc etc, thermal conductivity of the base plate is so bad the overall performance ends up matching a bog standard 70CFM fan.

razor1 · Aug 16, 2016

ok I see what you are getting at, yeah it seems AMD failed to hit their intended targets, but AMD's power gating does it really work when the GPU is pushed without frame rate locks? Pretty much when the application/drivers is telling it to go as fast as you can?
I think the power gating only works when the work load isn't as heavy......

Ieldra · Aug 16, 2016

razor1 said:
ok I see what you are getting at, yeah it seems AMD failed to hit their intended targets, but AMD's power gating does it really work when the GPU is pushed without frame rate locks? Pretty much when the application/drivers is telling it to go as fast as you can?
I think the power gating only works when the work load isn't as heavy......

I mean, it depends where exactly the gates fall, if it's per lane in the SIMD-VUs then it's hard to gauge when/where it will be active, but if we're talking about gating entire VUs off then yeah.

As usual my first thought is rasterizer bottleneck, that's a good candidate for gating things off (in the absence of async)

razor1 · Aug 16, 2016

Yes the rasterizer could be the bottleneck, one of the major power savings for Maxwell 2 is its rasterizer, but for AMD to adopt something like that, their entire front end has to change, not just tweaks.

nV's cards are doing much less work than AMD's.......

Ieldra · Aug 16, 2016

razor1 said:
Yes the rasterizer could be the bottleneck, one of the major power savings for Maxwell 2 is its rasterizer, but for AMD to adopt something like that, their entire front end has to change, not just tweaks.

nV's cards are doing much less work than AMD's.......

I thought a big reason for that was the lack of discard logic in hardware on the GCN side

razor1 · Aug 16, 2016

That is part of it too, the tiled rasterizer introduced in Maxwell really helps cut down the amount of work needed too.

Ieldra · Aug 16, 2016

razor1 said:
That is part of it too, the tiled rasterizer introduced in Maxwell really helps cut down the amount of work needed too.

Does it actually cut down the work being done ? if so, how ? Inbox me

razor1 · Aug 16, 2016

pmed, yeah its a drastic difference from traditionalist GPU rendering. One of the reasons why, nV's mobile experience helped them out....

Shadohh · Aug 16, 2016

razor1 said:
pmed,

Not nice to keep secrets!

razor1 · Aug 16, 2016

no secrets lol just getting OT

.

Pretty much title based rendering, you get savings on many fronts, bandwidth, memory usage, register space, primitive discard, pixel overdrawn is limited more, and so on. And this is why nV has been able to do much more with less.....

Ieldra · Aug 16, 2016

razor1 said:
no secrets lol just getting OT .

Pretty much title based rendering, you get savings on many fronts, bandwidth, memory usage, register space, primitive discard, pixel overdrawn is limited more, and so on. And this is why nV has been able to do much more with less.....

Well if people are interested may as well briefly discuss here. It is intuitive to me that you have advantages in terms of memory bandwidth if the size of the tile scales to fit the data in cache and registers, texture lookups as well, that all makes sense.

As for overdraw being limited and actually reducing the work done how does that work? Let's assume you have N overlapping triangles and ignore alpha so fully opaque, the tile rasterizer would still go through the triangles layer by layer wouldn't it?

If it doesn't and gets around it by looking at a depth buffer wouldn't a full-screen rasterizer (whatever the alternative to tiling is called) be able to do the exact same thing - but with some limitations where different parts of the screen have different overlapping triangles?

Basically o don't understand what mechanism reduces the load, but I understand why tiling is more efficient and faster

razor1 · Aug 16, 2016

yes it would go through the triangles layer by layer but at a much earlier stage, that is what I'm thinking. A basic optimization we use in programming is using an oct tree or another spacial method to do a z-sort to limit pixel overdraw but that is done a much later stage when rendering is pretty much half way done, the vertex side of things are all ready complete. What tile based rendering does is makes z-sort even more efficient. As it does with primitive discard.

razor1 · Aug 16, 2016

back to topic though, AMD did have tile based rasterization patents with their mobile division which they sold off, if they realized that nV was using this rasterization method and how much it saved on power consumption and other things with Maxwell, they might be working on remaking the wheel so to speak. But for Vega to have this, I think that would be a bit too fast, Navi seems to be a fair bet it will have it though. (AMD did state I think 2 Q's ago their refocus on mobile technologies).

Pieter3dnow · Aug 16, 2016

DigitalGriffin said:
I'm hoping Vega will turn things around a bit. But the real telling tail is if AMD only shows benchmarks which are known for heavily favoring AMD (ie: AoTS). If they pull that again, I know they are in trouble.

I would not consider Ashes of the Singularity an AMD only benchmark it does well because the Nitrous engine has been running for a long time. Oxide Nitrous engine is easily the most complex one out there making use of DX12 in a way no other "game" does.

Ieldra · Aug 16, 2016

Pieter3dnow said:
I would not consider Ashes of the Singularity an AMD only benchmark it does well because the Nitrous engine has been running for a long time. Their Nitrous engine is easily the most complex one out there making use of DX12 in a way no other "game" does.

That's not true though, the main advantage in using DX12 in AotS is CPU bottleneck relief, for AMD in particular the new command submission model is a huge improvements for reasons i can't be bothered getting into for the millionth time.

Tomb Raider in DX12 also benefits from these things in DX12, but these are just the improvements in terms of base performance of the API really, you then have to consider new features added in the 12_1 feature level, and I imagine there will be a 12_2 with SM6. The point is there are huge misconceptions about DX12, which is inevitable once a technical specification becomes the center of a marketing campaign. The truth is there's no simple answer to the question 'will a DX12 version of game X run better than DX11?'. It depends, on a huge number of things.

I tried thinking of a car analogy but I'm too ignorant about cars to come up with one.

Imagine we establish a set of bottlenecks (command submission, pixel fillrate, etc etc) corresponding to all the operations that are independent and have their own upper limits - such that we cover all the work being run on the system when the game is running.

Then you can sort of represent the 'state' of the game at every point of the rendering pipeline in every moment as some load of each type I mentioned earlier. So if in this instant command submission is the limiting factor, then everything else depending on it will be limited by that one thing.

DX12 brings revamped native multithreading and a new command submission model, both of which are great for amd for different reasons, but this won't change the performance landscape dramatically in every scenario, because it won't always be command submission being the limiting factor. It ain't that simple

Pieter3dnow · Aug 16, 2016

Ieldra said:
That's not true
It ain't that simple

Good job on making your point
The bottleneck relief you point out was already in Star Swarm .. DX12 is not about features in the API that is a thing they needed to get away from the feature that is key is that developers can program the gpu with as little overhead as possible.

If you ever want to discuss the engine detail difference between Tomb Raider and Ashes of the Singularity , don't bother you can't ...

Ieldra · Aug 16, 2016

Pieter3dnow said:
Good job on making your point
The bottleneck relief you point out was already in Star Swarm .. DX12 is not about features in the API that is a thing they needed to get away from the feature that is key is that developers can program the gpu with as little overhead as possible.

If you ever want to discuss the engine detail difference between Tomb Raider and Ashes of the Singularity , don't bother you can't ...

Details, no, but I can tell you the geometry throughput plays a much bigger role than in AotS.

Of course DX12 is also about new features, well ... They're also in 11.3, but still, there are also new features to consider, that are part of the spec, there is more to DX12 than the base multithreading and command lists and multiple queues, these are just core features of the API.

RX490 Speculation - Vega 10, not Polaris.

razor1

[H]F Junkie

Ieldra

I Promise to RTFM

razor1

[H]F Junkie

Ieldra

I Promise to RTFM

razor1

[H]F Junkie

Ieldra

I Promise to RTFM

razor1

[H]F Junkie

Ieldra

I Promise to RTFM

razor1

[H]F Junkie

Shadohh

Gawd

razor1

[H]F Junkie

Ieldra

I Promise to RTFM

razor1

[H]F Junkie

razor1

[H]F Junkie

Pieter3dnow

Supreme [H]ardness

Ieldra

I Promise to RTFM

Pieter3dnow

Supreme [H]ardness

Ieldra

I Promise to RTFM