Vega Rumors

it also needs extensive developer involvement, judging by his comments. Here are the quotes:


https://forum.beyond3d.com/threads/amd-vega-hardware-reviews.60246/page-59#post-1997709


https://forum.beyond3d.com/threads/amd-vega-hardware-reviews.60246/page-59#post-1997699

It's clear the automatic feature will cause performance degradation, If the manual control is causing troubles with the driver, and is hard to code for. Imagine what the automatic control will do! It will quite possibly wreak havok on performance!
This strikes me as one of those features that if pulled off can be great, but Radeon doesn't exactly have a track record to pulling off this type of big level optimization.
 
it also needs extensive developer involvement, judging by his comments. Here are the quotes:


https://forum.beyond3d.com/threads/amd-vega-hardware-reviews.60246/page-59#post-1997709


https://forum.beyond3d.com/threads/amd-vega-hardware-reviews.60246/page-59#post-1997699

It's clear the automatic feature will cause performance degradation, If the manual control is causing troubles with the driver, and is hard to code for. Imagine what the automatic control will do! It will quite possibly wreak havok on performance!


That confirms exactly what I stated earlier, there has to be something negative to have features (possible performance enhancing features) turned off in drivers! (many others stated it too) so props to them as well. Its just the way things are. Either ya have them or you don't, not you have em and shut them off because it will come out in a later date.
 
Last edited:
This strikes me as one of those features that if pulled off can be great, but Radeon doesn't exactly have a track record to pulling off this type of big level optimization.


Never seen it done by anyone in any silicon market ;), so if it happens it will be a miracle.
 
Considering you can be banned for trolling, this should have been locked if [H] wants to follow their own logic.

The casual reader could learn a lot from some of the posts in the this thread...I quite enjoy the technical level in some of the posts...even if they only are there to combat PR-FUD-Troll posts....they contain valuable technical information ;)
 
The casual reader could learn a lot from some of the posts in the this thread...I quite enjoy the technical level in some of the posts...even if they only are there to combat PR-FUD-Troll posts....they contain valuable technical information ;)
Too piece meal for me to put together the ramblings. I would think most folks would just look at the bottom line - how does it perform for what I am going to use it for - gaming, mining, specific applications - look at price and then decide. Future performance or hopefulness may be very disappointing in the end.
 
Primitive Shaders are very hard to develop and code for, one of AMD guys told Anandtech it's like writing an assembly code, you have to always outsmart the driver, and use some inefficient driver paths well.
it also needs extensive developer involvement, judging by his comments. Here are the quotes:
This is a mischaracterization of the posts made by Ryan Smith about the subject on the Beyond 3D Forum. I'm going to quote all three relevant posts in full. The first is here by Ryan Smith:
Quick note on primitive shaders from my end: I had a chat with AMD PR a bit ago to clear up the earlier confusion. Primitive shaders are definitely, absolutely, 100% not enabled in any current public drivers.

The manual developer API is not ready, and the automatic feature to have the driver invoke them on its own is not enabled.
Please note "The manual developer API is not ready, and the automatic feature to have the driver invoke them on its own is not enabled." This clearly states that there will be an automatic mode for primitive shaders in addition to creating an API to allow developers to override the default primitive shader implementation themselves.

Another B3D Forum user then asks Ryan to clarify whether he is referring to an automatic primitive shader implementation and a manual developer override of that automatic primitive shader implementation:
As far as I understood, there will be a "automatic mode" in the drivers that will specifically work for increasing the primitive discard rate and thus speeding up the geometry processing. I can only guess that is taking a lot of time to implement it because of possible compatibility issues with existing software. Then, there is the possibility to expose completely the primitive shaders to the developers, allowing other options (that means: new feature/possibilities in game engines). But Rys has confirmed that that is not planned yet in a tweet some time ago.
Let me note again for emphasis that leonazzurro specifically says "there will be a "automatic mode" in the drivers that will specifically work for increasing the primitive discard rate." and that in addition "Then, there is the possibility to expose completely the primitive shaders to the developers, allowing other options (that means: new feature/possibilities in game engines)." It is crystal clear here that leonazzurro is talking about an automatic default implementation of primitive shaders on the one hand, and a manual developer override of that default implementation on the other.

Finally, Ryan's second post is made directly in response to leonazzurro's post above and quotes it, where Ryan says:
AMD is still trying to figure out how to expose the feature to developers in a sensible way. Even more so than DX12, I get the impression that it's very guru-y. One of AMD's engineers compared it to doing inline assembly. You have to be able to outsmart the driver (and the driver needs to be taking a less than highly efficient path) to gain anything from manual control.
Ryan did not correct leonazzurro's description of there being a default automatic mode for primitive shaders that would be could eventually be overriden by developers once the API is implemented, but instead adds to leonazzurro's post by explaining that it will be very difficult for developers to outperform the automatic driver implementation of primitive shaders with manual control. That is what Ryan is referring to when he says "You have to be able to outsmart the driver (and the driver needs to be taking a less than highly efficient path) to gain anything from manual control."

Ryan's posts explicitly support that there will be a default implementation of primitive shaders in drivers by AMD that will not require any developer input to use, and that AMD is working on implementing this default implementation first, and only later exposing the programmable geometry pipeline to manual developer control. It's also worth noting that Ryan's comments are also entirely consistent with Rys Sommefeldt's comments on twitter that the default version of primitive shaders will work automatically and that there are no plans yet to enable manual developer override of the default primitive shader implementation for the reasons Ryan Smith noted:
bQa07s7.png
 
Last edited:
Yes it is!!

No it isn't!!

Yes it is!!

No it isn't!!!


This thread has become a joke. I can't believe you guys behave like this.
 
Yes it is!!

No it isn't!!

Yes it is!!

No it isn't!!!


This thread has become a joke. I can't believe you guys behave like this.


LOL Rasterizer reads into marketing spiel like its gold.

At no point does anyone say its going to be better at primitive discard over what they have now, not in practice anyways. This is all in theory it should be better, and to what degree we know what AMD stated, will they get that? I highly doubt it. AMD flipped flopped all over the place with primitive shaders, first they stated developer control, now they say they will have an automatic path and then dev control possible later on. Things were planned for Dev control before, why did it change? Something went wrong, something unexpected happened and now AMD needs to do the work in drivers. That means the access to do these things are probably a bit too hard for dev's right now. (PS creating 6 or 10 extensions for primitive shaders and an SDK or addition to an API shouldn't take long if it was planned for) They could have done it with Capbits or Vulkan and Ogl extensions.

They are expecting it might be better from what Rys stated, and Rasterizer was part of that conversation at B3D yet he disregarded everything that was stated there! Why does Rasterizer not mention what Ryan stated about current and future games? As I stated the stuff he is arguing about with me when I stated current games and future games, with much more detail that what Ryan stated but pretty much means the same thing.
 
Last edited:
This is a mischaracterization of the posts made by Ryan Smith about the subject on the Beyond 3D Forum. I'm going to quote all three relevant posts in full. The first is here by Ryan Smith:

Please note "The manual developer API is not ready, and the automatic feature to have the driver invoke them on its own is not enabled." This clearly states that there will be an automatic mode for primitive shaders in addition to creating an API to allow developers to override the default primitive shader implementation themselves.

Another B3D Forum user then asks Ryan to clarify whether he is referring to an automatic primitive shader implementation and a manual developer override of that automatic primitive shader implementation:

Let me note again for emphasis that leonazzurro specifically says "there will be a "automatic mode" in the drivers that will specifically work for increasing the primitive discard rate." and that in addition "Then, there is the possibility to expose completely the primitive shaders to the developers, allowing other options (that means: new feature/possibilities in game engines)." It is crystal clear here that leonazzurro is talking about an automatic default implementation of primitive shaders on the one hand, and a manual developer override of that default implementation on the other.

Finally, Ryan's second post is made directly in response to leonazzurro's post above and quotes it, where Ryan says:

Ryan did not correct leonazzurro's description of there being a default automatic mode for primitive shaders that would be could eventually be overriden by developers once the API is implemented, but instead adds to leonazzurro's post by explaining that it will be very difficult for developers to outperform the automatic driver implementation of primitive shaders with manual control. That is what Ryan is referring to when he says "You have to be able to outsmart the driver (and the driver needs to be taking a less than highly efficient path) to gain anything from manual control."

Ryan's posts explicitly support that there will be a default implementation of primitive shaders in drivers by AMD that will not require any developer input to use, and that AMD is working on implementing this default implementation first, and only later exposing the programmable geometry pipeline to manual developer control. It's also worth noting that Ryan's comments are also entirely consistent with Rys Sommefeldt's comments on twitter that the default version of primitive shaders will work automatically and that there are no plans yet to enable manual developer override of the default primitive shader implementation for the reasons Ryan Smith noted:
bQa07s7.png

That reads like the drivers are already doing it. Unless I am reading that wrong?
 
now AMD needs to do the work in drivers
So you agree that an automatic mode for primitive shaders can be implemented in a subsequent driver update to RX Vega by AMD?

That reads like the drivers are already doing it. Unless I am reading that wrong?
Ryan Smith makes it crystal clear that the automatic mode for primitive drivers is not yet enabled in any public driver:
Primitive shaders are definitely, absolutely, 100% not enabled in any current public drivers.
 
So you agree that an automatic mode for primitive shaders can be implemented in a subsequent driver update to RX Vega by AMD?


Ryan Smith makes it crystal clear that the automatic mode for primitive drivers is not yet enabled in any public driver:

Never stated it won't be, I stated it won't show anything more then what we see already, and its going to be hard do with current games. Ryan stated the exact same thing. I just gave you a lot more details on how they would do it and why there are performance drawbacks when doing it that way with current games. As per this quote


Now do you understand why they didn't tell you everything? This is not as simple as a slide or two or a white paper that doesn't go in to details how they are doing it via drivers *shit if they said drivers take care of world hunger would you believe them? Software needs to be driving something right? I have worked with tesselators (software since the late 90's) and did a lot of work with truform and tried to emulate it on nV hardware but that didn't work too well, since DX or Ogl didn't expose or rather hardware didn't have it most likely as ATi's did. Did it through CPU though and worked well, since at the time skinning and animation were all CPU side anyways, it increased the bandwidth needs across the AGP port though which in some cases (outliers) caused slow downs.

Worse yet you will not get predictable performance increases while emulating, and to the contrary of that there are always performance drawbacks when emulating and chances are high of performance pitfalls when emulation is stressed enough. This is what happened with the emulation of the tesselator with displacement for the 9700 pro. The question is is Vega's shader array enough to manage current workloads and emulate the tessealtor and other shader functionality while delivering the same performance as before, lets not even talk about higher performance?

Maybe its possible in older games, in current games or games just about to come out that is questionable. Current games the chances are ok that you might get some benefits, not much because we can see its shader array is not scaling well in many tests. Future games or games coming out shortly forget about it cause yeah they will push Pascal's array more, and if that happens Vega can't even keep up with Pascal not unit for unit. And then you have the other bottleneck to worry about bandwidth. Virtually the same shader array in Fiji is getting hampered down but Vega is not going to have problems with doing things like this? Yeah gotta say not going to happen.

This is all if there is no draw backs for emulation of the tesselator and tessellation is used. If Tessellation isn't used then yeah they will get some benefits if the application is suitable to changed via driver to use primitive shaders. I can at this point say, I don't know any engine (AAA) that will be suitable for this type of replacement via drivers.

This all goes back to what I stated ALONG time ago one year ago about Vega's new features, they look like add ons after seeing what Maxwell had they were too far into the process of Vega that they couldn't do anything then "create something new from something old".

Simple just ask yourself does Vega inclusive of clocks be capable enough to give the same performance as Fiji while emulating parts of the pipeline everything from the VS to the GS and then create data that the PS can use? Lets not even get into more than before. Equal yeah I can see that, more than nope not going to happen, the chances of that is limited and even more limited in up coming games.

Emulation must be done at a lower level than what was programmed before also. Because now the programmer must look at a 1 to 1 unit to functionality differences and map them accordingly, which explains why AMD didn't do the SDK or add in cap bits or extensions to Vulkan. It is not an easy job to do nor will it map perfect in almost all instances.

So whom ever stated the inline asm is 100% correct that is the way it has to be done for now. Until that is done in drivers, an SDK or extensions will not be available to developers. Now they could have made it available but I think from their perspective, there has to be something seriously going haywire for them not to.
 
Last edited:
razor1 have you seen this yet?

https://hardforum.com/threads/amd-r...-in-ethereum-mining-efficiency-by-2x.1943165/

43MH for 130W! I believe this now contends with the 1070 for efficiency.


Yes I did see that, hmm its close to the efficiency of the rx 580, the 1070 is still a bit higher. But yeah its damn close, I would like to see the Vega 56 on this! At the moment this is only on ETH, other alt coins its much lower, I think this has to do with software more than anything else, but until that software is optimized for Vega, its not a good option right now, why would anyone keep mining ETH at this point while buying rigs for it? There is only about 4 to 5 months left to mine it.
 
Yes I did see that, hmm its close to the efficiency of the rx 580, the 1070 is still a bit higher. But yeah its damn close, I would like to see the Vega 56 on this! At the moment this is only on ETH, other alt coins its much lower, I think this has to do with software more than anything else, but until that software is optimized for Vega, its not a good option right now, why would anyone keep mining ETH at this point while buying rigs for it? There is only about 4 to 5 months left to mine it.

You mean before the diff gets too high? Or is there some ETH asic being released?
 
Eth is going to POS in Feb of next year, no more mining ;), at least that is what the Eth founders are saying, I'm hoping it will be longer than that, but don't have my hopes up cause it was supposed to go to POS this year.
 
Eth is going to POS in Feb of next year, no more mining ;), at least that is what the Eth founders are saying, I'm hoping it will be longer than that, but don't have my hopes up cause it was supposed to go to POS this year.

POS means?
 
Well the way it sounds like its going to be only proof of stake, no capability to mine anymore.

I might have to read up on this. I thought mining was how transactions occured and were verified...
 
yeah it is right now, they are changing the algorithm entirely, that is why I think mining will be gone.
 
razor1 have you seen this yet?

https://hardforum.com/threads/amd-r...-in-ethereum-mining-efficiency-by-2x.1943165/

43MH for 130W. I believe this now contends with the 1070 for efficiency.

Yes I did see that, hmm its close to the efficiency of the rx 580, the 1070 is still a bit higher. But yeah its damn close, I would like to see the Vega 56 on this! At the moment this is only on ETH, other alt coins its much lower, I think this has to do with software more than anything else, but until that software is optimized for Vega, its not a good option right now, why would anyone keep mining ETH at this point while buying rigs for it? There is only about 4 to 5 months left to mine it.

to both:

These RX Vega 64 43.5MH/s at 130W rumours are extremely misleading
 
Never stated it won't be, I stated it won't show anything more then what we see already, and its going to be hard do with current games.
Hold on a second. You've already said in three separate posts that you do believe that primitive shaders (NOT, I repeat NOT RPM) can achieve "x2 the triangle through put":
While using RPM and all these things too, that is the only way it can achieve x2 the triangle through put lol. otherwise its around Fiji per clock......
Doesn't matter if it uses FP 16 or FP 32, if using the fixed function pipeline, the triangle amounts will stall GCN's pipeline. The only way around that is to use its primitive shaders. FP 16 calculations are done in the shader array. But the GU's have to do all the work after *this is where the problem is* This is where AMD's primitive shaders come in, they communicate with the programmable GU's of Vega.

So if using FP 16 or FP 32, if the bottleneck is the fixed function GU's, would it matter if that demo used FP 16 or FP 32? Cause the GU's can handle only so many triangles before they stall the pipeline.
RPM for vertex processing gets not benefit unless you use primitive shaders for GCN! As it is right RPM isn't even available in any API and RPM alone will not give any benefit in geometry processing I should have been more clear about this, that is my fault but that doesn't change the fact that demo needed all of Vega's new pipeline to show its max through put which is coincidentally x2 the geometry through put that is in AMD's marketing material!.
Again, forget about RPM. I am not talking about RPM and I don't care about RPM. Aren't you saying in these posts that primitive shaders can actually achieve 2x the polygon throughput? As I understand it, RPM should contribute zero to polygon throughput even if it was working, so it has nothing to do with the polygon throughput. The polygon throughput gains you are talking about here should ALL be from primitive shaders, right?
 
the way he speaks and make claims sometimes remember to some anarchist that joined the heavy forces of the AMD rebellion on the forum. lol..
I'll only address this point once. Please don't confuse me with someone that made outlandish, rambling claims with zero citations or evidence. I don't care if you agree with me or don't, but at least acknowledge that I make an effort actually use sources, to cite those external sources I am relying on, and to articulate my arguments coherently.
 
I'll only address this point once. Please don't confuse me with someone that made outlandish, rambling claims with zero citations or evidence. I don't care if you agree with me or don't, but at least acknowledge that I make an effort actually use sources, to cite those external sources I am relying on, and to articulate my arguments coherently.
I actually probably understand about 10% of what you guys are arguing about, but I think I agree with you.
 
I'll only address this point once. Please don't confuse me with someone that made outlandish, rambling claims with zero citations or evidence. I don't care if you agree with me or don't, but at least acknowledge that I make an effort actually use sources, to cite those external sources I am relying on, and to articulate my arguments coherently.

point granted.. =)
 
Hold on a second. You've already said in three separate posts that you do believe that primitive shaders (NOT, I repeat NOT RPM) can achieve "x2 the triangle through put":



Again, forget about RPM. I am not talking about RPM and I don't care about RPM. Aren't you saying in these posts that primitive shaders can actually achieve 2x the polygon throughput? As I understand it, RPM should contribute zero to polygon throughput even if it was working, so it has nothing to do with the polygon throughput. The polygon throughput gains you are talking about here should ALL be from primitive shaders, right?


With out RPM I don't think primitive shaders can in a full game, in demos yeah pretty sure they can show it if they are only showing that and nothing else (this is where that chart you linked from the white paper will fit right in, it can do it but in the most extremely favorably situations where there are no other needs), the reason why, cause primitive shaders will need to use the shader array, there is only so much resources the the shader array has. So if used for primitive shaders you already locked yourself down in the pixel and compute shader side of things, which also are contesting for those same resources.

You can't have your cake and eat it comes to mind. Either you do primitive shaders and bypass the fixed function GU or use the fixed function GU and have more resources for the rest of the stuff. That's why that RPM demo showed x2 the through put. it showed without RPM its doing 500k, or something like that triangles per frame, and with RPM it went up to 1 million.

These are max outputs for primitive shaders and discard

without it

4 tris per clock can be discarded (which is pretty much the 4 geometry units)

with it

11 tris per clock can be discarded (bypassing the geometry units and using the shader array to do the discard)

The problem is while doing 11 tris per clock discard, you are now using 3 times the shader resources to do the discard, lets go conservative here, lets see, the GU calculations amounts for discard is around 1 full block (primitive discard requires quite a few steps and is going to be dependent also on how many triangles there are to begin with, so more triangles more resources needed), when using primitive discard that will be distributed across the array, 3 times that, is 3 blocks, How many blocks of CU's does Vega have?

Without RPM Vega won't have the shader power even raw shader power to do much after that. What about all the other pixel shaders we are using right now? Ambient occlusion, all the different types of bump mapping, displacement mapping, and many others.

Now I'm am still not sure when the primitive shader is doing the discard either, is it doing it after tessellation or prior? Prior is how the fixed function set does it, and makes sense since you don't need to tessellate discard triangles, if its doing it after which seems like the way its set up (this is how Dice's presentation was set up via compute shaders and later on put into Polaris before the tessellate), there will be a bunch more problems. Problems that can be solved, but problems none the less and this will have a direct impact on shader throughput. Now if its emulating the tesselator, then it can do it before tessellation happens so yeah its could be doing it before. This is why I came to the conclusion they have to be emulating tessellation as well, still hardware driven but using the shader units.

And this all comes back to does it make sense to have a programmable geometry pipeline at this time with current nodes and transistor amount limits.....

Doing a unified shader system back with the G80 and r600 did for vertex, compute, and pixel shaders made sense, because vertex shader needs were not really increasing as fast as before and more needs for pixel shaders was needed and then later on compute shaders and pixel shader needs drop. But with tessellation + the higher polygon limits games are pushing now, those pixel shader needs and compute shader needs increase based on those polygon limits. The ratios of all 3 were changing and more are going towards compute. When using primitive shaders to do what vertex shaders do and tessellation AMD put the burden on the shader array entirely which is going to be quite high and this is if everything is copacetic.

Now with RPM, the resource contention is reduced because well using that gives double the possible raw compute performance in certain operations like vertex set up. That 3 times will get cut down to 1.5 times. Then it can have plenty of resources left for the other shaders. Still not as much as before, but hell of a lot better right?
 
Last edited:
primitive shaders will need to use the shader array
I think I finally understand the root of our disagreement. If I understand you correctly, you believe that primitive shaders will bypass the existing geometry engine entirely and do the geometry processing directly on the compute engine, thereby taking resources away from the rest of the graphics pipeline, yes?

That's not my understanding at all. As far as I understand it, it is Vega's geometry engine itself with has replaced the formerly fixed function shaders of the geometry engine with programmable generalized non-compute shaders, and that right now those programmable non-compute shaders are operating in a legacy mode where they mimic the traditional GCN geometry engine. It's my understanding that it is those generalized non-compute shaders located inside the geometry engine that can be reprogrammed into primitive shaders, meaning there should be zero impact on shader array usage whatsoever from whether or not primitive shaders are enabled.

Just to clarify for anyone else reading, I'll post the Vega10 block diagram for reference:

FN5zjTW.png
 
I think I finally understand the root of our disagreement. If I understand you correctly, you believe that primitive shaders will bypass the existing geometry engine entirely and do the geometry processing directly on the compute engine, thereby taking resources away from the rest of the graphics pipeline, yes?

yes that is exactly what I'm saying

That's not my understanding at all. As far as I understand it, it is Vega's geometry engine itself with has replaced the formerly fixed function shaders of the geometry engine with programmable generalized non-compute shaders, and that right now those programmable non-compute shaders are operating in a legacy mode where they mimic the traditional GCN geometry engine. It's my understanding that it is those generalized non-compute shaders located inside the geometry engine that can be reprogrammed into primitive shaders, meaning there should be zero impact on shader array usage whatsoever from whether or not primitive shaders are enabled.


Don't think its happening that way right now, because Vega if doing it that way there is no need for the fixed parts at all, this will create some issues with older games cause DX12 is not backwards compatible so a DX 12 card if emulating will have to run DX11 or 10 or what not by emulating an entire API. That would mean a butt load of work on AMD's part (right now with all DX functionality its MS's burden) which isn't worth the money to do that and would get a lot of bad press. Just see how complex it is to emulate Xbox one or PS4, it would be that much more per API.

We will see the performance increases right off the bat if the games actually work. Ask yourself if emulated the old pipeline how would it be getting the same performance (it really is getting the same performance per clock), emulation always has overhead one way or another. Why would this new pipeline (emulated) have the same restrictions too? It shouldn't right? cause the bottleneck for the polygon throughput is amount of geometry engines, nothing else. if those GE's are emulated already, then the bottleneck is removed they can just say use more shader units for GE operations and be done with. That is not what they are doing right now. That is what they will be done with primitive shaders though.

There is a reason why the API is there its to govern the way programming is done, the underlying hardware can do anything it wants to but if it can't do those steps without having problems or doing it differently (aka primitive shaders) its not going to get certified. And that is where AMD would get stuck if they are all ready emulating the pipeline cause as I stated before all games before with older Dx's will be on AMD's head.

Actually if they are able to emulate it that well, then there would be no need for primitive shaders ;)

BTW from early presentations of Vega I too thought that was what they were doing, but now looking at it, doesn't make sense with what we saw at launch.
 
Last edited:
Ask yourself if emulated the old pipeline how would it be getting the same performance (it really is getting the same performance per clock), emulation always has overhead one way or another. Why would this new pipeline (emulated) have the same restrictions too? It shouldn't right?
From my understanding, the new geometry engines are not emulating the old pipeline. I think it would be more accurate to say that Vega's geometry engines are presently configured like Fiji's were, or programmed like Fiji's were. Basically, the shaders inside of Vega's four geometry engines are no longer fixed function, instead they can be programmatically told what kind of shader behaviour to have (e.g. domain, hull, vertex, geometry), and once they have been programmatically configured to act as a certain type of shader, there should be minimal overhead for them continuing to do so (on anoher forum I saw a few people compare it to a shader based version of Larabee). The automatic form of primitive shaders being developed by RTG is going to be a driver defined geometry engine configuration that reconfigures the shaders within Vega's four next generation geometry engines to function as primitive shaders as RTG defines them in the Vega whitepaper.

The wording used by the Vega whitepaper itself makes it pretty clear that the next generation geometry engines are the new geometry engines in Vega, not uses of the shader array for geometry processing (apologies for quoting this again, but I think it's really relevant here:
Next-generation geometry engine

To meet the needs of both professional graphics and gaming applications, the geometry engines in “Vega” have been tuned for higher polygon throughput by adding new fast paths through the hardware and by avoiding unnecessary processing. This next-generation geometry (NGG) path is much more flexible and programmable than before.
The “Vega” 10 GPU includes four geometry engines which would normally be limited to a maximum throughput of four primitives per clock, but this limit increases to more than 17 primitives per clock when primitive shaders are employed.⁷
I think it's pretty clear that they are talking about the four geometry engines functioning as primitive shaders, rather than bypassing the geometry engines and doing the geometry processing on the shader array.

cause the bottleneck for the polygon throughput is amount of geometry engines, nothing else. if those GE's are emulated already, then the bottleneck is removed they can just say use more shader units for GE operations and be done with. That is not what they are doing right now. That is what they will be done with primitive shaders though.

There is a reason why the API is there its to govern the way programming is done, the underlying hardware can do anything it wants to but if it can't do those steps without having problems its not going to get certified.

Actually if they are able to emulate it that well, then there would be no need for primitive shaders ;)
The bottleneck is still there right now because Vega's four geometry engines are currently programmed by the drivers to act the same as the previous fixed function geometry engines of Fiji. I would assume this was done to not hold up development of the card while RTG worked on developing the new automatic primitive shader configuration for the four next generation geometry engines.
 
From my understanding, the new geometry engines are not emulating the old pipeline. I think it would be more accurate to say that Vega's geometry engines are presently configured like Fiji's were, or programmed like Fiji's were. Basically, the shaders inside of Vega's four geometry engines are no longer fixed function, instead they can be programmatically told what kind of shader behaviour to have (e.g. domain, hull, vertex, geometry), and once they have been programmatically configured to act as a certain type of shader, there should be minimal overhead for them continuing to do so (on anoher forum I saw a few people compare it to a shader based version of Larabee). The automatic form of primitive shaders being developed by RTG is going to be a driver defined geometry engine configuration that reconfigures the shaders within Vega's four next generation geometry engines to function as primitive shaders as RTG defines them in the Vega whitepaper.

The wording used by the Vega whitepaper itself makes it pretty clear that the next generation geometry engines are the new geometry engines in Vega, not uses of the shader array for geometry processing (apologies for quoting this again, but I think it's really relevant here:


I think it's pretty clear that they are talking about the four geometry engines functioning as primitive shaders, rather than bypassing the geometry engines and doing the geometry processing on the shader array.



The bottleneck is still there right now because Vega's four geometry engines are currently programmed by the drivers to act the same as the previous fixed function geometry engines of Fiji. I would assume this was done to not hold up development of the card while RTG worked on developing the new automatic primitive shader configuration for the four next generation geometry engines.


You are reading it wrong, its pretty much saying primitive shaders are separate from the geometry units.

You can't have it both ways, either CU's or they are fixed function geometry units. If you are not emulating then they are fixed function. If they are emulating them they are CU's. Pick one.

As I stated if they have removed the Fixed function units (when you say programmed to work like a fixed function unit, I say its emulated because the fixed function part is has its own set of inputs, outputs, extensions. You can't have a CU do that unless its emulating it.

When using compute shaders or in this case primitive shaders, you lose certain things, like all the bandwidth compression techniques.

This is not a simple thing to just replace the fixed function portion of the pipeline.

And just a question how is the white paper saying its more programmable when its not accessible yet ;)

Don't read into too much when its feeding us a line of BS......

Now if they are truly programmable units , like CU's like you stated, the bottleneck will not be there, that is what you don't understand, if they are CU's it would be easy to shift more CU resources to them if they start getting clogged up. Its really easy, it would be just like the way a shader array functions currently, if you need more pixel shader resources, the shader array is set up for that, if you need compute, it can shift over to that, if it needs more vertex, it will shift over to that. No restrictions.
 
Last edited:
What exactly are you both arguing about? Who gives a shit if the white paper is full of promises. The feature either doesn't work as outlined or isn't worth implementing. The proof is in the performance, not documentation. Just because the card might be able to do something doesn't mean it actually matters.
 
What exactly are you both arguing about? Who gives a shit if the white paper is full of promises. The feature either doesn't work as outlined or isn't worth implementing. The proof is in the performance, not documentation.


he is trying to say Vega's polygon throughput is going to fixed via drivers and it will return to glorious performance levels and his reasoning is based on things AMD has said but, never going to happen cause its just not, pretty much what you said there is nothing that AMD shown that would give any confidence that it will. Nor do I see any technical difference between CU's doing the fixed function part of the pipeline vs. primitive shaders.
 
he is trying to say Vega's polygon throughput is going to fixed via drivers and it will return to glorious performance levels and his reasoning is based on things AMD has said but, never going to happen cause its just not, pretty much what you said there is nothing that AMD shown that would give any confidence that it will. Nor do I see any technical difference between CU's doing the fixed function part of the pipeline vs. primitive shaders.

You are both arguing beliefs and trying to back them up with interpretations of possible facts. Y'all need to go hug it out.
 
You are both arguing beliefs and trying to back them up with facts and interpretations.


simple fact if you want programability (flexibility) in silicon you need to use transistors, just compare the transistor amounts from the Fiji's GU's and Vega's GU's, I just did, they are about the same factoring in node size differences. AMD already told us where all those extra transistors in Vega went too, pretty much to increase its clock speed and address its front end problems like they did with Polaris.

Noone can dispute taking Polaris and extrapolating it to 4k CU's it ends up near Vega's die size (bus changes and all) So what we got here are features that don't have supporting hardware or hardware that are not able to fully accelerate those features and current features.

AMD just can't be expected to make everything under sun the way people think it is going to be made, there are cost and silicon limitations, not to mention API restrictions. And I pretty much listed all of those out. Its not theoretical at all, those limitations are there and can't be changed by AMD no matter what.

This is what I don't get, why or better yet HOW can a company that has crap R&D compared to the competition be held up to so high in expectation? They put all their eggs into Zen, and they couldn't out do Intel's old architectures lol, but they expect RTG to just be able to flip a switch and make Vega a real competitor to Pascal in all fronts? No, anyone that has any inkling of understanding of the silicon markets knows that is far fetched.

stuff from the white paper Rasterizer didn't read or is ignoring

In a typical scene, around half of the geometry will be
discarded through various techniques such as frustum
culling, back-face culling, and small-primitive culling. The
faster these primitives are discarded, the faster the GPU
can start rendering the visible geometry. Furthermore,
traditional geometry pipelines discard primitives after
vertex processing is completed, which can waste computing
resources and create bottlenecks when storing a large batch
of unnecessary attributes. Primitive shaders enable early
culling to save those resources.

One factor that can cause geometry engines to idle is
context switching. Context switches occur whenever the
engine changes from one render state to another, such as
when changing from a draw call for one object to that of a
dierent object with dierent material properties. The
amount of data associated with render states can be quite
large, and GPU processing can stall if it runs out of
available context storage. The IWD seeks to avoid this
performance overhead by avoiding context switches
whenever possible.

These problems will not occur if these operations are done a CU hence why I stated its not the GU that is programmable, GU =/ primitive shaders (CU)


Also they even say in the white paper

Primitive shaders will coexist with the standard hardware
geometry pipeline rather than replacing it
.

Come on Rasterizer it was right there. Page 7
 
Last edited:
Back
Top