Vega Rumors

Factum · Sep 4, 2017

I can see that razor1 is doing such a good job...that this thread is attracting lot of new pro-AMD posters...same story as always:
Wait for...

Hameeeedo · Sep 4, 2017

razor1 said:
Its like when nV stated Async compute was disabled in Maxwell drivers, cause yeah after the first instance of doing it the next instance it broke the driver lol, it couldn't reallocate the SM without flushing it. If they were capable of doing that they would have not disabled it in the first place.

Words of gold right there.

noko · Sep 4, 2017

Some notes:
Battlegrounds, an Unreal 4 game engine performance went up (claimed that is from AMD) 18% from Vega driver 17.8.1 to 17.8.2. That is a sizable increase in performance just that the notes don't tell you what allowed that increase. If some of Vega features are driver implemented per game, then we should be seeing something like above over and over again per game if that is the case.

http://support.amd.com/en-us/kb-art...mson-ReLive-Edition-17.8.2-Release-Notes.aspx

Proof will be in in the pudding. I think I will collect data as time passes on Drivers and Vega performance.

razor1 · Sep 4, 2017

noko said:
Some notes:
Battlegrounds, an Unreal 4 game engine performance went up (claimed that is from AMD) 18% from Vega driver 17.8.1 to 17.8.2. That is a sizable increase in performance just that the notes don't tell you what allowed that increase. If some of Vega features are driver implemented per game, then we should be seeing something like above over and over again per game if that is the case.

http://support.amd.com/en-us/kb-art...mson-ReLive-Edition-17.8.2-Release-Notes.aspx

Proof will be in in the pudding. I think I will collect data as time passes on Drivers and Vega performance.

battlegrounds they were getting killed to begin with most likely something that was going wrong to begin with was fixed. Things like that are to be expected.

noko · Sep 4, 2017

razor1 said:
battlegrounds they were getting killed to begin with most likely something that was going wrong to begin with was fixed. Things like that are to be expected.

Agreed, we just don't know what was fixed or optimized. If we start seeing more examples like that, then per game optimizations may exposed more of Vega potential or hardware ability. Could be all BS or AMD just did not have the resources like Nvidia to implement everything or to verify it will work on a broad base of PC systems and software. In other words drivers showed the hardware working but for specific conditions that in general for consumer systems would break - hence per game optimizations and a more robust general implementation if possible. We just do not know one way or another. I would not rule out potential big gains as time goes on. RTG is not Nvidia in the drivers department - not even close.

cybereality · Sep 4, 2017

Battlegrounds is also the worst optimized game of all time, I can barely get 60fps with a Titan X Pascal on 1080p.

Araxie · Sep 4, 2017

cybereality said:
Battlegrounds is also the worst optimized game of all time, I can barely get 60fps with a Titan X Pascal on 1080p.

that game it's a heavy CPU burden, remember that first of all it's an early access game. so really hard to even try to optimize, I think for the kind of game it is at that stage, it perform good, nobody can't expect a game that's beta to be optimized, Idk why are you barely reaching 60 FPS with the Titan X Pascal but I've just tested with a 1080 and the FPS are anywhere from 65 to 95.. but that's with an OC'd 7700K, it's also weird that my 390x go anywhere from 50 to 70fps.

optimizations for that game will come after the first official release of the game, specially given it's popularity.

Edit: Found this from a couple of months ago with a 1080Ti, so it match what I should be achieving with the GTX 1080.

_mockingbird · Sep 4, 2017

noko said:
Some notes:
Battlegrounds, an Unreal 4 game engine performance went up (claimed that is from AMD) 18% from Vega driver 17.8.1 to 17.8.2. That is a sizable increase in performance just that the notes don't tell you what allowed that increase. If some of Vega features are driver implemented per game, then we should be seeing something like above over and over again per game if that is the case.

http://support.amd.com/en-us/kb-art...mson-ReLive-Edition-17.8.2-Release-Notes.aspx

Proof will be in in the pudding. I think I will collect data as time passes on Drivers and Vega performance.

Steve from Hardware Unboxed said that driver 17.8.2 didn't increase performance in PlayerUnknown's Battlegrounds

Dayaks · Sep 4, 2017

_mockingbird said:
Steve from Hardware Unboxed said that driver 17.8.2 didn't increase performance in PlayerUnknown's Battlegrounds

That happens a lot with both nVidia and AMD. A lot of times the increases are for a very specific set of circumstances. If I see 20% in patch notes and get 5% I am happy.

razor1 · Sep 4, 2017

Dayaks said:
That happens a lot with both nVidia and AMD. A lot of times the increases are for a very specific set of circumstances. If I see 20% in patch notes and get 5% I am happy.

yeah probably specific levels or parts of levels.

Rasterizer · Sep 4, 2017

razor1 said:
Just search on youtube, you will find 2 videos both from nV for how driver developement and chip emulation is done.

I watched both videos. That Nvidia uses emulation of GPUs while they are in development to allow simultaneous development of the drivers and testing prior to tape out is very interesting, however, I don't see how it follows that because the drivers are developed concurrently with emulation of the GPU prior to tapeout, that the drivers have to be totally finished prior to tape out. Nevermind that evidence that Nvidia does it this way is not evidence that RTG does it the same way.

Much more importantly, even if it was the general rule that the drivers had to be feature complete prior to tapeout, the very nature of Vega's next generation geometry engine is that it is made up of generalized non-compute shaders which can, by definition, by given novel configurations purely through drivers. Your own prior comments in this discussion show that you yourself recognize that novel configurations of the NGG fast path are possible to develop after Vega was taped out, since otherwise your claims that game developers would have to implement primitive shaders on a per-game basis would be nonsensical, since that would be impossible if it was impossible to implement novel configurations of the generalized non-compute shaders in Vega's new geometry engines after the card had been taped out.

Video about RPM, primitive discard, told you where to find that, youtube search, Raja, RPM, Vega, discard, you should be able to find it. I just did.

I watched that entire presentation and there was literally not a single mention of primitive shaders anywhere in the video. You still haven't given any sort of logical explanation of why a new feature in the geometry engine would depend in any way whatsoever on a new feature for the compute units.

What I'm I talking about, you don't remember DICE's presentation on how to get better primitive discard with Frostbyte 3 engine? Yeah they had a huge presentation about it, and this is why that engine doesn't hurt AMD with polygon through put...... Look it up, easy to find. This is what AMD used as an example for the changes they did in Polaris so it helps their front end. Lots of good that did it! Still has polygon through put problems comparing it to current gen nV cards. They did finally catch up to last gen nV cards though of the same bracket.

First of all, I don't see why a discussion of how to implement better primitive culling into a game engine would preclude RTG from also wanting to, and attempting to, implement primitive culling functionality directly into their geometry engines since geometry engine polygon throughput is a known structural bottleneck for high shader count GCN GPUs.

Second of all, in saying "This is what AMD used as an example for the changes they did in Polaris so it helps their front end. Lots of good that did it! Still has polygon through put problems comparing it to current gen nV cards" you are literally agreeing entirely with the initial post I made where I presented evidence that Vega is severely polygon throughput bottlenecked using its legacy geometry engines at the same 4 triangles per clock that front end bottlenecked Fiji.
[qupte]I know Rys is lying well not saying everything that isn't lying, just not telling the whole story, cause I know him, and have know him for years, and the way he sees and posts thing have changed since he started working at RTG![/quote]
This amounts to making a claim on the basis of claiming your uncle works at Nintendo. "Because I said so!" is not evidence.

zcash

Sure, cryptocurrencies are neither my area of interest nor expertise, so I'll say you are entirely right about Zcash. Even so, whether or not Vega is particularly great at Zcash does not mean its seriously impaired in compute perfomance, especially in light of Vega performing quite well in an array of GPU compute benchmarks I linked to you, and which you summarily ignored.

Can you at least articulate a reasoning for why you believe Vega's performance in Zcash (but not Vega's performance in a bunch of actual compute benchmarks) mean there is a problem with Vega's compute performance, and in turn how you believe that problem with compute performance would in any way translate to poor gaming performance on a GPU you yourself have already admitted is polygon throughput bottlenecked?

OutOfPhase · Sep 4, 2017

When there's a number like 20% or higher, assume there was something just basically broken before and now it isn't.

Back when I was a contractor helping a company write some GL apps (scientific, not gaming), you could very clearly tell which calls were used by iD's engines. Those all ran fast. Everything else had a very good chance of not working at all or falling back to CPU rendering. Sometimes we could pressure the companies into fixing this, and I was always amused when the patch notes would come out citing "optimization" and some huge number. Yeah, because now the GPU was actually doing something, heh.

I'm sure it's different now, I've been out of that field for a long time, but I suspect similar effects are at play.

Rasterizer · Sep 4, 2017

Dayaks said:
It's hilarious that you want more substance from other people but your posts have like a 1% ratio of useful data to fluff.

Considering AMD isn't touting the "fine wine" kind of language of your original post, which they love to exaggerate things, makes me extremely skeptical of any large future improvements. I enjoy trending history and correlating it to the present and future from a high level. I have never seen AMD pull off something like this and they've given themselves enough setups for it.

If by some chance they did update drivers six months from now Volta will pummel it anyways. If I was them I would call Vega a wash, get what they can before mining crashes, and focus everything on Navi.

Please feel free to engage with the substantive discussion if you would like to contribute to it. I would strongly prefer that to this kind of pointless trolling.

Honestly I'd like to see proof from you that AMD is actually planning on increasing the throughput in an automatic way as you are claiming.

As I linked to earlier in the thread, both Rys Sommefeldt and Ryan Smith said so on the record, and the Vega whitepaper does not indicate anywhere that primitive shaders require developer input. Unless you are going to join Razor in claiming that one Ryan is a bald faced liar and the other a sucker, I think that has to be regarded as decent evidence that primitive shaders are intended to work automatically.

Dayaks · Sep 4, 2017

Rasterizer said:
Please feel free to engage with the substantive discussion if you would like to contribute to it. I would strongly prefer that to this kind of pointless trolling.

As I linked to earlier in the thread, both Rys Sommefeldt and Ryan Smith said so on the record, and the Vega whitepaper does not indicate anywhere that primitive shaders require developer input. Unless you are going to join Razor in claiming that one Ryan is a bald faced liar and the other a sucker, I think that has to be regarded as decent evidence that primitive shaders are intended to work automatically.

I am sorry that things like high level realites is considered trolling to you.

You completely ignored this part and did a personal attack, "Honestly I'd like to see proof from you that AMD is actually planning on increasing the throughput in an automatic way as you are claiming."

Rasterizer · Sep 4, 2017

Presbytier said:
That is not how you launch a product especially a performance based product. You are upfront that your cards greateset performance is locked behind a feature not yet implement, so you can at least drive sales. What you don't do is go quiet about a feature and launch the card praying for the best. You only get one launch. Very few outlets are going to bother to retest the card after this supposed driver comes out not to mention all the consumers who decided to just go GTX 1080/1080ti based on Vegas current performance are not going to suddenly buy another card 3-6 months later. How does releasing a gimped product help shareholders?

I agree with you completely. As I said, I would not have even considered launching Vega without the drivers being in a much better state than they were on launch (especially for FE!) and remain. I certainly have no interest in trying to defend RTG's business decisions. My only interest in Vega is in trying to understand why it is delivering the performance it currently is, and what the implications might be of the drivers finally at least becoming feature complete at some point in the near future.

If you actually want to know why I got interested in this and started following Vega's development, the answer was trying to understand an apparent contradiction between Vega's performance in professional 3D rendering applications and Vega's performance in games. I still don't feel like I have an adequate answer to that apparent contradiction.

razor1 · Sep 4, 2017

Rasterizer said:
watched both videos. That Nvidia uses emulation of GPUs while they are in development to allow simultaneous development of the drivers and testing prior to tape out is very interesting, however, I don't see how it follows that because the drivers are developed concurrently with emulation of the GPU prior to tapeout, that the drivers have to be totally finished prior to tape out. Nevermind that evidence that Nvidia does it this way is not evidence that RTG does it the same way.

The whole industry does it the same way! man you don't understand where money comes from and goes when developing complex projects. The more money you can save by ensuring manufacturing goes without a hitch the better, cause once ya get into manufacturing now you will burn money if something goes wrong at an accelerated rate. On top of what I explained already, there are many contractural obligations not with just the fab, but all manufactures of all components of the card!

Much more importantly, even if it was the general rule that the drivers had to be feature complete prior to tapeout, the very nature of Vega's next generation geometry engine is that it is made up of generalized non-compute shaders which can, by definition, by given novel configurations purely through drivers. Your own prior comments in this discussion show that you yourself recognize that novel configurations of the NGG fast path are possible to develop after Vega was taped out, since otherwise your claims that game developers would have to implement primitive shaders on a per-game basis would be nonsensical, since that would be impossible if it was impossible to implement novel configurations of the generalized non-compute shaders in Vega's new geometry engines after the card had been taped out.

Err? Any compute shader can do what AMD's shaders do with geometry, even nV's.......

Do you understand its the same shader units doing geometry shading right? Its a unified system.....

I watched that entire presentation and there was literally not a single mention of primitive shaders anywhere in the video. You still haven't given any sort of logical explanation of why a new feature in the geometry engine would depend in any way whatsoever on a new feature for the compute units.

Sure you listened to the whole thing 8 mins, 15 mins or so into the video they mention programmable geometry engine, how do you suppose they get that, yeah, primitive shaders, and then he says he will show the RPM demo later on. Told ya have to listen to the whole thing.

First of all, I don't see why a discussion of how to implement better primitive culling into a game engine would preclude RTG from also wanting to, and attempting to, implement primitive culling functionality directly into their geometry engines since geometry engine polygon throughput is a known structural bottleneck for high shader count GCN GPUs.

Second of all, in saying "This is what AMD used as an example for the changes they did in Polaris so it helps their front end. Lots of good that did it! Still has polygon through put problems comparing it to current gen nV cards" you are literally agreeing entirely with the initial post I made where I presented evidence that Vega is severely polygon throughput bottlenecked using its legacy geometry engines at the same 4 triangles per clock that front end bottlenecked Fiji.

This amounts to making a claim on the basis of claiming your uncle works at Nintendo. "Because I said so!" is not evidence.

I know him period have know him for over 15 years now since 2000 And I don't like him, (only recently), he is the the worst type of AMD employee, and he should relinquish his B3D position or leave RTG one of the other, its a conflict of interest and is bad for the press.

Sure, cryptocurrencies are neither my area of interest nor expertise, so I'll say you are entirely right about Zcash. Even so, whether or not Vega is particularly great at Zcash does not mean its seriously impaired in compute perfomance, especially in light of Vega performing quite well in an array of GPU compute benchmarks I linked to you, and which you summarily ignored.

There are numerous compute benchmarks that show how erratic Vega's shader performance is, just in one suite of synthetic tests! The B3D one.

Can you at least articulate a reasoning for why you believe Vega's performance in Zcash (but not Vega's performance in a bunch of actual compute benchmarks) mean there is a problem with Vega's compute performance, and in turn how you believe that problem with compute performance would in any way translate to poor gaming performance on a GPU you yourself have already admitted is polygon throughput bottlenecked?

IT has more THAN ONE bottleneck, its bandwidth limited, lt's polygon throughput limited and its got shader throughput issues not associated with polygon through put!

THIS IS NOT A FORWARD LOOKING architecture, its a 4 generation old architecture which was on its last legs with Fiji, but AMD had no choice but to keep using it.

Rasterizer · Sep 4, 2017

Dayaks said:
I am sorry that things like high level realites is considered trolling to you.

Look, if we are going to stick to abstract high level realities and not engage with the substance and the evidence currently available, then we might as well say that in the long run we are all dead and there is no point to discussing GPUs online. The only reason I bothered to post here about this at all was that it looked like some people were starting to get interested in the question of what was bottlenecking Vega in gaming so severely that Vega 56 was delivering 100% of Vega 64 performance at the same clocks. I've actually been trying to understand what is bottlenecking Vega at present and I couldn't care less about the crap flinging that goes on back and forth between people with bizarre loyalties to GPU makers.

razor1 · Sep 4, 2017

Rasterizer said:
Look, if we are going to stick to abstract high level realities and not engage with the substance and the evidence currently available, then we might as well say that in the long run we are all dead and there is no point to discussing GPUs online. The only reason I bothered to post here about this at all was that it looked like some people were starting to get interested in the question of what was bottlenecking Vega in gaming so severely that Vega 56 was delivering 100% of Vega 64 performance at the same clocks. I've actually been trying to understand what is bottlenecking Vega at present and I couldn't care less about the crap flinging that goes on back and forth between people with bizarre loyalties to GPU makers.

What do you want me to link the sythenthics GN, at fiji clocks.......

Everyone can see it

Here you go

http://www.gamersnexus.net/guides/2977-vega-fe-vs-fury-x-at-same-clocks-ipc

Clock for clock its not different than Fiji and with its increased clocks it doesn't scale as well as it should (even in geometry processing, its worse in metro last light) Where is the x2 peak per clock geometry increases? Yeah BS that is with primitive shaders and RPM that is the only way they can achieve that.

http://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/18

B3D test suite

Vega is all over the place

Texture fillrate is bad, pixel fillrate is bad, it should be higher in pixel fillrate, much higher than the 1080ti.

So bandwidth limited in texture fillrate test, shader limited in the pixel fillrate test.

Lets go to compute performance now its hit or miss

http://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/17

Folding @ home which is traditionally a strong showing for AMD, its gets clobbered. Same with Fiji, both are near the rx580 which should not happen. Shader limited in a compute test

Geekbench another one they should do well in, it gets clobbered. Same results, shader limited in a compute test.

Why do you need us to show you these things, that is already know to us but you don't want to do the work to find it? I mentioned them a quick good search you can find them.

The only compute synthetic they actually represent the full capacity of Vega's array is blender all other compute tests are lower than what the theoretical flops ratio differences are to Pascal.

And these tests don't push anything other than compute, so low and behold my original conclusion along with Zcash, its shader limited

Do you think we sit here and just amuse ourselves by running you around in circles? Or have time to spoon feed everything that has already been talked about in this thread? I don't come down to conclusions unless I have a good idea of what is going on. And there is way too much empirical evidence that shows it having multiple problems across all types of synthetics affecting multiple parts of Vega

Rasterizer · Sep 4, 2017

razor1 said:
The whole industry does it the same way!

"Because I said so" is not evidence. It's of secondary importance, in any case, since even if I assume you are correct about that, it wouldn't apply in this case because a programmable geometry pipeline is by definition programmable.

Err? Any compute shader can do what AMD's shaders do with geometry, even nV's.......

Do you understand its the same shader units doing geometry shading right? Its a unified system.....

Here, I'll just quote an explanation from someone much more conversant with the subject than myself:

As I described earlier, a non-compute shader looks to the GPU as a set of metadata defining the inputs, the shader itself and metadata for the output. The metadata specifies data sources and launch setup and the output metadata describes what to do with the shader result. For conventional shaders, the patterns found in the metadata are well defined: there's a well-defined set of vertex buffers usage models, or the fragment shader has options such as whether it is able to write depth. These patterns are so well defined, they're baked into the hardware as simple options that are off or on, and each set is geared towards one or more shader types.

These new higher-level shaders sound as if they are improvisational. It could be that AMD has generalised the hardware support for non-compute shaders. In a sense it would appear as though non-compute shaders now have fully exposed interfaces for input, launch and output metadata. If true, then this would mean that there isn't explicit hardware support for primitive shader, or surface shader. etc. Each of these new types of shader is constructed using the low-level hardware parameters.

In effect the entire pipeline configuration has, itself, become fully programmable:

Want a buffer? Where? What does it connect? What's the load-balancing algorithm? How elastic do you want it?

Want a shader? Which inputs types and input buffers do you want? Load-balancing algorithm? Which outputs do you want to use?

These concepts are familiar to the graphics API user, as there are many options when defining pipeline state. But this would seem to be about taking the graphics API out of the graphics pipeline! Now it's a pipeline API expressed directly in hardware. Graphics is just one use-case for this pipeline API.

Hence the talk of "low level assembly" to make it work. That idea reminds me of the hoop-jumping required to use GDS in your own algorithms and to make LDS persist across kernel invocations. I've personally not done this stuff, but this has been part of the programming model for a long long time and so a full featured pipeline API in the hardware would be the next step and, well, not that surprising.

Of course Larrabee is still a useful reference point in all this talk of a software defined GPU - you could write your own pipeline instead of using what came out of the box courtesy of the driver.

To generalise a pipeline like this is going to require lots of on-die memory, in a hierarchy of capacities, latencies and connectivities. Sounds like a cache hierarchy? Maybe ... or maybe something more focussed on producer-consumer workloads, which isn't a use-case that caches support directly (cache-line locking is one thing needed to make caches support producer-consumer). GDS, itself, was created in order to globally manage buffer usage across the GPU, providing atomics to all CUs so that a single kernel invocation can organise itself across workgroups.

So, in this model the driver doesn't know about primitive and surface shaders. It just exposes hardware capabilities. The driver team has to code the definitions of these pipelines and then produce a set of metrics that define how the driver would choose which kind of pipeline to setup. So if the driver detects that the graphics programmer is writing to a shadow buffer (lots of little clues in the graphics pipeline state definition!) it would deploy primitive shader type 5B. The driver doesn't know it's a primitive shader, the hardware doesn't either, it is merely a kind of pipeline that produces the effect that AMD is calling "primitive shader"

By definition a geometry engine made up of these type of generalized non-compute shaders would be able to accept novel configurations even after release.

Sure you listened to the whole thing 8 mins, 15 mins or so into the video they mention programmable geometry engine, how do you suppose they get that, yeah, primitive shaders, and then he says he will show the RPM demo later on. Told ya have to listen to the whole thing.

I did listen to the whole thing. There was no demo at the end except for Raja revealing that the Vega would be called RX Vega and showing the marketing video reveal of that. Are you sure you didn't link the wrong video?

I know him period have know him for over 15 years now since 2000 And I don't like him, (only recently), he is the the worst type of AMD employee, and he should relinquish his B3D position or leave RTG one of the other, its a conflict of interest and is bad for the press.

Again, your claim amounts to "because I said so" without any external evidence.

There are numerous compute benchmarks that show how erratic Vega's shader performance is, just in one suite of synthetic tests! The B3D one.

It would be really appreciated if you could actually provide links to the items you refer to in the same post you are referring to them.

IT has more THAN ONE bottleneck, its bandwidth limited, lt's polygon throughput limited and its got shader throughput issues not associated with polygon through put!

THIS IS NOT A FORWARD LOOKING architecture, its a 4 generation old architecture which was on its last legs with Fiji, but AMD had no choice but to keep using it.

Can you please stop shouting? I'm trying to have a productive conversation here, I see no reason for your belligerence.

Right now, I don't see how you could even empirically test whether Vega was bandwidth limited or shader throughput limited in games when there is a polygon throughput bottleneck. The polygon throughput bottleneck is at an earlier stage in the pipeline, and so it would substantially obscure the effects of any subsequent bandwidth or shader throughput bottlenecks, would it not?

Dayaks · Sep 4, 2017

Rasterizer said:
Look, if we are going to stick to abstract high level realities and not engage with the substance and the evidence currently available, then we might as well say that in the long run we are all dead and there is no point to discussing GPUs online. The only reason I bothered to post here about this at all was that it looked like some people were starting to get interested in the question of what was bottlenecking Vega in gaming so severely that Vega 56 was delivering 100% of Vega 64 performance at the same clocks. I've actually been trying to understand what is bottlenecking Vega at present and I couldn't care less about the crap flinging that goes on back and forth between people with bizarre loyalties to GPU makers.

There you go with the personal attacks again. I have no loyalty nor did I mention nVidia. I had 3x 7970s and 1x Fury X. I have one more card to buy so if there is a valid boost coming AMD is a contender. At this point, a month after launch, I was asking if there was an actual claim from AMD any of this might happen. You're quick to hop on everyone else for solid facts so I thought it was a fair request.

razor1 · Sep 4, 2017

Rasterizer said:
"Because I said so" is not evidence. It's of secondary importance, in any case, since even if I assume you are correct about that, it wouldn't apply in this case because a programmable geometry pipeline is by definition programmable.

it is what it is, your inability to understand how things are done in the industry doesn't change facts. Ask PhaseNoise, you agreed with his post he worked at one of the 3 CPU and GPU makers so you will get a straight answer from him. And if you want more info, you can look up in the polaris thread, one of them, there is a guy that worked at AMD, and he said the same thing about Polaris, as I'm saying about Vega, nothing is going to change via drivers (nothing spectacular)

Here, I'll just quote an explanation from someone much more conversant with the subject than myself:

By definition a geometry engine made up of these type of generalized non-compute shaders would be able to accept novel configurations even after release.

I did listen to the whole thing. There was no demo at the end except for Raja revealing that the Vega would be called RX Vega and showing the marketing video reveal of that. Are you sure you didn't link the wrong video?

Geometry units are not the same thing as Geometry shaders........ Programmable geometry units have to have access via SDK or API. GS (geometry shaders) are part of the unified pipeline. Now primitive shaders, the word primitive means what? In graphics programming it means triangle. Yes. If you want info on this look up OpenGL programming guides, its the vertices that compose the triangle. Take all the marketing BS aside and if you understood why AMD called them what they called them, you will know their function.

Do you see what primitive shaders replace, they replace the geometry and vertex shaders......... Oh what did I say, programmable geometry shaders, are primitive shaders. This is exactly why drivers won't do anything for Vega, cause unless they replace the fixed function pipeline in the program itself, the relevant data that the pixel shader is anticipating will not be there.

Again, your claim amounts to "because I said so" without any external evidence.

I'm not going to get into personal stuff on this forum, I have told you enough, I know more about him but won't get into it.

It would be really appreciated if you could actually provide links to the items you refer to in the same post you are referring to them.

Can you please stop shouting? I'm trying to have a productive conversation here, I see no reason for your belligerence.

I told you to go look them up or read this entire thread, you did neither, not only did you not do those things, you can't even comprehend what people are saying to you and go on name calling.

Don't ask me not to shout when you name call. And I'm not shouting if you want me to bold or highlight in color I can will do that instead.

Right now, I don't see how you could even empirically test whether Vega was bandwidth limited or shader throughput limited in games when there is a polygon throughput bottleneck. The polygon throughput bottleneck is at an earlier stage in the pipeline, and so it would substantially obscure the effects of any subsequent bandwidth or shader throughput bottlenecks, would it not?

What, the synthetic tests only push certain parts of the GPU, that is why synthetics are so important so you can break down the limitations, and those synthetics clearly break down Vega. Compute synthetics have nothing to do with polygon through put at all. OK can you understand that? highlighted in red and blooded instead of capitalized just for you.

Rasterizer · Sep 4, 2017

razor1 said:
What do you want me to link the sythenthics GN, at fiji clocks.......

Everyone can see it

Here you go

http://www.gamersnexus.net/guides/2977-vega-fe-vs-fury-x-at-same-clocks-ipc

Clock for clock its not different than Fiji and with its increased clocks it doesn't scale as well as it should (even in geometry processing, its worse in metro last light) Where is the x2 peak per clock geometry increases? Yeah BS that is with primitive shaders and RPM that is the only way they can achieve that.

Yes, thank you for agreeing with me that Vega is presently front-end geometry bottlenecked and that primitive shaders are not currently enabled in the public Vega drivers.

B3D test suite

Vega is all over the place

Texture fillrate is bad, pixel fillrate is bad, it should be higher in pixel fillrate, much higher than the 1080ti.

So bandwidth limited in texture fillrate test, shader limited in the pixel fillrate test.

It's my understanding that Vega's pixel fillrate should still be more than enough not to be a bottleneck, and that in general pixel fillrates on modern GPUs are virtually never the bottleneck in achieved performance. As for texture fillrates, I think it would be pretty obvious that the charts included in the Anandtech article you linked show significantly better texture fillrates compared to Pascal than polygon throughput:

I think it stands to reason that if Vega were to be memory bandwidth bottlenecked in games if the existing polygon throughput bottleneck it suffers from were to be removed, it would be memory bandwidth bottlenecked at a higher level of relative performance to Pascal that it presently has.

Lets go to compute performance now its hit or miss

http://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/17

Folding @ home which is traditionally a strong showing for AMD, its gets clobbered. Same with Fiji, both are near the rx580 which should not happen. Shader limited in a compute test

Geekbench another one they should do well in, it gets clobbered. Same results, shader limited in a compute test.

Why do you need us to show you these things, that is already know to us but you don't want to do the work to find it? I mentioned them a quick good search you can find them.

The only compute synthetic they actually represent the full capacity of Vega's array is blender all other compute tests are lower than what the theoretical flops ratio differences are to Pascal.

And these tests don't push anything other than compute, so low and behold my original conclusion along with Zcash, its shader limited

I don't think the picture is nearly so clear as you are painting it:

AMD-Radeon-RX-Vega-64-Sandra-2017-Cryptography-Higher.png

Between the positive compute performance results from TechGage and the positive results from Anandtech, I think there are more relative wins for losses than Vega in compute performance. In any case, unless you plan to argue that Vega's gaming performance is or in any realistic scenario could become pixel fill rate limited, I don't understand why you think Vega's compute performance would ever end up bottlenecking its gaming performance?

Rasterizer · Sep 4, 2017

Dayaks said:
There you go with the personal attacks again. I have no loyalty nor did I mention nVidia.

Perhaps I misinterpreted this:

Dayaks said:
If by some chance they did update drivers six months from now Volta will pummel it anyways.

razor1 · Sep 4, 2017

Rasterizer said:
Yes, thank you for agreeing with me that Vega is presently front-end geometry bottlenecked and that primitive shaders are not currently enabled in the public Vega drivers.

It's my understanding that Vega's pixel fillrate should still be more than enough not to be a bottleneck, and that in general pixel fillrates on modern GPUs are virtually never the bottleneck in achieved performance. As for texture fillrates, I think it would be pretty obvious that the charts included in the Anandtech article you linked show significantly better texture fillrates compared to Pascal than polygon throughput:

I think it stands to reason that if Vega were to be memory bandwidth bottlenecked in games if the existing polygon throughput bottleneck it suffers from were to be removed, it would be memory bandwidth bottlenecked at a higher level of relative performance to Pascal that it presently has.

I don't think the picture is nearly so clear as you are painting it:

Between the positive compute performance results from TechGage and the positive results from Anandtech, I think there are more relative wins for losses than Vega in compute performance. In any case, unless you plan to argue that Vega's gaming performance is or in any realistic scenario could become pixel fill rate limited, I don't understand why you think Vega's compute performance would ever end up bottlenecking its gaming performance?

Are you serious, you aren't even correlating some of those "wins" with specifics in Vega hardware. Do you want me to go through this?

Ok

First one

INT8 texture fillrate.

Consumer Pascal doesn't have this feature like Vega does, Vega can do DOT product with INT8 at the same clock as it does other things. Yes it should win that test.

Encryption, Vega has specific hardware for encryption acceleration. It should win in those tests.

FP 32 texture fillrate, it should hit there, FP 32 texture fillrate hits the ROPS hard, and it lands where it should land.

Financial analysis and Scientific analysis, it does well in FP64 and only FP64, what happened to FP32? Why not there, You want to know why? nV cuts down its DP units more than AMD does.... AMD has more DP units. (also nV artificially holds back FP64 performance at a lower rate too via drivers if the DP units are there for certain lines).

You have to understand what certain things are before you make a conclusion.

Just look up this FP64 rate Vega and then FP64 rate Pascal. Now if they had Tesla P100 in there, it would crush Vega, as it has full FP64 rate. All nV chips outside of Tesla P100 have limited DP units or are rate limited and thus like 1/32 the DP rate. I think Vega has like 1/16 the rate, yeah it has 1/16 the rate.

When you don't know things like this and try to read charts or data, you aren't going to know what you are looking at. Ironically that article does mention that only in gaming cards does Vega do better in FP64...... right at the bottom of that chart......

Now if you want people to respond to your posts more constructively, do this post like you did and ask why it is happening, and summarize what you think is happening. Then you will get a contructive response, you will not get a nice response if you say this

Rasterizer said:
On the wild off-chance any of you are actually interested in discussing why RX Vega is so heavily bottlenecked and what the implications might be if that bottleneck could be alleviated through driver updates, rather than just trolling and garbageposting, I think there is pretty strong evidence to suggest what is holding Vega back in gaming performance right now:

Dayaks · Sep 4, 2017

Rasterizer said:
Perhaps I misinterpreted this:

Is it wrong?

You also cropped out the part I said about focusing on Navi.

Just because reality isn't pro AMD doesn't mean I am biased towards a company. I look at both equally negatively.

Now I am going to go read through your posts with all the charts because I am interested and I love data...

Rasterizer · Sep 4, 2017

razor1 said:
Are you seroius, you aren't even corrleating some of those "wins" with specifics in Vega hardware. Do you want me to go through this?

Not particularly, unless you plan to argue that there is any realistic scenario under which Vega compute performance would result in a bottleneck in gaming performance. Not only do I not see how Vega's compute performance is likely to be relevant to explaining present issues with its gaming performance, I would also note that most compute bencharks on Anandtech, TechGage and even SiSoftware have Vega's compute performance relative to Pascal well ahead of its present gaming performance relative to Pascal, so even in the miracle scenario where RX Vega might end up both not polygon throughput bottlenecked and not memory bandwidth bottlenecked, if it were to end up shader throughput bottlenecked that would be at a much higher level of performance relative to Pascal than Vega presently displays in games. Since Vega would almost certainly be memory bandwidth bottlenecked in the absence of its present polygon throughput bottleneck, I just don't see how Vega's compute performance is relevant here.

razor1 said:
Do you see what primitive shaders replace, they replace the geometry and vertex shaders......... Oh what did I say, programmable geometry shaders, are primitive shaders. This is exactly why drivers won't do anything for Vega, cause unless they replace the fixed function pipeline in the program itself, the relevant data that the pixel shader is anticipating will not be there.

Are we having some sort of miscommunication here? The text I linked from the B3D forums explains that Vega's geometry engines are composed of arbitrarily programmable shaders, not fixed function shaders. Literally the whole point of Vega's next generation geometry engines is to allow the geometry engines to be reconfigured by software.

razor1 · Sep 4, 2017

Rasterizer said:
Not particularly, unless you plan to argue that there is any realistic scenario under which Vega compute performance would result in a bottleneck in gaming performance. Not only do I not see how Vega's compute performance is likely to be relevant to explaining present issues with its gaming performance, I would also note that most compute bencharks on Anandtech, TechGage and even SiSoftware have Vega's compute performance relative to Pascal well ahead of its present gaming performance relative to Pascal, so even in the miracle scenario where RX Vega might end up both not polygon throughput bottlenecked and not memory bandwidth bottlenecked, if it were to end up shader throughput bottlenecked that would be at a much higher level of performance relative to Pascal than Vega presently displays in games. Since Vega would almost certainly be memory bandwidth bottlenecked in the absence of its present polygon throughput bottleneck, I just don't see how Vega's compute performance is relevant here.

And I just explained why the difference are there, if you can't understand the architectural differences between consumer Pascal, Tesla Pascal, Vega, then you won't understand why those results are expected. Every single one that you posted don't show anything that is unexpected. Not only that, if Tesla P100 was in those results, it would totally dominant Vega half of them.

Are we having some sort of miscommunication here? The text I linked from the B3D forums explains that Vega's geometry engines are composed of arbitrarily programmable shaders, not fixed function shaders. Literally the whole point of Vega's next generation geometry engines is to allow the geometry engines to be reconfigured by software.

If you think that is miscommunication, you better reread what I posted. Geometry engines are not primitive shaders, I was making sure you know that and even clarified it for you,

Geometry units are not the same thing as Geometry shaders........ Programmable geometry units have to have access via SDK or API. GS (geometry shaders) are part of the unified pipeline. Now primitive shaders, the word primitive means what? In graphics programming it means triangle. Yes. If you want info on this look up OpenGL programming guides, its the vertices that compose the triangle. Take all the marketing BS aside and if you understood why AMD called them what they called them, you will know their function.

From your prior posts, you didn't understand that. Nor did you understand what the B3D post was about, highlighted in red and in bold for your liking. What did I say, via SDK or API, guess what its not in any API yet, and there is no SDK. This is why AMD is doing via drivers. But I just told you this.

same post as the above quote

Do you see what primitive shaders replace, they replace the geometry and vertex shaders......... Oh what did I say, programmable geometry shaders, are primitive shaders. This is exactly why drivers won't do anything for Vega, cause unless they replace the fixed function pipeline in the program itself, the relevant data that the pixel shader is anticipating will not be there.

You are not paying attention to what is being posted, not only that you are arguing something that I have already stated and thinking I'm agreeing with what you posted, its the other way around, you are coming around to what I have posted and not understanding what has been posted here or at B3D.

Now to go into this deeper, its possible to have vertex and geometry creation when using compute shaders, no need for fixed function units on any DX12 hardware. This has been the case since Fermi from nV and first iteration of GCN from AMD. AMD now just created primitive shaders that are easier to program for, is the assumption I am going to make cause we just don't know yet....... Why have two different "shaders" not really different they are the same unit but have two different names for them...... Most likely Primitive shaders have some benefits from a programming side from a straight compute shader. Features being the big one.

If you want an example of any DX12 hardware doing this. Look at the frostbyte presentation, they are doing it in there. They are using the shader array to do the primitive culling and primitive creation.

Rasterizer · Sep 4, 2017

razor1 said:
Geometry engines are not primitive shaders, I was making sure you know that and even clarified it for you.

I'm just going to quote the Vega whitepaper:

Next-generation geometry engine

To meet the needs of both professional graphics and gaming applications, the geometry engines in “Vega” have been tuned for higher polygon throughput by adding new fast paths through the hardware and by avoiding unnecessary processing. This next-generation geometry (NGG) path is much more flexible and programmable than before. To highlight one of the innovations in the new geometry engine, primitive shaders are a key element in its ability to achieve much higher polygon throughput per transistor.

razor1 · Sep 4, 2017

Rasterizer said:
I'm just going to quote the Vega whitepaper:

err fix your quotes.

also yeah pretty much what I stated right, needs both to do higher polygon throughput lol. And that is what the Vega hair demo had lol, it has to have both lol.

While using RPM and all these things too, that is the only way it can achieve x2 the triangle through put lol. otherwise its around Fiji per clock......

Just look at the amount of Geometry Units (not GS's), Pascal has over Vega, that is why they can achieve so much more in geometry throughput. But nV gets away with it because it fits into a smaller die, AMD couldn't increase their fixed function geometry units, just didn't' have the die size to do it, so they are doing it another way, and if developers don't have access, that problem will not go away.

Rasterizer · Sep 4, 2017

razor1 said:
err fix your quotes.

also yeah pretty much what I stated right, needs both to do higher polygon throughput lol. And that is what the Vega hair demo had lol, it has to have both lol.

While using RPM and all these things too, that is the only way it can achieve x2 the triangle through put lol. otherwise its around Fiji per clock......

RPM was not mentioned in the portion of the Vega whitepaper I quoted, nor is it mentioned at all in the Vega whitepaper section covering the next generation geometry engine. The whitepaper does not say that RPM must be enabled for primitive shaders to work. RPM is a totally separate feature of Vega's NCUs and there is no interdependence between RPM and next generation geometry engine features.

sabrewolf732 · Sep 4, 2017

damn if there ever was a horse that was dead. This thread needs to be locked.

razor1 · Sep 4, 2017

Rasterizer said:
RPM was not mentioned in the portion of the Vega whitepaper I quoted, nor is it mentioned at all in the Vega whitepaper section covering the next generation geometry engine. The whitepaper does not say that RPM must be enabled for primitive shaders to work. RPM is a totally separate feature of Vega's NCUs and there is no interdependence between RPM and next generation geometry engine features.

Yeah the paper doesn't but without them they can't be cause SM 6.0 is not available yet, and that is the only shader model that has access to FP 16 functions outside of primitive shaders.

You don't understand that demo nor where the through put problem is. Doesn't matter if it uses FP 16 or FP 32, if using the fixed function pipeline, the triangle amounts will stall GCN's pipeline. The only way around that is to use its primitive shaders. FP 16 calculations are done in the shader array. But the GU's have to do all the work after *this is where the problem is* This is where AMD's primitive shaders come in, they communicate with the programmable GU's of Vega.

So if using FP 16 or FP 32, if the bottleneck is the fixed function GU's, would it matter if that demo used FP 16 or FP 32? Cause the GU's can handle only so many triangles before they stall the pipeline.

Now you see why I stated that demo has not just RPM in it, its everything new about Vega, not just RPM. Did Rys ever tell you that? No way will he tell you that cause reality is doesn't matter what Vega features have, if they aren't exposed to developers, those features can't be used in applications, not the way AMD has stated so far, no where near the projected upper limits, it won't even be much better than what we have now.

If you need clarification on this here is DX12's pipeline

Hull shader, Tesselator, Domain shader that is the GU (there is some programability in the GU of all DX12 GPU's but only inputs, once the intial input is done it is what it is, that is why its a fixed function unit. Those have to be done before it does the GS.That is where GCN gets bottlenecked by too much geometry.

The vertex shader is where the FP 16 or FP 32 calculations are done which comes prior to the bottleneck part. So back to that RPM demo, if they didn't use primitive shaders they would have the same damn problem if they used FP32 to do the vertex calculations in vertex shader.......

Its sad that you want to talk about all these high level stuff start name calling and saying others are trolling, but don't understand the low level stuff, because without that, all the high level stuff is just above your head man. Not only that it confuses the shit out of anyone reading your posts that don't have the low level information, cause there is no basis in reality for your lines of thought. Picking up pieces here and there from what AMD stated to people and AMD reps says, don't mean jake if you can't understand why they are saying what they are saying. At that point its just plain old marketing for sheep. And Rys is a good shepard. He know what to say and when to say, and what not to say, to get the message across that would be beneficial for AMD.

I'm hoping you understand that flow chart cause I'm going to show you another, its the compute shader, which is what I'm expecting the primitive shader to look like

It will entirly bypass the fixed function pipeline, slot the CS in where you see those three boxes for Hull Shader, Tesselator, Domain shaders.

Ocellaris · Sep 4, 2017

sabrewolf732 said:
damn if there ever was a horse that was dead. This thread needs to be locked.

No way. This thread is a fantastic exercise in how much effort people are willing to put into petty arguments online.

Rasterizer · Sep 4, 2017

razor1 said:
Hull shader, Tesselator, Domain shader that is the GU. Those have to be done before it does the GS.That is where GCN gets bottlenecked by too much geometry.

The vertex shader is where the FP 16 or FP 32 calculations are done which comes prior to the bottleneck part. So back to that RPM demo, if they didn't use primitive shaders they would have the same damn problem if they used FP32 to do the vertex calculations in vertex shader.......

The diagram in the Vega whitepaper clearly shows that primitive shaders replace everything between the tesselator and fixed function culling:

Furthermore, you appear to be claiming that RPM does not work unless primitive shaders are working FIRST, but even if that were true it would not mean that you had to use RPM in order to enable primitive shaders.

razor1 · Sep 4, 2017

Rasterizer said:
The diagram in the Vega whitepaper clearly shows that primitive shaders replace everything between the tesselator and fixed function culling:

Furthermore, you appear to be claiming that RPM does not work unless primitive shaders are working FIRST, but even if that were true it would not mean that you had to use RPM in order to enable primitive shaders.

Man that is exactly what I just showed you lol..... Don't need AMD white paper for that. Hull shader too, with out the tesselator the hull shader becomes usesless, hull shader is the one that tells how and what to tesselate in adaptive tessellation. Again you are agreeing with what I stated while not understanding what I stated and going by what AMD showed and arguing you are showing something different, They are the same thing. The reason why I showed you the DX pipeline was because there will be no confusion of what is being replaced, but its not working cause again, basics man.

RPM for vertex processing gets not benefit unless you use primitive shaders for GCN! As it is right RPM isn't even available in any API and RPM alone will not give any benefit in geometry processing I should have been more clear about this, that is my fault but that doesn't change the fact that demo needed all of Vega's new pipeline to show its max through put which is coincidentally x2 the geometry through put that is in AMD's marketing material!

PS why did I go through all this, cause once my demo is released I don't want people complaining about stupid shit like this when they don't understand why current AMD cards bend over and get rapped. My game is using over 20 million polys per scene and its going to hurt current AMD cards sorry no way around it unless they give us developers control over the pipeline. at least nV we don't need to worry about that crap.

Just for your clarification, DX 11 pipeline

https://msdn.microsoft.com/en-us/library/windows/desktop/ff476340(v=vs.85).aspx

more clarification. Clear enough for you?

Yeah those three boxes out of those three boxes there is only one fixed function unit that is the Tesselator, so when they say programmable, guess what that tesselator is no longer there or changed conciderably, not just from tessellation onward. All three boxes are needed for tessellation. Now you see where that whitepaper stands? When they said tessellation onward, its exactly what I stated, I just clarified it more precisely in the context of what it is used for. Why use primitive shaders when not using Tesselation? Not needed really at least so far from what they have said or shown so far.

Now the data that is coming out of the fixed function unit (tesselator specifically), is then read by the GS, if that is not there, which it isn't in the primitive shader pipeline, how will replacing via drivers work? Its not its done there is no way to correlate the data between the two units anymore because its a different data set. There needs to be something in the middle to translate the data over to the pixel shader.

Rasterizer · Sep 4, 2017

razor1 said:
RPM for vertex processing gets not benefit unless you use primitive shaders for GCN!

I'll try this one last time: Primitive shaders do not require RPM, and wouldn't even if RPM required primitive shaders. The horse can still walk without the cart even if the cart cannot move without the horse.

x2 the geometry through put that is in AMD's marketing material!

This does not show x2.

Yeah those three boxes are the fixed function, not just from tessellation onward. All three boxes are needed for tessellation. Now you see where that whitepaper stands? When they said tessellation onward, its exactly what I stated, I just clarified it more precisely.

Here, I'll just quote the whitepaper again:

Primitive shaders can operate on a variety of different geometric primitives, including individual vertices, polygons, and patch surfaces. When tessellation is enabled, a surface shader is generated to process patches and control points before the surface is tessellated, and the resulting polygons are sent to the primitive shader. In this case, the surface shader combines the vertex shading and hull shading stages of the Direct3D graphics pipeline, while the primitive shader replaces the domain shading and geometry shading stages

razor1 · Sep 4, 2017

Rasterizer said:
I'll try this one last time: Primitive shaders do not require RPM, and wouldn't even if RPM required primitive shaders. The horse can still walk without the cart even if the cart cannot move without the horse.

Primitive shaders don't need RPM, never stated they did, I stated the RPM demo needs primitive shaders though. yeah a horse can be led to water, doesn't mean it going to drink it..........

This does not show x2.

Where did they get that out of there ass? cause Vega can't do that, don't care what they show in table, if its not reproducible in real world testing, its worthless. Not only that the moment they are talking about per second. You really think anyone one in the GPU world cares about Per second? really man that graph is absolutely useless outside of marketing. How many frames per second does Vega do? Any application? and if they are using per second to show "how" much they can do for something that should be in the milli seconds or clock cycles, they are just exploding their numbers to show something that is acutally quite bad. lets talk about it per clock, much easier to figure out what is going on. 4 tris per clock that are discarded, up to 11 tris per clock with primitive shaders, RPM, ok those are the exact figures per clock and max values. Simple right?

Do you want me to put up all the Vega slides, its in there, its in their word cloud too, its in all their marketing material.

Here, I'll just quote the whitepaper again:

yeah I just stated that hull shader, tesselator, and domian shader are all replaced

. and that is exactly what you just quoted. The reason why the tesselator is replaced, its not using the fixed function unit for the tesslator anymore, it can't The primitive shader is the CU, the same CU's that are used in the shader array. the tesselator doesn't have bi directional communication, its a one way street.

This is your qoute from the white paper

Primitive shaders can operate on a variety of different geometric primitives, including individual vertices, polygons, and patch surfaces. When tessellation is enabled, a surface shader is generated to process patches and control points before the surface is tessellated, and the resulting polygons are sent to the primitive shader. In this case, the surface shader combines the vertex shading and hull shading stages of the Direct3D graphics pipeline, while the primitive shader replaces the domain shading and geometry shading stages

Surface shader there is no such thing in DX or traditional pipelines, that is part of the primitive shader. or something else there that is needed. Simple no more vertex shader or hull shader its something different that wasn't explained nor is it explained in that white paper just blah its there.

The Dx pipeline doesn't tell what the hardware is underneath, it just says these are the stages that must be followed. As long as the hardware follows that its fine. So.... next para.....

They can't be using the tesselator unit any more either, not as is, because as I said its a one way hole, unless they are doing the discard after the tessellation is done, which just defeats the purpose cause that will also cause additional problems. This is why the tesselator now has to be programmable in GCN. hence the programmable geometry pipeline. If they can't use that unit as is with Primitive shaders and they had to modify it in someway, it still needs to function like the old unit too. How are they doing this? This is via drivers, they are emulating the fixed function portion! Now you see why they said everything is done through drivers? And nV and AMD have emulated all DX fixed function units in prior gens of DX's like this too!

PS this can be done fairly easily the emulation portion, ATI did it with Trufrom with the 9700 pro, they emulated the 8500 fixed function tesselator in drivers by using the vertex shaders to do the actual work. The 9700 hardware was powerful enough to ensure trufrom worked fine at a proper frame rate on those games, but if you remember they also added in displacement too with the 9700 I don't remember any game coming out with that feature at that time, I tried to get it work well on it, just didn't work well too many draw backs.

Now do you understand why they didn't tell you everything? This is not as simple as a slide or two or a white paper that doesn't go in to details how they are doing it via drivers *shit if they said drivers take are off world hunger would you believe them? Software needs to be driving something right? I have worked with tesselators (software since the late 90's) and did a lot of work with truform and tried to emulate it on nV hardware but that didn't work too well, since DX or Ogl didn't expose or rather hardware didn't have it most likely as ATi's did. Did it through CPU though and worked well, since at the time skinning and animation were all CPU side anyways, it increased the bandwidth needs across the AGP port though which in some cases (outliers) caused slow downs.

Worse yet you will not get predictable performance increases while emulating, and to the contrary of that there are always performance drawbacks when emulating and chances are high of performance pitfalls when emulation is stressed enough. This is what happened with the emulation of the tesselator with displacement for the 9700 pro. The question is is Vega's shader array enough to manage current workloads and emulate the tessealtor and other shader functionality while delivering the same performance as before, lets not even talk about higher performance?

Maybe its possible in older games, in current games or games just about to come out that is questionable. Current games the chances are ok that you might get some benefits, not much because we can see its shader array is not scaling well in many tests. Future games or games coming out shortly forget about it cause yeah they will push Pascal's array more, and if that happens Vega can't even keep up with Pascal not unit for unit. And then you have the other bottleneck to worry about bandwidth. Virtually the same shader array in Fiji is getting hampered down but Vega is not going to have problems with doing things like this? Yeah gotta say not going to happen.

This is all if there is no draw backs for emulation of the tesselator and tessellation is used. If Tessellation isn't used then yeah they will get some benefits if the application is suitable to changed via driver to use primitive shaders. I can at this point say, I don't know any engine (AAA) that will be suitable for this type of replacement via drivers.

This all goes back to what I stated ALONG time ago one year ago about Vega's new features, they look like add ons after seeing what Maxwell had they were too far into the process of Vega that they couldn't do anything then "create something new from something old".

So end resultant, after all these mental gymnastics, what I stated to you right off the bat in upper case letters, its not going to matter shit.

Hameeeedo · Sep 4, 2017

Primitive Shaders are very hard to develop and code for, one of AMD guys told Anandtech it's like writing an assembly code, you have to always outsmart the driver, and use some inefficient driver paths well. Some AMD fans forget this is a software solution that is going to eat away some CUs to do it's work, CUs that would otherwise be busy doing something else. In otherwords while it may improve geometry performance, it may also negatively affect performance of other areas of the chip, like compute or pixel shaders. That's why they shipped Vega with it disabled, they've yet to figure out a way to make it work without negatively affecting performance.

razor1 · Sep 4, 2017

Hameeeedo said:
Primitive Shaders are very hard to develop and code for, one of AMD guys told Anandtech it's like writing an assembly code, you have to always outsmart the driver, and use some inefficient driver paths well. Some AMD fans forget this is a software solution that is going to eat away some CUs to do it's work, CUs that would otherwise be busy doing something else. In otherwords while it may improve geometry performance, it mas also negatively affect performance of other areas of the chip, like compute or pixel shaders. That's why they shipped Vega with it disabled, they've yet to figure out a way to make it work without negatively affecting performance.

If that is the case that would be an absolute nightmare to use them. and it goes back to what I stated about all these new features of Vega, they were after thoughts after they saw Maxwell, they needed to do something just were too far into Vega to do anything else. Well RPM wasn't an afterthought that was there, all the other NCU, primitive shaders, programmable geometry pipeline, all that stuff too many correlations with the CU functionality.

Hameeeedo · Sep 5, 2017

it also needs extensive developer involvement, judging by his comments. Here are the quotes:

AMD is still trying to figure out how to expose the feature to developers in a sensible way. Even more so than DX12, I get the impression that it's very guru-y. One of AMD's engineers compared it to doing inline assembly. You have to be able to outsmart the driver (and the driver needs to be taking a less than highly efficient path) to gain anything from manual control.

https://forum.beyond3d.com/threads/amd-vega-hardware-reviews.60246/page-59#post-1997709

The manual developer API is not ready, and the automatic feature to have the driver invoke them on its own is not enabled.

https://forum.beyond3d.com/threads/amd-vega-hardware-reviews.60246/page-59#post-1997699

It's clear the automatic feature will cause performance degradation, If the manual control is causing troubles with the driver, and is hard to code for. Imagine what the automatic control will do! It will quite possibly wreak havok on performance!

Vega Rumors

2[H]4U

Limp Gawd

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

Gawd

[H]F Junkie

[H]F Junkie

n00b

Supreme [H]ardness

n00b

[H]F Junkie

n00b

[H]F Junkie

n00b

[H]F Junkie

n00b

[H]F Junkie

[H]F Junkie

n00b

n00b

[H]F Junkie

[H]F Junkie

n00b

[H]F Junkie

n00b

[H]F Junkie

n00b

Supreme [H]ardness

[H]F Junkie

Fully [H]

n00b

[H]F Junkie

n00b

[H]F Junkie

Limp Gawd

[H]F Junkie

Limp Gawd