Vega Rumors

I'm angry about it. I want heads to roll. I'm not happy about Vega being an absolute turd. I have steam I want to vent. We need competition.

Fortunately for Raja, AMD can't seem to find anyone to take his job.

He would be toast otherwise.
 
Its not the cores, its the amount of tflops, Pascal achieves higher clocks so it has more shader performance, Vega, does too, but its got serious throughput issues, worse than Fiji. And if you want a comparison look at 390 vs Fiji, then compare that to 390 vs Vega. And just for good measure Polaris didn't do to well either, but its better than Vega's scaling lol.


perfrel_3840_2160.png




Granted games might be different but over all (but this should favor Vega has it should be able to spread its wings as newer games should hurt the older hardware more), 390 to Fury X 20%, 390 to Vega 41%

21% increase for Vega's major increase in clock speeds? 50% increase in clock speeds is not giving any where near what it should be showing. Yeah can't expect perfectly linear but should at least be somewhat respectable right? That is just down right ugly lol.

Something is severely holding it back. So increasing resolution affects what parts of the gpu. ROP through put, shader throughput and bandwidth. ROP throughput shouldn't be a problem as Vega's higher clocks give it more ROP operations even we equalize on a per clock basis right? Whats left, Shader and bandwidth. Vega GPU is limited by bandwidth but not to degree where it should be actually holding it back vs Fiji, cause Fiji also has around the same bandwidth. Whats left, Shader through put.

And we haven't even gotten to counting transistors yet, the amount of transistors used in Vega, we already know per transistor Polaris's performance was a crap shot, Vega's performance per transistor is abysmal.

This is obvious. Proof is vega 56 flashed with vega 64 bios. Extra shaders are scaling so poorly. Shader throughput is just not there. This is probably the reason we see much worse per/watt amd had planned. Looks like they had to boost clocks over vega 56. Otherwise extra shaders with little faster clocks should have made up the difference. And the reason vega 56 is performing just like 64 with just higher clocks.

Shader throughput seems to be worse than Fiji here believe it or not. This is why they focused on pro Market because GCN is just done for gaming. If Navi is no different then oh boy RTG is done and amd will probably just sell it off. May be that is Raja’S secret plan to be sold to Intel lol.
 
On the wild off-chance any of you are actually interested in discussing why RX Vega is so heavily bottlenecked and what the implications might be if that bottleneck could be alleviated through driver updates, rather than just trolling and garbageposting, I think there is pretty strong evidence to suggest what is holding Vega back in gaming performance right now:

j4n8ctB.png


People have been talking about Fiji being severely front end bottlenecked by its 4 triangle per clock polygon throughput limitation for a couple of years now. Based on the new featues that were being discussed for Vega, many assumed that this front end issue would be addressed by the implementation of primitive shaders, which appear to have been designed specfically to overcome GCN's 4 triangle per clock limitation.

As it turns out, RTG somehow decided to hardlaunch Vega with primitive shaders not enabled yet in drivers, and at present Vega is limited by the exact same 4 triangle per clock front end bottleneck that Fiji was. If you take the list culled polygon throughput numbers from above and do the math, Vega is showing the exact same 3.75 triangles per clock that Fiji does in this DX11 polygon throughput test. If you look at the Vega whitepaper, the totals shown above for list culled polygon throughput are identical to those shown for the Fiji and Vega native geometry pipelines where primitive shaders are discusssed:

k7J4NGu.jpg


Even this wasn't enough to convince some people, and ultimately it took Ryan Smith of Anandtech confirming on the record that primitive shaders were not enabled on any public Vega driver to get people to stop claiming they were already enabled despite all evidence to the contrary.

Vega being severely front end bottlenecked by its polygon throughput also explains why Vega 56 is 100% as fast as Vega 64 at the same clocks right now, why Vega's performance in games using large amounts of tessellation is still terrible, and may even explain Vega's terrible MSAA performance (I'm led to believe that MSAA performance is rasterization rate limited, since MSAA operates by conducting coverage and occlusion testing at higher resolutions than screen resolution). Vega's awful performance in Unigine Heaven, which features heavy use of both tessellation and MSAA, may demonstrate just how severe the issue is at present:

GXBpZ5O.png


Now, if you accept the argument that Vega is presently severely front-end bottlenecked by its polygon throughput, that presents and interesting question to think about: What happens if RTG actually manages to implement primitive shaders at even half of the claimed effectiveness in the whitepaper and Vega's polygon throghput suddenly doubles? What would the performance implications be of Vega suddenly being able to do 8 triangles per clock?
 
I think the problem here is twofold. One we assume they can release primitive shader support in a driver, frankly an important feature like that should have been at launch and I question their ability to implement it. Secondly by how much would that actually fix things? If they could do it it would be the first recorded incidence of the magical driver update and would be worth the history books. Still, if such a feature really improved performance that significantly Vega would have launched with it because you only get one first impression.
 
People have been talking about Fiji being severely front end bottlenecked by its 4 triangle per clock polygon throughput limitation for a couple of years now. Based on the new featues that were being discussed for Vega, many assumed that this front end issue would be addressed by the implementation of primitive shaders, which appear to have been designed specfically to overcome GCN's 4 triangle per clock limitation.

As it turns out, RTG somehow decided to hardlaunch Vega with primitive shaders not enabled yet in drivers, and at present Vega is limited by the exact same 4 triangle per clock front end bottleneck that Fiji was. If you take the list culled polygon throughput numbers from above and do the math, Vega is showing the exact same 3.75 triangles per clock that Fiji does in this DX11 polygon throughput test. If you look at the Vega whitepaper, the totals shown above for list culled polygon throughput are identical to those shown for the Fiji and Vega native geometry pipelines where primitive shaders are discusssed:

Even this wasn't enough to convince some people, and ultimately it took Ryan Smith of Anandtech confirming on the record that primitive shaders were not enabled on any public Vega driver to get people to stop claiming they were already enabled despite all evidence to the contrary

Primitive shader is something that has to be implemented by developers.

It's not like primitive shader is enabled in the drivers and Radeon RX Vega 64 would suddenly performs like a GeForce GTX 1080 Ti
 
Primitive shader is something that has to be implemented by developers.

It's not like primitive shader is enabled in the drivers and Radeon RX Vega 64 would suddenly performs like a GeForce GTX 1080 Ti
According to Ryan Smith there will be a form of driver level automatic primitive shaders

Quick note on primitive shaders from my end: I had a chat with AMD PR a bit ago to clear up the earlier confusion. Primitive shaders are definitely, absolutely, 100% not enabled in any current public drivers.

The manual developer API is not ready, and the automatic feature to have the driver invoke them on its own is not enabled.
 
If this driver update does what it seems it might, it's really odd AMD couldn't have had it ready for launch. Even with a slight boost it could be beating the GTX 1080 and reviews would have been very different. Something must have been really wrong for AMD to leave that performance on the table.
 
If this driver update does what it seems it might, it's really odd AMD couldn't have had it ready for launch. Even with a slight boost it could be beating the GTX 1080 and reviews would have been very different. Something must have been really wrong for AMD to leave that performance on the table.
Exactly, which leads me to believe that they can't actually do it or the improvement isn't significant at all.
 
Figured I'd post this here. Hash rate up on Vega. WCCF link below
http://wccftech.com/amd-rx-vega-64-...ereum-eclipsing-polaris-efficiency-factor-2x/


So many screw ups in that article its not even funny, 160 watts for rx580/480 is un modded vram and non under volted cards. I get 140 watts while dual mining with 30mhs for Eth and a second coin ;) Just Eth it uses 90-100 watts. So This guys RX Vega is not getting close to the rx580 its closer than before but no where near still.

Now sure what the 130 watts means, cause if he using an Vega 64 with -25% power target that thing is going to use 180 watts, a bit confusing the way he wrote it cause he also talks about the Vega 56 which he says he an do the same things, sounds like he was extrapolating with a Vega 56... anyways if its @ 130 its still under the rx580 mining efficiency but it really does make it an viable alternative.

And low and behold its khalid moammer as usual BS articles.
 
Last edited:
On the wild off-chance any of you are actually interested in discussing why RX Vega is so heavily bottlenecked and what the implications might be if that bottleneck could be alleviated through driver updates, rather than just trolling and garbageposting, I think there is pretty strong evidence to suggest what is holding Vega back in gaming performance right now:

j4n8ctB.png


People have been talking about Fiji being severely front end bottlenecked by its 4 triangle per clock polygon throughput limitation for a couple of years now. Based on the new featues that were being discussed for Vega, many assumed that this front end issue would be addressed by the implementation of primitive shaders, which appear to have been designed specfically to overcome GCN's 4 triangle per clock limitation.

As it turns out, RTG somehow decided to hardlaunch Vega with primitive shaders not enabled yet in drivers, and at present Vega is limited by the exact same 4 triangle per clock front end bottleneck that Fiji was. If you take the list culled polygon throughput numbers from above and do the math, Vega is showing the exact same 3.75 triangles per clock that Fiji does in this DX11 polygon throughput test. If you look at the Vega whitepaper, the totals shown above for list culled polygon throughput are identical to those shown for the Fiji and Vega native geometry pipelines where primitive shaders are discusssed:

k7J4NGu.jpg


Even this wasn't enough to convince some people, and ultimately it took Ryan Smith of Anandtech confirming on the record that primitive shaders were not enabled on any public Vega driver to get people to stop claiming they were already enabled despite all evidence to the contrary.

Vega being severely front end bottlenecked by its polygon throughput also explains why Vega 56 is 100% as fast as Vega 64 at the same clocks right now, why Vega's performance in games using large amounts of tessellation is still terrible, and may even explain Vega's terrible MSAA performance (I'm led to believe that MSAA performance is rasterization rate limited, since MSAA operates by conducting coverage and occlusion testing at higher resolutions than screen resolution). Vega's awful performance in Unigine Heaven, which features heavy use of both tessellation and MSAA, may demonstrate just how severe the issue is at present:

GXBpZ5O.png


Now, if you accept the argument that Vega is presently severely front-end bottlenecked by its polygon throughput, that presents and interesting question to think about: What happens if RTG actually manages to implement primitive shaders at even half of the claimed effectiveness in the whitepaper and Vega's polygon throghput suddenly doubles? What would the performance implications be of Vega suddenly being able to do 8 triangles per clock?


yes we know about all this, and Vega's polygon through put won't "suddenly" double, AMD told techreport that DSBR IS NOT GOING TO BE ACTIVE all the time, only limited times based on applications, they did not way way...... it was discussed here numerous times too. Also DSBR is not what is going to help its triangles per clock, that is RPM via primitive shaders and those have to be specially programmed for, and which are not accessible by dev's right now, maybe in the future but still that isn't where the problems are coming from (not all of them explained below)

So come again about people not knowing?

Now when looking at PURE COMPUTE synthetic tests, DSBR (any part of the transitional pipeline stages) will not affect compute performance, and this is where you see Vega also having problems. So its not just lake of DSBR that is holding it back. Geez you should like the guy who was that, that just got banned a month ago for shilling, damn can't remember his user name.
 
Last edited:
Primitive shader is something that has to be implemented by developers.

That is apparently not correct. According to Rys Sommefeldt, primitive shaders are an automatic implementation of the NGG fast path and do nor require any developer implementation. The ability for developers to customise the behaviour of the NGG fast path is a separate usage of the NGG fast path that would supersede the use of primitive shaders at the discretion of developers, once geometry pipeline customisation is implemented in drivers.

Exactly, which leads me to believe that they can't actually do it or the improvement isn't significant at all.

That is tantamount to suggesting that RTG simply lied outright in the Vega whitepaper. The whitepaper claims the feature is implemented in the internal beta 17.320 branch of the drivers. From what I've read, the current public drivers are based on 17.300.
 
yes we know about all this, and Vega's polygon through put won't "suddenly" double, AMD told techreport that DSBR IS NOT GOING TO BE ACTIVE all the time, only limited times based on applications, they did not way way...... it was discussed here numerous times too.

So come again about people not knowing?
Primitive shaders are not DSBR. Primitive shaders are an automatic implementation of Vega's new programmable geometry pipeline apparently designed to implement a conservative version of primitive culling directly into the geometry pipeline to alleviate GCN's historical bottleneck in this regard which occurs in the pipeline prior to any fixed function culling (which is why adding Polaris' PDA to Vega did not solve the front-end geometry bottleneck issues). To whatever extent DSBR would have performance benefits for Vega, they would be almost totally obscured at the present moment as Vega is bottlenecked at an earlier stage in the rendering pipeline.
 
Primitive shaders are not DSBR. Primitive shaders are an automatic implementation of Vega's new programmable geometry pipeline apparently designed to implement a conservative version of primitive culling directly into the geometry pipeline to alleviate GCN's historical bottleneck in this regard which occurs in the pipeline prior to any fixed function culling (which is why adding Polaris' PDA to Vega did not solve the front-end geometry bottleneck issues). To whatever extent DSBR would have performance benefits for Vega, they would be almost totally obscured at the present moment as Vega is bottlenecked at an earlier stage in the rendering pipeline.


I know that, was editting my post.

Primitive shaders are not where the problem for Vega is in shader through put, lol, kinda ridiculous to even say that, cause in pure compute tests, it still has problems against Pascal in many tests lol. And a traditional pipeline geometry set up is not used in pure compute tests lol.

How do you explain that? Because we don't know when AMD is feeding us a line of BS? or You just trying to make excuses for Vega's poor compute performance?

Just look at mining, that is pure compute, why the hell is Vega topping out at 43 mhs? That is pure compute, I would like to see it when something other then Eth, just to rule out bandwidth bottleneck, something like Zcash which it only gets 430 sols (after modding) which is lower than a gtx 1070, 450 sols (after modding)! There ya go, its compute throughput issues rear their ugly head again. We also see this problem is synthetic compute tests, so its not a one time thing.
 
Last edited:
So many screw ups in that article its not even funny, 160 watts for rx580/480 is un modded vram and non under volted cards. I get 140 watts while dual mining with 30mhs for Eth and a second coin ;) Just Eth it uses 90-100 watts. So This guys RX Vega is not getting close to the rx580 its closer than before but no where near still.

Now sure what the 130 watts means, cause if he using an Vega 64 with -25% power target that thing is going to use 180 watts, a bit confusing the way he wrote it cause he also talks about the Vega 56 which he says he an do the same things, sounds like he was extrapolating with a Vega 56... anyways if its @ 130 its still under the rx580 mining efficiency but it really does make it an viable alternative.

And low and behold its khalid moammer as usual BS articles.
Looking like real low volts core .9V-.98V core at 1000-1100 Mhz get the lowest power/highest efficiency hash rate.
 
Looking like real low volts core .9V-.98V core at 1000-1100 Mhz get the lowest power/highest afficiency hash rate.


yeah but it still gets edged out by the rx 580's, and much lower than the 1070's, if they get the rx 56 to that, yeah it will be a nice mining card, but the 64 still not as good.
 
That is apparently not correct. According to Rys Sommefeldt, primitive shaders are an automatic implementation of the NGG fast path and do nor require any developer implementation. The ability for developers to customise the behaviour of the NGG fast path is a separate usage of the NGG fast path that would supersede the use of primitive shaders at the discretion of developers, once geometry pipeline customisation is implemented in drivers.



That is tantamount to suggesting that RTG simply lied outright in the Vega whitepaper. The whitepaper claims the feature is implemented in the internal beta 17.320 branch of the drivers. From what I've read, the current public drivers are based on 17.300.


No he didn't explain it all, AMD needs to do the work to get it to work, and that is if the program is set up in a certain way too. Its not automatic. Currently all triangle setup in almost all games use the traditional GS pipeline, that can not be changed Via drivers to primitive shaders, no way around that. This is because the way the pixel shaders are looking to get certain information, is different that using the CU's for rendering triangles. Now if you look at earlier AMD videos/ interviews about this, the picture becomes clear, without Dev intervention, primitive shaders will not help performance in triangle discard at all. On top of that they must use RPM to achieve anything above 4 Tris/clock. Those pipeline changes can't be done via driver, those are low level triangle set up stages that impact the way an engine functions.
 
Last edited:
That is apparently not correct. According to Rys Sommefeldt, primitive shaders are an automatic implementation of the NGG fast path and do nor require any developer implementation. The ability for developers to customise the behaviour of the NGG fast path is a separate usage of the NGG fast path that would supersede the use of primitive shaders at the discretion of developers, once geometry pipeline customisation is implemented in drivers.



That is tantamount to suggesting that RTG simply lied outright in the Vega whitepaper. The whitepaper claims the feature is implemented in the internal beta 17.320 branch of the drivers. From what I've read, the current public drivers are based on 17.300.
Read what I quoted from anendtech, primitive shaders are currently not implemented at all on public drivers.
 
Read what I quoted from anendtech, primitive shaders are currently not implemented at all on public drivers.
I'm aware of this, and in fact linked to Ryan Smith's comments on the subject in my original post. I'm not seeing how that contradicts anything I said in my post though?
 
I'm aware of this, and in fact linked to Ryan Smith's comments on the subject in my original post. I'm not seeing how that contradicts anything I said in my post though?
It's not turned on you claimed that amounts to AMD lying, but the difference is internal betas vs what's publicly available. Maybe I didn't understand what it was you where trying to say.
 
It's hard to blame them. Radeon RX Vega 64 is a catastrophe in every single possible way.

That is no way anyone can sugarcoat this.

Compare to GeForce GTX 1080,

Radeon RX Vega 64 is...

1. Cheaper? ✗

2. Faster? ✗

3. More power efficient? ✗

______________________________________________________________________

I agree Mockingbird. I've owned a number of Radeon cards over the years and was so excited about the Ryzen release that I bought a 1800x the first weekend and wanted to be sure I would have an all AMD high end rig to match my 5960x/GTX1080. I also have a 6700k/GTX980TI rig. Previously had 2 R9-290s in CF in my 5960x rig but went from them to the 980TI instead of a FuryX since I custom water cool. Then I went to a GTX1080 when it came out and sold the 2 R9-290s. In fact I bought a RX480 and then a second all of last year to "hold on" for an all AMD rig to stay up with my single gpu GTX1080 and GTX980TI rigs.

Sadly, and I mean sadly, when Vega Frontier was released I had a premonition that my "hope" of a High end AMD rig to beat my 5960x/GTX1080 seemed to be slipping away. Delay after delay after delay.

I finally bit the bullet and moved my GTX1080 with EK waterblock to my 1800x rig (all 3 of my rigs are both cpu and gpu custom watercooled). Gigabyte had recently launched a GTX1080TI Aorus Extreme with attached full waterblock. It was salty ($850) but guess what? Unless I could snag a RX Vega 64 for $599 and then add a waterblock @$120 for a total of $719 I am WAY ahead with the GTX1080TI. In fact the only RX Vega 64 at Newegg is $699 so adding the waterblock puts me at @$30 cheaper than the GTX1080TI.

I just moved the GTX1080 over to the 1800x rig and for the most part equal a RX Vega 64 and upgraded to a GTX1080TI.

So yes, the bitterness is probably from a lot of frustrated AMD fans like me.

Seriously, I wish those who bought RX Vegas well. Just so sad to see such a poor overall launch by the RTG division when the cpu division was Stellar with Ryzen and now Threadripper.

It would be pretty hard to justify buying a Radeon RX Vega 64 over a Geforce GTX 1080.
 
yes we know about all this, and Vega's polygon through put won't "suddenly" double, AMD told techreport that DSBR IS NOT GOING TO BE ACTIVE all the time, only limited times based on applications, they did not way way...... it was discussed here numerous times too. Also DSBR is not what is going to help its triangles per clock
Nowhere in my original post did I mention DSBR, so I have no idea why you are referring to it here?
that is RPM via primitive shaders
Rapid packed math has nothing whatsoever to do with primitive shaders. RPM is a feature of Vega's compute units, whereas primitive shaders are a specific configuration for Vega's next-generation geometry engines. I have no idea why you would mention RPM in this context.
Now when looking at PURE COMPUTE synthetic tests, DSBR (any part of the transitional pipeline stages) will not affect compute performance, and this is where you see Vega also having problems.
Would you care to actually present any evidence for the asertion that Vega has problems in pure compute tests? And at least some semblance of a logic of how the problems you claim Vega has with compute performance would impact Vega's performance in games?
So its not just lake of DSBR that is holding it back.
Literally no one mentioned DSBR but you, can you please at least try and engage with the claims that were actually made rather than those you invent?
Primitive shaders are not where the problem for Vega is in shader through put, lol, kinda ridiculous to even say that, cause in pure compute tests, it still has problems against Pascal in many tests lol. And a traditional pipeline geometry set up is not used in pure compute tests lol.
Please feel free to actually present some evidence for your claims, and actually engage the evidence I presented for a front-end geometry bottleneck in Vega and refute it.
How do you explain that? Because we don't know when AMD is feeding us a line of BS? or You just trying to make excuses for Vega's poor compute performance?
How can I explain something you've neither bothered to define nor present any evidence of?
Just look at mining, that is pure compute, why the hell is Vega topping out at 43 mhs? That is pure compute, I would like to see it when something other then Eth, just to rule out bandwidth bottleneck, something like Zcash which it only gets 430 sols (after modding) which is lower than a gtx 1070, 450 sols (after modding)! There ya go, its compute throughput issues rear their ugly head again. We also see this problem is synthetic compute tests, so its not a one time thing.
Etherium hashing rate is well known to be predominantly memory bandwidth and memory latency bound. I have no idea why you would mention it as evidence of sub-par compute performance. Zcash is also apparently a memory bandwidth bound workload, so I don't see how it would constitute evidence of Vega's compute performance? According to Anandtech's compute performance testing and TechGage's compute performance testing, I see nothing to suggest RX Vega has issues with compute performance?
No he didn't explain it all, AMD needs to do the work to get it to work, and that is if the program is set up in a certain way too. Its not automatic. Currently all triangle setup in almost all games use the traditional GS pipeline, that can not be changed Via drivers to primitive shaders, no way around that. This is because the way the pixel shaders are looking to get certain information, is different that using the CU's for rendering triangles. Now if you look at earlier AMD videos/ interviews about this, the picture becomes clear, without Dev intervention, primitive shaders will not help performance in triangle discard at all.
So both Ryan Smith and Rys Sommefeldt are lying? Both claim that primitive shaders will work as pre-defined configuration for Vega's next-generation geometry engine and will not require any specific implementation by developers to use. Do you have any actual evidence to contradict this?
On top of that they must use RPM to achieve anything above 4 Tris/clock. Those pipeline changes can't be done via driver, those are low level triangle set up stages that impact the way an engine functions.
Once again, RPM has nothing to do with primitive shaders. In addition, the Vega whitepaper explains exactly why, in theory, it should be entirely possible for RTG to implement primitive shaders via drivers: The NGG fast path is made up of generalized non-compute shaders that can be configured almost arbitrarily by drivers. Primitive shaders are one pre-defined configuration for the NGG fast path being developed by RTG to operate automatically, once implemented. It is only in the case where developers wish to override this configuration of the NGG fast path that they would then be required to define their own custom geometry pipeline instead.
 
It's not turned on you claimed that amounts to AMD lying, but the difference is internal betas vs what's publicly available. Maybe I didn't understand what it was you where trying to say.
I meant that the suggestion that primitive shaders won't eventually actually be implemented in the public driver, and that once they are implemented in the public driver they wouldn't significantly increase polygon throughput would be equivalent to suggesting RTG was lying in the Vega whitepaper. Apologies if i misunderstood you, but I thought you were claiming that RTG might not succeed in implementing primitive shaders and/or they wouldn't do much once implemented.
 
Nowhere in my original post did I mention DSBR, so I have no idea why you are referring to it here?

You didn't I did

Rapid packed math has nothing whatsoever to do with primitive shaders. RPM is a feature of Vega's compute units, whereas primitive shaders are a specific configuration for Vega's next-generation geometry engines. I have no idea why you would mention RPM in this context.

SM 6.0 hasn't been released, yet and the demo AMD showed off to show its "improved" primitive discard, used RPM and primitive shaders. So either they need primitive shaders to utilize RPM and primitive discard, otherwise they wouldn't have shown that demo as they did. It was a demo showing off hair.
Would you care to actually present any evidence for the asertion that Vega has problems in pure compute tests? And at least some semblance of a logic of how the problems you claim Vega has with compute performance would impact Vega's performance in games?

I just gave you 3 actually if we take synthetic compute tests, many more than that examples. If you want to look them up I suggest you do, cause I can list them out.
Literally no one mentioned DSBR but you, can you please at least try and engage with the claims that were actually made rather than those you invent?

Everything AMD stated, Rys included, never explains how these are done automagically, cause they can't be done automagically. Nor are they even deactivated in drivers due to drivers not being ready. Creating GPU drivers it doesn't work that way. I have posted numerous youtube videos how drivers are created well before tape out because that HAS TO BE DONE THAT WAY TO ENSURE THE CHIP IS WORKING PROPERLY on getting the chip back from manufacturing reducing cost greatly.

Please feel free to actually present some evidence for your claims, and actually engage the evidence I presented for a front-end geometry bottleneck in Vega and refute it.

I have already done this numerous times, FEEL FREE TO SHOW EVIDENCE OF ANY PREVIOUS GENERATION CHIPS HAVING THE SAME DAMN EXCUSES when things don't go right. I CAN DO THAT TOO don't need to look too far

How can I explain something you've neither bothered to define nor present any evidence of?

you don't need to, what I have stated from Vega from over a year ago has all come true, all the features and what not are BS, and won't help them do anything in this market, because they can't. And that is because I understand how the graphics and compute pipelines work.

Etherium hashing rate is well known to be predominantly memory bandwidth and memory latency bound. I have no idea why you would mention it as evidence of sub-par compute performance. Zcash is also apparently a memory bandwidth bound workload, so I don't see how it would constitute evidence of Vega's compute performance? According to Anandtech's compute performance testing and TechGage's compute performance testing, I see nothing to suggest RX Vega has issues with compute performance?


Zcash is not memory bound, how do you explain a rx 580 getting 300 sols? It has similar bandwidth to the 1070 after modding. Oops didn't know that? Also rx580 are back down to gtx 1060 in Zcash performance, and gtx 1060 has much less memory bandwidth after modding. Sorry your understanding seems to be limited when it comes to where bottlenecks are with mining.....

So both Ryan Smith and Rys Sommefeldt are lying? Both claim that primitive shaders will work as pre-defined configuration for Vega's next-generation geometry engine and will not require any specific implementation by developers to use. Do you have any actual evidence to contradict this?

Rys yes he is, he knows more and can't talk about it, I 100% positive of that. Rys is an AMD Employee and has to hold his tongue when it comes to certain things/

Ryan, no he just didn't get all the info.

Once again, RPM has nothing to do with primitive shaders. In addition, the Vega whitepaper explains exactly why, in theory, it should be entirely possible for RTG to implement primitive shaders via drivers: The NGG fast path is made up of generalized non-compute shaders that can be configured almost arbitrarily by drivers. Primitive shaders are one pre-defined configuration for the NGG fast path being developed by RTG to operate automatically, once implemented. It is only in the case where developers wish to override this configuration of the NGG fast path that they would then be required to define their own custom geometry pipeline instead.

NONE OF IT IS AUTOMATIC, I can guarantee you that, I have been programming graphics for many years now, the pipelines for compute and graphics, triangle preparations are way too different.

They can't be interchanged on the fly! IT IS IMPOSSIBLE! If it was easy to do and even able to do be done, AMD would not have worked with DICE in their engine to rework their primitive discard in software. They could have done it easily in drivers if it was that way. BUT THEY DIDN'T, nor DID THEY DO IT WITH ANY OTHER GAME. So don't sit here and ask for evidence when you spout out Rys's BS, as your evidence, he is an AMD mouth piece just like any other one that works at any company, they represent that company in the best light possible, otherwise they get in trouble. Why would AMD keep Rys around if he stated all those features are useless? And currently without exposing them to developers they are absolutely useless, since you haven't bothered to check out AMD demos about those things, kinda pointless to talk with you right? Told ya where to look. Thats why you haven't seen any advantages in Vega and NEVER WILL. Of course in your way you won't even look for anything even after being told it was shown by AMD themselves. if you want more specifics, Raja, RPM, Primitive shaders, Vega, look it up in youtube.

Also your posting habits mimic that guy that got banned to a T, even sentence structure and the illusion that AMD can give performance down the road and no one here wants to talk about it.

There has never been GPU's ever released or made since the 9700 from ATI and nV from Riva, without drivers being done before tape out! ALL FUNCTIONALITY WAS DONE and working before Tape out.

Prior to those times, ATi (9700) did use emulators to show functionality and debugging silicon and drivers prior to tape out. (nV did this 2 gens before with Riva because they were forced to they ran out of money to do with multiple tape outs)
 
Last edited:
I meant that the suggestion that primitive shaders won't eventually actually be implemented in the public driver, and that once they are implemented in the public driver they wouldn't significantly increase polygon throughput would be equivalent to suggesting RTG was lying in the Vega whitepaper. Apologies if i misunderstood you, but I thought you were claiming that RTG might not succeed in implementing primitive shaders and/or they wouldn't do much once implemented.
I don't think it will be the silver bullet if it would they would have implemented it at launch. YOu only get one launch and people dissapointed with Vega will go NVIDIA you qon't get those people back in 3-6 months with a magical driver update. If it actually increased performance enough they would have just kept delaying Vega 64.
 
I don't think it will be the silver bullet if it would they would have implemented it at launch. YOu only get one launch and people dissapointed with Vega will go NVIDIA you qon't get those people back in 3-6 months with a magical driver update. If it actually increased performance enough they would have just kept delaying Vega 64.
I have no idea why they chose to launch RX Vega with the drivers in this state, and I certainly wouldn't have in their shoes, but I don't think you can infer from them choosing to launch the card with the drivers literally half-finished that primitive shaders won't achieve some proportion of what they claim it will in the whitepaper once implemented. They clearly launched Vega FE on what amounted to alpha drivers to keep their 1H 2017 promise to shareholders, and no one would assume that Vega FE presented the upper ceiling of how WX Vega will perform in pro-applications when it launches in September based on Vega FE's launch day pro application performance, right?

Basically, I think RTG was trying to appease their shareholders at the cost of horrible PR on initial launch, which is idiotic in my eyes, but I don't think that they would have changed their minds on when to launch RX Vega based on what primitive shaders might or might not deliver in a future driver update.
 
Has RTG themselves claimed any sort of massive improvements inbound? Last I heard they compared Vega 64 to a 1080 and didn't mention massive future improvements?

Don't know why the chart got in there as I posted the website link. Messed up.

My point was going to be if they put a 1080ti on the chart Vega doesn't look so great....
 
Last edited:
Has RTG themselves claimed any sort of massive improvements inbound? Last I heard they compared Vega 64 to a 1080 and didn't mention massive future improvements?


No they didn't not only that they avoid the topic entirely. Prior to launch Scott Wasson when talking about primitive discard and DSBR, talking about Witcher 3, he avoided the question, because its a per application based optimization, done by developers, its can't be done via drivers.

If they could they would do it and have review sites rebench all titles that use Hairworks and gameworks god rays. Why not, game works is just 2 libraries, and if they did that, Fallout 4, and Witcher 3 should be equalized on Vega when doing x64 tessellation amounts. That would be an awesome promotional victory for AMD, but they won't do it because it can't be done. Shit its not like these companies don't use shader replacement shouldn't be hard for them to do it, but this is at a much lower level than just shader replacement its actually data changes that the engine will have to be modified to take advantage of features.
 
I have no idea why they chose to launch RX Vega with the drivers in this state, and I certainly wouldn't have in their shoes, but I don't think you can infer from them choosing to launch the card with the drivers literally half-finished that primitive shaders won't achieve some proportion of what they claim it will in the whitepaper once implemented. They clearly launched Vega FE on what amounted to alpha drivers to keep their 1H 2017 promise to shareholders, and no one would assume that Vega FE presented the upper ceiling of how WX Vega will perform in pro-applications when it launches in September based on Vega FE's launch day pro application performance, right?

Basically, I think RTG was trying to appease their shareholders at the cost of horrible PR on initial launch, which is idiotic in my eyes, but I don't think that they would have changed their minds on when to launch RX Vega based on what primitive shaders might or might not deliver in a future driver update.


They choose to do it because they couldn't do anymore lol period. They had over 3 years to do work on these drivers, yes 3 years! As iterations of the design is being done and emulated, driver work is being done too. If they can't get things to work before tape out, they will never get them to work. These things go hand in hand, because they have to, otherwise AMD or nV have to retape out chips multiple times, and tape out is not a cheep process, its 20 million bucks or so each time. AMD has that kind of money to waste? nV has and they don't do it!

Simple reasoning to this, they need the drivers to ensure the emulation of the chip is working properly before tape out so at that point they know from a design perspective the chip is going to function the way they designed it. Now if it comes back from manufacturing and parts are not working, it won't be a design issue, its a manufacturing issue. Then they can just respin chips or what not to get things to work the way they want to.

For a good example of this, R600, one of the libs they were using there was a problem with it and it gave r600 clocking issues. Emulation will not pick that up, because its only looking at the design of the chip, not the manufacturing process, so while they were respinning the chip they had a team that was verifying everything, and finally someone found the problem. It took 6 months to find that problem, this is another reason why the design and drivers must be finalized prior to tape out, if they aren't they wouldn't know where to look, and that 20 million per tape out (including getting the first few batches back from the fabs), will explode to 100's of millions for the r600, they had something like 7 or more respins with metal and mask changes.
 
Last edited:
SM 6.0 hasn't been released, yet and the demo AMD showed off to show its "improved" primitive discard, used RPM and primitive shaders. So either they need primitive shaders to utilize RPM and primitive discard, otherwise they wouldn't have shown that demo as they did. It was a demo showing off hair.
What demo are you referring to? Also, what was or was not in a demo is not an argument as to why the use of primitive shaders would require RPM or vice versa.
I just gave you 3 actually if we take synthetic compute tests, many more than that examples. If you want to look them up I suggest you do, cause I can list them out.
Yes, please provide a list of links to them as I would actually like to see them.
Everything AMD stated, Rys included, never explains how these are done automagically, cause they can't be done automagically. Nor are they even deactivated in drivers due to drivers not being ready. Creating GPU drivers it doesn't work that way. I have posted numerous youtube videos how drivers are created well before tape out because that HAS TO BE DONE THAT WAY TO ENSURE THE CHIP IS WORKING PROPERLY on getting the chip back from manufacturing reducing cost greatly.
What on earth are you talking about? Automatic geometry culling is already commonly done by fixed function culling like Polaris' and Vega's primitive discard accelerator. Vega's whitepaper describes that primitive shaders implement similar geometry culling techniques to those used by fixed function culling, but at an earlier stage in the rendering pipeline:
In a typical scene, around half of the geometry will be discarded through various techniques such as frustum culling, back-face culling, and small-primitive culling. The faster these primitives are discarded, the faster the GPU can start rendering the visible geometry. Furthermore, traditional geometry pipelines discard primitives after vertex processing is completed, which can waste computing resources and create bottlenecks when storing a large batch of unnecessary attributes. Primitive shaders enable early culling to save those resources.

The “Vega” 10 GPU includes four geometry engines which would normally be limited to a maximum throughput of four primitives per clock, but this limit increases to more than 17 primitives per clock when primitive shaders are employed.⁷ Primitive shaders can operate on a variety of different geometric primitives, including individual vertices, polygons, and patch surfaces. When tessellation is enabled, a surface shader is generated to process patches and control points before the surface is tessellated, and the resulting polygons are sent to the primitive shader. In this case, the surface shader combines the vertex shading and hull shading stages of the Direct3D graphics pipeline, while the primitive shader replaces the domain shading and geometry shading stages.
Please explain which aspects of what is described here cannot be implemented automatically in drivers and why you believe they cannot be.
Zcash is not memory bound, how do you explain a rx 580 getting 300 sols? It has similar bandwidth to the 1070 after modding. Oops didn't know that? Also rx580 are back down to gtx 1060 in Zcash performance, and gtx 1060 has much less memory bandwidth after modding. Sorry your understanding seems to be limited when it comes to where bottlenecks are with mining.....
Zcash is described as memory hard directly on their own webpage for heaven's sake.
Rys yes he is, he knows more and can't talk about it, I 100% positive of that. Rys is an AMD Employee and has to hold his tongue when it comes to certain things
So you are just going to claim that he is lying without any evidence whatsoever?
Also your posting habits mimic that guy that got banned to a T, even sentence structure and the illusion that AMD can give performance down the road and no one here wants to talk about it.
Don't waste your time on ad hominem attacks, I'll simply ignore them. I prefer to keep any discussion to the substance, thanks.
There has never been GPU's ever released or made since the 9700 from ATI and nV from Riva, without drivers being done before tape out! ALL FUNCTIONALITY WAS DONE and working before Tape out.

Prior to those times, ATi (9700) did use emulators to show functionality and debugging silicon and drivers prior to tape out. (nV did this 2 gens before with Riva because they were forced to they ran out of money to do with multiple tape outs)
Can you please link to some evidence to support your claim that drivers have to be completed prior to tape out?
 
Last edited:
Has RTG themselves claimed any sort of massive improvements inbound? Last I heard they compared Vega 64 to a 1080 and didn't mention massive future improvements?

Of cause not, that's what AMD has its fanboys do.

Eventually, those supposedly massive improvements don't materialize.

AMD then say that it's free of blame because it never made such a claim.
 
Last edited:
What demo are you referring to? Could you at least link something like that if you are going to make claims based on it? Also, what was or was not in a demo is not an argument as to why the use of primitive shaders would require RPM or vice versa.

Yes, please provide a list of links to them as I would actually like to see them. I am not psychic and cannot divine what you are basing your assertions on.

What on earth are you talking about? Automatic geometry culling is already commonly done by fixed function culling like Polaris' and Vega's primitive discard accelerator. Vega's whitepaper describes that primitive shaders implement similar geometry culling techniques to those used by fixed function culling, but at an earlier stage in the rendering pipeline:

Please explain which aspects of what is described here cannot be implemented automatically in drivers and why you believe they cannot be.

This isn't even an argument pertaining to the current evidence, it's just incoherent shouting.

This is also not an even an argument relating to the current discussion.

Zcash is described as memory hard directly on their own webpage for heaven's sake.

So you are just going to claim that he is lying without any evidence whatsoever?

This is an incoherent mess of assertions made without any evidence whatsoever, and I won't be responding to it. Please take the time to make a legible post and actually articulate explanations for your claims if you would like me to respond to them.

Don't waste your time on ad hominem attacks, I'll simply ignore them. I prefer to keep any discussion to the substance, thanks.

Can you please link to some evidence to support your claim that drivers have to be completed prior to tape out?

Just search on youtube, you will find 2 videos both from nV for how driver developement and chip emulation is done.

Video about RPM, primitive discard, told you where to find that, youtube search, Raja, RPM, Vega, discard, you should be able to find it. I just did.

That is not ad hominem attacks, ad hominem attacks are attacks at you personally, and at a personal level, that debases you as a person get it? I have not attacked you, I have shown you that you don't know what you speak of, by giving examples where your logic fails that is not a personal attack, that is an attack at what you state not you.......

What I'm I talking about, you don't remember DICE's presentation on how to get better primitive discard with Frostbyte 3 engine? Yeah they had a huge presentation about it, and this is why that engine doesn't hurt AMD with polygon through put...... Look it up, easy to find. This is what AMD used as an example for the changes they did in Polaris so it helps their front end. Lots of good that did it! Still has polygon through put problems comparing it to current gen nV cards. They did finally catch up to last gen nV cards though of the same bracket.

I know Rys is lying well not saying everything that isn't lying, just not telling the whole story, cause I know him, and have know him for years, and the way he sees and posts thing have changed since he started working at RTG!

about memory hard, you didn't read what Equihash algo has do you

The memory requirements for testnet mining are currently set very low, because the implementation is not optimised. So currently, anyone who was mining before can still mine.

They aren't talking about bandwidth they are talking about memory amounts! Your assumption that it was bandwidth is a fallacy. I didn't even need to look it up because I have been mining different equihash coins on multiple rigs and know what is going on.

again, you didn't know this, its within your own link.
Once we have optimised the implementation (#857), we will bump up the memory requirements to around 1 GB of RAM (#856),


https://z.cash/blog/why-equihash.html

nowhere do they talk about bandwidth lol, yeah thought so, no where! cause its not at the moment and won't be for at least a year not even on 4 gb cards, let alone 8 gb, that will be 2 years down the road.

So another one to add to your list, of being wrong? You are assuming things without understanding the fundamentals, yeah AMD marketing will get ya all the time if you do that cause you assume they are telling you the truth or in this case eluding by telling others half truths. Don't sit here and tell me that I don't know what I'm talking about and its incoherent when you can't even read whats inside of your own link (well linked to within your link).

PS everything I have been talking about and you are asking for me to point out, I have in this very thread. number of times, so you can find them if you read this thread. So if you feel so inclined to tell others for proof in the very thread that they have already, you aren't doing your due diligence. You didn't even do your due diligence with Zcash, in your own link, I really should just relink and post about it all in one post, but that's a lot of work and will take pages, not in the mood to do that.
 
Last edited:
What demo are you referring to? Could you at least link something like that if you are going to make claims based on it? Also, what was or was not in a demo is not an argument as to why the use of primitive shaders would require RPM or vice versa.

Yes, please provide a list of links to them as I would actually like to see them. I am not psychic and cannot divine what you are basing your assertions on.

What on earth are you talking about? Automatic geometry culling is already commonly done by fixed function culling like Polaris' and Vega's primitive discard accelerator. Vega's whitepaper describes that primitive shaders implement similar geometry culling techniques to those used by fixed function culling, but at an earlier stage in the rendering pipeline:

Please explain which aspects of what is described here cannot be implemented automatically in drivers and why you believe they cannot be.

This isn't even an argument pertaining to the current evidence, it's just incoherent shouting.

This is also not an even an argument relating to the current discussion.

Zcash is described as memory hard directly on their own webpage for heaven's sake.

So you are just going to claim that he is lying without any evidence whatsoever?

This is an incoherent mess of assertions made without any evidence whatsoever, and I won't be responding to it. Please take the time to make a legible post and actually articulate explanations for your claims if you would like me to respond to them.

Don't waste your time on ad hominem attacks, I'll simply ignore them. I prefer to keep any discussion to the substance, thanks.

Can you please link to some evidence to support your claim that drivers have to be completed prior to tape out?

It's hilarious that you want more substance from other people but your posts have like a 1% ratio of useful data to fluff.

Considering AMD isn't touting the "fine wine" kind of language of your original post, which they love to exaggerate things, makes me extremely skeptical of any large future improvements. I enjoy trending history and correlating it to the present and future from a high level. I have never seen AMD pull off something like this and they've given themselves enough setups for it.

If by some chance they did update drivers six months from now Volta will pummel it anyways. If I was them I would call Vega a wash, get what they can before mining crashes, and focus everything on Navi.

Honestly I'd like to see proof from you that AMD is actually planning on increasing the throughput in an automatic way as you are claiming.
 
Last edited:
I have no idea why they chose to launch RX Vega with the drivers in this state, and I certainly wouldn't have in their shoes, but I don't think you can infer from them choosing to launch the card with the drivers literally half-finished that primitive shaders won't achieve some proportion of what they claim it will in the whitepaper once implemented. They clearly launched Vega FE on what amounted to alpha drivers to keep their 1H 2017 promise to shareholders, and no one would assume that Vega FE presented the upper ceiling of how WX Vega will perform in pro-applications when it launches in September based on Vega FE's launch day pro application performance, right?

Basically, I think RTG was trying to appease their shareholders at the cost of horrible PR on initial launch, which is idiotic in my eyes, but I don't think that they would have changed their minds on when to launch RX Vega based on what primitive shaders might or might not deliver in a future driver update.
That is not how you launch a product especially a performance based product. You are upfront that your cards greateset performance is locked behind a feature not yet implement, so you can at least drive sales. What you don't do is go quiet about a feature and launch the card praying for the best. You only get one launch. Very few outlets are going to bother to retest the card after this supposed driver comes out not to mention all the consumers who decided to just go GTX 1080/1080ti based on Vegas current performance are not going to suddenly buy another card 3-6 months later. How does releasing a gimped product help shareholders?
 
Again, chip emulation and drivers





Both these links go through this, (second link first half of the video) nV has been doing emulation and driver development finalization prior to tape out starting with Riva 128. ATi starting do this with the 9700 pro after all the problems it had with the 8500 pro drivers.



RPM/ primitive discard, primitive shaders demo, you need to listen to the whole video cause he first talks about the demo in the beginning of the presentation by saying he will show it later. So the information is kinda of split over the video. Pretnent part 22 minutes right around there.


Frostbyte presentation on culling



Pretty sure this was it, there was more than one presentation on this. But nV's architecture does this automatically, AMD's not so well and this is why DICE incorporated it into their engine for GCN.

Just understanding how these guys emulate chips and create drivers prior to tape out, you will know there is no magic that will happen with features with drivers. Its like when nV stated Async compute was disabled in Maxwell drivers, cause yeah after the first instance of doing it the next instance it broke the driver lol, it couldn't reallocate the SM without flushing it. If they were capable of doing that they would have not disabled it in the first place.
 
Last edited:
Back
Top