Ashes of the Singularity Day 1 Benchmark Preview @ [H]

Ieldra · Apr 4, 2016

jamesgalb said:
Still interested in r7 360 vs r7 370 and gtx 950, to find out how much the ASync closes the gap///

Also r9 380x vs GTX 970...

Zotac GeForce GTX 970 AMP! Extreme Core review

razor1 · Apr 4, 2016

Zion Halcyon said:
The way I looked at it was from the information available. And I formed an opinion based on what was available, even with the caveat of "we need to wait and see", but everything I put forth was based on logic from what has been given.

You put forth a narrative that disregards a bunch of stuff that was out there "just to prove a point", and it was poorly executed. And it was an attempt to change the argument from NVidia potentially missing the boat on Async for a span of a few years to more of a commentary on my commentary, which yes, would be a strawman, as it is an attempt to take something that, based on what we know, you cannot refute, and change it into an argument you think you can win, while trying to be persuasive about it being the same argument, when it is in fact not. In other words, a CLASSIC strawman.

Stop the Bullcrap.

Stop the bullcrap Ok, tell me where I stated nV doesn't have issues with async right now? They never missed the boat on async, because its only after DX12 (vulkan) has been introduced has it became a problem for end users, so the last 6 months. Yeah Maxwell doesn't seem to be well suited for async. That has by no means tell us anything on Pascal on async.

Does the fact AMD downplayed CR's have anything to do with them not having CR's in Polaris?

Does the fact AMD downplayed tessellation this generation, they are still going to have problems with Tessellation with Polaris? OH wait its been 5 generations (not even 5 years, 5 Freakin generations!) since AMD has had poor tessellation performance (geometry through put)

Stupidity has no bounds when making assertions based on marketing. Marketing is the tool to fool idiots to buy certain products regardless of their features or uses.

Come on, you want to say your story and narrative is correct, but there is absolutely NO WAY you know that (nor can I say it won't), unless you are in bed doing the horizontal mambo with one of the team members that is making the damn thing.

Lets put your time line theory to good use on CR's and AMD, when was DX12.1 introduced,? It was 8 months ago. Ok, you say IHV's need 2 to 3 years to make the changes to their architecture to put this feature in. When was DX12.1 had its feature set set? I would think some time after Maxwell 2 was put into play right? When would AMD know this only after Maxwell 2 was launched. So the gtx 980 and 970 was launched in......

Sept 2014, then Dx12.1 came after, so that is less than 2 years? AMD doesn't have any hardware that can do CR's unlike nV which as the hardware but doesn't seem to be functioning to do async when doing it under direct compute...... Yeah AMD will not be able to get CR's into Polaris they are screwed.

This is your same theory of timelines..... WTF does this all mean, what if Polaris comes out and has CR's... errr yeah guess what something was wrong, either they can get things done quicker than the 2 to 3 years you stated, or when the IHV's knew about what was upcoming in an API was know well before hand. Or both are incorrect.....

Ieldra · Apr 4, 2016

razor1 said:
Stop the bullcrap Ok, tell me where I stated nV doesn't have issues with async right now? They never missed the boat on async, because its only after DX12 (vulkan) has been introduced has it became a problem for end users, so the last 6 months. Yeah Maxwell doesn't seem to be well suited for async. That has by no means tell us anything on Pascal on async.

Does the fact AMD downplayed CR's have anything to do with them not having CR's in Polaris?

Does the fact AMD downplayed tessellation this generation, they are still going to have problems with Tessellation with Polaris? OH wait its been 5 generations (not even 5 years, 5 Freakin generations!) since AMD has had poor tessellation performance (geometry through put)

Stupidity has no bounds when making assertions based on marketing. Marketing is the tool to fool idiots to buy certain products regardless of their features or uses.

Come on, you want to say your story and narrative is correct, but there is absolutely NO WAY you know that (nor can I say it won't), unless you are in bed doing the horizontal mambo with one of the team members that is making the damn thing.

Lets put your time line theory to good use on CR's and AMD, when was DX12.1 introduced,? It was 8 months ago. Ok, you say IHV's need 2 to 3 years to make the changes to their architecture to put this feature in. When was DX12.1 had its feature set set? I would think some time after Maxwell 2 was put into play right? When would AMD know this only after Maxwell 2 was launched. So the gtx 980 and 970 was launched in......

Sept 2014, then Dx12.1 came after, so that is less than 2 years? AMD doesn't have any hardware that can do CR's unlike nV which as the hardware but doesn't seem to be functioning to do async when doing it under direct compute...... Yeah AMD will not be able to get CR's into Polaris they are screwed.

'AMD will use their usual bruteforce methods' xD

I also wanted to mention something in case some of you don't know, the only effective difference between work run on the graphics queue and work run on the compute queues is that the latter won't have access to the geometry units

Zion Halcyon · Apr 4, 2016

razor1 said:
Stop the bullcrap Ok, tell me where I stated nV doesn't have issues with async right now? They never missed the boat on async, because its only after DX12 (vulkan) has been introduced has it became a problem for end users, so the last 6 months. Yeah Maxwell doesn't seem to be well suited for async. That has by no means tell us anything on Pascal on async.

Does the fact AMD downplayed CR's have anything to do with them not having CR's in Polaris?

Does the fact AMD downplayed tessellation this generation, they are still going to have problems with Tessellation with Polaris? OH wait its been 5 generations (not even 5 years, 5 Freakin generations!) since AMD has had poor tessellation performance (geometry through put)

Stupidity has no bounds when making assertions based on marketing. Marketing is the tool to fool idiots to buy certain products regardless of their features or uses.

Come on, you want to say your story and narrative is correct, but there is absolutely NO WAY you know that (nor can I say it won't), unless you are in bed doing the horizontal mambo with one of the team members that is making the damn thing.

Lets put your time line theory to good use on CR's and AMD, when was DX12.1 introduced,? It was 8 months ago. Ok, you say IHV's need 2 to 3 years to make the changes to their architecture to put this feature in. When was DX12.1 had its feature set set? I would think some time after Maxwell 2 was put into play right? When would AMD know this only after Maxwell 2 was launched. So the gtx 980 and 970 was launched in......

Sept 2014, then Dx12.1 came after, so that is less than 2 years? AMD doesn't have any hardware that can do CR's unlike nV which as the hardware but doesn't seem to be functioning to do async when doing it under direct compute...... Yeah AMD will not be able to get CR's into Polaris they are screwed.

This is your same theory of timelines..... WTF does this all mean, what if Polaris comes out and has CR's... errr yeah guess what something was wrong, either they can get things done quicker than the 2 to 3 years you stated, or when the IHV's knew about what was upcoming in an API was know well before hand. Or both are incorrect.....

See? I touched a nerve and the truth came out. Sniffed team green there the whole time.

I dealt with what was given and what made sense given the timeline of how everything went down.

You tried to turn it into an argument about arguments.

What you still aren't grasping is that async started the clock. Before Async, everyone was approaching how things were as "business as usual". There was no reason to do anything different prior. Any of your arguments that do not recognize how async changed the argument, changed the conversation, especially with a number of developers actively COMMENTING on how async is benefiting them, on top of that this whole thing was kicked off in Fall of last year, are basically you just trying to spin.

And for what? Again, who are you trying to defend? To Protect? NVidia? They're big boys - they can do it themselves. And far from me being painted in a corner of saying I am Team Red - this whole time I have been saying that we need to wait and see what this whole async business actually means, and how it actually translates. If nothing else, as someone who's owned both AMD and NVidia and been happy with ALL my purchases, I am excited at the prospect of a new round of LEGIT video card wars, with prices that benefit us.

So really, just let it drop. Your act here is tired. You are arguing over arguing, and its poor form. The most likely scenario is the one I put forth, that NVidia wasn't doing anything different than they had, nor would or should they have, until this async business from this past Fall. Pascal may end up being a better card than AMD's offerings, but it still will not have async compute hardware most likely, and likely will not until after Volta given how long it takes something to go from design to production. How many times to I have to lay that out for you before you concede instead of concocting arbitrary storylines to win a pissing contest? I at least am going off the information and rumors and such which has been building from last Fall until now, whereas you are pulling a Michael-Moore-esque attempt at wrapping enough truths around a faulty premise to make yourself sound convincing.

And before you come back with me trying to sound definitive, as I repeated, I, along with others, are in a "Wait and see" to see if Async is a game changer that can ignite a new round of card wars, which is hardly either a "Team Red" or "Team Green" position; merely a smart position.

razor1 · Apr 4, 2016

What you were the one that stated this line of thought? Where you the one that stated this first?

Zion Halcyon said:
You aren't getting it. This has nothing to do with the quality or quantity of NVidia's engineers. This has to do with BUSINESS. If they have pascal all designed and components of that are already in production, do you have any idea as to the BUSINESS COST of having scrap those pieces already in production, go back and re-engineer the pascal plans, make sure it works, and then start the whole process over again? It would set them back MILLIONS, both in time cost, the people they paid, what they already did, and the sales they still will have gotten from Pascal even without async. It would be business SUICIDE.

Adding async isn't like sticking a stick or ram into a motherboard - there are complicated engineering feats that need to be handled, tested, verified, etc. The integration isn't plug and play dude.

Even if they added it to pascal, that would mean delaying pascal for 2-3 years, and likely scrapping Volta. And that just can't happen. No, they got caught with their pants down. And to correct it, from a business standpoint, they are better off selling Pascal (and likely Volta) as is, and given the R&D time, working on a hardware async solution for the next card past Volta, so that they can play catch up.

What does that mean, you think by adding this feature in it would have delayed pascal for to 2 to 3 years. Then you say this

What you still aren't grasping is that async started the clock. Before Async, everyone was approaching how things were as "business as usual". There was no reason to do anything different prior. Any of your arguments that do not recognize how async changed the argument, changed the conversation, especially with a number of developers actively COMMENTING on how async is benefiting them, on top of that this whole thing was kicked off in Fall of last year, are basically you just trying to spin.

Who trying to convince themselves what here?

Does your posts have any resemblence of a straight forward thought process, when you start jumping from one point to another without confirming or even anecdotal proof?

So really, just let it drop. Your act here is tired. You are arguing over arguing, and its poor form. The most likely scenario is the one I put forth, that NVidia wasn't doing anything different than they had, nor would or should they have, until this async business from this past Fall. Pascal may end up being a better card than AMD's offerings, but it still will not have async compute hardware most likely, and likely will not until after Volta given how long it takes something to go from design to production. How many times to I have to lay that out for you before you concede instead of concocting arbitrary storylines to win a pissing contest? I at least am going off the information and rumors and such which has been building from last Fall until now, whereas you are pulling a Michael-Moore-esque attempt at wrapping enough truths around a faulty premise to make yourself sound convincing.

I don't know if Pascal will be a better card, you are assuming that I stated it will be, I never did, what I have stated was this is going to be an interesting generation as both companies are going to be launching their next gens relatively at the same time on new nodes which is something we have never seen both of these happen at the same time before......

razor1 said:
Both AMD and nV have same (similar) limitation when it comes to node size and the only thing that will every make one better than the other in a generation is architectural design. Its been a long time since AMD (ATi at the time) has been able to out right out design nV, for what ever reasons (r300), since then nV has been able to keep up with AMD/ATI easily. Since the g80, nV has been able to keep up with AMD and usually best them with performance, performance per watt, etc, even when later on their cards don't age as well.

This is the first time, we might actually see both companies successfully launch new architectures on a new node at similar times. We have never seen this before. What we have seen is one or the other company screw up lol, AKA, r600, FX, rv6x0 and so on.

Keep this in mind, when everyone was thinking the G80 was going to look like a doubled up 7800 gpu with seperate pixel shaders and vertex shaders, nV came out and blew away our expectations of what they could do. A small amount of foresight from an experience like this, should tell us even though they are down playing something, that has nothing to do with what is potentially coming out within 1 year of what they were trying to avoid programmers to program for. The downplaying has everything to do with what they are try to sell right now, not unannounced products.

But yet people put A+B together as if its PBJ and try to make a point. Its folly when A and B have nothing to do with each other.

Zion Halcyon · Apr 4, 2016

If you want to keep spinning your wheels and jousting with windmills Don Quixote, go right ahead. I stand by what I have said so far, and thus far you've said nothing that actually makes sense in terms of adjusting those expectations, only such a large pile of strawmen that you could make a Stay-Puft Marshmellow Man-sized scarecrow out of it.

When you actually have something sensical and relevant, I'll be happy to reply at that point. If you continue to just try to win pissing contests, I have no patience for that brand of nonsense.

Ieldra · Apr 4, 2016

razor1 said:
What you were the one that stated this line of thought? Where you the one that stated this first?

What does that mean, you think by adding this feature in it would have delayed pascal for to 2 to 3 years. Then you say this

Who trying to convince themselves what here?

Does your posts have any resemblence of a straight forward thought process, when you start jumping from one point to another without confirming or even anecdotal proof?

I don't know if Pascal will be a better card, you are assuming that I stated it will be, I never did, what I have stated was this is going to be an interesting generation as both companies are going to be launching their next gens relatively at the same time on new nodes which is something we have never seen both of these happen at the same time before......

Off-topic but I remember the g80 well, at the time graphics card were always late in the country I was in, and stupidly expensive. I was in the market for a 7800gt, I went to a trade show and found an 8800gtx on display, aib called albatron, bought it for the price of a 7800gt! What a surprise that was, best card ever, died twice! Baked it twice! It lived for six years!

razor1 · Apr 4, 2016

Zion Halcyon said:
If you want to keep spinning your wheels and jousting with windmills Don Quixote, go right ahead. I stand by what I have said so far, and thus far you've said nothing that actually makes sense in terms of adjusting those expectations, only such a large pile of strawmen that you could make a Stay-Puft Marshmellow Man-sized scarecrow out of it.

When you actually have something sensical and relevant, I'll be happy to reply at that point. If you continue to just try to win pissing contests, I have no patience for that brand of nonsense.

Then don't post nonsense you won't get nonsense to respond to lol.

You can't even post your ideas with proper trains of thought, so go figure all you can fall back on is misanthropic posts to down play your own fallacies.

razor1 · Apr 4, 2016

Ieldra said:
Off-topic but I remember the g80 well, at the time graphics card were always late in the country I was in, and stupidly expensive. I was in the market for a 7800gt, I went to a trade show and found an 8800gtx on display, aib called albatron, bought it for the price of a 7800gt! What a surprise that was, best card ever, died twice! Baked it twice! It lived for six years!

Yeah I would have to say the gtx 8800 was up there with the 9700 pro from ATi.

HybridHB · Apr 4, 2016

This review totally misses the mark for me. I haven't read through all of the responses in here but the main point of low level api's is to reduce driver overhead on the cpu. This review only tests a single high end cpu. They sort of talked about the performance bottle neck shifting towards the cpu in the conclusion but didnt show any data for it then came to the conclusion "Is DX12 the game changer it has been hyped to be? We don't yet truly know." They dont know because they didnt test low and mid range cpu's. This is relevant in the upcoming year because gpu performance is going to increase by a lot when AMD and Nvidia release their high end 14/16 nm gpu's. At the same time theres no indication that we will see major performance increases for upcoming cpu's so I assume most users with mid/high range cpu's right now will just upgrade their gpus in the next year.

Ieldra · Apr 4, 2016

HybridHB said:
This review totally misses the mark for me. I haven't read through all of the responses in here but the main point of low level api's is to reduce driver overhead on the cpu. This review only tests a single high end cpu. They sort of talked about the performance bottle neck shifting towards the cpu in the conclusion but didnt show any data for it then came to the conclusion "Is DX12 the game changer it has been hyped to be? We don't yet truly know." They dont know because they didnt test low and mid range cpu's. This is relevant in the upcoming year because gpu performance is going to increase by a lot when AMD and Nvidia release their high end 14/16 nm gpu's. At the same time theres no indication that we will see major performance increases for upcoming cpu's so I assume most users with mid/high range cpu's right now will just upgrade their gpus in the next year.

This is true, but at the same time you have to recognize the jump from kepler to maxwell pretty much doubled performance per watt on the same node, node maturity and architectural advantages are equally important; and both pascal and polaris will be built on an expensive process that is yet to mature, I don't think it will even be feasible with the transistor cost disparity you see jumping from 28nm fd soi and 14/16nm finfet .

To put this into perspective, IF amd were to produce a 14nm finfet fiji, it would be roughly 60% smaller die, same 8.9m transistors. It would cost more than 28nm! Not only that, but if they were to run those 14nm finfets at a higher frequency, say 1500mhz, heat density would likely be 4~5x times 28nm fiji.

they can reduce chip density, but that will also be a cost increase, 14/16nm will provide a lot more than double current performance, but it will take a while before they can match 28nm costs

Zion Halcyon · Apr 4, 2016

razor1 said:
Then don't post nonsense you won't get nonsense to respond to lol.

You can't even post your ideas with proper trains of thought, so go figure all you can fall back on is misanthropic posts to down play your own fallacies.

Physician, heal thyself.

ME: (Fall 2015) Async Compute news breaks with AOTS. Nvidia's response (which changed weekly, if not daily), first asked for a retraction, claimed they had it, then admitted they didn't, then claimed they could add it via software, then said they didn't need it. Based on that lack of clear response from them, I made the logical assumption (which is all ANY of us have, and stated as such from the beginning) that NVidia got caught with their pants down. Soon followed rumors that Pascal did not have Async. IF NVidia was caught by surprise, this makes sense. That would mean based on the design process, it would take them 2-3 years to get a new product out with Async, given R&D time from design to implementation, and the current graphics card cadence.

That, is a proper train of thought given the information I have read up until this time.

You have been using pretzel logic from go to try to argue anything BUT the series of events as they played out from the summer, to win a pissing contest.

HybridHB · Apr 4, 2016

Ieldra said:
This is true, but at the same time you have to recognize the jump from kepler to maxwell pretty much doubled performance per watt on the same node, node maturity and architectural advantages are equally important; and both pascal and polaris will be built on an expensive process that is yet to mature, I don't think it will even be feasible with the transistor cost disparity you see jumping from 28nm fd soi and 14/16nm finfet .

To put this into perspective, IF amd were to produce a 14nm finfet fiji, it would be roughly 60% smaller die, same 8.9m transistors. It would cost more than 28nm! Not only that, but if they were to run those 14nm finfets at a higher frequency, say 1500mhz, heat density would likely be 4~5x times 28nm fiji.

Sure, but the main point of my post was that in the next year peoples systems will have more of an increase in gpu power than cpu power. How much gpu power? We dont know, but its safe to assume itll be an increase and not a decrease.

razor1 · Apr 4, 2016

Zion Halcyon said:
Physician, heal thyself.

ME: (Fall 2015) Async Compute news breaks with AOTS. Nvidia's response (which changed weekly, if not daily), first asked for a retraction, claimed they had it, then admitted they didn't, then claimed they could add it via software, then said they didn't need it. Based on that lack of clear response from them, I made the logical assumption (which is all ANY of us have, and stated as such from the beginning) that NVidia got caught with their pants down. Soon followed rumors that Pascal did not have Async. IF NVidia was caught by surprise, this makes sense. That would mean based on the design process, it would take them 2-3 years to get a new product out with Async, given R&D time from design to implementation, and the current graphics card cadence.

That, is a proper train of thought given the information I have read up until this time.

You have been using pretzel logic from go to try to argue anything BUT the series of events as they played out from the summer, to win a pissing contest.

You have made an assumption on ever single one of those parts, which you even stated, you made a logical assumption, what is the correlation of one assumption to another assumption and another. What is the % of variation an each assumption based on another assumption. Do you know that logical % drops per assumption?

And yes there is math behind this. This is exactly how AI neuronets work. So your "theory" is as crack pot as a computer trying to link assumptions together to create possibilities to measure from. That is not a very good way of coming up with a theory. Theories are based on facts not assumptions.

The Most Common Logical Fallacies

This is what you are doing.

Affirming the Consequent

Ieldra · Apr 4, 2016

HybridHB said:
Sure, but the main point of my post was that in the next year peoples systems will have more of an increase in gpu power than cpu power. How much gpu power? We dont know, but its safe to assume itll be an increase and not a decrease.

Yeah, but it was only AMD that until recently had an inefficient driver stack, nvidia users will feel a smaller difference, nvidia users with higher end gpus on AMD cpus will feel a difference, i can tell you that on a 5820k tomb raider raised my minimums by quite a bit under dx12, but thats about it, in ashes performance is identical , haven't checked minimums though

Zion Halcyon · Apr 4, 2016

razor1 said:
You have made an assumption on ever single one of those parts, which you even stated, you made a logical assumption, what is the correlation of one assumption to another assumption and another. What is the % of variation an each assumption based on another assumption. Do you know that logical % drops per assumption?

And yes there is math behind this. This is exactly how AI neuronets work. So your "theory" is as crack pot as a computer trying to link assumptions together to create possibilities to measure from. That is not a very good way of coming up with a theory. Theories are based on facts not assumptions.

The Most Common Logical Fallacies

This is what you are doing.

Affirming the Consequent

You are making the argument about the argument, not about what was given. And then you shift what the argument is about. That's strawman to a T. Dress it up all you want, but YOU are doing that. What I have said has been consistent, and not this moving target you keep trotting out to win pissing contests and argue about arguing, while I simply am talking tech.

Ieldra · Apr 4, 2016

Zion Halcyon said:
You are making the argument about the argument, not about what was given. And then you shift what the argument is about. That's strawman to a T. Dress it up all you want, but YOU are doing that. What I have said has been consistent, and not this moving target you keep trotting out to win pissing contests and argue about arguing, while I simply am talking tech.

Do you not realize that in the best case scenario for you (razor is guilty of everything you say) you are guilty of the exact same things ?

He said it's wrong to assume they can't support async concurrency in Pascal, and I agree, you base your argument on the ASSUMPTION that nvidia only heard of asynchronous multiengine concurrency when there was a pr storm about it, if you actually believe that you're delusional, not just wrong

He made an analogous argument to show you how wrong it is

Given that Pascal will bring back full fp64 and will replace gk210 (bet you never heard of this one) they have already hinted that they might bring back the hw scheduler , especially considering fp16 support and mixed precision

Oh, and some friends of mine that work at a meteorological research institute are swearing to me they are scheduled to receive new nvidia hardware in early may, and they currently have gk210 equipped teslas

Oh and something else i should add, EVEN IF pascal sucks for gaming, and Polaris outperforms it by 20% across the board, nvidia can more than afford to drop prices because pascal will sell like hotcakes for fp16, hotcakes i tell you.

my friends were renting a dual hexacore cpu amazon server to train a neural network, the training routine took 8 hours.

i let them run the same thing on my gpu using CUDA and cuDNN libs avaiable for FREE from nv dev program, took them one day to adapt their code, ONE DAY.

The experiment took 50 minutes on a 980ti. 50 minutes vs 8 hours, at ~ 50% gpu utilization

On pascal this will take 12.5 minutes, at same % util

razor1 · Apr 4, 2016

Exactly what I was going at, I even stated that me making the counter argument is that most likely I would be wrong by making the same assumptions lol.

Zion Halcyon · Apr 4, 2016

Ieldra said:
Do you not realize that in the best case scenario for you (razor is guilty of everything you say) you are guilty of the exact same things ?

He said it's wrong to assume they can't support async concurrency in Pascal, and I agree, you base your argument on the ASSUMPTION that nvidia only heard of asynchronous multiengine concurrency when there was a pr storm about it, if you actually believe that you're delusional, not just wrong

He made an analogous argument to show you how wrong it is

Given that Pascal will bring back full fp64 and will replace gk210 (bet you never heard of this one) they have already hinted that they might bring back the hw scheduler , especially considering fp16 support and mixed precision

Look. there's what's possible, and what's probable.

I am not saying that there isn't some NVidia engineer somewhere who doesn't know what Async is. What I am saying is, based on NVidia's own reaction, it would be reasonable to assume that they did not have that tech in Pascal. If they did, they would have said so to head off the storm. Their PR department isn't bad, you know. They know how to do their jobs. In absence of that, it stands to reason that Pascal will likely not have async.

Now, for all we know, it may not matter - it's entirely possible that NVidia has some of its own unique hardware that allows for gains in other areas. But we don't have numbers, or tech, or anything on that. All we have to talk about right now in this given moment is async, therefore its not unproper to speculate along that avenue.

And further, no one is saying anything CONCRETE. Everything is a "wait and see".

But no, I am not doing the same thing as your team green buddy (I see the liked posts

). I took what information is available and simply took the most straightforward route connecting the dots, and put that out as what seemed likely based on the events. I then couched it at the end that the entire process is a wait and see.

Razor decided to make it about how I argue and drag this on for pages about that, all while saying that Pascal COULD have async that wasn't reported, which in his bent IS pure speculation, whereas at least I am going off of the information already out there for public consumption. Therein lies the major difference, and whenever I deign to point out that difference, he changes tactics back to arguing about how I argue again, which indicates he knows he is on a weak argument, is fighting just to fight, and it just flinging poo at a wall, hoping for something to stick.

Imhotep · Apr 4, 2016

Relayer said:
If you didn't watch the link in the post I was responding to I can understand why you seem to have misinterpreted everything I said.

Great video. It is cool to see what this tech is capable of doing.

Ieldra said:
Do you not realize that in the best case scenario for you (razor is guilty of everything you say) you are guilty of the exact same things ?

He said it's wrong to assume they can't support async concurrency in Pascal, and I agree, you base your argument on the ASSUMPTION that nvidia only heard of asynchronous multiengine concurrency when there was a pr storm about it, if you actually believe that you're delusional, not just wrong

He made an analogous argument to show you how wrong it is

Given that Pascal will bring back full fp64 and will replace gk210 (bet you never heard of this one) they have already hinted that they might bring back the hw scheduler , especially considering fp16 support and mixed precision

Oh, and some friends of mine that work at a meteorological research institute are swearing to me they are scheduled to receive new nvidia hardware in early may, and they currently have gk210 equipped teslas

Oh and something else i should add, EVEN IF pascal sucks for gaming, and Polaris outperforms it by 20% across the board, nvidia can more than afford to drop prices because pascal will sell like hotcakes for fp16, hotcakes i tell you.

my friends were renting a dual hexacore cpu amazon server to train a neural network, the training routine took 8 hours.

i let them run the same thing on my gpu using CUDA and cuDNN libs avaiable for FREE from nv dev program, took them one day to adapt their code, ONE DAY.

The experiment took 50 minutes on a 980ti. 50 minutes vs 8 hours, at ~ 50% gpu utilization

On pascal this will take 12.5 minutes, at same % util

lol, im dieing ....heeelp......

razor1 · Apr 4, 2016

Zion Halcyon said:
Look. there's what's possible, and what's probable.

I am not saying that there isn't some NVidia engineer somewhere who doesn't know what Async is. What I am saying is, based on NVidia's own reaction, it would be reasonable to assume that they did not have that tech in Pascal. If they did, they would have said so to head off the storm. Their PR department isn't bad, you know. They know how to do their jobs. In absence of that, it stands to reason that Pascal will likely not have async.

Now, for all we know, it may not matter - it's entirely possible that NVidia has some of its own unique hardware that allows for gains in other areas. But we don't have numbers, or tech, or anything on that. All we have to talk about right now in this given moment is async, therefore its not unproper to speculate along that avenue.

And further, no one is saying anything CONCRETE. Everything is a "wait and see".

But no, I am not doing the same thing as your team green buddy (I see the liked posts ). I took what information is available and simply took the most straightforward route connecting the dots, and put that out as what seemed likely based on the events. I then couched it at the end that the entire process is a wait and see.

Razor decided to make it about how I argue and drag this on for pages about that, all while saying that Pascal COULD have async that wasn't reported, which in his bent IS pure speculation, whereas at least I am going off of the information already out there for public consumption. Therein lies the major difference, and whenever I deign to point out that difference, he changes tactics back to arguing about how I argue again, which indicates he knows he is on a weak argument, is fighting just to fight, and it just flinging poo at a wall, hoping for something to stick.

Then explain this Zion, why in games that use DX11 with cuda is there concurrent kernal execution and async compute happening just like it should be happening. Its not even DX12 here we are talking about. nV has the ability to do it with DX11!

Did you know that it works with CUDA and DX11? or this a fact you seemed not to look at to make your assumption. If you want to see it work, start up a game with CUDA and DX11 and fire up a profiler and see the magic.

Ieldra · Apr 4, 2016

Zion Halcyon said:
Look. there's what's possible, and what's probable.

I am not saying that there isn't some NVidia engineer somewhere who doesn't know what Async is. What I am saying is, based on NVidia's own reaction, it would be reasonable to assume that they did not have that tech in Pascal. If they did, they would have said so to head off the storm. Their PR department isn't bad, you know. They know how to do their jobs. In absence of that, it stands to reason that Pascal will likely not have async.

Now, for all we know, it may not matter - it's entirely possible that NVidia has some of its own unique hardware that allows for gains in other areas. But we don't have numbers, or tech, or anything on that. All we have to talk about right now in this given moment is async, therefore its not unproper to speculate along that avenue.

And further, no one is saying anything CONCRETE. Everything is a "wait and see".

But no, I am not doing the same thing as your team green buddy (I see the liked posts ). I took what information is available and simply took the most straightforward route connecting the dots, and put that out as what seemed likely based on the events. I then couched it at the end that the entire process is a wait and see.

Razor decided to make it about how I argue and drag this on for pages about that, all while saying that Pascal COULD have async that wasn't reported, which in his bent IS pure speculation, whereas at least I am going off of the information already out there for public consumption. Therein lies the major difference, and whenever I deign to point out that difference, he changes tactics back to arguing about how I argue again, which indicates he knows he is on a weak argument, is fighting just to fight, and it just flinging poo at a wall, hoping for something to stick.

Have you watched The Hateful Eight ?

There's this amazing scene where Samuel Jackson implies someone is lying, and the guy asks him "are you calling me a liar?" and he says "no, it seems like it, but i haven't done it yet", then much later he goes " NOW i am calling you a liar senor bob"

This is very much like that:

You aren't getting it. This has nothing to do with the quality or quantity of NVidia's engineers. This has to do with BUSINESS. If they have pascal all designed and components of that are already in production, do you have any idea as to the BUSINESS COST of having scrap those pieces already in production, go back and re-engineer the pascal plans, make sure it works, and then start the whole process over again? It would set them back MILLIONS, both in time cost, the people they paid, what they already did, and the sales they still will have gotten from Pascal even without async. It would be business SUICIDE.

Adding async isn't like sticking a stick or ram into a motherboard - there are complicated engineering feats that need to be handled, tested, verified, etc. The integration isn't plug and play dude.

Even if they added it to pascal, that would mean delaying pascal for 2-3 years, and likely scrapping Volta. And that just can't happen. No, they got caught with their pants down. And to correct it, from a business standpoint, they are better off selling Pascal (and likely Volta) as is, and given the R&D time, working on a hardware async solution for the next card past Volta, so that they can play catch up.

You are most definitely assuming nobody knew about 'async' and that, had they been smart enough to know, they'd have scrapped four or five years of R&D. Yeah. All for 'async'

Silly nvidia, shoulda just bought some async and made some space for it on the die

Cause async is a thing you add, and not a programming paradigm

razor1 said:
Then explain this Zion, why in games that use DX11 with cuda is there concurrent kernal execution and async compute happening just like it should be happening. Its not even DX12 here we are talking about. nV has the ability to do it with DX11!

Did you know that it works with CUDA and DX11? or this a fact you seemed not to look at to make your assumption. If you want to see it work, start up a game with CUDA and DX11 and fire up a profiler and see the magic.

and I can doubly, triply, even quadruply confirm asynchronous execution, concurrent execution, and even asynchronous concurrent execution work in CUDA

best case scenario nvidia can bring gmu back to life, but i don't think it'll make a 10% difference, unless the code is designed for GCN and stalls the nvidia pipeline anyway, but that would be a solution to a problem caused by the solution to the problem wouldnt it ?

Mahigan · Apr 4, 2016

razor1 said:
Then explain this Zion, why in games that use DX11 with cuda is there concurrent kernal execution and async compute happening just like it should be happening. Its not even DX12 here we are talking about. nV has the ability to do it with DX11!

Did you know that it works with CUDA and DX11? or this a fact you seemed not to look at to make your assumption. If you want to see it work, start up a game with CUDA and DX11 and fire up a profiler and see the magic.

Are they doing it with DX11? If so then how can it be that Kepler can run PhysX when it does not support mixing Graphics and Compute together in parallel? Is Kepler executing concurrently instead? Most probable. Maxwell appears to execute in parallel, however, from the graphs I've seen from Batman Arkham Assylum.

Now this begs other questions. The degree of PhysX, in these titles, is minimal and the FPS hit is still quite noticeable. Why such a large hit to the frames per second with PhysX enabled? And why such a large hit from minimal effects?

And this brings us back to several events which occurred back in 2015 while Oxide were working on AotS.

1. At first, NVIDIAs driver fully exposed Asynchronous Compute + graphics. When Oxide went to use the feature, with their work loads, the result was an unmitigated disaster. Oxide opted by a vendor ID specific path for its shaders when running on NV hardware. NVIDIA removed the Async feature from their drivers.

2. NVIDIA re-enabled Asynchronous Compute + Graphics in their drivers but now the feature simply re-routes such tasks into sequential execution. Oxide mentioned that NV enabled the feature in their drivers and Sean Pelletier said that this was not true (when it is) and that in order to have Asynchronous compute + Graphics support you need app + driver support.

3. The scientific paper I linked you too shows that Maxwell SMs run out of local cache at 16 concurrent Warps and begin to spill over into the L2 cache. The L2 cache is already under stress as-is (hence the increase in size from GM204 to GM200 from 2 to 3 MB).

4. The Beyond3D tests showed that Maxwell can only handle small Asynchronous compute loads before the driver crashes. This is the EXACT driver crash that Dan Baker witnessed (see unmitigated disaster quote and explanation).

What can we discern from this? Well given the performance hit associated with running PhysX and given the Beyond3D test results and comments from Dan Baker of Oxide we can conclude that even if Maxwell enabled Asynchronous compute + graphics the net result would be an unmitigated disaster even in mild usage (AotS) scenarios.

Here's hoping that Pascal rectifies Maxwell's caching hierarchy and redundancy shortcomings.

Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.

Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD

As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.

If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic

).

Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.

From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.

I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.

Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?

In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,

)

--
P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.

- Dan Baker

razor1 · Apr 4, 2016

Mahigan said:
Are they doing it with DX11? If so then how can it be that Kepler can run PhysX when it does not support mixing Graphics and Compute together in parallel? Is Kepler executing concurrently instead? Most probable. Maxwell appears to execute in parallel, however, from the graphs I've seen from Batman Arkham Assylum.

Now this begs other questions. The degree of PhysX, in these titles, is minimal and the FPS hit is still quite noticeable. Why such a large hit to the frames per second with PhysX enabled? And why such a large hit from minimal effects?

Those minimal effects are quite large effects specially the interactive smoke. This affect wasn't even possible prior to this game and the top end graphics cards. Do you remember UE3 engine demoed this effect on Fermi, and just the effect alone a gun shooting through the smoke and the level which was pretty much a box with textures, brought the frame rates down to teens (might have had interactive water in that demo to)?

And this brings us back to several events which occurred back in 2015 while Oxide were working on AotS.

1. At first, NVIDIAs driver fully exposed Asynchronous Compute + graphics. When Oxide went to use the feature, with their work loads, the result was an unmitigated disaster. Oxide opted by a vendor ID specific path for its shaders when running on NV hardware. NVIDIA removed the Async feature from their drivers.

This is not what happened first, the Oxide dev stated it did not work even though nV's drivers were saying it had the functionality Oxide couldn't expose it.

2. NVIDIA re-enabled Asynchronous Compute + Graphics in their drivers but now the feature simply re-routes such tasks into sequential execution. Oxide mentioned that NV enabled the feature in their drivers and Sean Pelletier said that this was not true (when it is) and that in order to have Asynchronous compute + Graphics support you need app + driver support.

Then nV stated they disabled it in drivers

3. The scientific paper I linked you too shows that Maxwell SMs run out of local cache at 16 concurrent Warps and begin to spill over into the L2 cache. The L2 cache is already under stress as-is (hence the increase in size from GM204 to GM200 from 2 to 3 MB).

the work load would be split up by the driver based on the programmer telling what they need, so making a shader that doesn't go along with the architecture is just bad news to begin with.

4. The Beyond3D tests showed that Maxwell can only handle small Asynchronous compute loads before the driver crashes. This is the EXACT driver crash that Dan Baker witnessed (see unmitigated disaster quote and explanation).

This was different, the work load that was simulated is no where near real world results, as its way above anything any game would use, it went on an exponential increase of work load, not application in real world terms do that. And this is why the results can be used to show it sometimes works, but why its failing is can't be discerned in real world terms with AOTS, as the workload is way different and not realistic.

What can we discern from this? Well given the performance hit associated with running PhysX and given the Beyond3D test results and comments from Dan Baker of Oxide we can conclude that even if Maxwell enabled Asynchronous compute + graphics the net result would be an unmitigated disaster even in mild usage (AotS) scenarios.

Can't really put those together, although we can say there is a fall off point for Maxwell's architecture.

Mahigan · Apr 4, 2016

razor1 said:
Those minimal effects are quite large effects specially the interactive smoke. This affect wasn't even possible prior to this game and the top end graphics cards. Do you remember UE3 engine demoed this effect on Fermi, and just the effect alone a gun shooting through the smoke and the level which was pretty much a box with textures, brought the frame rates down to teens (might have had interactive water in that demo to)?

This is not what happened first, the Oxide dev stated it did not work even though nV's drivers were saying it had the functionality Oxide couldn't expose it.

Then nV stated they disabled it in drivers

the work load would be split up by the driver based on the programmer telling what they need, so making a shader that doesn't go along with the architecture is just bad news to begin with.

This was different, the work load that was simulated is no where near real world results, as its way above anything any game would use, it went on an exponential increase of work load, not application in real world terms do that. And this is why the results can be used to show it sometimes works, but why its failing is can't be discerned in real world terms with AOTS, as the workload is way different and not realistic.

Can't really put those together, although we can say there is a fall off point for Maxwell's architecture.

I have a class right now but in order to start answering you I'll start with Dan Baker's response:

Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.

Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD

As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.

If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic

).

Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.

From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.

I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.

Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?

In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,

)

--
P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.

It did work but conformance and performance was dreadful.

Ieldra · Apr 4, 2016

razor1 said:
Then explain this Zion, why in games that use DX11 with cuda is there concurrent kernal execution and async compute happening just like it should be happening. Its not even DX12 here we are talking about. nV has the ability to do it with DX11!

Did you know that it works with CUDA and DX11? or this a fact you seemed not to look at to make your assumption. If you want to see it work, start up a game with CUDA and DX11 and fire up a profiler and see the magic.

Mahigan said:
Are they doing it with DX11? If so then how can it be that Kepler can run PhysX when it does not support mixing Graphics and Compute together in parallel? Is Kepler executing concurrently instead? Most probable. Maxwell appears to execute in parallel, however, from the graphs I've seen from Batman Arkham Assylum.

Now this begs other questions. The degree of PhysX, in these titles, is minimal and the FPS hit is still quite noticeable. Why such a large hit to the frames per second with PhysX enabled? And why such a large hit from minimal effects?

And this brings us back to several events which occurred back in 2015 while Oxide were working on AotS.

1. At first, NVIDIAs driver fully exposed Asynchronous Compute + graphics. When Oxide went to use the feature, with their work loads, the result was an unmitigated disaster. Oxide opted by a vendor ID specific path for its shaders when running on NV hardware. NVIDIA removed the Async feature from their drivers.

2. NVIDIA re-enabled Asynchronous Compute + Graphics in their drivers but now the feature simply re-routes such tasks into sequential execution. Oxide mentioned that NV enabled the feature in their drivers and Sean Pelletier said that this was not true (when it is) and that in order to have Asynchronous compute + Graphics support you need app + driver support.

3. The scientific paper I linked you too shows that Maxwell SMs run out of local cache at 16 concurrent Warps and begin to spill over into the L2 cache. The L2 cache is already under stress as-is (hence the increase in size from GM204 to GM200 from 2 to 3 MB).

4. The Beyond3D tests showed that Maxwell can only handle small Asynchronous compute loads before the driver crashes. This is the EXACT driver crash that Dan Baker witnessed (see unmitigated disaster quote and explanation).

Interesting questions regarding kepler, but maxwell enabling kernel concurrency could explain the rift in performance to some extent

thanks for the link to that paper btw, i've been looking for it for weeks

edit: from what I can tell kepler also allows for compute kernel concurrency w/ graphics via the gmu, however only 1 execution slot is reserved for compute, 31 for graphics, this is the same for maxwell, however nvidia claims gm20x can lift the reservation dynamically

DX12 Multi engine capabilties of recent AMD and Nvidia hardware

razor1 · Apr 4, 2016

Mahigan said:
I have a class right now but in order to start answering you I'll start with Dan Baker's response:

It did work but conformance and performance was dreadful.

Yeah and that aligned with what I stated right lol

Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn't hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn't hold Ashes up as the premier example of this feature.

We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.

They didn't know it wasn't fully functional in drivers, than later on nV just disabled it altogether.

Guys i don't think kepler is capable of different queues concurrently, but then again I never tested or seen tests for this.......

Ieldra · Apr 4, 2016

razor1 said:
Yeah and that aligned with what I stated right lol

They didn't know it wasn't fully functional in drivers, than later on nV just disabled it altogether.

Guys i don't think kepler is capable of different queues concurrently, but then again I never tested or seen tests for this.......

I just read it again, kepler supports kernel concurrency across the board from what I've read, the difference comes when you go and consider specifically concurrent graphics/compute

Kepler GK110 introduced a new architectural feature called Dynamic Parallelism, which allows the GPU to create additional work for itself. A programming model enhancement leveraging this feature was introduced in CUDA 5.0 to enable threads running on GK110 to launch additional kernels onto the same GPU.

SMM brings Dynamic Parallelism into the mainstream by supporting it across the product line, even in lower-power chips such as GM107. This will benefit developers, because it means that applications will no longer need special-case algorithm implementations for high-end GPUs that differ from those usable in more power constrained environments.

This is the feature Ext3h references in his blog:

but I can't actually find any nvidia statements regarding lifting the reservation on execution slots dynamically in maxwell and gk110

razor1 · Apr 4, 2016

Ok I haven't really looked at everything Ext3h stated in his article, but knowing him I would believe what he stated, he definitely looked into many of the aspects and has quite a bit of knowledge of the topic.

CSI_PC · Apr 4, 2016

Ieldra said:
......

I completely disagree regarding your argument of 'usage'. As I've elucidated earlier, Ashes seems to be a very compute intensive game, and as a result the ingame performance seems to be a function of fp32 throughput.

In my case, on a 980ti, matching the 8.6Tflops on Fury X results in almost identical performance (1fps difference at 47fps, 8.45 vs 8.6tflops) with fury X having async enabled.

You could just as easily argue that at fp32 parity the 980ti is faster, and fury X needs async to catch up.

Async shaders are an advantage that translate directly into a 10% performance increase at best, an advantage for amd is not a disadvantage for nvidia.......

Although this is made more complicated by the fact Brent's recent Ashes benchmark shows that AMD FuryX is faster than 980ti even with DX11 and also with 390x compared to 980.
Considering how poor AMD's DX11 driver implementation is compared to NVIDIA, it raises some questions what exactly is going on; whether it relates to issue with NVIDIA driver development,Oxide developers (IMO they have more of an interest with AMD), or a mix of both.
Another consideration is just how fast NVIDIA cards were with the earlier version of the Oxide Nitrous engine used for the Star Swarm benchmark, so something is up IMO with the Ashes development in some way.
And yeah I appreciate Ashes would have a different post processing/rendering/shader solution but one would not expect NVIDIA to really lose out on DX11 if a game/benchmark is equally well optimised for both card manufacturers.

So I think this may limit somewhat any overall conclusions, whether the fault lies with NVIDIA/Oxide/or both.
Cheers

razor1 · Apr 4, 2016

That's a good point CSI_PC, there is a large disparity from the the public beta's and Star swarm and the release of the game and the unexpected performance disparity of DX11. I personally think nV didn't give a crap about this game, at least to the degree people thought they would, the steam numbers kinds speaks to that.

What I am surprised is, why isn't AMD bundling this game with their cards? Damn it is a very good marketing product from their point of view. Maybe they felt the same way too that this game won't sell that much?

Ieldra · Apr 4, 2016

razor1 said:
That's a good point CSI_PC, there is a large disparity from the the public beta's and Star swarm and the release of the game and the unexpected performance disparity of DX11. I personally think nV didn't give a crap about this game, at least to the degree people thought they would, the steam numbers kinds speaks to that.

What I am surprised is, why isn't AMD bundling this game with their cards? Damn it is a very good marketing product from their point of view. Maybe they felt the same way too that this game won't sell that much?

They are bundling it with their cards!

pendragon1 · Apr 4, 2016

Ashes of the Singularity

Free with a 380/x

razor1 · Apr 4, 2016

ah cool didn't know that, that is a smart move.

CSI_PC · Apr 4, 2016

razor1 said:
That's a good point CSI_PC, there is a large disparity from the the public beta's and Star swarm and the release of the game and the unexpected performance disparity of DX11. I personally think nV didn't give a crap about this game, at least to the degree people thought they would, the steam numbers kinds speaks to that.

What I am surprised is, why isn't AMD bundling this game with their cards? Damn it is a very good marketing product from their point of view. Maybe they felt the same way too that this game won't sell that much?

Yeah I agree..

Good point on the bundling, I see others mention it is bundled but I wonder if that pertains to regions and also product model, because looking here in UK I notice that policy does not seem universal.
Like you this would be a superb game to highlight "DX12 is great on AMD" marketing narrative, although I wonder if they feel a strategy game would not be mainstream enough (would be strange to think that considering how well Starcraft games do).

Cheers

primetime · Apr 4, 2016

For those of us that had artifacts in DX12 with are GCN1 cards (7970/280's) i can report the new driver that came out today fixes the issues Crimson 16.4.1 AMD Radeon Software Crimson Edition 16.4.1. lol i reported the issue last night and today we have a working driver

noko · Apr 4, 2016

Dan Barker indicated that AoTs uses a minor amount of Async Compute capability or usage for AMD hardware and they saw good performance increases. Then indicated that the console developers are seeing up to 30% performance increases on upcoming games which I take will make it to the PC. 30% is a rather big difference for a render path and would be a game changer between AMD and Nvidia if those numbers carry over.

AMD demo had a 46% increase using Async Compute with their hardware, of course a synthesize restrictive benchmark. The potential is there for certain scenarios for big gains.

As for level 12.1, conservative rasterization; conservative rasterization has been around for awhile and was done using the geometry shader back in the 6800 days. I am sure AMD can do conservative rasterization where needed (collision detection, shadows using ray tracing techniques, depth ques for like fog) with prior methods if need be. I just don't have a clear feel for how that would affect performance since you would not be able to use the compute units in AMD but the graphic command processor instead with four available geometry shaders. Unless a direct compute method could be used, I do not know. The Ego 4 engine uses CR and is DX12 capable so maybe this will be seen later this year the importance of level 12.1 and what benefits it can provide.

razor1 · Apr 4, 2016

the question is how under utilized are AMD cards, if its 30% or more, that is pretty bad coding don't you think? Even on consoles, its very hard to believe that the GPU's are that underutilized by bottlenecks or poor code. Specially when you look at the crap GPU they have in there..... (not saying its crap as in bad, just crap as in compared to what we got on PC's).

Yeah CR's can be done on older hardware with other techniques, but there are penalties, quality and performance, quality hard to see but its there.

pendragon1 · Apr 4, 2016

primetime said:
For those of us that had artifacts in DX12 with are GCN1 cards (7970/280's) i can report the new driver that came out today fixes the issues Crimson 16.4.1 AMD Radeon Software Crimson Edition 16.4.1. lol i reported the issue last night and today we have a working driver

sweet! I told we just had to wait a couple days!
I just reran tests at 1440p(high with AA, usually have it off) this aft while fiddling with my system. just aboot to install these and see if there is any other improvement beside fixing the glitches. it was previously mentioned that AOTS did like GPU OCs and I thought the glitches we were seeing were from my OC so that's why I backed it off. gonna put it back to "normal" and see what it does...

edit: something I just noticed that people keep doing in this thread and other is incorrectly noting the feature level designators. this could be part of all the confusion and arguing over the last couple days.
people keep typing "level 12.1 or 11.0" but the correct was is FL 11_0 and FL 12_0. the Direct3D runtime numbers are designated by the "." ie: D3D12.0, D3D11.0 or D3D11.3 see here: Feature levels in Direct3D - Wikipedia, the free encyclopedia

edit 2: installed fine, just over top like I normally do. AND my long ass boot delay is now gone!

17seconds · Apr 4, 2016

So little mention that Ashes of the Singularity is straight out of the AMD development loop. Prior to this implementation, the engine was used by Oxide for the Star Swarm demo as the very first showcase for AMDs Mantle technology. Predictably, Ashes of the Singularity indeed does favor AMD cards and promotes the one and only aspect of DirectX 12 where AMD has an advantage. AMD optimizations have been heavily baked into this game engine from the beginning, so the results need to be filtered through that lens. This is a best case scenario for AMD.

To place so much emphasis and to prognosticate over the future based on performance in these one or two AMD Gaming Evolved titles is taking it a bit too far. In fact, the mere mention of Mantle begins to remind me of how this whole discussion could end up seeming like deja vu. If only a few AMD titles end up adopting Async Compute, the results will likely be the same as Mantle.

Ashes of the Singularity Day 1 Benchmark Preview @ [H]

I Promise to RTFM

[H]F Junkie

I Promise to RTFM

2[H]4U

[H]F Junkie

2[H]4U

I Promise to RTFM

[H]F Junkie

[H]F Junkie

[H]ard|Gawd

I Promise to RTFM

2[H]4U

[H]ard|Gawd

[H]F Junkie

I Promise to RTFM

2[H]4U

I Promise to RTFM

[H]F Junkie

2[H]4U

Gawd

[H]F Junkie

I Promise to RTFM

Limp Gawd

[H]F Junkie

Limp Gawd

I Promise to RTFM

[H]F Junkie

I Promise to RTFM

[H]F Junkie

2[H]4U

[H]F Junkie

I Promise to RTFM

Extremely [H]

[H]F Junkie

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

[H]F Junkie

Extremely [H]

n00b