AotS officially launched today, where is Nvidia's async driver?

TaintedSquirrel

[H]F Junkie
Joined
Aug 5, 2013
Messages
12,688
Massive scale real-time strategy game, Ashes of the Singularity released today
Today, Stardock released its highly anticipated real-time strategy game, Ashes of the Singularity. Set in a future where humanity has expanded into the stars and is now in conflict for control of key planets in the galaxy, players wage planetary warfare on a massive scale to conquer opponents and control worlds. In Ashes of the Singularity players command vast forces as they fight for control over the galaxy’s most prized resource, Turinium.

Sean Pelletier on Twitter (Feb 26)
Fun FACT of the day: Async Compute is NOT enabled on the driver-side with public Game Ready Drivers. You need app-side + driver-side!
 
Why does NV PR thinks it's funny (FUN FACT?!) to lie about DX12 support?

WTF is that, on their official page from 2014 when Maxwell debut, they list Async Compute has a key feature Maxwell supports.

They even told AnandTech's Ryan Smith that Maxwell was different to Kepler on the architecture, because it supports graphics + compute in parallel. Pure lies.

Lying to consumers is funny?
 
Why does NV PR thinks it's funny (FUN FACT?!) to lie about DX12 support?

WTF is that, on their official page from 2014 when Maxwell debut, they list Async Compute has a key feature Maxwell supports.

They even told AnandTech's Ryan Smith that Maxwell was different to Kepler on the architecture, because it supports graphics + compute in parallel. Pure lies.

Lying to consumers is funny?
Add it to the list.
 
It isn't done by the drivers, it needs to be activated by drivers, It seems to work fine with multiple compute queues with cuda and and a graphics queue in directx, just with direct compute and directx doesn't. Which is what the confusion is about. Its like the GMU is just turned off with doing directcompute......

This is why I was saying doesn't matter about how many ACE's and what not, that wasn't the point, the point was, in certain circumstances (directcompute), nV's GMU is just not functioning and its falling back on doing everything serially.

Is this because there is a certain limitation in the hardware that direct compute can't avoid? No body really knows outside of nV, because in all other instances it seems to work.
 
Last edited:
It isn't done by the drivers, it needs to be activated by drivers, It seems to work fine with multiple compute queues with cuda and and a graphics queue in directx, just with direct compute and directx doesn't. Which is what the confusion is about. Its like the GMU is just turned off with doing directcompute......

This is why I was saying doesn't matter about how many ACE's and what not, that wasn't the point, the point was, in certain circumstances (directcompute), nV's GMU is just not functioning and its falling back on doing everything serially.

Is this because there is a certain limitation in the hardware that direct compute can't avoid? No body really knows outside of nV, because in all other instances it seems to work.

I'm not sure it would work well even if activated given that Maxwell SMs run out of local cache resources in between 16-32 concurrent Warps.
3BiXPRJ.jpg


Maxwell thus begins to spill into L2 cache more quickly than Kepler and even Fermi.
 
where is that picture from?
A University led study published a short while ago. Let me find the link for ya.

It also highlights some other interesting aspects of the Maxwell architecture. Like how it appears to have shrunk its memory controller logic, compared to Kepler, resulting in global memory efficiency of only 69% of the theoretical bandwidth.

This is why I am of the view that GP100 (or GP102 if NV withhold GP100 from the consumer market) should prove to be a monster performance wise. Maxwell is memory bandwidth starved (particularly the ROPs).

Let me find the study for you.
 
Yeah that is probably not it, as it shows that Maxwell has a higher % of maximum bandwidth use when using shared memory over the other two cards even though it takes a hit going from 16 to 32 warps, its still higher. Its actually much higher.
 
Yeah that is probably not it, as it shows that Maxwell has a higher % of maximum bandwidth use when using shared memory over the other two cards even though it takes a hit going from 16 to 32 warps, its still higher. Its actually much higher.

Yes, it is much higher than Kepler and Fermi but the question is... How does it perform compared to GCN. If the Beyond3D tests, which you commented on, are any indication it is that Maxwell's performance falls rather quickly compared to GCN based on the amount of concurrent kernels being executed.

It seems that a Maxwell SMM hits its peak at 512 concurrent threads even though it can technically handle up to 2,048. With 22SMs in a GTX 980 Ti that means a performance peak at 11,264 threads (352 Warps) and a maximum of 45,056 threads (1,408 Warps).

AotS, appears to highlight this as well. By hammering concurrent compute executions... Maxwell is brought down to its knees when compared to GCN.

To me, this illustrates GCNs more robust caching system and redundancy. This isn't that surprising when you consider the relative power usage of GCN.

Anyway, none of this means anything right now but it probably highlights just in which direction NVIDIA are headed with Pascal and Volta in terms of architectural improvements.
 
Last edited:
hmm I don't think that is correct train of thought, you can't really draw a direct parallel from two different data sets like that, unless there is a direct relationship between those data sets.

I don' think the caching structure when doing different things has much to do with the scheduling in AOTS.
 
hmm I don't think that is correct train of thought, you can't really draw a direct parallel from two different data sets like that, unless there is a direct relationship between those data sets.

I don' think the caching structure when doing different things has much to do with the scheduling in AOTS.

True,

But what we do know is that originally, AotS attempted to run Asynchronous compute + graphics on Maxwell using the exposed driver feature. The result was an unmitigated disaster. Maxwell dropped performance into single digit-land. This prompted Oxide to code a separate path for the shaders when it came to NV hardware. This was all leaked to us over at overclock.net by Dan Baker. It also prompted his infamous quote "AFAIK, Maxwell doesn't support Asynchronous compute + graphics so I don't know why the driver was trying to expose that feature".

The Beyond3D tests mirrored those results.

We were told to wait, by NVIDIA, and that was over 8 months ago. Now the game has launched and we still don't have Asynchronous Compute + Graphics driver support from NVIDIA.

Are we going to wait until Volta before we come to any sort of conclusions on Maxwell?

Not me, I have concluded that Maxwell is incapable of said features. I'm not sure if Maxwell cannot out right support Asynchronous Compute + Graphics under DX12 or if it is due to the enormous performance drop initially identified by Oxide once the feature is turned on.

The latter is a possibility when we look at the caching capabilities within a Maxwell SM, as is highlighted in that study I linked. More than 16 concurrent Warps over flows the SMs local cache and spills into the L2 cache. The L2 cache is also used for other aspects of Maxwell's pipeline. This extra strain on the L2 cache, when Maxwell is busy with Graphics rendering tasks, would lead to an "unmitigated disaster".

I think that rather than admitting to this architectural limitation, NVIDIA are simply buying for time. I do not think we will ever see a driver fix for this issue. At this point, I believe, that NVIDIA will leverage Pascal, followed by Volta, in order to tackle this issue. I think that a more robust caching hierarchy is what we will see in NVIDIAs upcoming GPU architectures. At that point, we may see the feature enabled for Pascal, or most likely, Volta GPUs.

Does Asynchronous compute + graphics matter? Well let me ask this question... Does SMT (Hyperthreading) matter for CPUs? I'd answer that it does and will for all future GPU architectures. I am of the opinion that this feature is here to stay.
 
well that is the issue concurrency, but it definitely works with CUDA and any version of Direct X, 11 or 12. Just profile any CUDA based game and you will see the different queues running concurrently. So will this problem be still in Pascal, I think it can be mitigated pretty easily since it is working already in some games using CUDA. There seems to be something that is holding back Maxwell 2 from doing the same with direct compute. I don't know what it is nor I don't think there is enough information out there to infer where the problem is at least not yet. So maybe nV is trying to get the drivers to get it to work, and its not on their top priority (which at this point, I do have a strong feeling they probably can't get it to work because of the time its been when they first stated it and also it should be on their top priority lists)

I agree the feature is a nice to have, but I would rather have better utilization without being hampered by creating more paths lol. That would be the best way to go, but if that is unavoidable, yes async shading is better to have than nothing at all.
 
I would have thought by now nvidia are working very closely with the AOTS devs to ensure the game runs well on nvidia hardware, async or not.

Here's what's gunna happen:

The AMD camp will carry on banging on and on about the supposed importance of async compute. Then the next gen of cards will come along, nvidia will win across the board as always (yes always, bigger R&D budget, period), and all the AMD fanboys will then finally shutup when they realise support of a specific API feature means fuck all compared to overall raw graphical performance.

It's not like it's an option like anti-alising where you can toggle it and see the effect. It won't be exposed to the user. Most people won't even be aware of it. But ALL people will be aware of the framerate, which is ultimately all that counts.


Can't wait to watch this play out...
 
Here's what's gunna happen:

Nvidia will release the GTX 1080. The Nvidia camp will all brag about buying their $800 video cards. A few months later Nvidia will release their latest version of the Titan for $1,000+ and Nvidia fanboys will all dump their GTX 1080's in favor of Titan's. A few months after that Nvidia will release the GTX 1080Ti for $800 and the Nvidia camp will once again dump their cards in favor of Nvidia's latest and greatest. In the meantime, those who bought high end AMD cards will be still enjoying good performance from their $550 cards and will all be quietly snickering at the Nvidia fanboys getting trolled by their beloved company.

Can't wait to watch this play out...
 
Here's what's gunna happen:

Nvidia will release the GTX 1080. The Nvidia camp will all brag about buying their $800 video cards. A few months later Nvidia will release their latest version of the Titan for $1,000+ and Nvidia fanboys will all dump their GTX 1080's in favor of Titan's. A few months after that Nvidia will release the GTX 1080Ti for $800 and the Nvidia camp will once again dump their cards in favor of Nvidia's latest and greatest. In the meantime, those who bought high end AMD cards will be still enjoying good performance from their $550 cards and will all be quietly snickering at the Nvidia fanboys getting trolled by their beloved company.

Can't wait to watch this play out...
AMD's failure to compete with flagships is one of the HUGE problems they've had over the years and led to their current situation.
Trickle down -- When Nvidia releases 3 flagships over the same span of time where AMD releases zero, it's going to cost them a lot of customers across the board.

How many 970's has Nvidia sold due to the 980 Ti's popularity? Or 960's thanks to the 970's popularity? If all the money is in the "best of the best" tier, and AMD is only competing in the "good value" tier, they will continue to suffer. The market has spoken.
 
AMD's failure to compete with flagships is one of the HUGE problems they've had over the years and led to their current situation.
Trickle down -- When Nvidia releases 3 flagships over the same span of time where AMD releases zero, it's going to cost them a lot of customers across the board.

How many 970's has Nvidia sold due to the 980 Ti's popularity? Or 960's thanks to the 970's popularity? If all the money is in the "best of the best" tier, and AMD is only competing in the "good value" tier, they will continue to suffer. The market has spoken.
I agree, to a point. Flagship halo effect does sell cards. But other factors need to be taken into consideration as well. AMD would probably be doing much better today if they had initially released the HD 290X with a more effective, quieter cooler. The HD 290X should have been the top choice for most gamers if it hadn't gotten that black eye right out of the gate. They were their own worst enemy there.

In addition, Nvidia has done a number of underhanded things over the years that have hurt AMD. The 970 that you mentioned had Nvidia lying about the specs for four and a half months after launch. In addition, they didn't disclose the 3.5GB + 0.5GB gimped memory architecture. But the reviews had already been published and people had already purchased them by the time the truth came to light. Charging $1,000 for Titan X and then coming out with the GTX 980Ti months later after having milked the early adopters. Blocking PhysX from working on AMD cards. Batmangate. Gameworks. Bumpgate. It's quite the list.

It's my opinion that there are more than a few Nvidia customers out there who are getting fed up with Nvidia's antics. Certainly GTX 780, original Titan and GTX 780Ti owners are looking at their purchases and are wondering why they forked out more money for their Nvidia cards while the much cheaper HD 290X is faster than all three of them with today's games. So that's one AMD flagship spanning three Nvidia flagships, yet still winning in the end. The HD 290X has certainly proven to be one of the best bargains in video gaming. Can the same be said of the 780/Titan/780Ti?

Nvidia announced eight months ago that their hardware was capable of Asynchronous Compute and that they were working on releasing an AS capable driver. Eight months. So where is it? If Nvidia is such a market leader, certainly they should be able to follow through by now. Right? Or is this a repeat of Nvidia's PureVideo debacle where they announced hardware video acceleration on the NV40/NV45 but couldn't deliver. It took Nvidia eight months there as well to finally admit that their hardware was flawed and couldn't perform all of the functions they claimed it would.

Yes, Nvidia is a market leader. But they seem to be stumbling as of late. It's up to AMD now to capitalize on that. But they'll have to pull it off flawlessly this time if they want to regain market share. Drivers, power consumption, performance, sound levels, features, availability and pricing will all need to be perfectly balanced. They really need to get this one right on the first try.
 
Of course Nvidia would string folks along about a fix when most likely none was possible. They didn't want to hurt sells. 8 months sold a lot of cards. Over and over and over again. Now with that type of trend (and being successful at it) how likely will Nvidia do that again being not forth right with their hardware? Frankly I like more up front honesty in a company.
 
Of course Nvidia would string folks along about a fix when most likely none was possible. They didn't want to hurt sells. 8 months sold a lot of cards. Over and over and over again. Now with that type of trend (and being successful at it) how likely will Nvidia do that again being not forth right with their hardware? Frankly I like more up front honesty in a company.
It's not like AMD is any better in regards to how Fiji was or FreeSync. Even though I would like Nvidia to address the performance regarding Ashes, it is by no means a major hit to them. It's nice that the population has increased since the Beta days, but the diversity is still rather low for a game that gained so much press.

As it stands now Red Orchestra 2 has more players than Ashes, which is sort of sad to me considering people view Ro2 as an outdated game with "bad graphics" compared to the likes of Battlefield / CoD.
 
It's not like AMD is any better in regards to how Fiji was or FreeSync. Even though I would like Nvidia to address the performance regarding Ashes, it is by no means a major hit to them. It's nice that the population has increased since the Beta days, but the diversity is still rather low for a game that gained so much press.

As it stands now Red Orchestra 2 has more players than Ashes, which is sort of sad to me considering people view Ro2 as an outdated game with "bad graphics" compared to the likes of Battlefield / CoD.
And that is where you miss what the debate is. Most see this not as a must have game but a game that shows a possible trend and that is what worries some. Granted Nvidia still does well enough but AMD has quite the commanding performance boost. If and I mean IF this continues as the trend that is what concerns most, be them AMD or Nvidia users. So I wouldn't be to quick to write off this game as an outlier, especially as Hitman is showing the same results across most all tiers of cards (390X equaling the 980Ti in DX12, prob with async on).
 
It's not like AMD is any better in regards to how Fiji was or FreeSync.
How so? The only thing I can think about Fiji was Joe Macri's comment that it would be an "overclocker's dream", which it wasn't. That can hardly compare to Nvidia themselves publishing better specs for the GTX 970 than it actually contained and then lying about it for four and a half months. And also not disclosing its gimped 3.5GB + 0.5GB memory configuration.

As far as Adaptive-Sync (FreeSync), it was delivered almost exactly when AMD said it would and is now an industry standard. Any manufacturer who wants to can implement Adaptive-Sync for free in their hardware, including Nvidia.
 
  • Like
Reactions: Zuul
like this
And also not disclosing its gimped 3.5GB + 0.5GB memory configuration.

I don't get the complaint over this. Did the card suddenly lose performance when the internet found out about this configuration? People were salivating over its performance, especially at its price.
 
I don't get the complaint over this. Did the card suddenly lose performance when the internet found out about this configuration? People were salivating over its performance, especially at its price.

I'm guessing it is Nvidia approach to this whole charade. When it was first reported the Nvidia forums had solutions according to Nvidia that were going to be driver based, soon after it seems that they heard that it was not possible because of the hardware.

The last 500 megabytes are not able to keep up so the ram is not used the same way without slowdowns. Some have filed a class action lawsuit against Nvidia about the card memory "feature" not being disclosed.
 
I'm guessing it is Nvidia approach to this whole charade. When it was first reported the Nvidia forums had solutions according to Nvidia that were going to be driver based, soon after it seems that they heard that it was not possible because of the hardware.

The last 500 megabytes are not able to keep up so the ram is not used the same way without slowdowns. Some have filed a class action lawsuit against Nvidia about the card memory "feature" not being disclosed.
I was much more bothered by cache size being misrepresented

Still a really weird incident, I find it hard to believe they did this intentionally ; how the fuck were people not going to find out.

Many people who read the gm204 whitepaper would have expected quirks in memory partitioning due to disabled SMs
 
Many people who read the gm204 whitepaper would have expected quirks in memory partitioning due to disabled SMs

...And yet it took months to catch, even though tons of forums experts from both sides of the fence proclaim their deep understanding on all manners GPU performance related every day. Now those same people are trying to be "right" again by speaking on future AMD and Nvidia products/driver releases based on limited information and their newfound understanding of the future.
 
I don't get the complaint over this. Did the card suddenly lose performance when the internet found out about this configuration? People were salivating over its performance, especially at its price.
The issue was never completely the performance, that was just a small part most wont be affected by. It was the misrepresentation of the ram. Given the choice between the then AMD 290/X it would have lost out if it was 3.5Gb against 4.0Gb, so that was basically the issue that left a bad taste in most peoples mouth.
 
...And yet it took months to catch, even though tons of forums experts from both sides of the fence proclaim their deep understanding on all manners GPU performance related every day. Now those same people are trying to be "right" again by speaking on future AMD and Nvidia products/driver releases based on limited information and their newfound understanding of the future.

Wasn't the only reported 3.5gb ram issue in shadow of Mormon, and wasn't it fixed? I don't remember what exactly triggered the whole vram investigation, but I'm pretty sure it was fixed soon after.


People will always speculate, nothing new there
 
Wasn't the only reported 3.5gb ram issue in shadow of Mormon, and wasn't it fixed? I don't remember what exactly triggered the whole vram investigation, but I'm pretty sure it was fixed soon after.


People will always speculate, nothing new there

It was also shown in Dragon Age and other games. Specially games at 4k. That is why I sold 1 970 because SLI 970's couldnt cut it at 4k because of the memory issue.
 
Wasn't the only reported 3.5gb ram issue in shadow of Mormon, and wasn't it fixed? I don't remember what exactly triggered the whole vram investigation, but I'm pretty sure it was fixed soon after.


People will always speculate, nothing new there
At the forefront of the debacle was Goldentiger a diehard Nvidia fan boy and he was posting his findings. At first he thought it was his monitor so he got another and the issue persisted. Then either he discovered the VRAM wall @3.5Gb or he saw the early reports and tested. He was very unhappy and after giving his opinion and expressing his disdain with Nvidia and his SLI 970s, he disappeared.
 
Back
Top