Maxwell DOES support Asynchronous Shaders

Stoly · Sep 3, 2015

According to WCCFTech

http://wccftech.com/nvidia-amd-directx-12-graphic-card-list-features-explained/

Tgrove · Sep 3, 2015

Could have just posted this right in the other thread

Gideon · Sep 3, 2015

Supporting a feature and actually having it work right seem to be two different things. Maxwell may technically support it, but it runs like garbage on it.

Quix · Sep 3, 2015

I think it's funny that anyone is even talking about this, there aren't even any games that support DirectX 12 at this point. Isn't real world performance the only thing that really matters here?

Tgrove · Sep 3, 2015

Quix said:
I think it's funny that anyone is even talking about this, there aren't even any games that support DirectX 12 at this point. Isn't real world performance the only thing that really matters here?

Yes there is. Ark survival evolved is the first dx12 game and I think it's already out. They tout a 20% performance boost and it's one of the most popular games steam has ever seen

Airbrushkid · Sep 3, 2015

I don't know. Metal Gear Solid V: The Phantom Pain a non DX 12 game is kicking it's ass over 70,000 players playing right now and 40,000 players play Ark.

Tgrove said:
Yes there is. Ark survival evolved is the first dx12 game and I think it's already out. They tout a 20% performance boost and it's one of the most popular games steam has ever seen

TaintedSquirrel · Sep 3, 2015

It's software emulated, apparently.

https://www.reddit.com/r/pcgaming/comments/3jfgs9/maxwell_does_support_async_compute_but_with_a/

The performance gains are incredibly small. And it can potentially cause performance loss if overhead becomes a problem. That would explain the tiny gains in the B3D benchmark and the stability problems the Oxide dev mentioned in AotS. Dunno how Nvidia could possibly respond to this situation. I can't think of anything they could say that would appease people aside from a product recall.

Should really be rephrased to "Nvidia only supports software-level async".

polonyc2 · Sep 3, 2015

Stoly said:
Maxwell DOES support Asynchronous Shaders

I don't believe Nvidia...I think they're in spin mode trying to make the best of a bad situation (much like the 3.5GB VRAM issue)

Cool Vibrations · Sep 3, 2015

polonyc2 said:
I don't believe Nvidia...I think they're in spin mode trying to make the best of a bad situation (much like the 3.5GB VRAM issue)

Nvidia's new modus operandi:

Release broken hardware
Make a software work-around
Refer to the software workaround that doesn't actually fix the problem as a new feature, optimization, or other meaningless buzzword

Nvidia's most recent drivers are emulating the missing function. (Refer to 2) The gains are small and does not fix this missing function (Refer to 3). There is a big difference between software emulation and actual hardware supporting it.

So, in essence Maxwell still doesn't support it.

polonyc2 · Sep 3, 2015

Cool Vibrations said:
Refer to the software workaround that doesn't actually fix the problem as a new feature, optimization, or other meaningless buzzword

lol...yes they made the missing VRAM issue into a new feature...3.5GB plus an extra 0.5GB of new segmented memory based on heuristic patterns...this amazing new feature is only available for GTX 970 owners!

RobertR1 · Sep 3, 2015

As an nvidia buyer since 8800gtx days, I'm leaving towards them doing async as a checkbox item.

Similar to how tessellation was a checkbox item done by software with severe penalties before future generations had dedicated hardware reserved for it. How much async get used is the real question.

YeuEmMaiMai · Sep 3, 2015

wow OP

how much were you paid to post that?

just curious as you seemed not to have read it.

MaZa · Sep 3, 2015

RobertR1 said:
As an nvidia buyer since 8800gtx days, I'm leaving towards them doing async as a checkbox item.

Similar to how tessellation was a checkbox item done by software with severe penalties before future generations had dedicated hardware reserved for it. How much async get used is the real question.

Considering consoles are going to use it/already using it due to the performance boost it gives safe bet is that its going to be common once DX12 gets popular. Unless PC simply gets singled out on purpose.

DPI · Sep 3, 2015

Wake me up when "ASYNC shaders" is something more meaningful and demonstrable than forum masturbation.

Ticker305 · Sep 3, 2015

and my car can run on moonshine

razor1 · Sep 3, 2015

Ticker305 said:
and my car can run on moonshine

Funny thing is it probably can, have you see how combustible pure moonshine is? Tell ya something its smooth going down but its more volatile than Bacardi 151. probably destroy your engine though.

StormClaw · Sep 3, 2015

Stoly said:
According to WCCFTech

http://wccftech.com/nvidia-amd-directx-12-graphic-card-list-features-explained/

Nvidia GPUs actually have 0 hardware support for DX12, they just try to fake it through their drivers

Deleted member 83233 · Sep 3, 2015

DPI said:
Wake me up when "ASYNC shaders" is something more meaningful and demonstrable than forum masturbation.

Me too. I'm gonna lay down back, 'cause our concussion had me sleepy...

Hmph... forumsturbation... phooey

razor1 · Sep 3, 2015

StormClaw said:
Nvidia GPUs actually have 0 hardware support for DX12, they just try to fake it through their drivers

Not possible, something has to be creating the queue, to create the queue the processor has to analysis the shaders, break them down to individual instructions and then parse them through the scheduler is a very complex task for the CPU to do, and if it isn't the GPU its the CPU and that would easily be noticeable with the latency numbers as it will increase at a much steeper plot, just from the communication of the GPU and CPU we don't even need to look at the time it would take the CPU to do the other steps.

Sycraft · Sep 3, 2015

DPI said:
Wake me up when "ASYNC shaders" is something more meaningful and demonstrable than forum masturbation.

Hey, that's the most important thing: Bragging to a bunch of strangers about how your hardware is supposedly better than their hardware. Not enjoying playing games, I mean who buys gaming hardware for that?

It has always been funny to me the GPU pissing matches that go on. Just buy a GPU that works for you and be happy.

The arguing about future stuff is even funnier. Why would I care about how any of the current cards are going to perform on games in the future? I'll get a new card then. That's half the fun of PC gaming> New toys all the time

Deleted member 83233 · Sep 3, 2015

Sycraft said:
Hey, that's the most important thing: Bragging to a bunch of strangers about how your hardware is supposedly better than their hardware. Not enjoying playing games, I mean who buys gaming hardware for that?

It has always been funny to me the GPU pissing matches that go on. Just buy a GPU that works for you and be happy.

The arguing about future stuff is even funnier. Why would I care about how any of the current cards are going to perform on games in the future? I'll get a new card then. That's half the fun of PC gaming> New toys all the time

Yep.

StormClaw · Sep 3, 2015

razor1 said:
Not possible, something has to be creating the queue, to create the queue the processor has to analysis the shaders, break them down to individual instructions and then parse them through the scheduler is a very complex task for the CPU to do, and if it isn't the GPU its the CPU and that would easily be noticeable with the latency numbers as it will increase at a much steeper plot, just from the communication of the GPU and CPU we don't even need to look at the time it would take the CPU to do the other steps.

you're wrong

http://vocaroo.com/i/s0moE1u9QsCA

Deleted member 83233 · Sep 3, 2015

StormClaw said:
you're wrong

http://vocaroo.com/i/s0moE1u9QsCA

I see you keep linking obscure forum posts, sites, and such. That doesn't really prove anything. I have a Maxwell card, and couldn't care too much less about the DX12 features it does or doesn't support because I will buy a new card by the time I'm fully using DX12. So I don't even really care about any of this other than passing curiosity. (and just to disclose, I'm all for AMD doing something right, and if their async shader setup works better right now, good for them) I still haven't seen anything that I'd consider solid proof yet on how NVs hardware does or doesn't do this. We see hints of it, we hear hints of it, but I don't see any proof or evidence that I'd consider compelling yet.

StormClaw · Sep 3, 2015

J3RK said:
I see you keep linking obscure forum posts, sites, and such. That doesn't really prove anything.

As opposed to you posting factious Nvidia press release on the matter that clears everything up.

Oh wait...

Deleted member 83233 · Sep 3, 2015

StormClaw said:
As opposed to you posting fatuous Nvidia press release on the matter that clears everything up.

Oh wait...

I'm not posting anything, because quite frankly there isn't much to post at the moment. Also, I didn't say Nvidia would be the one to clear everything up. Also, even if they do post something that's clearly bullshit, then that would tend to clear things up in a different way. But... That hasn't happened, nor has anyone else so far. Based on the idle(ish) curiosity I mentioned above, I would actually like to know what's up. Not that it will change my opinion at all about the hardware I'm running at THIS moment in time, but I still like to know things. I was just saying that what you just posted doesn't really provide that any better. So far there is speculation, pure garbage spewing, educated guessing, and I'm sure other stuff in there too. I don't see what the point is in further pressing the issue until someone comes up with something a bit more concrete.

StormClaw · Sep 3, 2015

J3RK said:
I'm not posting anything, because quite frankly there isn't much to post at the moment.

There isn't much to post because Nvidia knows they're fucked. That's why they've been dead silent, while being rekt on tech sites all over the web.

TaintedSquirrel · Sep 3, 2015

StormClaw said:
you're wrong

http://vocaroo.com/i/s0moE1u9QsCA

Starts off great but he really goes off the deep end about halfway through.
It looks like he's reading footnotes from an AMD fanboy's wet dream.

I'd like to know who that is so I can cross them off my list of people to listen to, eh.

Deleted member 83233 · Sep 3, 2015

TaintedSquirrel said:
I'd like to know who that is so I can cross them off my list of people to listen to, eh.

You could probably safely do that already.

tybert7 · Sep 3, 2015

StormClaw said:
you're wrong

http://vocaroo.com/i/s0moE1u9QsCA

Thank you for providing that subdued, measured, and neutral analysis. I enjoyed it.

razor1 · Sep 3, 2015

StormClaw said:
you're wrong

http://vocaroo.com/i/s0moE1u9QsCA

who ever that guy is, man what is his level of education, redneck lol.

DPI · Sep 3, 2015

razor1 said:
who ever that guy is, man what is his level of education, redneck lol.

thats actually stormclaw

TaintedSquirrel · Sep 3, 2015

DPI said:
thats actually stormclaw

Why would he record an audio clip of his ramblings rather than just ramble in text?

razor1 · Sep 3, 2015

if it is he should be banned for doing that, poor excuse for member of this forum.

jwcalla · Sep 3, 2015

StormClaw said:
you're wrong

http://vocaroo.com/i/s0moE1u9QsCA

I lasted about a minute and heard about 50 f-bombs and the dude wasn't even angry. A sure sign of a low IQ.

iczerjones · Sep 3, 2015

I mostly just lurk on these forums (have for many, many years - love this place), but I had to post on this one. So, I write some cuda/ocl code which is (drumroll) - compute.

I'm a little confused by this whole debacle. Async memory/kernel transfer and even async concurrent compute isn't anything new. Hell, you could run concurrent async operations back on the Tesla C1060 with cuda 2.x!

Now, it is quite true that GCN's scheduler is quite a bit more robust, but to claim that Nvidia doesn't support async operations at a hardware level shows a distinct misunderstanding of what a GPU even is. That said, Nvidia could definitely benefit from putting some extra effort behind beefing up the scheduler pipeline to catch up a bit and really let things shine.

I don't really have much of a dog in the race. I'll buy whatever is working best at the time I need a new card, so I just wanted to express my confusion / amusement at all of this. By all means, if I'm fundamentally misunderstanding the issue myself, let me know, I'd appreciate being less confused. =)

I hate to link to something with any bias at all, but this OP did a great job of collecting and analyzing all of the recent news / data / etc, and they even have great analysis of a simple DX12 synthetic bench someone through together to show scaling from 1 to 128 threads, concurrent, You can see, quite clearly, that Maxwell (and every prior Nvidia card all the way back to Tesla-core) supported async compute. It isn't new.

https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

Again, sorry for reddit as a source, but the material is pretty comprehensive.

** Edit: Ok, after a bit more reading I see what the beef is. Folks are upset that they aren't seeing pure 100%, integrated asymmetrical overlap time savings when using DX12's async calls. I could see where that might be distressing at first blush and frankly, the forced synchronize overlap is horrendous on the Maxwell cards. However, given all the kerfuffle pointed at Nvidia, outside of the awesome gains of the AMD 2xx/3xx boards, Fiji seems to be in much the same boat. Nvidia has a much larger gap to make up, but the fact that there are gains at all (that are statistically relevant in many cases) should point to this being a driver/API implementation issue, not hardware. Regardless, this topic once again has my interest and I don't find it embarrassing to read anymore. =D

TaintedSquirrel · Sep 3, 2015

As an aside: Is compute performance relative to the GPU it's running on?
I have to wonder if all of these % performance numbers we see being thrown around (Oxide cited 30%) are normalized to console performance.

For example, taking the HD7850(ish) in the PS4 and increasing its performance by 30% would make it jump from a GTX 660 to about a 660 Ti. In other words, a very huge performance boost on the PS4 is about half a video card tier on PC. The numbers are even smaller for Xbox One.

So I guess the question becomes: How does the performance benefit of async scale from consoles to PC? I assume the size and type of load would be the same on both platforms. Compute operations don't scale with resolution; sort of like CPU performance, I assume. Some of the lighting details (being handled asynchronously) might scale more at higher resolutions.

This might be why Nvidia decided to emulate it on the software side. If the compute operations are simplistic enough, modern CPUs will have no problem handling them on their own. AotS being a PC exclusive and explicitly designed to showcase async performance, would present a clear problem. Past Mantle benchmarks showed the most benefit to low-end CPUs, and virtually no benefit to higher-end i5's and i7's. Knowing how weak the consoles' CPUs are, it could be a similar case as what we saw with Mantle... It exists to alleviate some CPU bottlenecking issues. If that's the case, Nvidia assumed its implementation would be 'good enough' to handle console-based async features with no trouble.

Someone else linked an article about volumetric lighting in the new Tomb Raider game and it got me thinking. Although Oxide did mention other games will be using way more async computing, and their own implementation was "moderate". I have to wonder which games exactly they are referring to. It seems AotS is capable of crippling every card on the market to about 40fps or lower, both AMD and Nvidia.

I'm just having a hard time understanding how async capabilities on the 7770 and 7850 in consoles would be capable of wrecking Nvidia cards three times more powerful (serially).

Brent_Justice · Sep 3, 2015

I like this quote

I think I would like to quote Robert Hallock from AMD here:
“I think gamers are learning an important lesson: there’s no such thing as “full support” for DX12 on the market today. There have been many attempts to distract people from this truth through campaigns that deliberately conflate feature levels, individual untiered features and the definition of “support.”

I don't think it is any surprise either. By the time there are DX12 games, we will be on the next gen of GPUs.

It will be a battle between the two next gens.

Also, lowest common denominator, DX11 aint going away for a long time.

People expecting DX12 support on older generational hardware is not logical. You need new hardware and software for support. It has always been this way.

razor1 · Sep 3, 2015

TaintedSquirrel said:
As an aside: Is compute performance relative to the GPU it's running on?
I have to wonder if all of these % performace numbers we see being thrown around (Oxide cited 30%) are normalized to console performance.

For example, taking the HD7850(ish) in the PS4 and increasing its performance by 30% would make it jump from a GTX 660 to about a 660 Ti. In other words, a very huge performance boost on the PS4 is about half a video card tier on PC. The numbers are even smaller for Xbox One.

So I guess the question becomes: How does the performance benefit of async scale from consoles to PC? I assume the size and type of load would be the same on both platforms. Compute operations don't scale with resolution; sort of like CPU performance, I assume. Some of the lighting details (being handled asynchronously) might scale more at higher resolutions.

This might be why Nvidia decided to emulate it on the software side. If the compute operations are simplistic enough, modern CPUs will have no problem handling them on their own. AotS being a PC exclusive and explicitly designed to showcase async performance, would present a clear problem. Past Mantle benchmarks showed the most benefit to low-end CPUs, and virtually no benefit to higher-end i5's and i7's. Knowing how weak the consoles' CPUs are, it could be a similar case as what we saw with Mantle... It exists to alleviate some CPU bottlenecking issues. If that's the case, Nvidia assumed its implementation would be 'good enough' to handle console-based async features with no trouble.

Someone else linked an article about volumetric lighting in the new Tomb Raider game and it got me thinking. Although Oxide did mention other games will be using way more async computing, and their own implementation was "moderate". I have to wonder which games exactly they are referring to. It seems AotS is capable of crippling every card on the market to about 40fps or lower, both AMD and Nvidia.

I'm just having a hard time understanding how async capabilities on the 7770 and 7850 in consoles would be capable of wrecking Nvidia cards three times more powerful (serially).

Compute performance is definitely relevant to the gpu its running on, and this is why paths have to be different for different GPU's,

That 30% number on consoles is valid and can be translated over to PC's, but there are things that won't port over well because the nature of the IGP in the CPU that by itself hides other areas with latency and system bandwidths that PC's have.

Llighting algorithms and their async needs will scale with resolutions,

nV isn't emulating it in software people have to get that out their head, its not possible, the latency figures would be much higher if they were.

Async doesn't have anything to do with CPU's either.

Let go with an example when component A and B are being done at the same time and need C which is a derivative of D

The scheduler predicts this but predicts its wrong it thinks D isn't needed until later (due to the fact the code isn't written well, it might be picked up by one of the gpu but not different IHV's gpu), what happens? The whole thing breaks down on the gpu that doesn't pick it up, but the time error correcting takes place to see what is going on you have lost considerable time, it will be fixed with error correcting but its already over by then time has been spent. These are front end issues that have to be fixed in the compiler. So if nV hasn't spent as much dev time as need for their Dx12 drivers problems like this might occur.

here is another situation.

https://blogs.oracle.com/swdeveloper/entry/a_brief_explanation_of_race

this is just an example, there are many other steps that can be taken or have to be taken, but to just give something simplified enough to understand.

Ok on the older slower hardware beating something 3 times faster newer hardware, its possible. Lets say async compute has total broken down on 980 ti because the program is doing 128 threads a tons of compute, the scheduler is so over tasked that the GPU will now do everything in serial, the workload might be tuned to fill the 128 threads with all of the compute needs at one shot, you just got a 50% increase in performance. I'm not saying it will happen this simple or even in this manner but theoretically its possible.

Tomb Raider was being made well before Xbox one dx 12 console version was ready, and its PC version is Q1 of 2016, its far away possibly far enough for Pascal, but in any case, they would have added in async code later on in the development life cycle, like recently in the last few months.

jwcalla · Sep 3, 2015

TaintedSquirrel said:
It seems AotS is capable of crippling every card on the market to about 40fps or lower, both AMD and Nvidia.

razor1 said:
Ok on the older slower hardware beating something 3 times faster newer hardware, its possible. Lets say async compute has total broken down on 980 ti because the program is doing 128 threads a tons of compute, the scheduler is so over tasked that the GPU will now do everything in serial, the workload might be tuned to fill the 128 threads with all of the compute needs at one shot, you just got a 50% increase in performance. I'm not saying it will happen this simple or even in this manner but theoretically its possible.

The more "compute" operations are loaded onto the video card the less resources will be available for graphics rendering (obviously). Looking at the graphics in AotS it's hard to believe the framerates can be so low (no offense to the game devs, just saying). Whether or not pounding the graphics card with non-graphics operations are a good strategy for game development, I'm not sure.

It would make sense for low-end CPUs though (read: consoles).

The question of whether or not nvidia cards are good at async compute is only pertinent if this is a technology that is truly valuable for a broad range of next-gen games. I think that's a question that doesn't have an obvious answer.

durquavian · Sep 4, 2015

razor1 said:
Compute performance is definitely relevant to the gpu its running on, and this is why paths have to be different for different GPU's,

That 30% number on consoles is valid and can be translated over to PC's, but there are things that won't port over well because the nature of the IGP in the CPU that by itself hides other areas with latency and system bandwidths that PC's have.

Llighting algorithms and their async needs will scale with resolutions,

nV isn't emulating it in software people have to get that out their head, its not possible, the latency figures would be much higher if they were.

Async doesn't have anything to do with CPU's either.

Let go with an example when component A and B are being done at the same time and need C which is a derivative of D

The scheduler predicts this but predicts its wrong it thinks D isn't needed until later (due to the fact the code isn't written well, it might be picked up by one of the gpu but not different IHV's gpu), what happens? The whole thing breaks down on the gpu that doesn't pick it up, but the time error correcting takes place to see what is going on you have lost considerable time, it will be fixed with error correcting but its already over by then time has been spent. These are front end issues that have to be fixed in the compiler. So if nV hasn't spent as much dev time as need for their Dx12 drivers problems like this might occur.

here is another situation.

https://blogs.oracle.com/swdeveloper/entry/a_brief_explanation_of_race

this is just an example, there are many other steps that can be taken or have to be taken, but to just give something simplified enough to understand.

Ok on the older slower hardware beating something 3 times faster newer hardware, its possible. Lets say async compute has total broken down on 980 ti because the program is doing 128 threads a tons of compute, the scheduler is so over tasked that the GPU will now do everything in serial, the workload might be tuned to fill the 128 threads with all of the compute needs at one shot, you just got a 50% increase in performance. I'm not saying it will happen this simple or even in this manner but theoretically its possible.

Tomb Raider was being made well before Xbox one dx 12 console version was ready, and its PC version is Q1 of 2016, its far away possibly far enough for Pascal, but in any case, they would have added in async code later on in the development life cycle, like recently in the last few months.

As I mentioned in the other thread, Will Pascal fix the Async issue? It was taped out last year right? And engineering would have started years before that. So it is possible they haven't accounted for this scenario unless they were looking at AMDs architecture, which I doubt seeing it made little sense in the current market at the time.

Maxwell DOES support Asynchronous Shaders

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

2[H]4U

[H]ard|Gawd

2[H]4U

[H]F Junkie

Fully [H]

Gawd

Fully [H]

n00b

Extremely [H]

2[H]4U

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

Gawd

Deleted member 83233

Guest

[H]F Junkie

Supreme [H]ardness

Deleted member 83233

Guest

Gawd

Deleted member 83233

Guest

Gawd

Deleted member 83233

Guest

Gawd

[H]F Junkie

Deleted member 83233

Guest

2[H]4U

[H]F Junkie

[H]F Junkie

[H]F Junkie

[H]F Junkie

2[H]4U

n00b

[H]F Junkie

Moderator

[H]F Junkie

2[H]4U

Gawd