• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Maxwell DOES support Asynchronous Shaders

Supporting a feature and actually having it work right seem to be two different things. Maxwell may technically support it, but it runs like garbage on it.
 
I think it's funny that anyone is even talking about this, there aren't even any games that support DirectX 12 at this point. Isn't real world performance the only thing that really matters here?
 
I think it's funny that anyone is even talking about this, there aren't even any games that support DirectX 12 at this point. Isn't real world performance the only thing that really matters here?

Yes there is. Ark survival evolved is the first dx12 game and I think it's already out. They tout a 20% performance boost and it's one of the most popular games steam has ever seen
 
I don't know. Metal Gear Solid V: The Phantom Pain a non DX 12 game is kicking it's ass over 70,000 players playing right now and 40,000 players play Ark.


Yes there is. Ark survival evolved is the first dx12 game and I think it's already out. They tout a 20% performance boost and it's one of the most popular games steam has ever seen
 
It's software emulated, apparently.

https://www.reddit.com/r/pcgaming/comments/3jfgs9/maxwell_does_support_async_compute_but_with_a/

The performance gains are incredibly small. And it can potentially cause performance loss if overhead becomes a problem. That would explain the tiny gains in the B3D benchmark and the stability problems the Oxide dev mentioned in AotS. Dunno how Nvidia could possibly respond to this situation. I can't think of anything they could say that would appease people aside from a product recall.

Should really be rephrased to "Nvidia only supports software-level async".
 
I don't believe Nvidia...I think they're in spin mode trying to make the best of a bad situation (much like the 3.5GB VRAM issue)

Nvidia's new modus operandi:
  1. Release broken hardware
  2. Make a software work-around
  3. Refer to the software workaround that doesn't actually fix the problem as a new feature, optimization, or other meaningless buzzword

Nvidia's most recent drivers are emulating the missing function. (Refer to 2) The gains are small and does not fix this missing function (Refer to 3). There is a big difference between software emulation and actual hardware supporting it.

So, in essence Maxwell still doesn't support it.
 
Refer to the software workaround that doesn't actually fix the problem as a new feature, optimization, or other meaningless buzzword

lol...yes they made the missing VRAM issue into a new feature...3.5GB plus an extra 0.5GB of new segmented memory based on heuristic patterns...this amazing new feature is only available for GTX 970 owners!
 
As an nvidia buyer since 8800gtx days, I'm leaving towards them doing async as a checkbox item.

Similar to how tessellation was a checkbox item done by software with severe penalties before future generations had dedicated hardware reserved for it. How much async get used is the real question.
 
wow OP

how much were you paid to post that?


just curious as you seemed not to have read it.
 
As an nvidia buyer since 8800gtx days, I'm leaving towards them doing async as a checkbox item.

Similar to how tessellation was a checkbox item done by software with severe penalties before future generations had dedicated hardware reserved for it. How much async get used is the real question.

Considering consoles are going to use it/already using it due to the performance boost it gives safe bet is that its going to be common once DX12 gets popular. Unless PC simply gets singled out on purpose.
 
Wake me up when "ASYNC shaders" is something more meaningful and demonstrable than forum masturbation.

89kLm6v.jpg
 
Last edited:
and my car can run on moonshine


Funny thing is it probably can, have you see how combustible pure moonshine is? Tell ya something its smooth going down but its more volatile than Bacardi 151. probably destroy your engine though.
 
Wake me up when "ASYNC shaders" is something more meaningful and demonstrable than forum masturbation.

Me too. I'm gonna lay down back, 'cause our concussion had me sleepy...

Hmph... forumsturbation... phooey

:D
 
Nvidia GPUs actually have 0 hardware support for DX12, they just try to fake it through their drivers


Not possible, something has to be creating the queue, to create the queue the processor has to analysis the shaders, break them down to individual instructions and then parse them through the scheduler is a very complex task for the CPU to do, and if it isn't the GPU its the CPU and that would easily be noticeable with the latency numbers as it will increase at a much steeper plot, just from the communication of the GPU and CPU we don't even need to look at the time it would take the CPU to do the other steps.
 
Wake me up when "ASYNC shaders" is something more meaningful and demonstrable than forum masturbation.

Hey, that's the most important thing: Bragging to a bunch of strangers about how your hardware is supposedly better than their hardware. Not enjoying playing games, I mean who buys gaming hardware for that?

It has always been funny to me the GPU pissing matches that go on. Just buy a GPU that works for you and be happy.

The arguing about future stuff is even funnier. Why would I care about how any of the current cards are going to perform on games in the future? I'll get a new card then. That's half the fun of PC gaming> New toys all the time :D
 
Hey, that's the most important thing: Bragging to a bunch of strangers about how your hardware is supposedly better than their hardware. Not enjoying playing games, I mean who buys gaming hardware for that?

It has always been funny to me the GPU pissing matches that go on. Just buy a GPU that works for you and be happy.

The arguing about future stuff is even funnier. Why would I care about how any of the current cards are going to perform on games in the future? I'll get a new card then. That's half the fun of PC gaming> New toys all the time :D

Yep. :cool:
 
Not possible, something has to be creating the queue, to create the queue the processor has to analysis the shaders, break them down to individual instructions and then parse them through the scheduler is a very complex task for the CPU to do, and if it isn't the GPU its the CPU and that would easily be noticeable with the latency numbers as it will increase at a much steeper plot, just from the communication of the GPU and CPU we don't even need to look at the time it would take the CPU to do the other steps.

you're wrong

http://vocaroo.com/i/s0moE1u9QsCA
 

I see you keep linking obscure forum posts, sites, and such. That doesn't really prove anything. I have a Maxwell card, and couldn't care too much less about the DX12 features it does or doesn't support because I will buy a new card by the time I'm fully using DX12. So I don't even really care about any of this other than passing curiosity. (and just to disclose, I'm all for AMD doing something right, and if their async shader setup works better right now, good for them) I still haven't seen anything that I'd consider solid proof yet on how NVs hardware does or doesn't do this. We see hints of it, we hear hints of it, but I don't see any proof or evidence that I'd consider compelling yet.
 
I see you keep linking obscure forum posts, sites, and such. That doesn't really prove anything.
As opposed to you posting factious Nvidia press release on the matter that clears everything up.

Oh wait...

:eek:
 
Last edited:
As opposed to you posting fatuous Nvidia press release on the matter that clears everything up.

Oh wait...

:eek:

I'm not posting anything, because quite frankly there isn't much to post at the moment. Also, I didn't say Nvidia would be the one to clear everything up. Also, even if they do post something that's clearly bullshit, then that would tend to clear things up in a different way. But... That hasn't happened, nor has anyone else so far. Based on the idle(ish) curiosity I mentioned above, I would actually like to know what's up. Not that it will change my opinion at all about the hardware I'm running at THIS moment in time, but I still like to know things. I was just saying that what you just posted doesn't really provide that any better. So far there is speculation, pure garbage spewing, educated guessing, and I'm sure other stuff in there too. I don't see what the point is in further pressing the issue until someone comes up with something a bit more concrete.
 
I'm not posting anything, because quite frankly there isn't much to post at the moment.

There isn't much to post because Nvidia knows they're fucked. That's why they've been dead silent, while being rekt on tech sites all over the web.
 
if it is he should be banned for doing that, poor excuse for member of this forum.
 
I mostly just lurk on these forums (have for many, many years - love this place), but I had to post on this one. So, I write some cuda/ocl code which is (drumroll) - compute.

I'm a little confused by this whole debacle. Async memory/kernel transfer and even async concurrent compute isn't anything new. Hell, you could run concurrent async operations back on the Tesla C1060 with cuda 2.x!

Now, it is quite true that GCN's scheduler is quite a bit more robust, but to claim that Nvidia doesn't support async operations at a hardware level shows a distinct misunderstanding of what a GPU even is. That said, Nvidia could definitely benefit from putting some extra effort behind beefing up the scheduler pipeline to catch up a bit and really let things shine.

I don't really have much of a dog in the race. I'll buy whatever is working best at the time I need a new card, so I just wanted to express my confusion / amusement at all of this. By all means, if I'm fundamentally misunderstanding the issue myself, let me know, I'd appreciate being less confused. =)


I hate to link to something with any bias at all, but this OP did a great job of collecting and analyzing all of the recent news / data / etc, and they even have great analysis of a simple DX12 synthetic bench someone through together to show scaling from 1 to 128 threads, concurrent, You can see, quite clearly, that Maxwell (and every prior Nvidia card all the way back to Tesla-core) supported async compute. It isn't new.

https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/


Again, sorry for reddit as a source, but the material is pretty comprehensive.


** Edit: Ok, after a bit more reading I see what the beef is. Folks are upset that they aren't seeing pure 100%, integrated asymmetrical overlap time savings when using DX12's async calls. I could see where that might be distressing at first blush and frankly, the forced synchronize overlap is horrendous on the Maxwell cards. However, given all the kerfuffle pointed at Nvidia, outside of the awesome gains of the AMD 2xx/3xx boards, Fiji seems to be in much the same boat. Nvidia has a much larger gap to make up, but the fact that there are gains at all (that are statistically relevant in many cases) should point to this being a driver/API implementation issue, not hardware. Regardless, this topic once again has my interest and I don't find it embarrassing to read anymore. =D
 
Last edited:
As an aside: Is compute performance relative to the GPU it's running on?
I have to wonder if all of these % performance numbers we see being thrown around (Oxide cited 30%) are normalized to console performance.

For example, taking the HD7850(ish) in the PS4 and increasing its performance by 30% would make it jump from a GTX 660 to about a 660 Ti. In other words, a very huge performance boost on the PS4 is about half a video card tier on PC. The numbers are even smaller for Xbox One.

So I guess the question becomes: How does the performance benefit of async scale from consoles to PC? I assume the size and type of load would be the same on both platforms. Compute operations don't scale with resolution; sort of like CPU performance, I assume. Some of the lighting details (being handled asynchronously) might scale more at higher resolutions.

This might be why Nvidia decided to emulate it on the software side. If the compute operations are simplistic enough, modern CPUs will have no problem handling them on their own. AotS being a PC exclusive and explicitly designed to showcase async performance, would present a clear problem. Past Mantle benchmarks showed the most benefit to low-end CPUs, and virtually no benefit to higher-end i5's and i7's. Knowing how weak the consoles' CPUs are, it could be a similar case as what we saw with Mantle... It exists to alleviate some CPU bottlenecking issues. If that's the case, Nvidia assumed its implementation would be 'good enough' to handle console-based async features with no trouble.

Someone else linked an article about volumetric lighting in the new Tomb Raider game and it got me thinking. Although Oxide did mention other games will be using way more async computing, and their own implementation was "moderate". I have to wonder which games exactly they are referring to. It seems AotS is capable of crippling every card on the market to about 40fps or lower, both AMD and Nvidia.

I'm just having a hard time understanding how async capabilities on the 7770 and 7850 in consoles would be capable of wrecking Nvidia cards three times more powerful (serially).
 
Last edited:
I like this quote

I think I would like to quote Robert Hallock from AMD here:
“I think gamers are learning an important lesson: there’s no such thing as “full support” for DX12 on the market today. There have been many attempts to distract people from this truth through campaigns that deliberately conflate feature levels, individual untiered features and the definition of “support.”

I don't think it is any surprise either. By the time there are DX12 games, we will be on the next gen of GPUs.

It will be a battle between the two next gens.

Also, lowest common denominator, DX11 aint going away for a long time.

People expecting DX12 support on older generational hardware is not logical. You need new hardware and software for support. It has always been this way.
 
As an aside: Is compute performance relative to the GPU it's running on?
I have to wonder if all of these % performace numbers we see being thrown around (Oxide cited 30%) are normalized to console performance.

For example, taking the HD7850(ish) in the PS4 and increasing its performance by 30% would make it jump from a GTX 660 to about a 660 Ti. In other words, a very huge performance boost on the PS4 is about half a video card tier on PC. The numbers are even smaller for Xbox One.

So I guess the question becomes: How does the performance benefit of async scale from consoles to PC? I assume the size and type of load would be the same on both platforms. Compute operations don't scale with resolution; sort of like CPU performance, I assume. Some of the lighting details (being handled asynchronously) might scale more at higher resolutions.

This might be why Nvidia decided to emulate it on the software side. If the compute operations are simplistic enough, modern CPUs will have no problem handling them on their own. AotS being a PC exclusive and explicitly designed to showcase async performance, would present a clear problem. Past Mantle benchmarks showed the most benefit to low-end CPUs, and virtually no benefit to higher-end i5's and i7's. Knowing how weak the consoles' CPUs are, it could be a similar case as what we saw with Mantle... It exists to alleviate some CPU bottlenecking issues. If that's the case, Nvidia assumed its implementation would be 'good enough' to handle console-based async features with no trouble.

Someone else linked an article about volumetric lighting in the new Tomb Raider game and it got me thinking. Although Oxide did mention other games will be using way more async computing, and their own implementation was "moderate". I have to wonder which games exactly they are referring to. It seems AotS is capable of crippling every card on the market to about 40fps or lower, both AMD and Nvidia.

I'm just having a hard time understanding how async capabilities on the 7770 and 7850 in consoles would be capable of wrecking Nvidia cards three times more powerful (serially).


Compute performance is definitely relevant to the gpu its running on, and this is why paths have to be different for different GPU's,

That 30% number on consoles is valid and can be translated over to PC's, but there are things that won't port over well because the nature of the IGP in the CPU that by itself hides other areas with latency and system bandwidths that PC's have.

Llighting algorithms and their async needs will scale with resolutions,

nV isn't emulating it in software people have to get that out their head, its not possible, the latency figures would be much higher if they were.

Async doesn't have anything to do with CPU's either.

Let go with an example when component A and B are being done at the same time and need C which is a derivative of D

The scheduler predicts this but predicts its wrong it thinks D isn't needed until later (due to the fact the code isn't written well, it might be picked up by one of the gpu but not different IHV's gpu), what happens? The whole thing breaks down on the gpu that doesn't pick it up, but the time error correcting takes place to see what is going on you have lost considerable time, it will be fixed with error correcting but its already over by then time has been spent. These are front end issues that have to be fixed in the compiler. So if nV hasn't spent as much dev time as need for their Dx12 drivers problems like this might occur.

here is another situation.

https://blogs.oracle.com/swdeveloper/entry/a_brief_explanation_of_race

this is just an example, there are many other steps that can be taken or have to be taken, but to just give something simplified enough to understand.

Ok on the older slower hardware beating something 3 times faster newer hardware, its possible. Lets say async compute has total broken down on 980 ti because the program is doing 128 threads a tons of compute, the scheduler is so over tasked that the GPU will now do everything in serial, the workload might be tuned to fill the 128 threads with all of the compute needs at one shot, you just got a 50% increase in performance. I'm not saying it will happen this simple or even in this manner but theoretically its possible.

Tomb Raider was being made well before Xbox one dx 12 console version was ready, and its PC version is Q1 of 2016, its far away possibly far enough for Pascal, but in any case, they would have added in async code later on in the development life cycle, like recently in the last few months.
 
Last edited:
It seems AotS is capable of crippling every card on the market to about 40fps or lower, both AMD and Nvidia.

Ok on the older slower hardware beating something 3 times faster newer hardware, its possible. Lets say async compute has total broken down on 980 ti because the program is doing 128 threads a tons of compute, the scheduler is so over tasked that the GPU will now do everything in serial, the workload might be tuned to fill the 128 threads with all of the compute needs at one shot, you just got a 50% increase in performance. I'm not saying it will happen this simple or even in this manner but theoretically its possible.

The more "compute" operations are loaded onto the video card the less resources will be available for graphics rendering (obviously). Looking at the graphics in AotS it's hard to believe the framerates can be so low (no offense to the game devs, just saying). Whether or not pounding the graphics card with non-graphics operations are a good strategy for game development, I'm not sure.

It would make sense for low-end CPUs though (read: consoles).

The question of whether or not nvidia cards are good at async compute is only pertinent if this is a technology that is truly valuable for a broad range of next-gen games. I think that's a question that doesn't have an obvious answer.
 
Compute performance is definitely relevant to the gpu its running on, and this is why paths have to be different for different GPU's,

That 30% number on consoles is valid and can be translated over to PC's, but there are things that won't port over well because the nature of the IGP in the CPU that by itself hides other areas with latency and system bandwidths that PC's have.

Llighting algorithms and their async needs will scale with resolutions,

nV isn't emulating it in software people have to get that out their head, its not possible, the latency figures would be much higher if they were.

Async doesn't have anything to do with CPU's either.

Let go with an example when component A and B are being done at the same time and need C which is a derivative of D

The scheduler predicts this but predicts its wrong it thinks D isn't needed until later (due to the fact the code isn't written well, it might be picked up by one of the gpu but not different IHV's gpu), what happens? The whole thing breaks down on the gpu that doesn't pick it up, but the time error correcting takes place to see what is going on you have lost considerable time, it will be fixed with error correcting but its already over by then time has been spent. These are front end issues that have to be fixed in the compiler. So if nV hasn't spent as much dev time as need for their Dx12 drivers problems like this might occur.

here is another situation.

https://blogs.oracle.com/swdeveloper/entry/a_brief_explanation_of_race

this is just an example, there are many other steps that can be taken or have to be taken, but to just give something simplified enough to understand.

Ok on the older slower hardware beating something 3 times faster newer hardware, its possible. Lets say async compute has total broken down on 980 ti because the program is doing 128 threads a tons of compute, the scheduler is so over tasked that the GPU will now do everything in serial, the workload might be tuned to fill the 128 threads with all of the compute needs at one shot, you just got a 50% increase in performance. I'm not saying it will happen this simple or even in this manner but theoretically its possible.

Tomb Raider was being made well before Xbox one dx 12 console version was ready, and its PC version is Q1 of 2016, its far away possibly far enough for Pascal, but in any case, they would have added in async code later on in the development life cycle, like recently in the last few months.

As I mentioned in the other thread, Will Pascal fix the Async issue? It was taped out last year right? And engineering would have started years before that. So it is possible they haven't accounted for this scenario unless they were looking at AMDs architecture, which I doubt seeing it made little sense in the current market at the time.
 
Back
Top