Async compute gets 30% increase in performance. Maxwell doesn't support async.

razor1 · Sep 2, 2015

TaintedSquirrel said:
They smash a GTX 680 in "The Fixer" video series.

http://www.youtube.com/watch?v=eH6XayaLTw8&t=1m20s

LOL the leaf blower was funny!

Yakk · Sep 2, 2015

razor1 said:
yeah that is a possibility. To many variables to really see what is going on and the only way to cut down the variables is to get more data from different sources.

The performance wall Maxwell is hitting is just too well defined and if that slight decrease in ms is consistent, it would indicate the possibility of a small cache and shuffling being done. This workaround would make Maxwell "Async Compatible", but not very good at it. That is the direction I would pursue testing slowly eliminating these variables by testing. Haven't looked into DX12 code, but this is making me curious.

razor1 said:
They should rephrase that only older GCN architecture can do it if that is their thinking. Fiji seems to have the same issues as Maxwell if that is their thinking.

Actually I'd speculate the test is not coded properly for Fiji. Whatever is happening does not seem to not be stressing the Fiji GPU long enough to keep it from powering down between instructions from what I can see. It could be a driver level issue, but I doubt it at this point unless proven otherwise. Will need to do some research, but Fiji is based on a HUGE 4096-bit Wide Bus with 500 GB/s of total bandwidth. Combining this with Async Compute is as close to a perfect balance for a massively parallel GPU. The amount of data which can be accumulated out of order can be Huge making for much closer use to 100% use of the GPU possible. Could be as simple as not enough data being fed or that data not getting flagged to go out of order for some reason. Or more complex as needing more independent "threads" to load up more data. In any case Fiji appears to be a "level above" what we are used to seeing and looks like it would need another approach to saturate it properly.

razor1 · Sep 2, 2015

err there are 512 threads going that's a lot of threads man if that can't load up Fiji what can lol. What program can we expect to be doing 512 threads in time soon, I can't think of anything for regular consumers.

And this test is latency test, if its a "little" test or a test that doesn't push the GPU, the latency shouldn't be affected by the test because it won't stall the async pipeline. And with it mirroring Maxwell 2 GPU's, that seems like its having the same problems.

Fiji, according to new AMD slides only as 4 ACE's and 2 HWS which function like 4 ACE's, so the architectural changes are making the program to run like Maxwell 2? Where is this going? More confusion.

Yakk · Sep 2, 2015

razor1 said:
err there are 512 threads going that's a lot of threads man if that can't load up Fiji what can lol. What program can we expect to be doing 512 threads in time soon, I can't think of anything for regular consumers.

And this test is latency test, if its a "little" test or a test that doesn't push the GPU, the latency shouldn't be affected by the test because it won't stall the async pipeline.

Fiji, according to new AMD slides only as 4 ACE's and 2 HWS which function like 4 ACE's, so the architectural changes are making the program to run like Maxwell 2? Where is this going? More confusion.

We can see Maxwell is hitting a wall. Why FIJI is not rushing those instructions though I don't know, but it's certainly not hitting a performance wall. Which is why I'm thinking something is different is required for it. And with DX12 being low level, well, you probably need to have a good idea of what you are even looking for. A-LA Assembler programming back in the day...

With Fiji being aimed at VR, and AMD heavily pushing their Liquid VR with Async Shaders something tells me there is something FIJI can do AMD is either not telling us (yet) for some reason; is still in development software wise, or is under NDA, maybe until certain select developers can show it off. I'm just speculating here...

Rizen · Sep 3, 2015

Yakk said:
We can see Maxwell is hitting a wall. Why FIJI is not rushing those instructions though I don't know, but it's certainly not hitting a performance wall. Which is why I'm thinking something is different is required for it. And with DX12 being low level, well, you probably need to have a good idea of what you are even looking for. A-LA Assembler programming back in the day...

With Fiji being aimed at VR, and AMD heavily pushing their Liquid VR with Async Shaders something tells me there is something FIJI can do AMD is either not telling us (yet) for some reason; is still in development software wise, or is under NDA, maybe until certain select developers can show it off. I'm just speculating here...

Definitely not. If the card had some magical selling point they would be hyping it to the moon. Look at how hard companies have hyped features in the past that meant nothing initially - GSYNC for example, NVIDIA announced it well before monitors for it were available. AMD wouldn't be sandbagging, they want sales.

Michaelius · Sep 3, 2015

TaintedSquirrel said:
They've been boasting "Full DX12 Support" including Async for a while.
They will attempt to use lawyer-speak to escape this situation. Probably explains why they remain silent. I can't think of anything they could possibly say that will help them in this situation unless they post some magical benchmarks that prove Oxide and everyone else wrong...

Supporting feature is different thing from getting huge benefits from it.

And people getting mad over this shitstorm are losing perspective on what is most important here - real life results that show it actually matters in games.

Until someone shows benchmarks where Fury X get's 20%+ more frames than 980 ti I'm going to consider it another "wait for windows 8 it will make FX cpus great" situation

Optik · Sep 3, 2015

Michaelius said:
Supporting feature is different thing from getting huge benefits from it.

And people getting mad over this shitstorm are losing perspective on what is most important here - real life results that show it actually matters in games.

Until someone shows benchmarks where Fury X get's 20%+ more frames than 980 ti I'm going to consider it another "wait for windows 8 it will make FX cpus great" situation

you're missing the part where there already are benchmarks that show the r9 290x on par with the 980ti

Araxie · Sep 3, 2015

Optik said:
you're missing the part where there already are benchmarks that show the r9 290x on par with the 980ti

and you're missing the part where those same benchmark put the Fury X even with the 980TI? so we can assume R9 290X = Fury X? something its really wrong.. people believe what want to believe.

KickAssCop · Sep 3, 2015

Every one is missing the part where this isn't even a beta of a completely crap game.

dcds1 · Sep 3, 2015

KickAssCop said:
Every one is missing the part where this isn't even a beta of a completely crap game.

Is is a completely crap game because you don't like RTS games. You can purchase Ashes of Singularity right now and it is a pretty fun game. It would be like me saying any test from a FPS game doesn't count because FPS sucks. It still is a game and there is still people who play that type of game, so it does matter.

StormClaw · Sep 3, 2015

KickAssCop said:
Every one is missing the part where this isn't even a beta of a completely crap game.

It's a kickass looking game. I loved Total Annihilation, and this looks x10 better.

Revdarian · Sep 3, 2015

It is indeed Pre Beta but the game isn't crap at all.

Edit:
Heck the fact that you call it crap, and other statements from you show how green do you bleed... what in the hell do people win when cheerleading for a company so hard? do you think that Nvidia Senpai will look at your cheerleading and finally stop being baka and go dere dere your way? (yeah i intentionally used all those terms because that is how irrational i find those ppl).

Yakk · Sep 3, 2015

Rizen said:
Definitely not. If the card had some magical selling point they would be hyping it to the moon. Look at how hard companies have hyped features in the past that meant nothing initially - GSYNC for example, NVIDIA announced it well before monitors for it were available. AMD wouldn't be sandbagging, they want sales.

And... How long did AMD wait to push Async Shaders? Why couldn't they do it again?

Also, AMD still can't make enough Fiji cards to stock the channel properly. They keep selling out and retailer price gouging is still happening. Per my post earlier, looks like AMD is as surprised as anyone about all this fanfare going on since it was announced back at GDC, but nobody picked up on it until now.

KickAssCop · Sep 3, 2015

Revdarian said:
It is indeed Pre Beta but the game isn't crap at all.

Edit:
Heck the fact that you call it crap, and other statements from you show how green do you bleed... what in the hell do people win when cheerleading for a company so hard? do you think that Nvidia Senpai will look at your cheerleading and finally stop being baka and go dere dere your way? (yeah i intentionally used all those terms because that is how irrational i find those ppl).

It is crap from what I have seen of it. Not interested. I can have an opinion no?

trick0502 · Sep 3, 2015

KickAssCop said:
It is crap from what I have seen of it. Not interested. I can have an opinion no?

yes. rts is crap to me too, but being the first dx12 game benchmark makes it interesting. now we need a dx12 fps engine benchmark. come on nv, where is ur4 gw dx12 benchmarks?

yourgrandma · Sep 3, 2015

I don't get the hate.

I remember when total a annihilation was essentially the first game to take advantage of dual cores. I think it's awesome that a identical game pushing the latest tech again even when the rts genre is on life support.

Yakk · Sep 3, 2015

yourgrandma said:
I don't get the hate.

I remember when total a annihilation was essentially the first game to take advantage of dual cores. I think it's awesome that a identical game pushing the latest tech again even when the rts genre is on life support.

Somebody or other will always hate something or other... No biggie... Seems the internet is based on hate oftentimes. LoL...

Games types, like fashion, change. RTS, MMO, MP, RPG, FPS...etc... Hang around long enough and you see it go full circle with some slight tweaks. RTS games typically put very heavy demands on all system resources so it's no surprise to see them amongst the first to try out new tech.

trudude · Sep 3, 2015

I predict that once more DX12 games start coming out nVidia is going to totally shit-stomp AMD like they always have and it won't even matter that AMD supports async compute and nVidia doesn't.

Quix · Sep 3, 2015

trudude said:
I predict that once more DX12 games start coming out nVidia is going to totally shit-stomp AMD like they always have and it won't even matter that AMD supports async compute and nVidia doesn't.

I predict that in 6-10 months both Nvidia and AMD will drop new cards that eviscerate their current cards because they'll be built with Direct X 10 in mind on a 16nm process.

Venomous · Sep 3, 2015

Quix said:
I predict that in 6-10 months both Nvidia and AMD will drop new cards that eviscerate their current cards because they'll be built with Direct X 10 in mind on a 16nm process.

DX10? Wha?

rinaldo00 · Sep 3, 2015

razor1 said:
simple put

AsyncCompute should be used with caution as it can cause more unpredictable performance and requires more coding effort for synchronization.

Click to expand...

https://docs.unrealengine.com/lates...ing/ShaderDevelopment/AsyncCompute/index.html

Why didn't you quote the entire paragraph where it talks about ease of debugging and porting?

Thanks and Future

This feature was implemented by Lionhead Studios. We integrated it and indend to make use of it as a tool to optimize the XboxOne rendering.

As more more APIs expose the hardware feature we would like make the system more cross platform. Features that make use use AsyncCompute you always be able to run without (console variable / define) to run on other platforms and easier debugging and profiling. AsyncCompute should be used with caution as it can cause more unpredicatble performance and requires more coding effort for synchromization.

razor1 · Sep 3, 2015

easier debugging with the new tools, over the existing tools, that has nothing to do with async shaders

. Why don't you download the engine and see what they are talking about before you start inferring something that has nothing to do with the topic at hand?

tybert7 · Sep 3, 2015

razor1 said:
easier debugging with the new tools, over the existing tools, that has nothing to do with async shaders. Why don't you download the engine and see what they are talking about before you start inferring something that has nothing to do with the topic at hand?

The unpredictable performance is especially true for nvidia hardware.

razor1 · Sep 3, 2015

tybert7 said:
The unpredictable performance is especially true for nvidia hardware.

Its for all hardware types, even Fiji.

Its truly baffling that people that don't have any idea of what is going on, come up with conclusions based off of articles which their point of view is to get as many hits as possible and what others have said, that have something of interest to the topic from a monitory point of view.

TaintedSquirrel · Sep 3, 2015

Developers of Nvidia-sponsored Unreal Engine 4 downplaying async compute... Yes, this is all starting to make sense.

razor1 · Sep 3, 2015

TaintedSquirrel said:
Developers of Nvidia-sponsored Unreal Engine 4 downplaying async compute... Yes, this is all starting to make sense.

Its the nature of async programming, I can show you white papers about the same thing if you like

http://www.doc.ic.ac.uk/~pd1113/lib/papers/psharp_pldi15.pdf

first sentence

Programming efficient asynchronous systems is challenging because it can often be hard to express the design declaratively, or to defend against data races and interleaving-dependent assertion violations. Previous work has only addressed these challenges in isolation, by either designing a new declarative language, a new data race detection tool or a new testing technique.

Experienced programmers are better adept for Dx12 and async shaders, beginner and mid level programmers shouldn't worry too much about them because coding standards have to be more strict, and the learning curve is higher for Dx12 and async shaders, guess what the Unreal engine 4's main target is? Unity users and Indi developers, that warning is for them.

rinaldo00 · Sep 3, 2015

TaintedSquirrel said:
They smash a GTX 680 in "The Fixer" video series.

http://www.youtube.com/watch?v=eH6XayaLTw8&t=1m20s

wimp - "you better settle down"

cool d00d - "I never settle"

referencing AMD's 'never settle' bundles.

kinda clever

StormClaw · Sep 3, 2015

TaintedSquirrel said:
Developers of Nvidia-sponsored Unreal Engine 4 downplaying async compute... Yes, this is all starting to make sense.

anal devastation incoming

http://vocaroo.com/i/s0moE1u9QsCA

TaintedSquirrel · Sep 3, 2015

Who is that?

Would have been more credible if he avoided the ranting. He starts citing DX12 games coming soon (which is true) but there's nothing known about async shading in those games, especially with regards to UE4 which he includes on his list. He's also wrong about VR: https://www.reddit.com/r/oculus/comments/3gwnsm/nvidia_gameworks_vr_sli_test/

I find it hard to believe AMD saw this coming 5 years ago, as he claims, and Nvidia can't even manage to figure it out yet with Pascal. I'm gonna go ahead and toss this link into a backup bookmarks folder that way I can revisit it in 6-12 months and see where he landed.

razor1 · Sep 3, 2015

StormClaw said:
anal devastation incoming

http://vocaroo.com/i/s0moE1u9QsCA

lol yeah he doesn't know what he is talking about, narrow pipeline?

Parallel processing is not the same as async, yes async needs parallel to function,

Wow this guys needs more than a 4 letter vocabulary

they have these engines..... render tasks no compute tasks are not rendering tasks lol.

Yeah the best he can do is read more lol.

Rendering effects is not part of compute!

Fury X can't even use the extra bandwidth it has right now!

he is lying by thinking he knows what he is talking about, ignorance is bliss.

Wow he calls crytek C*nts? This guy probably can't even write hello world and he calls professional developers that make games that he plays that?

There is no such thing as future proof which generations have we ever say that, only 2, the 9700 from ATi and the 8800 gtx and this is because the lowest common denominator is what developers program for.

He is wrong about VR he doesn't know how that works either.

He can't even talk right, emergence how hard is that to say?

Stop stuttering bubba lol maybe he needs some shrimp.

Grapevine, is not what he heard is right, Pascal is a completely different AISC, not to mention to do different precision at the same time, async compute is a must and it has to perform well.

LOL he doesn't understand how node process shrinks work, they don't automagically drop power usage, there is some advantage but as for the past few node drops the advantage of the node drops have been dropping, much more effort in base architecture design has improved power usage and frequency much more than just the node drop.

The rest of the crap, he can shove it personal desires don't play with the discussion.

tybert7 · Sep 3, 2015

razor1 said:
Its for all hardware types, even Fiji.

Its truly baffling that people that don't have any idea of what is going on, come up with conclusions based off of articles which their point of view is to get as many hits as possible and what others have said, that have something of interest to the topic from a monitory point of view.

Fiji seemed a bit more erratic in those test runs for async compute but they still showed clear overlap of graphics + compute. None of the nvidia gpus have so far, something is going on with them where that entire pathway for mixed graphics/compute work with async is just trashed. Maybe they don't need it to get the desired results, but it does not look good for a supposedly supported feature. Supporting async compute... unless you need to perform a lot of it while also doing graphics, is a much different capability than the more robust mixed support amd cards seem to have. It's actually worse than the 970 issue where nvidia was technically correct in stating the cards had 4GB of ram (ignoring the added detail that not all of it had the same access speed or could be accessed at the same time due to the design) because at least in that case, expected performance in most cases was roughly the same unless you did something weird above 3.5GB.

For heavy async + graphics workloads, where that truly is the most beneficial performance pathway, nvidia cards sound like they will be gimped.

dcds1 · Sep 3, 2015

How does Async truely work. Does it take a batch of problems and then compares the results or does each problem it calculates changes the final out come. I understand that AMD has like a 64 lane bus system, like a big 64 lane highway, but how does it take all the information at once and render a scene.

razor1 · Sep 3, 2015

tybert7 said:
Fiji seemed a bit more erratic in those test runs for async compute but they still showed clear overlap of graphics + compute. None of the nvidia gpus have so far, something is going on with them where that entire pathway for mixed graphics/compute work with async is just trashed. Maybe they don't need it to get the desired results, but it does not look good for a supposedly supported feature. Supporting async compute... unless you need to perform a lot of it while also doing graphics, is a much different capability than the more robust mixed support amd cards seem to have. It's actually worse than the 970 issue where nvidia was technically correct in stating the cards had 4GB of ram (ignoring the added detail that not all of it had the same access speed or could be accessed at the same time due to the design) because at least in that case, expected performance in most cases was roughly the same unless you did something weird above 3.5GB.

For heavy async + graphics workloads, where that truly is the most beneficial performance pathway, nvidia cards sound like they will be gimped.

Well taking the 970 out of the discussion, cause that doesn't have anything to do with the conversation.

Its possible there is a front end or driver related problem with Maxwell 2, to how much is it fixable, have no clue, as with Fiji as well but to a lesser degree.

The only thing is, is this feature going to be a necessity for the next 6 months to 9 months (assuming Pascal is coming out in this time frame and games with heavy async usage is going to be a minority or not) ? If the answer is yes, then Maxwell 2 is no good without its async performance being fixed, if the answer is no, then its fine. What Dx12 games are coming out in 6 months to 9 months will answer this.

razor1 · Sep 3, 2015

dcds1 said:
How does Async truely work. Does it take a batch of problems and then compares the results or does each problem it calculates changes the final out come. I understand that AMD has like a 64 lane bus system, like a big 64 lane highway, but how does it take all the information at once and render a scene.

ok when a compute shader is read by the compiler, it is broken up into its individual instructions, and this point it is sent to the scheduler queue, the scheduler looks at what is going on with the graphics and then plug in compute instructions where it has time by predicting. Its not a smart system where it knows what is going to happen its all about predicting what might happen.

And this is why different code behaves different on different architectures and drivers. There is a lot of if's and a lot of prediction going on.

Deleted member 83233 · Sep 3, 2015

I'm curious, (and this is a genuine question as I don't know the intricate details of how this works currently...) Does using multiple GPUs (on either side AMD or NV) help alleviate this problem, or is it doing identical tasks on both GPUs simultaneously to where it would essentially behave exactly the same. I'm not asking which is better here, only whether more GPUs help either side.

griff30 · Sep 3, 2015

Will this be fixed with the next Nvidia GPU or is the near future bleak for them on DX12?

razor1 · Sep 3, 2015

J3RK said:
I'm curious, (and this is a genuine question as I don't know the intricate details of how this works currently...) Does using multiple GPUs (on either side AMD or NV) help alleviate this problem, or is it doing identical tasks on both GPUs simultaneously to where it would essentially behave exactly the same. I'm not asking which is better here, only whether more GPUs help either side.

It can the driver can tell one card to do the compute and one card do the graphics, or combine the two on both cards.

Deleted member 83233 · Sep 3, 2015

razor1 said:
It can the driver can tell one card to do the compute and one card do the graphics, or combine the two on both cards.

Ok, that makes sense. Does it need to be set up, or can the scheduler(s) move these around freely between the two cards? (based on the load on a particular GPU)

razor1 · Sep 3, 2015

well the driver will tell the hardware, it should be fairly easy to do on nV's end I think, again, that's not my expertise so, I might be completely wrong on the difficulty part but its definitely doable.

Chimpee · Sep 3, 2015

I am curious, what kind of games will benefit most with async compute?

Async compute gets 30% increase in performance. Maxwell doesn't support async.

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

[H]F Junkie

Gawd

Gawd

2[H]4U

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

2[H]4U

Gawd

2[H]4U

[H]F Junkie

2[H]4U

[H]F Junkie

[H]F Junkie

[H]F Junkie

2[H]4U

Gawd

[H]F Junkie

[H]F Junkie

2[H]4U

Gawd

[H]F Junkie

[H]F Junkie

Deleted member 83233

Guest

Supreme [H]ardness

[H]F Junkie

Deleted member 83233

Guest

[H]F Junkie

[H]ard|Gawd