• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Async Compute!

Status
Not open for further replies.
Man I don't understand why the focus has been on using async compute for rendering. Rendering ties you to frames. If you're tied to frames then you need synchronization with other related rendering tasks. If your work is synchronized it ain't fucking asynchronous, at least not by the dictionary definition of the word lol.

Interesting uses of async compute should be things that are not tied to rendering, like AI, or in some cases physics.

Like, engine renders a frame, frame is ready X ms before vsync, spend X ms doing async work before moving onto next frame


Yep, like Flex is going to be interesting ;)
 
Oh the hilarity.

Anyway, on async. Those saying it is a feature of DX12 are correct. The difference between AMD and NVidia is AMD has hardware dedicated to handling it, and NVidia does not yet have this, even in Pascal, so they try to compensate with software emulation and brute force.

2 different approaches to the same technology. Not even sure why its up for debate, other than fanbois in denial.
 
The developers themselves are learning to exploit it in ways that benefits their software/games. Its the architecture of the hardware that makes it work or not. The more complex the hardware the more benefit of async.
The DX12 drivers story seems to point to the fact that Win 10 drivers are not as mature/stable in the green camp. All of this is still new. Everyone is still learning.
Going back to my DOOM post and how it runs in comparison to Guru3d's benchmarks.
Is it drivers alone that make my scores in mid 90's. Both of ID guys were talking about Async Compute and Vulkan during AMD's Computex Rx480 reveal. It all looks promising and is evolving.
I will take those 20 frames. Its free and does not matter if its Devil's or God's work....lol
 
Last edited:
Oh the hilarity.

Anyway, on async. Those saying it is a feature of DX12 are correct. The difference between AMD and NVidia is AMD has hardware dedicated to handling it, and NVidia does not yet have this, even in Pascal, so they try to compensate with software emulation and brute force.

2 different approaches to the same technology. Not even sure why its up for debate, other than fanbois in denial.

This bright young mind is one of the people who has me on ignore. It's his loss, because my posts in the last two pages probably contain more information on the subject than he has ever seen.

The hardware dedicated to handling it he is referring to are the ACEs.

The ACEs enabled asynchronous SHADERS. Which is async compute implementation used on GCN. He doesn't even know what the difference between the implementation and the actual concept is

When you give him evidence for his being wrong, he calls you names, or acts offended, or pretends to be taking a moral high ground by not arguing with you.

This bright young mind has me on ignore because I got VERY angry (I admit it) when he said stuff like "percentages are meaningless because 5% could be any number"

He is also one of the proponents of the "pascal is gcn-like because gcn is the superior architecture" view that was popularized by the Tesla P100 announcement; 64 ALUs per SM? MUST BE A GCN-CLONE

Then when GP104 was announced with 128 ALUs per SM, those people shut up and moved onto some other superficial analysis that fits their obtuse views.

God, I laughed so hard thinking about them when I saw 128 ALU/SM on Gp104. Rofl.

Then when you contest the 'pascal is gcn-like' view by pointing out something HUGE and plainly obvious like

"but CUDA architectures don't even vector units, GCN has 4 16-wide VUs per CU, CUDA has scalar units..." they don't reply.

there's nothing wrong with being ignorant, everyone is ignorant about something . Most people don't know or care about the difference between vector units and scalar units, and that's fine, so long as they don't talk crap about it
 
Last edited:
Jim, John, Joe, Joel and Jimothy all live in the same apartment.

Jim goes to the bathroom, picks up his toothbrush and applies thermal paste. Instead of John having to wait for him to finish brushing his teeth, John just waits for Jim to put the toothpaste back (5 seconds after picking up his toothbrush), then he starts his brushing teeth sequence. Then Joe, Joel and Jimothy all do the same.


TLDR: Brush your teeth, you dirty fuckers!

I don't always brush my teeth. but when I do, I use MX-5.
 
I don't always brush my teeth. but when I do, I use MX-5.

Yeah MX-5 is great for teeth. I always use the pea method then spread it with a spatula, makes for a great contact surface when you need to mount your teeth heatsinks... What do they call them again? Braces ?
 
The developers themselves are learning to exploit it in ways that benefits their software/games. Its the architecture of the hardware that makes it work or not. The more complex the hardware the more benefit of async.
The DX12 drivers story seems to point to the fact that Win 10 drivers are not as mature/stable in the green camp. All of this is still new. Everyone is still learning.
Going back to my DOOM post and how it runs in comparison to Guru3d's benchmarks.
Is it drivers alone that make my scores in mid 90's. Both of ID guys were talking about Async Compute and Vulkan during AMD's Computex Rx480 reveal. It all looks promising and is evolving.
I will take that 20 frames. Its free and does not matter if its Devi's or Gods work....lol

It's not a driver issue for team green - its an architectural one. AMD for the longest time had hardware async support baked into their cards, but no one used it because in DX11, the way to quickly make games was already established, and developers didn't want to learn something new.

However, partially thanks to AMDs partnership with MS in developing DX12, since developers are learning new things anyway, async is now a part of that.

Therefore, AMD had a solution waiting for a problem, while NVidia essentially got pantsed on this (not that it matters much with the pure brute force of the new cards). Depending on where it is in the development cycle, NVidia's next gen card, Volta, could possibly be the first NVidia card with real async support.
 
It's not a driver issue for team green - its an architectural one. AMD for the longest time had hardware async support baked into their cards, but no one used it because in DX11, the way to quickly make games was already established, and developers didn't want to learn something new.

However, partially thanks to AMDs partnership with MS in developing DX12, since developers are learning new things anyway, async is now a part of that.

Therefore, AMD had a solution waiting for a problem, while NVidia essentially got pantsed on this (not that it matters much with the pure brute force of the new cards). Depending on where it is in the development cycle, NVidia's next gen card, Volta, could possibly be the first NVidia card with real async support.

Imhotep ask him to explain further, please :D There's nothing I would enjoy more than a GPU architecture 101 course from Zion Halcyon :D

Like see he talks about pure brute force, yet the Fury X (8600gflops) is competing with a 980Ti (6700gflops stock), so who is using brute force exactly ?

When you match the 8600 on the FX the 980Ti pulls ahead. What does this mean? It means the 980Ti makes a more efficient use of it's shader array, without using async compute, without the additional die complexity (therefore cost + power consumption) that the async shader thing requires.
 
It's not a driver issue for team green - its an architectural one. AMD for the longest time had hardware async support baked into their cards, but no one used it because in DX11, the way to quickly make games was already established, and developers didn't want to learn something new.

However, partially thanks to AMDs partnership with MS in developing DX12, since developers are learning new things anyway, async is now a part of that.

Therefore, AMD had a solution waiting for a problem, while NVidia essentially got pantsed on this (not that it matters much with the pure brute force of the new cards). Depending on where it is in the development cycle, NVidia's next gen card, Volta, could possibly be the first NVidia card with real async support.


its is an architectural thing, but its not because there is a lack of ability to do so. nV's Maxwell cards did static scheduling of their SMX's, so if they don't predict the workload right with async compute, things become a major problem for them as you now are not increasing utilization rather the inverse you are increasing under utilization. And this is a code thing. You can write code that will break down the prediction so it doesn't work well on Maxwell, and vice versa code that works well on Maxwell will not work the best on GCN.

We are talking about a per SMX/CU level here not ALU level I think that is where the confusion was coming from in the past, as no one really knew this on either GCN or Maxwell as it wasn't explained that well. AMD kept harping on their ACE's was the cause of this, it is not, the ACE's have nothing to do with how in each of their CU's different ALU's can be doing different shader types at the same time. ACE's feed the CU's the instructions but that is about all they do.
 
Why do you talk crap about async. He is not wrong. Everyone has the same opinion about it. Its plain and simple just as he described it.
 
its is an architectural thing, but its not because there is a lack of ability to do so. nV's Maxwell cards did static scheduling of their SMX's, so if they don't predict the workload right with async compute, things become a major problem for them as you now are not increasing utilization rather the inverse you are increasing under utilization. And this is a code thing. You can write code that will break down the prediction so it doesn't work well on Maxwell, and vice versa code that works well on Maxwell will not work the best on GCN.

fable legends did it, statically. Must have been hell to code lol.

AMD: graphics + compute executed concurrently WITHIN ONE CU, to boost utilization.

NVIDIA: graphics + compute executed concurrently WITHIN ONE GPC, to boost utilization. static partitioning of SMs at drawcall boundary, with Pascal this limitation is lited and partitioning is dynamic.
 
Why do you talk crap about async. He is not wrong. Everyone has the same opinion about it. Its plain and simple just as he described it.
We're not "talking crap about async". If anything, he is.

He conflates async shaders with async compute, that should be enough to show you how little he understands it.

If you ask him to explain further I guarantee you he's gonna start talking about context switching. That's because he's simply incapable of understanding that there's NO NEED for it to be on a single SM or CU level.

Context switching is needed only to execute compute on a SM that was previously doing graphics.

There's no need to do this at all, just assign graphics and compute to different SMs, and repartition the SMs as necessary
 
Guys context switching is not part of async compute at ALL! Context switching should only be used in time sensitive or latency sensitive situation where the program tells the driver, we need this done first before proceeding to anything else. What we saw in Maxwell once the static partitioning broke down, it started using context switching because the driver didn't know what to do anymore as the GPU was stalling. This is a big NO NO something with the program has caused to break the driver down.
 
We're not "talking crap about async". If anything, he is.

He conflates async shaders with async compute, that should be enough to show you how little he understands it.

If you ask him to explain further I guarantee you he's gonna start talking about context switching. That's because he's simply incapable of understanding that there's NO NEED for it to be on a single SM or CU level.

Context switching is needed only to execute compute on a SM that was previously doing graphics.

There's no need to do this at all, just assign graphics and compute to different SMs, and repartition the SMs as necessary
Maybe he just does not care how it works. We know it works.
 
Guys context switching is not part of async compute at ALL! Context switching should only be used in time sensitive or latency sensitive situation where the program tells the driver, we need this done first before proceeding to anything else. What we saw in Maxwell once the static partitioning broke down, it started using context switching because the driver didn't know what to do anymore as the GPU was stalling. This is a big NO NO something with the program has caused to break the driver down.

The whole point of async compute is not stalling other operations that are in flight, context switching kind of defies the point!

Not replying to you razor haha, just adding stuff for the general thread
 
exactly!

lets try to get rid of the confusion here guys, async shaders is the same thing as Pascal's ability to dynamic load balancing. Its really simple. Yeah AMD still has a small advantage here, they have finer granularity then pascal.
 
exactly!

lets try to get rid of the confusion here guys, async shaders is the same thing as Pascal's ability to dynamic load balancing. Its really simple. Yeah AMD still has an small advantage here, they have finer granularity then pascal.

Or rather, the end result is the same, but they're different approaches to the same problem
 
Why do you talk crap about async. He is not wrong. Everyone has the same opinion about it. Its plain and simple just as he described it.

Exactly. Flat out, NVidia has no hardware async solution, and because they have to emulate it with software, it comes off terrible and negates any advantage.

Given the raw horsepower of the new NVidia cards, I don't think its a big deal that they don't have it, so until it becomes a deciding factor, I don't understand people getting their panties in a twist over it, or trying to rationalize a way to say Nvidia has a good Async solution when they don't.
 
Yes increasing clock speed is a different approach. Not a solution .:)


See now I start getting annoyed and here is why

A few posts ago I explained why overclocking is relevant to the discussion, and you appeared to understand the validity of the argument. Maybe that was ngfidel actually

Clocking isn't the solution, dynamic load balancing is.

If you don't understand that, then you really have no place on a serious discussion about async; in much the same way as an airline passenger has nothing to add to a discussion about avionics
 
Yes increasing clock speed is a different approach. Not a solution .:)
Exactly. Flat out, NVidia has no hardware async solution, and because they have to emulate it with software, it comes off terrible and negates any advantage.

Given the raw horsepower of the new NVidia cards, I don't think its a big deal that they don't have it, so until it becomes a deciding factor, I don't understand people getting their panties in a twist over it, or trying to rationalize a way to say Nvidia has a good Async solution when they don't.



what are you guys talking about? This is the problem you guys aren't understanding anything about the architecture yet you want to talk about as if you do.

Simple you don't have async compute, you don't get DX 12 certification.

You don't need async shaders or dynamic load balancing to have async compute.

How hard is that to understand?

Put it in math terms having A as a requirement doesn't mean B is also a requirement. But having B will encompass A.
 
what are you guys talking about? This is the problem you guys aren't understanding anything about the architecture yet you want to talk about as if you do.

Simple you don't have async compute, you don't get DX 12 certification.

You don't need async shaders or dynamic load balancing to have async compute.

How hard is that to understand?

Put it in math terms having A as a requirement doesn't' mean B is also a requirement. But having B will encompass A.

I dunno about Imhotep, but maths is unlikely to help Zion understand it
 
Well this is a simple logic concept.

To get to B you need A, but only A is a requirement to begin with you don't B.
 
what are you guys talking about? This is the problem you guys aren't understanding anything about the architecture yet you want to talk about as if you do.

Simple you don't have async compute, you don't get DX 12 certification.

You don't need async shaders or dynamic load balancing to have async compute.

How hard is that to understand?

Put it in math terms having A as a requirement doesn't mean B is also a requirement. But having B will encompass A.


You tell us. Everything we're heard up until now on async compute is in line with what I said.

Then there is you who gives us a story completely different than what we've heard and read from different sources, leaks, and reports, and we're all just supposed to believe you because you said it?

In light of that, you tell me why it's NOT hard to take what you are saying seriously when it contradicts what's already published out there...
 
Simple you don't have async compute, you don't get DX 12 certification.

I think what the sticking point is hardware vs software to deal with it. Their argument is "if it ain't hardware, it ain't REAL async compute!" The result doesn't matter, how you get there is what counts.
 
You tell us. Everything we're heard up until now on async compute is in line with what I said.

Then there is you who gives us a story completely different than what we've heard and read from different sources, leaks, and reports, and we're all just supposed to believe you because you said it?

In light of that, you tell me why it's NOT hard to take what you are saying seriously when it contradicts what's already published out there...

Yeah because the stories you are listening to are marketing people? Wow they didn't even know shit about the underlying architecture of their own GPU's let alone their competitors......

That was obvious when you had programmers and GDC tell it how it is.

The entire confusion started with Hallock (AMD marketing person) stating he thought nV's architecture is using context switching and the ACE's of GCN helped to async compute properly. Yet ACE's have nothing to do with how CU's execute kernels at a ALU level.

Months later at GDC, where AMD and nV programmers stated different architectures need different paths for async to work properly on either hardware. It had nothing to do with ACE's. It had to do with scheduling of the tasks and understanding timing of compute items vs. timing of graphics items.
 
I think what the sticking point is hardware vs software to deal with it. Their argument is "if it ain't hardware, it ain't REAL async compute!" The result doesn't matter, how you get there is what counts.

The argument is that a software solution will almost always be far more inefficient to a hardware one. AMD cards do Async better than NVidia. Period. And it will remain that way until NVidia puts hardware capable of doing ASync in their cards. That isn't a debate point - that is FACT.

What is also fact is that given how good the new lineup of NVidia cards is, whether Nvidia can properly do Async is right now completely and utterly irrelevant. NVidia has the faster card by a longshot, even without async. Hence why I don't see the point in Nvidia fans getting all pissy and up in arms unless its over some silly "bragging rights".
 
Yeah because the stories you are listening to are marketing people? Wow they didn't even know shit about the underlying architecture of their own GPU's let alone their competitors......

That was obvious when you had programmers and GDC tell it how it is.

He literally presents no evidence at all. I haven't seen so much as a link from him in the past four or five months
 
Yeah because the stories you are listening to are marketing people? Wow they didn't even know shit about the underlying architecture of their own GPU's let alone their competitors......

That was obvious when you had programmers and GDC tell it how it is.

Stories tend to include more than just marketers. Like programmer interviews, dev interviews, from multiple sources and multiple sites, since async became a thing last fall.

Seriously, if you had to fall back on that sad sack argument, then you deep down already know we're right.
 
WOW...

I knew my posting these tweets would be a catalyst for bickering, and I debated if H was the right place for this, but the amount of disinformation being posted here is staggering! (This includes reading posts I would normally ignore)

First of all, I'm most disappointed with Brent's posts.

Second, Thief under Mantle was the first PC game using Async. Thief under Mantle, and in Crossfire/mGPU to boot, is amazingly fluid and responsive. Gameplay can be debated as good or bad, but the technical achievement of using Async on top of the old UE is unmatched still today by any other game trying bolt on DX12. This gives me great hope the Square Enix coding team knows what they are doing very well, and Deus Ex should be amazing from a technical standpoint.

Third, DX12 enables the use of multiple CPU cores much better than DX11. Putting requests through the Async queue from multiple threads, and from multiple CPU core RELIEVES the game developers from a debugging nightmare of thread request sequences and combinations from the CPU to the GPU, a HUGE obstacle to using multiple CPU cores. Yes there is more work upfront for the engine coders, but once the engine work is done, which per the tweets a lot of devs are getting there, there will be time saved on a per game basis, not increased! If developement time was increased any developer worth his salt would be going... WTF?! Not praising it! Consoles have more developement tools than PCs, that's no secret, so yes you may see developements there first. But the code is making it's way to PCs too.
 
  • Like
Reactions: Zuul
like this
Stories tend to include more than just marketers. Like programmer interviews, dev interviews, from multiple sources and multiple sites, since async became a thing last fall.

Seriously, if you had to fall back on that sad sack argument, then you deep down already know we're right.


Only one guy from Oxide, and we know he is full of it. To many flip flops and then going to talk about something that he didn't fully understand what was going on goes to a hardware forum and talks about it where those people knew even less? Why didn't he go to B3D? why he would have gotten shut down.

why didn't he go to nV and ask them directly? Well I guess their line of communication wasn't there?

it was obvious the fall out gave them them huge exposure.
 
The argument is that a software solution will almost always be far more inefficient to a hardware one. AMD cards do Async better than NVidia. Period. And it will remain that way until NVidia puts hardware capable of doing ASync in their cards. That isn't a debate point - that is FACT.

What is also fact is that given how good the new lineup of NVidia cards is, whether Nvidia can properly do Async is right now completely and utterly irrelevant. NVidia has the faster card by a longshot, even without async. Hence why I don't see the point in Nvidia fans getting all pissy and up in arms unless its over some silly "bragging rights".

My argument is that someone who is clueless about hardware will always be less reliable than someone with half a clue. The one-eyed man in the kingdom of the blind is king.

Doing a half-assed job of anything makes you a one-eyed man in the kingdom of the blind.

You are not even doing a half assed job
 
I dunno about Imhotep, but maths is unlikely to help Zion understand it
Dynamic load balancing is not async compute. Sofware emulating is apparently good enough for DX12 certification. High clocks make up for the resources needed to emulate. Now, how do we not understand it.
 
WOW...

I knew my posting these tweets would be a catalyst for bickering, and I debated if H was the right place for this, but the amount of disinformation being posted here is staggering! (This includes reading posts I would normally ignore)

First of all, I'm most disappointed with Brent's posts.

Second, Thief under Mantle was the first PC game using Async. Thief under Mantle, and in Crossfire/mGPU to boot, is amazingly fluid and responsive. Gameplay can be debated as good or bad, but the technical achievement of using Async on top of the old UE is unmatched still today by any other game trying bolt on DX12. This gives me great hope the Square Enix coding team knows what they are doing very well, and Deus Ex should be amazing from a technical standpoint.

Third, DX12 enables the use of multiple CPU cores much better than DX11. Putting requests through the Async queue from multiple threads, and from multiple CPU core RELIEVES the game developers from a debugging nightmare of thread request sequences and combinations from the CPU to the GPU, a HUGE obstacle to using multiple CPU cores. Yes there is more work upfront for the engine coders, but once the engine work is done, which per the tweets a lot of devs are getting there, there will be time saved on a per game basis, not increased! If developement time was increased any developer worth his salt would be going... WTF?! Not praising it! Consoles have more developement tools than PCs, that's no secret, so yes you may see developements there first. But the code is making it's way to PCs too.


its not just the engine programmers, when a game is made and they want to create a custom shader, that is going to change things in the pathways as well.
 
Dynamic load balancing is not async compute. Sofware emulating is apparently good enough for DX12 certification. High clocks make up for the resources needed to emulate. Now, how do we not understand it.


dynamic load balancing is not a software solution. static load balancing is.

Think of it this way, Pascal can now pick and choose which ALU's in specific SMX's does either graphics or compute kernel executions on the fly. This wasn't possible on Maxwell 2, once partitioned that partition must stay the same way until that SMX has done all work assigned to it, then it can be repartitioned.

Remember the talks about fences and draw call boundaries? Guess were the repartitioning take place for maxwell for the code to be effective?
 
Last edited:
This is why discussion with some users never goes anywhere, every one step forward we take two steps back to explain tangential concepts, lol it's the Texas two step. Kyle should be happy
 
God, its so much fun reading this forum. Especially Leidra ,and razort. LOVE u guys, u know quite a lot, and you 2 are the biggest nvida fanboys i ever seen, well Leidra is, razort, have lots of good posts.

Why do i comment in this tread, ITs hilarious how this to fanbois know more abaut actually programmers, i meen peoples that program games.

Man you are good.

And no, im old, not new to Hardocp at all.

And i also love Brents comment, he is good, there are no games that will use it.

D : I can't tell if this is criticism or a compliment lol. It's i e l d r a
 
God, its so much fun reading this forum. Especially Leldra ,and razort. LOVE u guys, u know quite a lot, and you 2 are the biggest nvida fanboys i ever seen, well Leldra is, razort, have lots of good posts.

Why do i comment in this tread, ITs hilarious how this to fanbois know more abaut this, than actually programmers, i meen peoples that program games.

Man you are good.

And no, im old, not new to Hardocp at all.

And i also love Brents comment, he is good, there are no games that will use it.


That's just it - really what it comes down to is a type of fan who cannot handle that his company of choice doesn't do EVERYTHING perfect. Its the definition of fandom - turn a blind eye to the bad and love the good. Yet fanbois take it to a new level, as they leverage what technical understanding they have to prove their card absolutely cannot lose ever to the competition.

Which is downright silly.
 
Status
Not open for further replies.
Back
Top