3dmark Time Spy DX12 test

JustReason · Jul 14, 2016

Ieldra said:
Its not at the gpc level, nowhere in the whitepaper does it mention it being at the gpc level.

Typical mahigan, calling it a clever hack

Then do you have a better explanation? So far he is the only one I have seen and his history (his work history not posts) makes his assumptions rather likely, at least more than some others.

Ieldra · Jul 14, 2016

JustReason said:
Then do you have a better explanation? So far he is the only one I have seen and his history (his work history not posts) makes his assumptions rather likely, at least more than some others.

He gets it out of nowhere man, claims it's all in the whitepaper, even quotes it. Says nothing about partitioning at the gpc level, if you want proof he's really just blowing steam consider this.

He says Pascal divides it's gpcs between graphics and compute, that's block of 5SMs.

He says that's a hack. He says that's neither parallel nor concurrent execution.

It's obviously parallel if different gpcs are assigned to it.

In his haste to point the finger he forgot to make sense

For the record, in the past he claimed GCN's ability to perform a context switch quickly (this isn't Preemption! Context switch is the time taken to save your context and retrieve new one you want to work on) allows for a CU to do graphics and compute in parallel.

How's it parallel if there's a context switch... Duuude.

Then at some point he decided maxwell's problem is register overflow and the inability to keep many threads in flight. Crowed about how right he was when gp100 was announced with doubled register file.

GP104 doesn't have it, still runs fine. Lol.

Ieldra · Jul 14, 2016

JustReason said:
Then do you have a better explanation? So far he is the only one I have seen and his history (his work history not posts) makes his assumptions rather likely, at least more than some others.

Yes.
My explanation is the following.

First of all

1. SM partitioning of L1 cache is different in graphics and compute contexts for nvidia
2. Maxwell achieved async compute by static partitioning of SMs between compute and graphics, this could only happen at drawcall boundaries
3. Pascal can do partioning dynamically now

Worth noting that fast preemption still won't allow 'async shaders' concept to work on nvidia hardware, leaving aside that it's not needed and of little benefit.

Fast preemption is for real time workloads as explained by Jonah Alben, but the context is then saved to vram, which takes several thousand cycles. Gcn does it in one through a dedicated cache for storing context

JustReason · Jul 14, 2016

Ieldra said:
He gets it out of nowhere man, claims it's all in the whitepaper, even quotes it. Says nothing about partitioning at the gpc level, if you want proof he's really just blowing steam consider this.

He says Pascal divides it's gpcs between graphics and compute, that's block of 5SMs.

He says that's a hack. He says that's neither parallel nor concurrent execution.

It's obviously parallel if different gpcs are assigned to it.

In his haste to point the finger he forgot to make sense

For the record, in the past he claimed GCN's ability to perform a context switch quickly (this isn't Preemption! Context switch is the time taken to save your context and retrieve new one you want to work on) allows for a CU to do graphics and compute in parallel.

How's it parallel if there's a context switch... Duuude.

Then at some point he decided maxwell's problem is register overflow and the inability to keep many threads in flight. Crowed about how right he was when gp100 was announced with doubled register file.

GP104 doesn't have it, still runs fine. Lol.

First he isn't pretending to know all, and like anyone he can be wrong. However reading that white paper he is correct. Nvidia is quite sly as they don't directly comment on concurrent graphics and compute they however mention the use of pre-emption for -context switching- (not verbatim but easily inferred). And based on the white paper it is fair to conclude he is correct that compute tasks can not be issued at the same time as graphics tasks but can run concurrently given resources are available. Probably why we see increases with async on Pascal but no where near as huge as AMD.

Ieldra · Jul 14, 2016

JustReason said:
First he isn't pretending to know all, and like anyone he can be wrong. However reading that white paper he is correct. Nvidia is quite sly as they don't directly comment on concurrent graphics and compute they however mention the use of pre-emption for -context switching- (not verbatim but easily inferred). And based on the white paper it is fair to conclude he is correct that compute tasks can not be issued at the same time as graphics tasks but can run concurrently given resources are available. Probably why we see increases with async on Pascal but no where near as huge as AMD.

No they don't! They talk about dynamic load balancing for async!

Preemption discussion is for asynchronous time warp in VR.

The context switch is still slow as on maxwell, only difference is you can now preempt at pixel level

The graphics + compute parallelism stems form work being done on different SMs entirely. On GCN it's concurrent within CUs as well as parallel across them

Presbytier · Jul 14, 2016

JustReason said:
Probably why we see increases with async on Pascal but no where near as huge as AMD.

We really have to stop with this "massive bump" in AMD performance nonsense. The bump would not be nearly that big if the AMD cards where performing well to begin with. The reason we see the increase and see such a large one is do to the disproportionately bad performance it was getting before.

JustReason · Jul 14, 2016

Presbytier said:
We really have to stop with this "massive bump" in AMD performance nonsense. The bump would not be nearly that big if the AMD cards where performing well to begin with. The reason we see the increase and see such a large one is do to the disproportionately bad performance it was getting before.

You have to stop this "chicken or the egg" argument. You keep doing it. Look at it like this: You have the latest Intel CPU and you are forced to use SSe2 although you have access to AVX2. Now is it Intels fault that they cant run SSe2 as fast as AVX2? I know you are thinking competition, but that has no bearing on the point. Fact is that, like the example, AMD GCN needs that DX12 to show their full potential, just isn't going to happen with DX11 no matter how many times you complain about it.

JustReason · Jul 14, 2016

Ieldra said:
No they don't! They talk about dynamic load balancing for async!

Preemption discussion is for asynchronous time warp in VR.

The context switch is still slow as on maxwell, only difference is you can now preempt at pixel level

The graphics + compute parallelism stems form work being done on different SMs entirely. On GCN it's concurrent within CUs as well as parallel across them

The part in yellow was the part I was talking about not the VR section. Besides, read between the lines. I always look at what they don't say and it is glaring how absent the discussion is on CONCURRENT execution although they start with the loaded async term definition.

Presbytier · Jul 14, 2016

JustReason said:
You have to stop this "chicken or the egg" argument. You keep doing it. Look at it like this: You have the latest Intel CPU and you are forced to use SSe2 although you have access to AVX2. Now is it Intels fault that they cant run SSe2 as fast as AVX2? I know you are thinking competition, but that has no bearing on the point. Fact is that, like the example, AMD GCN needs that DX12 to show their full potential, just isn't going to happen with DX11 no matter how many times you complain about it.

This is not chicken and the egg, the argument is several years ago AMD locked themselves in an architecture that gave their customers no advantage and are just now seeing it finally pay off; the question now is will it even matter. They lost a lot of market share betting on future tech that took forever to see the light of day. Nvidia is not going to just roll over for AMD now that DX12 and Vulkan are showing up. Also GCn still sucks at Tesselation.

Ieldra · Jul 14, 2016

JustReason said:
The part in yellow was the part I was talking about not the VR section. Besides, read between the lines. I always look at what they don't say and it is glaring how absent the discussion is on CONCURRENT execution although they start with the loaded async term definition.

Parallelism is a subset of concurrency.

The part in yellow is about VR. You can preempt at the pixel level, therefore do your async timewarp as late as possible. Entiende?

Preemption has nothing to do with async compute, the whole point of async compute is making full use of your shaders when there is idle time.

Strictly concurrent graphics and compute on one SM( equivalent of async shaders) is never going to happen on nvidia. And I don't see why you expect it should? It would be solving a problem that does not exist

tybert7 · Jul 14, 2016

Ieldra said:
He's very obviously wrong, as he has been in the paste in his haste to point out some fatal flaw in nvidia's engineering

Dedicated an entire gpc to compute is a huge waste of geometry resources, it makes no sense at all. Preemption has nothing to do with async compute, I don't know why people keep mentioning it. Preemption is almost antithetical to async compute, because you don't want to preempt the task unless there's a stall

But can't the preemption go both ways? If there was a graphics workload being crunched through but some compute task needed to be completed first I thought the point of the better preemption of pascal was to be able to stop the current work, switch to the compute task, then go back to the graphics work (not at the same time, but better able to switch between the different types of workloads). You say preemption has nothing to do with anything but vr in other posts, but I'm not sure that's true.

Anyway, here is a later post from Mahigan.

[Various] Futuremark Releases 3DMark Time Spy DirectX 12 Benchmark - Page 14

He thinks nvidia is getting around their inability to work on graphics+compute tasks at the same time by faster switching, which works so long as the gpcs do not become overloaded. I have no idea if that's how it works, but it would explain why the 1080 is able to stay ahead, being the non cut down chip, I'd assume it has more of those on the chip that can be tapped before it becomes saturated.... Or not, not sure that's how it works. But there would be a way to test it out. Someone could create a workload simulation that would ramp up the use of those resources and see how pascal handled different levels of complexity vs gcn. Perhaps the breaking point is so high it becomes a moot point, or perhaps this has nothing to do with anything, but pascal is doing something different from maxwell to be less crippled with dx12 tasks. Is it just more narrow alus as someone else suggested?

Someone more neutral that is not ledra needs to chime in.

heatlesssun · Jul 14, 2016

JustReason said:
Honestly both AMD and Nvidia have a perfect setup as neither is competing with the other so sales are as good as they can get. Now that doesn't really help price on the high end(1080/1070) but does quell any price hikes on the low end(480).

Which is kind of my point. AMD is able to push lower end hardware better. Still nothing that tops the charts. I don't think some understand just how important that crown is. I bet all of the tea in China that AMD would trade the 480 for a 499 or whatever tops the performance benchmarks across the board and sells at $600+.

We've been down this road TOO many times with AMD. "Look! DX 12/Vulkan games will put AMD on top!" And while those API look to serve AMD well, they don't have anything on the top yet.

MotionBlur · Jul 14, 2016

I got 7540 with my 2 480s and a virtually stock 6700k: I scored 7 540 in Time Spy

Kor · Jul 14, 2016

613 on my Skull Canyon NUC, awww yeah.

primetime · Jul 14, 2016

MotionBlur said:
I got 7540 with my 2 480s and a virtually stock 6700k: I scored 7 540 in Time Spy

interesting....that graphics score beat even the highest single 1080 graphics score...So thats not bad at all. Your cpu only loses to a 1000 dollar cpu...so thats expected

Result

jhatfie · Jul 15, 2016

MSI GTX 1070 @ 2139/9400 and [email protected]
Score: 6713
GFX Score: 6643

I scored 6 713 in Time Spy

sleepybp · Jul 15, 2016

We should all welcome these DX12/vulkan gains especially for AMD. Even if you game on Nvidia, a more competitive AMD product will help to keep Nvidia's high prices in check

heatlesssun · Jul 15, 2016

sleepybp said:
We should all welcome these DX12/vulkan gains especially for AMD. Even if you game on Nvidia, a more competitive AMD product will help to keep Nvidia's high prices in check

I don't think anyone is debating this. Not all criticism is of the "that sucks" nature. I want to see an AMD GPU or CPU lead the pack. Not in price/performance ratios but win outright. If one wants to see prices kept in check this it what needs to happen. A $200 to $240 dollar card that draws a lot of power and is still slower than nVidia's offerings isn't going to keep prices in check, not at the high end where when you win you can charge much more.

Just think of the shock and awe that would have occurred if AMD had put out say something this the price range of a 1070 that was beating out a 1080 in Doom with with Vulkan. The conversation would be MUCH different.

Presbytier · Jul 15, 2016

sleepybp said:
We should all welcome these DX12/vulkan gains especially for AMD. Even if you game on Nvidia, a more competitive AMD product will help to keep Nvidia's high prices in check

heatlesssun said:
I don't think anyone is debating this. Not all criticism is of the "that sucks" nature. I want to see an AMD GPU or CPU lead the pack. Not in price/performance ratios but win outright. If one wants to see prices kept in check this it what needs to happen. A $200 to $240 dollar card that draws a lot of power and is still slower than nVidia's offerings isn't going to keep prices in check, not at the high end where when you win you can charge much more.

Just think of the shock and awe that would have occurred if AMD had put out say something this the price range of a 1070 that was beating out a 1080 in Doom with with Vulkan. The conversation would be MUCH different.

I want to second this sentiment, unlike what may come across I am not a Nvidia fanboy and would love to see some nice offerings from AMD. I do not think the Rx 480 is a good card, for the number of transistors it has it should be performing much better. This is what annoys me most about AMD is their engineering on bot their CPU and GPU have fallen really far behind. I hope they prove me wrong with Zen and Vega.

heatlesssun · Jul 15, 2016

Presbytier said:
I want to second this sentiment, unlike what may come across I am not a Nvidia fanboy and would love to see some nice offerings from AMD. I do not think the Rx 480 is a good card, for the number of transistors it has it should be performing much better. This is what annoys me most about AMD is their engineering on bot their CPU and GPU have fallen really far behind. I hope they prove me wrong with Zen and Vega.

It's this simple. If AMD made the fastest CPUs and GPUs they'd be on top and people would pay top dollar for them. Sure there are fanboys but people have gone nuts over that nonsense. AMD right now is charging $200 for it's best consumer CPU and $240 for it's best consumer GPU. A decade ago I paid a lot more money for ATI GPUs and AMD GPUs. And these prices didn't go down because of unicorn magic. They went down because that was the BEST AMD has now. Not low end, not mid tier, the BEST. And we're now again waiting on the BEST from AMD. And of the best ends up and the low end of the price range, it's not too hard to figure what happened.

All AMD needs to do is make the best. That's it. Nothing more. Nothing less. And then everything will change. That'll be the unicorn magic.

ChrisM47 · Jul 15, 2016

I scored 6 282 in Time Spy

CPU: Intel Core i7-3930k 6-Core @ 4.6GHz
Motherboard: ASUS Sabertooth X79
GPU: Gigabyte NVIDIA GeForce 980Ti G1 Gaming @ 1535 Core / 8000 Mem
RAM: G.Skill 16GB DDR3 2400MHz
SSD: Samsung 850 PRO 512GB
Case: Corsair Air 540

Presbytier · Jul 15, 2016

heatlesssun said:
It's this simple. If AMD made the fastest CPUs and GPUs they'd be on top and people would pay top dollar for them. Sure there are fanboys but people have gone nuts over that nonsense. AMD right now is charging $200 for it's best consumer CPU and $240 for it's best consumer GPU. A decade ago I paid a lot more money for ATI GPUs and AMD GPUs. And these prices didn't go down because of unicorn magic. They went down because that was the BEST AMD has now. Not low end, not mid tier, the BEST. And we're now again waiting on the BEST from AMD. And of the best ends up and the low end of the price range, it's not too hard to figure what happened.

All AMD needs to do is make the best. That's it. Nothing more. Nothing less. And then everything will change. That'll be the unicorn magic.

One of the best CPU I had was an Athlon 64 x2. I rocked that till 2013.

Lord Risky · Jul 15, 2016

heatlesssun said:
A $200 to $240 dollar card that draws a lot of power.....

Yet another person from the peanut gallery pushing the false narrative that a 480 draws a lot of power. 1060 @ 120w while the 480 is at 150-160w. The light bulb in my refrigerator bridges that difference.

MangoSeed · Jul 15, 2016

tybert7 said:
He thinks nvidia is getting around their inability to work on graphics+compute tasks at the same time by faster switching, which works so long as the gpcs do not become overloaded.

Futuremark Releases 3DMark Time Spy DirectX 12 Benchmark

Anandtech said:
In the case of async compute, Futuremark is using it to overlap rendering passes, though they do note that "the asynchronous compute workload per frame varies between 10-20%."

You only need to context switch if there's a high priority task that needs to get executed immediately e.g. vr timewarp.
Otherwise you can just wait for an SM to become available.

3dmark is using async to overlap processing of the current frame with processing of the next frame. This may or may not require pre-emption on nVidia hardware. You obviously would not pre-empt the current frame to process the next one.

People really need to understand that async isn't a feature like tessellation or texture filtering. It's usefulness is completely dependent on how much time the hardware spends idling and its ability to fill those idle cycles with useful work.

pencea · Jul 15, 2016

Here are my results with the 6700k using the GTX 1080 g1 gaming.

I uploaded the video here for those who's interested to see how the card performs.

Pieter3dnow · Jul 15, 2016

Since this test can't use any commercial code that is running in real game engines it is useless.
The premise of a benchmark is that there is code that is an approximation of what is out there in the real world and would give you a similar result. This might have "worked" under DX11 since you would use driver calls that everyone needed to use.

But under DX12 your code is the only source the driver has no to little(*100) impact. Commercial engines as Frostbyte Nitrous Unreal (or others for that matter) do not share code with this and there for you could never have any real world significance from this test. Even if this test had code from one of the engines it still would not reflect on others ..

Lord Risky · Jul 15, 2016

Pieter3dnow said:
Since this test can't use any commercial code that is running in real game engines it is useless.
The premise of a benchmark is that there is code that is an approximation of what is out there in the real world and would give you a similar result. This might have "worked" under DX11 since you would use driver calls that everyone needed to use.

But under DX12 your code is the only source the driver has no to little(*100) impact. Commercial engines as Frostbyte Nitrous Unreal (or others for that matter) do not share code with this and there for you could never have any real world significance from this test. Even if this test had code from one of the engines it still would not reflect on others ..

Haha, you used the word "code" five times. Are you a programmer?

Shintai · Jul 15, 2016

This test made it easy to see what developers get sponsored and who doesn't. Now the async myth is put to its grave. As expected the best cases gives 5-10% performance uplift. Not 20%. And Pascal supports async.

Lord Risky · Jul 15, 2016

Shintai said:
This test made it easy to see what developers get sponsored and who doesn't. Now the async myth is put to its grave. As expected the best cases gives 5-10% performance uplift. Not 20%. And Pascal supports async.

Wait, what?

chameleoneel · Jul 15, 2016

I'm sort of surprised it has taken this long, for compute to really kind of get going. What I mean is that the PS4 has the same compute section, as a 7970. Devs could have been doing that for awhile, now. And that could have translated over to PC, quite easily. It will happen, eventually. And I wonder if AMD will do a big driver update for Pitcairn and Tahitii----or will they just keep pretending those don't exist and only focus on the new stuff? I mean, come on AMD. You made a big deal about the relevancy of the PS4's compute setup and I have a GPU which shares that architecture (7870).

JustReason · Jul 15, 2016

Shintai said:
This test made it easy to see what developers get sponsored and who doesn't. Now the async myth is put to its grave. As expected the best cases gives 5-10% performance uplift. Not 20%. And Pascal supports async.

Wrong on both counts, at least misleading on one. First the benchmark isn't heavy on use of async so it isn't a "best case" scenario. Second is this use of the word "support". The Nvidia supporters are trying overly hard to make the bridge of true support and adaption whilst obfuscating the terminology. Async as most will associate it as being concurrent execution of tasks. Pascal seems to get closer to this still as of yet is not capable of doing. If some of the discussions are correct then it can process concurrently/parallel but still must assign the tasks serially which is why it isn't true support of async. It is also why 5-10% is modest and proof positive that this benchmark is not making heavy use of async.

Ieldra · Jul 15, 2016

tybert7 said:
But can't the preemption go both ways? If there was a graphics workload being crunched through but some compute task needed to be completed first I thought the point of the better preemption of pascal was to be able to stop the current work, switch to the compute task, then go back to the graphics work (not at the same time, but better able to switch between the different types of workloads). You say preemption has nothing to do with anything but vr in other posts, but I'm not sure that's true.

Anyway, here is a later post from Mahigan.

[Various] Futuremark Releases 3DMark Time Spy DirectX 12 Benchmark - Page 14

He thinks nvidia is getting around their inability to work on graphics+compute tasks at the same time by faster switching, which works so long as the gpcs do not become overloaded. I have no idea if that's how it works, but it would explain why the 1080 is able to stay ahead, being the non cut down chip, I'd assume it has more of those on the chip that can be tapped before it becomes saturated.... Or not, not sure that's how it works. But there would be a way to test it out. Someone could create a workload simulation that would ramp up the use of those resources and see how pascal handled different levels of complexity vs gcn. Perhaps the breaking point is so high it becomes a moot point, or perhaps this has nothing to do with anything, but pascal is doing something different from maxwell to be less crippled with dx12 tasks. Is it just more narrow alus as someone else suggested?

Someone more neutral that is not ledra needs to chime in.

I'm as neutral as can be thanks very much, I resent the implication, this is technology we're talking about, there's no space for bias.

Even if it is how Mahigan says it is (supremely unlikely; GPC level partitioning) that would still be perfectly valid async compute, so I don't know why he keeps calling it a hack, saying there's no concurrency and that NV get around their 'inability' with faster switching.

Switching is precisely what you don't want to do.

I'll try with an example to explain what I mean, from here on G is a graphics task, C is compute.

Now let's say on GCN G is currently being execution on some unit. What GCN does is preempt a stalled task (determined by ACE programming) and then transfer the entire context of to a dedicated cache. This allows switching contexts very quickly (one or two clocks afaik) therefore allows multiple tasks to be overlapped in some time frame, but they never run at the same exact time ; there's a context switch.

With Pascal's fine grain preemption, you have to less wait for that task you're preempting to effectively stop running, and you can dispatch new work sooner, which is great for time critical operations like async time warp.

The actual context is then stored off die, in vram. Now GCN doesn't have pixel level preemption afaik, so Pascal's preemption is faster, but an actual context *switch* is much much slower.

Async shaders will never work on Pascal, but async compute works just fine. Async shaders is beneficial to GCN and that's great, but it's simply not needed (not worth the cost) for efficient alu utilization.

Ieldra · Jul 15, 2016

JustReason said:
Wrong on both counts, at least misleading on one. First the benchmark isn't heavy on use of async so it isn't a "best case" scenario. Second is this use of the word "support". The Nvidia supporters are trying overly hard to make the bridge of true support and adaption whilst obfuscating the terminology. Async as most will associate it as being concurrent execution of tasks. Pascal seems to get closer to this still as of yet is not capable of doing. If some of the discussions are correct then it can process concurrently/parallel but still must assign the tasks serially which is why it isn't true support of async. It is also why 5-10% is modest and proof positive that this benchmark is not making heavy use of async.

1. This benchmark doesn't make heavy use of async compute

Evidence? Reasoning?

2. Pascal gets closer, but is incapable of doing concurrent g+c

Evidence? Reasoning?

What do you mean by assign the tasks serially?

~10% gains from async is perfectly in line with statements made previously by by developers, on pc I mean. 10% async gains isn't proof of a light async load, if anything its proof that async isn't a magic bullet and a reminder to some people to come back down from their cloud and touch ground. It's time for you and your hot air balloon to descend

It's very very simple, the more the alu is utilized, the more performance you're extracting from the machine.

JustReason · Jul 15, 2016

Ieldra said:
I'm as neutral as can be thanks very much, I resent the implication, this is technology we're talking about, there's no space for bias.

Even if he is how Mahigan says it is (supremely unlikely) that would still be perfectly valid async compute, so I don't know why he keeps calling it a hack, saying there's no concurrency and that NV get around their 'inability' with faster switching.

Switching is precisely what you don't want to do.

I'll try with an example to explain what I mean, from here on G is a graphics task, C is compute.

Now let's say on GCN G is currently being execution on some unit. What GCN does is preempt a stalled task (determined by ACE programming) and then transfer the entire context of to a dedicated cache. This allows switching contexts very quickly (one or two clocks afaik) therefore allows multiple tasks to be overlapped in some time frame, but they never run at the same exact time ; there's a context switch.

With Pascal's fine grain preemption, you have to less wait for that task you're preempting to effectively stop running, and you can dispatch new work sooner, which is great for time critical operations like async time warp.

The actual context is then stored off die, in vram. Now GCN doesn't have pixel level preemption afaik, so Pascal's preemption is faster, but an actual context *switch* is much much slower.

Async shaders will never work on Pascal, but async compute works just fine. Async shaders is beneficial to GCN and that's great, but it's simply not needed (not worth the cost) for efficient alu utilization.

From what I gathered AMDs ACEs kind of act independent so it can accept tasks whilst others are doing others regardless of G or C. Whereas Nvidia must context switch at least with Maxwell. It looked to me that Pascal was looking more like a ACE-like setup, could be that their die-plot-chart was missing some parts (is the Gigathread gone?). Preemption is only necessary when a task must be moved for a more critical one, yes, but looks like for now that is how Nvidia is circumventing it - the concurrent execution of tasks.

Ieldra · Jul 15, 2016

JustReason said:
From what I gathered AMDs ACEs kind of act independent so it can accept tasks whilst others are doing others regardless of G or C. Whereas Nvidia must context switch at least with Maxwell. It looked to me that Pascal was looking more like a ACE-like setup, could be that their die-plot-chart was missing some parts (is the Gigathread gone?). Preemption is only necessary when a task must be moved for a more critical one, yes, but looks like for now that is how Nvidia is circumventing it - the concurrent execution of tasks.

No! They're not using preemption for async compute! Preemption means just stopping a task. The context is stored in vram man, that is very slow. Stop saying this stuff! It's just untrue, and it makes no sense, none at all.

I've been saying this to you for several months now, there's *no requirement* for async to be done within a single unit like on gcn. None at all. You are talking about async ***SHADERS*** that's an amd term for the system is described with the fast context switching on the CUs.

JustReason · Jul 15, 2016

Ieldra said:
1. This benchmark doesn't make heavy use of async compute

Evidence? Reasoning?

2. Pascal gets closer, but is incapable of doing concurrent g+c

Evidence? Reasoning?

What do you mean by assign the tasks serially?

~10% gains from async is perfectly in line with statements made previously by by developers, on pc I mean. 10% async gains isn't proof of a light async load, if anything its proof that async isn't a magic bullet and a reminder to some people to come back down from their cloud and touch ground. It's time for you and your hot air balloon to descend

It's very very simple, the more the alu is utilized, the more performance you're extracting from the machine.

The yellow part is why I don't trust you, that and you have given little evidence against any theory anyone else can give. I have spent the last year or more explaining why AMDs performance in DX11 was an architecture issue not a driver issue. Turns out I was right (posted lots of documented proof in other threads). Here again using Nvidias own whitepapers one can easily discern a reasonable theory on how async works on Pascal and based on their own words can see that concurrent execution of tasks is apparently not supported. Now as I mentioned it looked as though they could concurrently execute but they kind of hinted at still needing a context switch but I felt that was per each GPC(? I think that was the right terminology) and that they might be able to be independent of the others which would allow for parallel execution but for some reason not in assigning of these tasks hence the serial part. You kind of see this with the graph/picture where the G was finished and then the C was able to spread from one GPC to it to finish faster, so it is safe to assume that they context switch per GDC not the chip as a whole as Maxwell seemed to do.

Ieldra · Jul 15, 2016

JustReason said:
using Nvidias own whitepapers one can easily discern a reasonable theory on how async works on Pascal and based on their own words can see that concurrent execution of tasks is apparently not supported

Okay. Concurrency not supported.

JustReason said:
as I mentioned it looked as though they could concurrently execute but they kind of hinted at still needing a context switch

Oh. So concurrency is supported ?

Ah right. Only fake concurrency needs a context switch :S

JustReason said:
able to be independent of the others which would allow for parallel execution but for some reason not in assigning of these tasks hence the serial part.

So now it can execute in parallel, but with serial assignment ? Serial assignment of what man ?

JustReason said:
The yellow part is why I don't trust you, that and you have given little evidence against any theory anyone else can give

Mahigan's argument is totally inconsistent, and there's no evidence because nobody is forced to discuss the inner workings of the architecture. Async compute != async shaders.

You argument is inconsistent as well, it looks like muddied mix of mahigan's post you linked earlier and your insistence on talking about preemption when it has nothing to do with what we're talking about.

Here's proof of that

"These asynchronous workloads create two scenarios to consider"

The first is what we're talking about, and they mention load balancing in hardware.

The second...

is time critical stuff, and they mention preemption.

MangoSeed · Jul 15, 2016

JustReason said:
I have spent the last year or more explaining why AMDs performance in DX11 was an architecture issue not a driver issue. Turns out I was right (posted lots of documented proof in other threads).

AMDs inability to fully utilize their hardware is quite obviously a software/driver issue. The (hardware) architecture is fine.

Here again using Nvidias own whitepapers one can easily discern a reasonable theory on how async works on Pascal and based on their own words can see that concurrent execution of tasks is apparently not supported.

Concurrent execution is supported and proven via directed tests. However, the way in which it's supported is less elegant than AMD's ACEs.

Not sure what you mean by running concurrently via context switching and assigning tasks serially. Sorry but that's just mumbo jumbo that doesn't make any sense.

Ieldra · Jul 15, 2016

MangoSeed said:
AMDs inability to fully utilize their hardware is quite obviously a software/driver issue. The (hardware) architecture is fine.

Concurrent execution is supported and proven via directed tests. However, the way in which it's supported is less elegant than AMD's ACEs.

Not sure what you mean by running concurrently via context switching and assigning tasks serially. Sorry but that's just mumbo jumbo that doesn't make any sense.

Mumbo jumbo indeed, "it can execute them concurrently, but still appears to need a context switch"... What does that even mean?

To be fair though MangoSeed , it's not just a software limitation, there are hardware issues that contribute

Shintai · Jul 15, 2016

JustReason said:
Wrong on both counts, at least misleading on one. First the benchmark isn't heavy on use of async so it isn't a "best case" scenario. Second is this use of the word "support". The Nvidia supporters are trying overly hard to make the bridge of true support and adaption whilst obfuscating the terminology. Async as most will associate it as being concurrent execution of tasks. Pascal seems to get closer to this still as of yet is not capable of doing. If some of the discussions are correct then it can process concurrently/parallel but still must assign the tasks serially which is why it isn't true support of async. It is also why 5-10% is modest and proof positive that this benchmark is not making heavy use of async.

Seems like you are making up your own definition of async.

How many times have async been attributed with the performance gains, when for example in DOOM, its due to GCN Shader Extentions instead?

3dmark Time Spy DX12 test

razor1 is my Lover

I Promise to RTFM

I Promise to RTFM

razor1 is my Lover

I Promise to RTFM

[H]ard|Gawd

razor1 is my Lover

razor1 is my Lover

[H]ard|Gawd

I Promise to RTFM

2[H]4U

Extremely [H]

[H]ard|Gawd

2[H]4U

Supreme [H]ardness

[H]ard|Gawd

Limp Gawd

Extremely [H]

[H]ard|Gawd

Extremely [H]

n00b

[H]ard|Gawd

AMDFanboy EchoChamber Member

[H]ard|Gawd

Weaksauce

Supreme [H]ardness

AMDFanboy EchoChamber Member

Supreme [H]ardness

AMDFanboy EchoChamber Member

Supreme [H]ardness

razor1 is my Lover

I Promise to RTFM

I Promise to RTFM

razor1 is my Lover

I Promise to RTFM

razor1 is my Lover

I Promise to RTFM

[H]ard|Gawd

I Promise to RTFM

Supreme [H]ardness