Ashes of the Singularity Day 1 Benchmark Preview @ [H]

When you do the full review please get a AMD FX chip for review. People are dying to see if DX12 is really an AMD cpu's saving grace or if it will still get trounced.

I'm thinking its gonna help ALOT with mid range systems. see below...

If ya'll want an FX platform to try out on, I have an 8320 that does 4.5 and a Sabertooth 990FX R2.0 that I would be willing to donate for the cause. A pre-paid label would be nice :D

my 8120 is @ 4560 on a GB 990XA-UD3 w/ MSI 280x @ 1125/1500(dropped it for testing). so it might give you a bit of an idea about performance increases.

Im going to venture a guess that this game is extremely hard on overclocks (there ALREADY getting red screens without overclocking) lol i guess well see:)

ummmm yeah, definitely definitely does not like my gpu OC'd! in DX12 I HAD to back off my OC, particularly the RAM. Dropped core from 1135 to 1125 and the RAM had to go down to stock 1500. anything over gave glitches! DX11 seems fine at my normal 1135/1700

ill only post a couple segments 'cause they are huge but you'll get the idea. i also only ran at standard setting no aa as my system is getting old... :( none the less, I am totally impressed with the improvements made especially in the medium and heavy batch sections! also of note is that cpu/gpu usage went from 75-90% gpu and 30-40 cpu in DX11 to 90-100% gpu and ~80% cpu in DX12!

edit: RAM usage in DX11=3GB, DX12=2.8GB

==========================================================================
Quality Preset: Custom
==========================================================================

Resolution: 1920x1080
Fullscreen: True
Bloom Quality: High
PointLight Quality: High
Glare Quality: Low
Shading Samples: 4 million
Terrain Shading Samples: 8 million
Shadow Quality: Low
Temporal AA Duration: 0
Temporal AA Time Slice: 0
Multisample Anti-Aliasing: 1x
Texture Rank : 1

== Total Avg Results DX11 =================================================
Total Time: 60.000820 ms per frame
Avg Framerate: 27.173849 FPS (36.800087 ms)
Weighted Framerate: 26.569708 FPS (37.636845 ms)
Average Batches per frame: 11289.113281 Batches
==========================================================================


== Results ===============================================================
BenchMark 0
TestType: Full System Test
== Sub Mark Normal Batch =================================================
Total Time: 70.932716 ms per frame
Avg Framerate: 44.069931 FPS (22.691210 ms)
Weighted Framerate: 42.706245 FPS (23.415779 ms)
Average Batches per frame: 4088.271240 Batches
== Sub Mark Medium Batch =================================================
Total Time: 56.017582 ms per frame
Avg Framerate: 31.097382 FPS (32.157047 ms)
Weighted Framerate: 30.400705 FPS (32.893974 ms)
Average Batches per frame: 8638.016602 Batches
== Sub Mark Heavy Batch =================================================
Total Time: 53.052155 ms per frame
Avg Framerate: 18.001154 FPS (55.551994 ms)
Weighted Framerate: 17.667601 FPS (56.600784 ms)
Average Batches per frame: 21141.052734 Batches


== Total Avg Results DX12 =================================================
Total Time: 60.001919 ms per frame
Avg Framerate: 45.331207 FPS (22.059858 ms)
Weighted Framerate: 44.524979 FPS (22.459305 ms)
CPU frame rate (estimated if not GPU bound): 56.855946 FPS (17.588310 ms)
Percent GPU Bound: 77.092522 %
Driver throughput (Batches per ms): 4230.979004 Batches
Average Batches per frame: 11946.156250 Batches
==========================================================================


== Results ===============================================================
BenchMark 0
TestType: Full System Test
== Sub Mark Normal Batch =================================================
Total Time: 70.978905 ms per frame
Avg Framerate: 51.930923 FPS (19.256350 ms)
Weighted Framerate: 50.733509 FPS (19.710838 ms)
CPU frame rate (estimated if not GPU bound): 61.388744 FPS (16.289631 ms)
Percent GPU Bound: 52.069389 %
Driver throughput (Batches per ms): 3290.672607 Batches
Average Batches per frame: 4513.521973 Batches
== Sub Mark Medium Batch =================================================
Total Time: 55.982483 ms per frame
Avg Framerate: 46.210888 FPS (21.639923 ms)
Weighted Framerate: 45.536053 FPS (21.960621 ms)
CPU frame rate (estimated if not GPU bound): 59.649719 FPS (16.764538 ms)
Percent GPU Bound: 87.486526 %
Driver throughput (Batches per ms): 4045.292725 Batches
Average Batches per frame: 8851.755859 Batches
== Sub Mark Heavy Batch =================================================
Total Time: 53.044365 ms per frame
Avg Framerate: 39.551796 FPS (25.283300 ms)
Weighted Framerate: 38.900742 FPS (25.706451 ms)
CPU frame rate (estimated if not GPU bound): 50.733711 FPS (19.710760 ms)
Percent GPU Bound: 91.721634 %
Driver throughput (Batches per ms): 4825.051758 Batches
Average Batches per frame: 22473.193359 Batches
 
Last edited:
I seriously doubt they're in deep shit. They control a much larger portion of the PC gaming market. Think about that for a second. They could easily strong-arm developers in not utilizing the full benefits of Async compute.

The problem is, AMD has Microsoft's "protection", and AMD made themselves a wise partner. And on the flip side, NVidia charges gaming companies to use the gameworks code for optimization. AMD is giving all the tools away for free AND is working with developers on the best ways to use them. So that genie CAN'T be put back in the bottle. It's already out, and you have people like those making DOOM commenting on how AMD cards are doing much better, and DICE showing off Battlefront with Radeon Rigs in full, advertised view.

It all carries weight, and you get the feeling it all snuck up on NVidia. So yeah, while AMD isn't going to be putting them out of business anytime soon, deep shit is not an overstatement.
 
AFAIK, Nvidia drivers have AC disabled, so your not even getting that working to test .

Dx12 brings low level access to HW , so company A may have feature A for there chips to help performance and company B might have feature B for theirs. Because company A can't run B feature is meaningless as it wasn't meant to help .
Your going to see a lot of big changes in performance in Dx12 in beginning IMO untill this gets sorted to a degree.
If AC is disabled then it's because the hardware is faulty. Probably due to a bug. The problem with this is that usually you can get around this by using the CPU, but that would be against the point of having Async Compute.

Ashes of Singularity developers have been dealing with Nvidia for a while on fixing this issue, and have even debated disabling Async Compute. They aren't ever going to be able to close the performance gap. Not without Gameworks, and even then it might not be enough.

The only reason Hitman is "built properly" (async) is because AMD implemented it themselves.
It should be telling when big studios like Microsoft and Square Enix both don't care enough to do it "properly" without AMD's direct involvement.

A feature or set of features that developers don't care to implement without being prompted by a GPU manufacturer, which runs better on one brand of hardware over another -- sounds a lot like GameWorks to me.

Although it seems to be you're using the phrase "built properly" in place of "designed for AMD GPUs".
If DX12 is a performance loss for both AMD and Nvidia, then yes it wasn't done properly. The people behind the Talos Principle said it themselves with Vulkan. If the game wasn't specifically built to utilize the API, then it won't see any benefits.

As for AMD's involvement, I'm sure Nvidia can tweak the drivers just like how AMD had to tweak there's after a Gameworks game was released. Except that I don't think it's possible for Nvidia to do that with DX12.

Can't happen with Pascal. The issue here isn't one of software but of hardware. This is an engineering issue. So in other words, not only can Pascal not do this, but if they were already past the Engineering phase of Volta when the Async phenomenon hit, they won't have it for that either.

NVidia could be in deep shit for a while.
I think Nvidia will get Async Compute working for Pascal. They have the engineers to do it.
 
If AC is disabled then it's because the hardware is faulty. Probably due to a bug. The problem with this is that usually you can get around this by using the CPU, but that would be against the point of having Async Compute.

Ashes of Singularity developers have been dealing with Nvidia for a while on fixing this issue, and have even debated disabling Async Compute. They aren't ever going to be able to close the performance gap. Not without Gameworks, and even then it might not be enough.


If DX12 is a performance loss for both AMD and Nvidia, then yes it wasn't done properly. The people behind the Talos Principle said it themselves with Vulkan. If the game wasn't specifically built to utilize the API, then it won't see any benefits.

As for AMD's involvement, I'm sure Nvidia can tweak the drivers just like how AMD had to tweak there's after a Gameworks game was released. Except that I don't think it's possible for Nvidia to do that with DX12.


I think Nvidia will get Async Compute working for Pascal. They have the engineers to do it.


I hear this so often, why do you think nvidia need async compute?

Let's assume nvidia hardware is incapable of it. Their performance is still very competitive, and since their gpus are pretty much fully utilized without it, there's no need for it.

In other words, AMD hardware would have nothing to gain from 'async' were it not underutilized

This is all hypothetical by the way; nvidia hardware is demonstrably capable of asynchronous execution, certainly in Cuda. However ever if we assume it's not capable of it, it's really no big deal at all. Just a lot of noise being made by amd marketing department
 
Last edited:
This is hypothetical, but what the above poster is saying makes a lot of logical sense.

If AMD GPU's level of performance without Async is here _____ and you use Async the performance goes up to here --------- Ok, there you are.

Now, what if Pascal comes out, and is at this level of performance --------- without Async. It would be matching AMD with Async, and thus the gameplay experience would be equal despite the use of Async.

This is why its important to keep focus on the gameplay experience delivered, because it doesn't matter what pathways the GPU takes to get there, what matters is the end result and the performance and gameplay experience delivered at the end of the pipeline.

That is why I do not feel that any feature like Async Compute makes or breaks a GPU for me, for gaming. What matters to me is the end result in gameplay performance. Again, Async is going to completely depend on game developer support implementation anyway, so that's what it comes down to.

AMD may be able to achieve higher performance with less work, and it may take NVIDIA a lot more work to achieve that same performance. But, in the end, if they achieve the same goal (and the power draw and heat is similar) then what's the difference?

Now I could be wrong, maybe Async Compute will be the thing that developers embrace and it takes off and rules the world. I really do not know, that is why we need a lot more DX12 games under the belt before we can determine what the trend will end up being. It could be anything at this point, all we can do is evaluate on a game by game basis and see what provides the best experience for the money.

So far, AMD is 1, NVIDIA 0 for the DX12 gameplay experience in a game. In Tomb Raider, neither perform better. Now let's see what happens in Hitman, which is up next.
 
Last edited:
Test the game with Async Compute off vs on. It's in the settings file for the game.

Other tech sites tested it awhile back and they found with Async Off. AMD and NV gains in DX12 under all situations. But AMD's gains were smaller.

With Async Compute on, AMD's gains were huge. NV tanks in performance because apparently they "haven't enabled" it in their drivers yet... lol
 
Hitman DX12 also has AMD GPUs gaining in performance. The press release for that lists Async Compute being used. The developers IO at GDC 2016 presented on their engine, that Async Compute boosts performance by 10% additional.

Sadly NV GPUs run DX12 slower than DX11 in Hitman as well.

In Ashes, to turn on/off Async Compute:

Documents\My Games\Ashes of the Singularity. Look for the settings.ini file.

"AsyncComputeOff=0"

Change the 0 to 1 to disable AC. Watch as even NV GPUs run Ashes faster in DX12 vs DX11.
 
Hitman DX12 also has AMD GPUs gaining in performance. The press release for that lists Async Compute being used. The developers IO at GDC 2016 presented on their engine, that Async Compute boosts performance by 10% additional.

Sadly NV GPUs run DX12 slower than DX11 in Hitman as well.

In Ashes, to turn on/off Async Compute:

Documents\My Games\Ashes of the Singularity. Look for the settings.ini file.

"AsyncComputeOff=0"

Change the 0 to 1 to disable AC. Watch as even NV GPUs run Ashes faster in DX12 vs DX11.
I'm going to test this tomorrow, but as you can see in the benchmark results I posted above dx11 doesn't seem to perform much better than dx12.

Hitman is a terrible example, AMD runs it better at dx11 and the gains from dx12 are minimal. iO interactive confessed that async is hard to tune and isn't worth the effort ,check gdc coverage for this .

Speaking generally though, doesn't matter if async is possible or not. If it runs better with it off, it should be off .its meant to boost performance

Not to fan the flames or anything, but I'd much rather a lack of 'async ' support than lack of conservative rasterization for example ; HFTS likely cannot be done in real time without it . this is a huge load of marketing hype imo, and both hitman and AotS are very amd leaning games . time will tell
 
Last edited:
What are you saying ? AMD solved Witcher 3 issues by reducing tessellation factor in drivers, you can hardly call that NVIDIA adding something special.
I
Oxide developed the engine for Ashes specifically to showcase Mantle; can't get any more amd biased, can't see how this is different from gameworks features that run better on nv hw.

Additionally, with dx12 developers need ihv specific paths; its on them to tune for different hardware. If you don't want amd/NVIDIA specific optimization to be your responsibility stick to dx11. Everyone has wrong expectations
 
Now, what if Pascal comes out, and is at this level of performance --------- without Async. It would be matching AMD with Async, and thus the gameplay experience would be equal despite the use of Async.

And when AMD's next gen hits and reclaims the lead......because of Async..........your back where you started.

These DX12 reviews look like a hell of a lot of work by the way. Great job.
 
And when AMD's next gen hits and reclaims the lead......because of Async..........your back where you started.

These DX12 reviews look like a hell of a lot of work by the way. Great job.

Again, async only helps when you're not fully utilizing the hardware, are you counting on AMD being unable to maximize utilization without additional developer work for implementation and tuning of `async` ? This is not a bonus from dev perspective ; extra work that is unnecessary for the other ihv
 
Oxide developed the engine for Ashes specifically to showcase Mantle; can't get any more amd biased, can't see how this is different from gameworks features that run better on nv hw.

No. Oxide developed an engine that at the time could only be done on a low level APi due to the restrictions on hardware via DX11. At that time Mantle was it. Surprisingly its development transitioned smoothly to DX12 (weird) and we now have the completed project.

Nvidia has produced fast, low power high performing DX11 gaming parts and have had great acclaim for doing so since Fermi. AMD has produced cards with a bit higher power because they weren't stripped down for just gaming. Unfortunately this added GPU power wasn't realized until now, and they were bashed for it (except for crypto miners and the such). With AMD's next gen, they will bring efficiency back in focus while maintaining its features that we are beginning to see the fruits of. Except for brand loyalty I don't see where Nvidia will have any advantage except in the diminishing DX11 market.
 
  • Like
Reactions: N4CR
like this
No. Oxide developed an engine that at the time could only be done on a low level APi due to the restrictions on hardware via DX11. At that time Mantle was it. Surprisingly its development transitioned smoothly to DX12 (weird) and we now have the completed project.

Nvidia has produced fast, low power high performing DX11 gaming parts and have had great acclaim for doing so since Fermi. AMD has produced cards with a bit higher power because they weren't stripped down for just gaming. Unfortunately this added GPU power wasn't realized until now, and they were bashed for it (except for crypto miners and the such). With AMD's next gen, they will bring efficiency back in focus while maintaining its features that we are beginning to see the fruits of. Except for brand loyalty I don't see where Nvidia will have any advantage except in the diminishing DX11 market.

You have one data point to support this view ,and it's Ashes of the Singularity; a game in which I can make up for the difference in performance by overclocking Maxwell. This is far too simplistic a view of things for it to be true. The critical thing to understand regarding the performance boost from async is that it comes from latency hiding; GCN takes a relatively long, and unpredictable, amount of time to execute any one task, this is central to the issue , and a major deviation from nvidia`s design .

Async only increases hardware utilization, why you consider it to be such an important feature is beyond me. Conservative rasterization enables the first implementation of real-time ray tracing in a game and nobody bats an eye, but Ashes implementing technology that was tuned specifically for AMD hw ( you need ihv specific code,see gdc slides ) that is potentially unnecessary on nv hw and its suddenly the death of NVIDIA. Please.

IHV specific code is a must.

As for Maxwell being stripped down for gaming... They stripped it of fp64, which isn't used outside of high precision scientific work. Maxwell is a compute powerhouse , some friends and I have been ripping through neural net training models with minimal effort using CUDA and CUDA derived libraries .

Async Compute Only Boosted HITMAN's Performance By 5-10% on AMD cards; Devs Say It's "Super Hard" to Tune
 
Last edited:
You have one data point to support this view ,and it's Ashes of the Singularity; a game in which I can make up for the difference in performance by overclocking Maxwell. This is far too simplistic a view of things for it to be true. The critical thing to understand regarding the performance boost from async is that it comes from latency hiding; GCN takes a relatively long, and unpredictable, amount of time to execute any one task, this is central to the issue , and a major deviation from nvidia`s design .


IHV specific code is a must.

As for Maxwell being stripped down for gaming... They stripped it of fp64, which isn't used outside of high precision scientific work. Maxwell is a compute powerhouse , some friends and I have been ripping through neural net training models with minimal effort using CUDA and CUDA derived libraries .

I never mentioned Async. We know as a fact Nvidia chose lighter game orientated hardware. It isn't only the things they stripped down, it was the future thinking of elements they chose NOT to add. You can't software your way out of a lack of hardware. But you can for a while if your stuck on a hardware node with a DX11 platform. Worked like a hot damn for the first 5 years. We'll see about the next 5.
 
I never mentioned Async. We know as a fact Nvidia chose lighter game orientated hardware. It isn't only the things they stripped down, it was the future thinking of elements they chose NOT to add. You can't software your way out of a lack of hardware. But you can for a while if your stuck on a hardware node with a DX11 platform. Worked like a hot damn for the first 5 years. We'll see about the next 5.
Lighter game oriented hardware? The future thinking elements they chose not to add? You literally couldn't be more vague, because there's no substance to this,at all.

I meant they do not have special software that nvidia cant access code to. its all on the tables with AMD. whatever could make the game favor AMD would be open.

Ashes was not developed just to showcase mantle. Even if it was, dx12 is again something both can take advantage of. There's no magic here. If nvidia could, they would. Its been said over and over by oxide that they worked with both and possibly even more with nvidia. So I dont see any reason to call bias towards AMD here. There is no locked off code when AMD is involved. If things are moving forward in such a way that nvidia's hardware cannot keep up, such is life. I am fine as long as a game is running well on both vendors. Its when it starts running like crap that I start wondering
I said the engine was developed to showcase mantle, and it was.

AMD claims Hitman has the best async implementation , yet the gains are way smaller than in AotS and a 390x matches a Fury X. Great stuff .

There doesn't need to locked off code, all you need is a DX12 developer unwilling to implement ihv-specific code, and Oxide have boasted about this in the past. DX11 for lazy devs, DX12 for devs with resources
 
it was not.

AMD said best Async yet iirc. what are the gains with async in ashes? you do not know and I have not seen anyone test with/without. All we have so far is dx11 vs dx12. you can't assume thats all due to async.

Oxide already has ihv specific code from nvidia in there. They were working with nvidia for months. Nvidia has all their code. this is a perfect example really of working with both IHVs. if nvidia could have done anything, it would have been done.
I've seen numerous tests with async on vs off. Anandtech have done them, for one.

Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only ‘vendor’ specific code is for Nvidia where we had to shutdownasync compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path.

Read more: Oxide Games Dev Replies On Ashes of the Singularity Controversy

This is exactly the problem, there should be vendor specific code in dx12. Asynchronous execution of tasks from multiple queues should be handled differently for amd/NVIDIA, this has been made abundantly clear by now, just look at the damn gdc presentations.

Its not on NVIDIA to rewrite their code to better suit their hw, its on the dev.

The FAC that Nitrous engine was built for mantle is a clear indication it is tuned for and hw
 
Last edited:
==========================================================================

== Hardware Configuration ================================================
GPU 0: AMD Radeon R9 200 Series
CPU: GenuineIntel
Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
Physical Cores: 6
Logical Cores: 12
Physical Memory: 12279 MB
Allocatable Memory: 134217727 MB
==========================================================================


== Configuration =========================================================
API: DirectX 12
==========================================================================
Quality Preset: Crazy
==========================================================================

Resolution: 2560x1440
Fullscreen: True
Bloom Quality: High
PointLight Quality: High
Glare Quality: High
Shading Samples: 16 million
Terrain Shading Samples: 16 million
Shadow Quality: High
Temporal AA Duration: 0
Temporal AA Time Slice: 0
Multisample Anti-Aliasing: 4x
Texture Rank : 1


== Total Avg Results =================================================
Total Time: 60.006039 ms per frame
Avg Framerate: 23.609848 FPS (42.355206 ms)
Weighted Framerate: 23.197466 FPS (43.108158 ms)
CPU frame rate (estimated if not GPU bound): 79.978493 FPS (12.503362 ms)
Percent GPU Bound: 99.935188 %
Driver throughput (Batches per ms): 5514.889648 Batches
Average Batches per frame: 17046.714844 Batches
==========================================================================


Uses every last drop of vram 2988MB and for first time ever it used 2400 MB of dynamic vram....Must be a dx12 thing i guess. I wold have to lower some settings to get higher fps, buts its not bad for the age of this gpu.
 
The only reason Hitman is "built properly" (async) is because AMD implemented it themselves.
It should be telling when big studios like Microsoft and Square Enix both don't care enough to do it "properly" without AMD's direct involvement.

A feature or set of features that developers don't care to implement without being prompted by a GPU manufacturer, which runs better on one brand of hardware over another -- sounds a lot like GameWorks to me.

Although it seems to be you're using the phrase "built properly" in place of "designed for AMD GPUs".
You don't think it could have anything to do with AMD's experience? Who else are they going to work with? nVidia? ;)
 
This is exactly the problem, there should be vendor specific code in dx12. Asynchronous execution of tasks from multiple queues should be handled differently for amd/NVIDIA, this has been made abundantly clear by now, just look at the damn gdc presentations.

Its not on NVIDIA to rewrite their code to better suit their hw, its on the dev.

The FAC that Nitrous engine was built for mantle is a clear indication it is tuned for and hw

I honestly wish you luck in trying to bring some semblance to the discussion (Kyle and Brent are being particularly diplomatic here, haha), but I fear you've got quite the uphill battle. The mob has lost their minds entirely (insert Joker meme here) rather than going, "wait a second, does this actually make sense?" I'm certainly not doubting the results of this comparison, but whenever I get such a results flip flop from prior results, my first question is to ask if I screwed up the test, the second is to ask what *exactly* changed?

As I wrote, we've already seen parity between the Fury X and the 980TI (again, big win for the Fury X to achieve that!) in prior betas. What happened to the code since then that there's such a disparity now? Doesn't that make anyone else go, "hmmmmmmmmmm"? As you say, something is up with the Nvidia path (much like has plagued AMD through most/all of DX11)
 
I honestly wish you luck in trying to bring some semblance to the discussion (Kyle and Brent are being particularly diplomatic here, haha), but I fear you've got quite the uphill battle. The mob has lost their minds entirely (insert Joker meme here) rather than going, "wait a second, does this actually make sense?" I'm certainly not doubting the results of this comparison, but whenever I get such a results flip flop from prior results, my first question is to ask if I screwed up the test, the second is to ask what *exactly* changed?

As I wrote, we've already seen parity between the Fury X and the 980TI (again, big win for the Fury X to achieve that!) in prior betas. What happened to the code since then that there's such a disparity now? Doesn't that make anyone else go, "hmmmmmmmmmm"? As you say, something is up with the Nvidia path (much like has plagued AMD through most/all of DX11)
IDk WTF your going on about...all the nvidia scores posted today were very good...hell the one right above earlier was even better
 
If you ask me, all of the chips are on the table. Nvidia and their customers lost,

Why? The Nvidia cards are the best at DX11 games and are sold as such and there are very few DX12 games. If Async compute turns out to be a game-changer and Pascal does not perform well, then Nvidia will just accelerate release of its next generation of cards.
 
Why? The Nvidia cards are the best at DX11 games and are sold as such and there are very few DX12 games. If Async compute turns out to be a game-changer and Pascal does not perform well, then Nvidia will just accelerate release of its next generation of cards.

Which NV cards are best at DX11? Maybe only the 980Ti, and at 1440p or below or single GPUs. 4K multi-GPU, even Fury X is faster.

NV releasing new gen hardware because current stuff is gimped doesn't actually help gamers on current stuff. It only helps NV sell more GPUs cos people have to upgrade more often.

Are you a gamer or an NV investor?
 
The perfromance gains or lack thereof really depend on the game in question.
As Brent stated above, ROTR doesn't see better performance in DX12 for any GPU.
According to several reports, the performance is actually worse in DX12, compared to DX11.

I just saw some summaries of the lessons learned sessions at GDC.
They clearly said, it's a lot harder to get the GPUs to perform well in DX12, than in DX11.
However, they also said, it's rather easy to get the CPU to perform much better than in DX11.

Do you remember the only game so far in DX11 which could multithread up to 12 CPU cores/threads in DX11?
It was Crysis 3 - no other game before or after that was able to actually make good use of more than 6 threads.
Most stopped at 4 or even 2 threads.
I'm not sure if any English mag / site did check this kind of scaling.
I would have to point to PC Games Hardware again.

There are 2 stories on this.
The original benchmark and a short update with a later patch, which brought better performance for Intel CPUs, but just talks about a few CPUs:

Original story:
Crysis 3 im CPU-Test: AMDs FX-Prozessoren dominieren unsere Benchmarks

Update with later patch:
Crysis 3 im erneuten CPU-Test: Intel holt auf, AMD weiterhin stark
 
Again, async only helps when you're not fully utilizing the hardware, are you counting on AMD being unable to maximize utilization without additional developer work for implementation and tuning of `async` ? This is not a bonus from dev perspective ; extra work that is unnecessary for the other ihv
AMD has fixed function hardware for async compute, ACE's. They don't utilize unused CU's for it. What it does for AMD it would do for nVidia too if they could utilize it. It allows for functions to be performed out of order instead of serially. It is a bonus for devs because it allows improved performance. What's not a bonus is that nVidia can't take advantage.
 
  • Like
Reactions: Yakk
like this
AMD has fixed function hardware for async compute, ACE's. They don't utilize unused CU's for it. What it does for AMD it would do for nVidia too if they could utilize it. It allows for functions to be performed out of order instead of serially. It is a bonus for devs because it allows improved performance. What's not a bonus is that nVidia can't take advantage.
I used to work for a place where we were constantly exposed to new hardware. It was great because sometimes some of these things had no official name. Chipzilla reps would show up with Extreme Edition Cpus to gift everyone.Making sure that they do twice as much as AMD does. Those are some of my experiences with strong arming, bribing. This whole thing comes down to one thing for me. Its H and its new tech. its open to everyone. I cant seem to wrap my head around the fact
that things like that are being downplayed. Yes there are always going to be two camps. We are lucky that there are because they both drive the tech faster to our Pcs. Ensuring affordable prices.
I rest my case :D !
 
Last edited:
For the millionth time, asynchronous execution has been supported since Kepler, Maxwell can perform operatioms out of order. I can't stomach having this debate again, I always end up being attacked and asked to 'prove' nvidia won't 'gimp' maxwell.

The actual performance gains for amd come from latency hiding through concurrent, and sometimes parallel, execution of tasks from different queues enables through dx12's multi-engine capability.



The effectiveness of such an approach is entirely dependent on the implementation, and as I've stated previously, it is hard to tune and requires a lot of work on the developer
end.

Can't get quote to work, but user who said that it's all about the ACEs and not the CUs; you thinking performance gain comes out of thin air? Fiji was being fully utilized and async magic gives it another 10%? No.

Everything regarding 'async' in Ashes has been misinterpreted; it's not about asynchronous execution, it's about concurrent execution of commands from multiple queues, having no ihv-specific code is not a bonus, and finally even if you want to consider these results as the ultimate, most valid, performance metric, as you can see from my results a 980Ti matches a Fury X when you overclock.

If you account for oc on fury X it ends up with a 10% lead, despite all the moaning about async.

Meanwhile we have real time ray tracing in the division, amd hardware can't do it, but amd has sold the 'async' pipedream so well it's gotten an unholy segment of the 'gamer' demographic to talk smack about nvidia and async, whilst not really knowing what it is they're talking about.

Come on.

This is low effort.

By the way, when I say this is an AMD biased game I'm not saying it's intentional; just that the code path was designed to run well on GCN, possibly thanks to Nitrous' legacy as a mantle demo. GCN and Kepler/Maxwell benefit from 'async' in different use cases; you don't want long running shaders in the compute queue on nvidia hardware, whereas on AMD hardware you do.

Certainly the presence of the ACEs simplifies developer's work as less care must be taken when dividing commands across queues, however the fact remains that it is a question of utilization, and the poster who claimed async doesn't enable the use of otherwise inactive CUs is uninformed. That is exactly what it does; the execution time of any one task on GCN is high; this allows multiple commanda from multiple queues to be dispatched asynchronously, and executed concurrently, sometimes even in parallel.

The downside is that it's quite random, hence developers saying it is hard to tune; the commands dispatched from the compute queue will 'steal' an execution unit from a running shader temporarily, this can result in its execution being delayed.

This discussion is many things, but simple isn't one of them
 
Last edited:
For the millionth time, asynchronous execution has been supported since Kepler, Maxwell can perform operatioms out of order. I can't stomach having this debate again, I always end up being attacked and asked to 'prove' nvidia won't 'gimp' maxwell.

The actual performance gains for amd come from latency hiding through concurrent, and sometimes parallel, execution of tasks from different queues enables through dx12's multi-engine capability.



The effectiveness of such an approach is entirely dependent on the implementation, and as I've stated previously, it is hard to tune and requires a lot of work on the developer
end.

Can't get quote to work, but user who said that it's all about the ACEs and not the CUs; you thinking performance gain comes out of thin air? Fiji was being fully utilized and async magic gives it another 10%? No.

Everything regarding 'async' in Ashes has been misinterpreted; it's not about asynchronous execution, it's about concurrent execution of commands from multiple queues, having no ihv-specific code is not a bonus, and finally even if you want to consider these results as the ultimate, most valid, performance metric, as you can see from my results a 980Ti matches a Fury X when you overclock.

If you account for oc on fury X it ends up with a 10% lead, despite all the moaning about async.

Meanwhile we have real time ray tracing in the division, amd hardware can't do it, but amd has sold the 'async' pipedream so well it's gotten an unholy segment of the 'gamer' demographic to talk smack about nvidia and async, whilst not really knowing what it is they're talking about.

Come on.

This is low effort.
PROVE IT!!! Saying it exists doesn't make it so. CUDA coding isn't generally used in gaming. The facts at hand are indisputable and no amount of screaming and sidestepping will change it.
 
async compute is different that async shaders. async compute is definitely supported. async shaders are the ones that have issues with scheduling.

What did I say about using proper terms, how many times, then you wouldn't have these stupid arguments?

When people can't even use the right terms, you think they can even have a conversation about the topic?
 
its hardware supported, they can do concurrent queues AKA multi engine (this is a DX12 spec and there is no way around it), the thing is, will it be as beneficial to them as AMD and by what means, that is a different topic altogether.
 
PROVE IT!!! Saying it exists doesn't make it so. CUDA coding isn't generally used in gaming. The facts at hand are indisputable and no amount of screaming and sidestepping will change it.

Interesting that you say screaming won't change things, if I compare my post to yours at face value it seems like mine is more thought out, and certainly seems like made more of an effort to write it.

CUDA is used extensively in games, what are you on about?

As for multi-engine concurrency, there is a hardware dispatcher on maxwell that runs uncoupled from both the graphics and compute command processors, but as far as I know that's a dead end because dx12 requires barrier support.

Why do I need to prove it? My entire argument has been; I know 'async' works, even if we assume it doesn't, Maxwell does a pretty good job of fully occupying it's resources on its own, there's much less to gain than on gcn; there's less latency to hide
 
this is just a definition difference. When others say async, they are talking about concurrent execution of commands from multiple queues. You are just choosing to interpret it as something else so you can say nvidia supports it. What IHV specific code should be in there that isn't? Some how they could have done async on nvidia with it?




AMD can't run it because its locked to nvidia. Not that anybody should care considering the performance impact. Its an example of the difference between AMD leaning and nvidia leaning software. HFTS will sell their pascal GPUs for them over maxwell. They are likely designed to take advantage of the new fancy gimpmaxwelworks features.

AMD can't run it because there's no conservative rasterization on their hardware.

The ihv specific I'm referring to is a concerted effort to optimize for different hardware architectures, an effort that is now required from the developers if they choose to develop on dx12. I've repeated myself many times, why do you keep asking me this ?


Are you under the impression that you just start writing code using d3d12 and it just works well on all hardware?

If you want to know exactly how nvidia/amd hardware should be treated differently go read the gdc feature on dx12, or take a programming course, or commit yourself to an extended period of self-study.
 
Interesting that you say screaming won't change things, if I compare my post to yours at face value it seems like mine is more thought out, and certainly seems like made more of an effort to write it.

CUDA is used extensively in games, what are you on about?

As for multi-engine concurrency, there is a hardware dispatcher on maxwell that runs uncoupled from both the graphics and compute command processors, but as far as I know that's a dead end because dx12 requires barrier support.

Why do I need to prove it? My entire argument has been; I know 'async' works, even if we assume it doesn't, Maxwell does a pretty good job of fully occupying it's resources on its own, there's much less to gain than on gcn; there's less latency to hide
I am in route home from vacation and reading your drivel and obvious infatuation with Nvidia and your facts be damned approach is really annoying. With or without Async AMD wins this round. Amd no amount of banter and obfuscation from you will change it.
 
I am in route home from vacation and reading your drival an
In my experience people who say 'drival' and talk about their ride home from vacation instead of the topic at hand have run out of unfounded claims to make and need to figure out their next move.

Bravado is good for an audience, and you dony have one.
 
In my experience people who say 'drival' and talk about their ride home from vacation instead of the topic at hand have run out of unfounded claims to make and need to figure out their next move.

Bravado is good for an audience, and you dony have one.
I made no claims I asked you to prove yours.
 
He can prove and its easy to prove. JustReason, you can go down that road with him, but it will just smack you in the face, his last three posts, shows he know more about the topic than you do.
 
With or without Async AMD wins this round. Amd no amount of banter and obfuscation from you will change it.

Unless someone has gone and changed the very definition of the word 'claim', I'll wager this is one.

Assuming you mean the entire DX12 generation when you say 'this round', you've decided that this one game determines the outlook for the next two years.

What exactly is it you want me to prove? Exactly.
 
He can prove and its easy to prove. JustReason, you can go down that road with him, but it will just smack you in the face, his last three posts, shows he know more about the topic than you do.
No he hasn't. I read the same B3D thread on the subject you posted in and seen the issues when asynchronous compute and graphics ques were attempted. Also have read more than enough material about both architectures and support to know when posters including yourself try and sidestep and muddy the discussions as to cover up the original issue of the discussion as to not acknowledge the issue especially when its Nvidia's behind over the coals.
 
Ok lets do this, start up another thread about async, out of all the other million out there, and lets do it. I want to see your reasoning, put your mouth where your call sign is. Instead of spouting out crap that doesnt' make any sense.

This is a specific topic about async compute, async shaders (multi engine) concurrency. I want to see that. I want to see your definitions, your explanations based on benchmarks, based on sample data from B3D's concurrent execution demo, how those graphics are correlated between the two. All of these in parrallel to how the architectural differences can influence the need for async or better wording for that, the affect of async on the pipeline execution.


then we can go from there.
 
For the millionth time, asynchronous execution has been supported since Kepler, Maxwell can perform operatioms out of order. I can't stomach having this debate again, I always end up being attacked and asked to 'prove' nvidia won't 'gimp' maxwell.

The actual performance gains for amd come from latency hiding through concurrent, and sometimes parallel, execution of tasks from different queues enables through dx12's multi-engine capability.



The effectiveness of such an approach is entirely dependent on the implementation, and as I've stated previously, it is hard to tune and requires a lot of work on the developer
end.

Can't get quote to work, but user who said that it's all about the ACEs and not the CUs; you thinking performance gain comes out of thin air? Fiji was being fully utilized and async magic gives it another 10%? No.

Everything regarding 'async' in Ashes has been misinterpreted; it's not about asynchronous execution, it's about concurrent execution of commands from multiple queues, having no ihv-specific code is not a bonus, and finally even if you want to consider these results as the ultimate, most valid, performance metric, as you can see from my results a 980Ti matches a Fury X when you overclock.

If you account for oc on fury X it ends up with a 10% lead, despite all the moaning about async.

Meanwhile we have real time ray tracing in the division, amd hardware can't do it, but amd has sold the 'async' pipedream so well it's gotten an unholy segment of the 'gamer' demographic to talk smack about nvidia and async, whilst not really knowing what it is they're talking about.

Come on.

This is low effort.

By the way, when I say this is an AMD biased game I'm not saying it's intentional; just that the code path was designed to run well on GCN, possibly thanks to Nitrous' legacy as a mantle demo. GCN and Kepler/Maxwell benefit from 'async' in different use cases; you don't want long running shaders in the compute queue on nvidia hardware, whereas on AMD hardware you do.

Certainly the presence of the ACEs simplifies developer's work as less care must be taken when dividing commands across queues, however the fact remains that it is a question of utilization, and the poster who claimed async doesn't enable the use of otherwise inactive CUs is uninformed. That is exactly what it does; the execution time of any one task on GCN is high; this allows multiple commanda from multiple queues to be dispatched asynchronously, and executed concurrently, sometimes even in parallel.

The downside is that it's quite random, hence developers saying it is hard to tune; the commands dispatched from the compute queue will 'steal' an execution unit from a running shader temporarily, this can result in its execution being delayed.

This discussion is many things, but simple isn't one of them
It actually is if we stay focused. The topic started as a DX11 vs 12. My question was if the benefits of performance are due to async. Brent's observations are that it is not all due to async. If this is the case than Nvidia is not doing DX12 as well as AMD >? Someone had mentioned that you can compare it by disabling the feature in the game. Kyle was kind enough to test with slower Cpus to examine the impact of CPU power in DX12. Hitman benchmarks are the best exemples of dx12 and async at play to date. AMD is marketing this async thing heavily, but they also prove that it works well if implemented correctly. How do you want to defend the fact that Nvidia is not so hot at async. No amount of tech jargon will make a difference to the reality. The reality and core question of this topic :)
 
Back
Top