DX11 vs DX12 Intel 4770K vs 5960X Framerate Scaling @ [H]

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
55,596
DX11 vs DX12 Intel 4770K vs 5960X Framerate Scaling - This is our third installment of looking at the new DX12 API and how it works better with a game such as Ashes of the Singularity. We have looked at how DX12 is better at distributing workloads across multiple CPU cores than DX11 in AotS. This time we are compare Haswell and Haswell-E processors' AotS performance.
 
Seems as though if you have a low spec'd machine across the board, DX12 might give you a little bit of gains. Otherwise with a high powered machine, makes little to no difference.

Perhaps the faster procs are fast enough to keep up with what the gpu is feeding it to handle it on a single core..? lol
 
Okay. One day AOTS is just a tech demo that no one plays. The next day a lot of ones and zeros are committed to AOTS as the foundation of an article about DX11/12 scaling. Seems validity is a matter of suiting ones needs.
 
Okay. One day AOTS is just a tech demo that no one plays. The next day a lot of ones and zeros are committed to AOTS as the foundation of an article about DX11/12 scaling. Seems validity is a matter of suiting ones needs.

Every benchmark is a tool, and every tool may have specific uses.

In this case, AOTS is the only current dx11/12 benchmark that really supports a high ammount of threads, and thus was the proper tool for this article.
 
could'v thrown some FX CPUs in the mix, too bad
I know why: Your phone was busy when Kyle called you to run those FX benchies, too bad.
No?
How about: You missed the thread where Kyle asked everyone here what additional tests needed to be run in beyond the 300 or so he already did, too bad.
 
I think the higher latency and lower bandwith of the DDR4 memory makes a difference (and probably the cache latency too).
In the case of DDR4 2666 vs DDR3 2133, the DDR3 bandwidth can be 5-10% higher, depending on the timings.
ALSO, it seems that the DDR4 controller is not used fully with 2 sticks... using 4 sticks reveals the full power of the Haswell-E memory controller, especially with highly threaded applications such as 7zip

I can't post links because I had to recreate a new account: you can check at hardware [dot] fr , they are quite good at testing and dissecting the performance difference between products... you'll have to google translate though, but I'm sure you guys can read tables :p
 
GPU limited across the board, at least it means we don't all need to run out and buy new CPUs for DirectX 12. Although honestly, DirectX 12 is going to make games more GPU bound unless developers decide to up the physics or AI to take advantage of the CPU time that's freed up from not having to handle the complex DirectX 11 API.

Just because it turned out the way you'd expect didn't mean it wasn't worth doing of course.
 
Last edited:
1-i missed the FX test, thx for the link
2-i meant on the same graphs with the same setup to have a feel about intel vs FX
 
I would recommend degrading or slowing down the I7 4770k memory to see if the data better lines up with the 5960x. See how changes in memory bandwidth is affecting the outcome. The more cores you have active, the more it hits memory and cache's plus the game maybe will not use above a certain thread count. Would like to have seen CPU usage comparing the two, that may have shown the I7 4770K cpu %usage higher.

I am impressed and more interested in why AMD with DX12 has such a great improvement while Nvidia does not. Too bad we don't have more games to sample using DX12 correctly.
 
I would recommend degrading or slowing down the I7 4770k memory to see if the data better lines up with the 5960x. See how changes in memory bandwidth is affecting the outcome. The more cores you have active, the more it hits memory and cache's plus the game maybe will not use above a certain thread count. Would like to have seen CPU usage comparing the two, that may have shown the I7 4770K cpu %usage higher.

I am impressed and more interested in why AMD with DX12 has such a great improvement while Nvidia does not. Too bad we don't have more games to sample using DX12 correctly.

if i remember correctly the difference is mostly driver efficiency related between dx11 and dx12 so the multithreaded support shows much larger gains. mantle API showed similar gains as well.
 
My first thought would be DDR3 vs DDR4 timing related as well, as brickwall mentioned.

Second thought would be that maybe your i7-5960X isn't actually fully stable at the voltages you are giving it. There is a threshold on Intel CPUs where they can be weird and pretend they are 100% stable when OC'ed, yet exhibit lower then expected performance (constant error correction?) since they aren't actually getting enough voltage. I usually use Linpack to check for this issue. Try running LinX (AVX) on your i7-5960X (HT-disabled) and see if the GFlop numbers improve at all as you increase your CPU/MB related voltages.
 
A very interesting article.

I'm wondering if Ashes is not the best test bench - or even a good test bench - for DX12 because it uses the additional cores not only for DX12 but for AI?
 
Now from what we have come to believe in this, our third article focussing on Ashes of the Singularity (Firsthere, second here.) is that the more cores we had at the game's disposal, we would see higher framerates. That does not ring true above, in what is probably closest to a "real world" comparison for a benchmark. In three scenarios, the 4770K provided a slightly higher framerate than did our 5960K processor. The delta in FPS is very small, but the fact is that the data is repeatable, and quite frankly we don't know exactly why.

This is expected. Just because you use more cores doesn't mean you'll get a higher FPS. If no individual CPU core is bottlenecked, then CPU performance is driven by IPC/Clock.

Basically, two cores each doing 20% work results in the same performance as a single core doing 40% work. Both do the same amount of processing over the same period of time, so you wouldn't expect much of a processing difference. From a software perspective, the fewer core chip would be excepted to be every so slightly faster, due to minute savings within the OS scheduler, which might be why the 4770k beats the 5960x at the same clocks when GPU limited.

To me, the numbers are showing EXACTLY what I predicted DX12 would do: Lower class processors see performance gains due to better core loading via not having a massive driver thread, other CPUs show almost no improvement as they weren't bottlenecked to start with. AMD GPUs are seeing an improvement compared to NVIDA due to having an actual async compute engine and the fact their DX11 code path isn't optimal to begin with.
 
So....when are all the new DX12 games going appear? Or am I going to have to wait till 2018 and play them on my DX13 capable card instead?
 
I would recommend degrading or slowing down the I7 4770k memory to see if the data better lines up with the 5960x. See how changes in memory bandwidth is affecting the outcome. The more cores you have active, the more it hits memory and cache's plus the game maybe will not use above a certain thread count. Would like to have seen CPU usage comparing the two, that may have shown the I7 4770K cpu %usage higher.

I am impressed and more interested in why AMD with DX12 has such a great improvement while Nvidia does not. Too bad we don't have more games to sample using DX12 correctly.
AotS takes advantage of AMD's Async Shaders built into GCN and taken advantage of by their drivers. NVIDIA currently has no similar framework in place. As far as the benchmark goes the framerate numbers between DX11 and DX12 on Pascal are statistically the same and still higher than what current AMD cards are putting out.
So....when are all the new DX12 games going appear? Or am I going to have to wait till 2018 and play them on my DX13 capable card instead?
More DX12 games are coming in the near future. At least from Microsoft, as they start bringing most, if not all, first-party Xbox ONE games to Windows 10.
 
Last edited:
So....when are all the new DX12 games going appear? Or am I going to have to wait till 2018 and play them on my DX13 capable card instead?

Looks as if only DX12 patches for current titles. However, Doom is supposed to have a Vulkan patch. From what i have read, its a good increase of openGL
Devs might have been waiting for xbone to get the win10 update, i think that has DX12. I'm sure they have been tinkering with it for a while.
 
Kyle- Just for the heck of it, try turning hyper-threading off on the Haswell-E chip and run the benchmark again. I have the same processor and sometimes run into goofy issues in games when HT is turned on. Supreme Commander suffers from this; Wondering if AoS may be affected by this as well.
I do not have that exact system even here anymore to use, like I said, it has been a while since I pulled that data. However, I do have a Broadwell-E system on the bench right now and will run some non-HT numbers and see if I get any performance deltas.
 
I think you are going to need to use Intel's PresentMon just to validate the "FRAPs" type performance is behaving, they do not necessarily show the same context of data.
Those that have used PresentMon do not get the same type of behaviour as analysing with the internal benchmark, primarily because one looks at preparing the render/data from an engine perspective and the other outside of that and from the OS to driver.

I am not saying the data and benchmark from the AoTS internal benchmark is wrong, just that it really needs another context such as provided by PresentMon.
Cheers
 
Kyle- Just for the heck of it, try turning hyper-threading off on the Haswell-E chip and run the benchmark again. I have the same processor and sometimes run into goofy issues in games when HT is turned on. Supreme Commander suffers from this; Wondering if AoS may be affected by this as well.
Nope, not it at all. Just four rounds of tests with a Titan, but perf down across the board with HT off.
 
I think you are going to need to use Intel's PresentMon just to validate the "FRAPs" type performance is behaving, they do not necessarily show the same context of data.
Those that have used PresentMon do not get the same type of behaviour as analysing with the internal benchmark, primarily because one looks at preparing the render/data from an engine perspective and the other outside of that and from the OS to driver.

I am not saying the data and benchmark from the AoTS internal benchmark is wrong, just that it really needs another context such as provided by PresentMon.
Cheers
So PrentMon is a mess to use and the verdict is still out to whether or not it is accurate. I am very aware of it. I am trying to find someone at Intel now to look into its Graphic Analyzer, but not finding the right people yet. We are talking to NVIDIA on this as well and they are telling me that they may have a solution, but it would be NV only.
 
It is worth noting here that RAM footprint 8GB vs 16GB can make up to 10% difference in AotS benchmark scores. Just a note if you are comparing scores yourself.
 
This is good information.

One thing I can't help but wonder though is, wasn't multithreaded rendering supposed to be introduced in in DX11?

I vaguely remember it being part of the DX11 spec, but at first neither AMD nor Nvidia had enabled it in their drivers, and when they did, many games got a nice bump.

Am I misremembering things, or does DX12 just improve this even more?
 
Yeah it has a lot of data and intervals that can make it overly sensitive.
You ever speak to HardwareCanucks as they found a way to use it themselves that they felt made sense, I appreciate this is only their own view and not necessarily fit with what you want.
PresentMon will be accurate as it is based upon the ETW, and Oxide recommended this type of solution over FCAT and used ETW to show the limitations of FCAT for AoTS with AMD hardware (to do with DWM compositing).
Probably most helpful if he has time is Andrew Lauritzen at Intel as he was the one who highlighted the PresentMon for use.

Cheers

Yes, we are familiar with it. This is what raw PresentMon data looks like. We are trying to figure out how to clean up some of the noise. But it looks like we will be going this way. It would be nice to have a better tool that we were 100% confident in.

Application ProcessID SwapChainAddress Runtime SyncInterval AllowsTearing PresentFlags PresentMode Dropped TimeInSeconds MsBetweenPresents MsBetweenDisplayChange MsInPresentAPI MsUntilRenderComplete MsUntilDisplayed
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 0 0 Composed: Flip 1 30.682723 1502.308 0 1.673 4.073 0
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 1 0 Hardware: Independent Flip 0 30.689967 7.244 1489.204 0.281 0.261 0.261
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 1 0 Hardware: Independent Flip 0 30.708284 18.317 31.6 13.58 13.544 13.544
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 1 0 Hardware: Independent Flip 0 30.722375 14.092 1.126 0.189 0.578 0.578
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 1 0 Hardware: Independent Flip 0 30.722864 0.489 0.564 0.159 0.653 0.653
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 1 0 Hardware: Independent Flip 0 30.723624 0.76 0.622 0.147 0.515 0.515
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 1 0 Hardware: Independent Flip 0 30.724042 0.418 0.671 0.414 0.768 0.768
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 1 0 Hardware: Independent Flip 0 30.724711 0.669 0.581 0.159 0.68 0.68
ROTTR.exe 6980 0x00000000091FF220 DXGI 0 1 0 Hardware: Independent Flip 0 30.725117 0.406 0.617 0.43 0.891 0.891

Your "MsBetweenPresents" is your framerate. Again though, we are not 100% confident in what we are being shown, and that is the reason we have not used it publicly yet. That said, it looks like it is going to be the go-to most likely out of what tools we have evaluated.
 
Any information on the framerate distribution (variability, minimums, that sort of thing)?
 
makes sense now why nvidia did software async instead of hw async, dx12 isn't proving to be the game changer it was touted to be
 
makes sense now why nvidia did software async instead of hw async, dx12 isn't proving to be the game changer it was touted to be
I don't think it matters that much. Async Compute in DX12 allows for multiple cores of the CPU to work independently vice forcing a synchronous fixed relationship. How the GPU handles the workload either serially or parallel is up to the design.

Nvidia faster clock speed can switch between graphics routines and compute routines faster. Nvidia can do multiple compute operations at once (Asynchronously) and switch back to graphics. AMD can do both at the same time but if you have more and more compute operations you then have less and less stream processors for graphics when in parrallel, less cache hits etc..

Who will do it better? I think it will depend on what is being processed. In other words Async Compute is part of DX12 to allow use of multiple cpu cores but for the GPU it can do it serially or at the same time as needed. With Nvidia you have all the SP's dedicated to graphics or multiple compute operations which can be really efficient with the cache but with some context switching (I believe Nvidia design really is quick in this), While AMD is parallel between Graphics and Compute if used it can have contentions as well with cache hits, memory access waits etc.

So who will win? One advantage AMD really has is that developers are using consoles for targeting development which of course uses AMD designs. Nvidia has great developer relations which can help on the PC side. Looking lat Return of the Tomb Raider, Nvidia out did AMD in this game even with all of the advantages AMD had over the developer.
 
So who will win? One advantage AMD really has is that developers are using consoles for targeting development which of course uses AMD designs. Nvidia has great developer relations which can help on the PC side. Looking lat Return of the Tomb Raider, Nvidia out did AMD in this game even with all of the advantages AMD had over the developer.

You are right. DX12 can be programmed closed to metal, but closed to metal based on what hardware, AMD or Nvidia? that's dependent on who support the developer(pays them). I think in the future we will see gameworks titles have relatively better performance on Nvidia hardware and vice versa.

In my opinion low level APIs like DX12 are really bad for PC gaming industry because developers have to Optimize their game for either Nvidia or AMD's latest architecture, for example if a game is optimized for fiji architecture, all Nvidia GPUs and AMD GPUs from older architectures and even the future AMD architectures that are fundamentally different from fiji can't have good performance under that game engine. you can see in this article AMD's older GPU hd7970 can't gain performance under DX12 compare to DX11 in GPU bound scenarios and almost on par with gtx670 in DX12,considering gtx670 is usually much slower than 7970 this is not a good result for 7970 at all.
 
Last edited:
I keep seeing this AoTS benchmarks for many months now , but did anyone actually play it ? is it any good or just some tech demo?
 
You are right. DX12 can be programmed closed to metal, but closed to metal based on what hardware, AMD or Nvidia? that's dependent on who support the developer(pays them). I think in the future we will see gameworks titles have relatively better performance on Nvidia hardware and vice versa.

In my opinion low level APIs like DX12 are really bad for PC gaming industry because developers have to Optimize their game for either Nvidia or AMD's latest architecture, for example if a game is optimized for fiji architecture, all Nvidia GPUs and AMD GPUs from older architectures and even the future AMD architectures that are fundamentally different from fiji can't have good performance under that game engine. you can see in this article AMD's older GPU hd7970 can't gain performance under DX12 compare to DX11 in GPU bound scenarios and almost on par with gtx670 in DX12,considering gtx670 is usually much slower than 7970 this is not a good result for 7970 at all.
This is a rather confused post.

DX12 is another programming option for compliant hardware.
If a developer doesnt want or need to extract more performance/granularity, they can use DX11.
The reason why older hardware will struggle to make best use of some features is the way its always been.
Newer hardware is faster and more featured. Things need to move forward otherwise the games industry will stagnate.

DX12 is the opposite of bad, it allows devs to get the most from the hardware be it NVidia or AMD.
It doesnt have to be an either/or, it can be for both AMD and NVidia . There will be situations where one gets better treatment but nothing has changed in that regard, there will always be some sponsored titles. Thats nothing to do with DX12.
Its not definitively down to whoever pays them, its down to what the devs want to do and the standards they want to achieve.

Once a programming house has developed DX12 code, they can use it again.
Game engines will incorporate DX12 in their routines, covering a lot of the groundwork.
It wont always be a hard slog.

The downside to DX12 is how it is restricted to one OS.
Vulkan doesnt have this restriction and has similar programming methods.
 
This is a rather confused post.

DX12 is another programming option for compliant hardware.
If a developer doesnt want or need to extract more performance/granularity, they can use DX11.
The reason why older hardware will struggle to make best use of some features is the way its always been.
Newer hardware is faster and more featured. Things need to move forward otherwise the games industry will stagnate.

DX12 is the opposite of bad, it allows devs to get the most from the hardware be it NVidia or AMD.
It doesnt have to be an either/or, it can be for both AMD and NVidia . There will be situations where one gets better treatment but nothing has changed in that regard, there will always be some sponsored titles. Thats nothing to do with DX12.
Its not definitively down to whoever pays them, its down to what the devs want to do and the standards they want to achieve.

Once a programming house has developed DX12 code, they can use it again.
Game engines will incorporate DX12 in their routines, covering a lot of the groundwork.
It wont always be a hard slog.

The downside to DX12 is how it is restricted to one OS.
Vulkan doesnt have this restriction and has similar programming methods.

The problem is even if it would be possible to optimize for both Nvidia and AMD, developers won't have/put time and money to optimize for all available architectures. in addition unlike DX11, optimization in DX12 is mostly on developers, once developers optimized their game for a specific architecture there will be nothing AMD or Nvidia can do to optimize their drivers for that game so older and even newer architectures can't get performance optimization. I believe low level APIs are a step in wrong direction and time will show that.
 
Last edited:
Every benchmark is a tool, and every tool may have specific uses.

In this case, AOTS is the only current dx11/12 benchmark that really supports a high ammount of threads, and thus was the proper tool for this article.


I get that. In fact, I'm one of the "no ones" that plays the game. My point was a little more abstract ;)
 
The problem is even if it would be possible to optimize for both Nvidia and AMD, developers won't have/put time and money to optimize for all available architectures. in addition unlike DX11, optimization in DX12 is mostly on developers, once developers optimized their game for a specific architecture there will be nothing AMD or Nvidia can do to optimize their drivers for that game so older and even newer architectures can't get performance optimization. I believe low level APIs are a step in wrong direction an time will show that.

Its not a matter of "even if it would be possible", it is possible and it will happen.

No different than normal when a new DX is launched.
It takes time to develop libraries and for better optimisation paths to be found.
Theres nothing stopping developers using DX11.

If a game dev completely gimps a game on one brand of card, that game isnt going to sell very well.
So its not going to be that extreme unless due to lack of competence.
ie the difference in performance running gameworks isnt very big at all and you can disable the toughest features if you have a slow card.

This isnt the panic you were looking for.
 
The problem is even if it would be possible to optimize for both Nvidia and AMD, developers won't have/put time and money to optimize for all available architectures. in addition unlike DX11, optimization in DX12 is mostly on developers, once developers optimized their game for a specific architecture there will be nothing AMD or Nvidia can do to optimize their drivers for that game so older and even newer architectures can't get performance optimization. I believe low level APIs are a step in wrong direction an time will show that.

We already had low level apis years ago which resulted in API wars and Microsoft coming with DX saviour that killed them.
 
Interesting benches.
It seems that the lower end hardware will benifit a lot with DX12.
No wonder a lot of my friends, who run on older hardware, like Win 10 better than Win 8 (although most people still prefer win 7, lol).
 
Back
Top