DX11 vs DX12 Intel CPU Scaling and Gaming Framerate @ [H]

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
55,532
DX11 vs DX12 Intel CPU Scaling and Gaming Framerate - Last week we set our sights on seeing how well the new DX12 API is able to distribute workloads across multiple CPU cores on a new AMD processor. Today we are going to continue, except this time we will be using an Intel Haswell-E processor that has a lot of cores available for DX12 usage. A couple new GPUs in the mix as well.
 
Isn't nVidia suffering from the same issue that Intel had with netburst "fundamentally" at this point?
 
So if you're running a crap CPU with a decent GPU, DX12 will give you a performance boost over DX11. Otherwise there isn't a big difference at this point.
 
Thank you !. Cool comparison. I have done some tests myself. It looks like DX12 likes high memory speeds and tight timings. Same holds true in Mantle BF4. It is not really only the highest frame rate but your lowest frames make a difference to the overall scores.
 
You could rephrase it into CPU drives video card and is no longer the bottleneck for performance as it is under DX11. This allows a lot more room for developers trying to push graphics ....
 
You could rephrase it into CPU drives video card and is no longer the bottleneck for performance as it is under DX11. This allows a lot more room for developers trying to push graphics ....
Yeah makes you wish they will one day utilize 75% of each core :)
 
Pretty awesome article and its interesting how the nvidia card responded much better for dx12 in your system. Should be fun figuring out and learning why:)
 
interesting article and nice to see the AMD/Nvidia numbers shown within the same graph since the others make it look like there's some massive performance gain for one vs the other but in reality they are almost identical performance within dx12 and far different from each other in dx11. thanks for the work and agree it might be interesting to see how much of an effect ram performance has on the caching system AMD has in place with fury since it seems more and more game developers are using it.
 
Interesting results - I'm really excited to see what DX12 brings and how it changes both games and hardware requirements moving forward.
 
Looking forward to seeing some testing on non-AMD Gaming Evolved titles that feature DirectX 12 so that we can rule out some of those vendor specific optimizations. Too bad Rise of the Tomb Raider had such poor DirectX 12 implementation, because that may have offered an alternate viewpoint. Without that balance, it's hard to draw too many valid conclusions from all the pro-AMD titles we have available at this point.
 
Looking forward to seeing some testing on non-AMD Gaming Evolved titles that feature DirectX 12 so that we can rule out some of those vendor specific optimizations. Too bad Rise of the Tomb Raider had such poor DirectX 12 implementation, because that may have offered an alternate viewpoint. Without that balance, it's hard to draw too many valid conclusions from all the pro-AMD titles we have available at this point.
it going to be a lot more of the same as dx11 games...red wins some, green wins some anyone thinking otherwise is dreaming. We can say amd should as a hole improve with dx12 to help equal things out vs what we had before. We should see the best benefits on laptops that typically have weaker cpus for both camps. I personally dont think it was ever meant to help the super high end systems much.
 
I have to admit that's some great scaling, both AMD and Intel has to be happy about those low MHz scaling graphs, bring those temps down!

Great article, I appreciate your disclaimer talking about having the different testing environments and such, lends a lot of credibility to the test. Plus it prevents people from coming in and starting another bs asynchronous war.
 
it going to be a lot more of the same as dx11 games...red wins some, green wins some anyone thinking otherwise is dreaming. We can say amd should as a hole improve with dx12 to help equal things out vs what we had before. We should see the best benefits on laptops that typically have weaker cpus for both camps. I personally dont think it was ever meant to help the super high end systems much.

to be honest the only losers here is Intel when it comes to the super high end.. if the numbers are consistent it'll extend the life span of CPU's and thus reduces the need for gamers to constantly upgrade cpu's. obviously they're will be other demands for higher end cpu's but thats still a pretty big hit for sales.

only time will tell though so it'll be an interesting next couple of years to see how far developers will push the limits of dx12.
 
to be honest the only losers here is Intel when it comes to the super high end.. if the numbers are consistent it'll extend the life span of CPU's and thus reduces the need for gamers to constantly upgrade cpu's. obviously they're will be other demands for higher end cpu's but thats still a pretty big hit for sales.

I can see that happening and I can also see Intel being able to open up another market for ultra small PC's with smaller (lesser TDP) CPU's for gaming.

While they are already a big market (ITX), this could make Intel a bunch more money if they can sell their cheaper, higher margin CPU's.

Personally I love the big die CPU's, I'll always stay with big CPU's.
 
Fixed - Kyle
 
Last edited by a moderator:
  • Like
Reactions: Creig
like this
I found myself disappointed in the article for two reasons: you didn't test at 4K, and you chose to test the CPU limiting by only increasing the speed and not increasing the number of cores. It would have been useful to see Ashes tested on an 18 core, 36 thread Xeon.
 
I found myself disappointed in the article for two reasons: you didn't test at 4K, and you chose to test the CPU limiting by only increasing the speed and not increasing the number of cores. It would have been useful to see Ashes tested on an 18 core, 36 thread Xeon.
It wasn't about 4K it was about GPU limited and CPU limited and 1440 did that so no need to do 4K. (And I think he mentioned not having a 4K monitor, but cant be sure I don't have that mixed up).
 
Tin foil hat time. I'm sure M$ would focus on performance boost in hardware similar to an xbox one (weaker CPU than the average PC).
 
It also points to the fact that
I found myself disappointed in the article for two reasons: you didn't test at 4K, and you chose to test the CPU limiting by only increasing the speed and not increasing the number of cores. It would have been useful to see Ashes tested on an 18 core, 36 thread Xeon.
Quarrz, I have tested this. a 4 core 6700k needs to run at 4.5 ghz to match a 5930k at its default speed.
My old 990X on a X58 mobo still kicks ass in Crysis 3 or anything that is well optimized for more than 4 cores.
The 4K would have been nice especialy on 2 Amd fury x.
 
Last edited:
I'm excited to see more testing and some actual games! I feel we are on the cusp of some big changes in our games and im so glad to be along for the ride.
Thanks for the article, keep-em coming.
 
As AMD seems to benefit the most from CPU Clock increases, this means the drivers implemented do a considerable amount of prearranging or data structures before being sent to the GPU. Under DX11 this was a single core task. Under DX12 this was spread across many cores, lowering processing overhead and allowing the GPU core to efficiently handle the commands.

NVIDIA seems to have taken the opposite approach where a good amount of the work stays in the GPU card itself. As other developers have pointed out, while the NVIDIA core itself is very fast and efficient even with a long pipeline. However they die a painful death when asked to do things like context switching (Async ops etc...) (Pipeline rollback) It would make sense that NVIDIA does all this data structure arrangement on silicon instead of the CPU. I could see how, "Arrange X Y Z for a 3 dot color product and...HOLD ON we just got a compute command...kill all the data structurers, kill the pipeline, pop values onto a stack and reset the data registers to handle the aync structures" could cause issues if the pipeline is that complex. Putting these features into software to be handled by a CPU would make more sense because you can dynamically restructure how commands are processed preemptively. HOWEVER you are then limited by the CPU resources and how these things are handled. You could fine tune adjustments to the CPU core. However with GPU firmware, things become harder because you have a much more fixed function pipeline.

So NVIDIA I take akin to a GT 500 Mustang. Great on a drag race. But you wouldn't want it on a race course where you need a stiff frame, great balance, and independent rear suspension. And so far most games have been in a drag race instead of a race course.
 
Last edited by a moderator:
I found myself disappointed in the article for two reasons: you didn't test at 4K, and you chose to test the CPU limiting by only increasing the speed and not increasing the number of cores. It would have been useful to see Ashes tested on an 18 core, 36 thread Xeon.
The game doesnt scale well past 8c 16t, so max would be a 5960x and 4k would show little gains

As AMD seems to benefit the most from CPU Clock increases, this means the drivers implemented do a considerable amount of prearranging or data structures before being sent to the GPU. Under DX11 this was a single core task. Under DX12 this was spread across many cores, lowering processing overhead and allowing the GPU core to efficiently handle the commands.

NVIDIA seems to have taken the opposite approach where a good amount of the work stays in the GPU card itself. As other developers have pointed out, while the NVIDIA core itself is very fast and efficient even with a long pipeline. However they die a painful death when asked to do things like context switching (Async ops etc...) (Pipeline rollback) It would make sense that NVIDIA does all this data structure arrangement on silicon instead of the CPU. I could see how, "Arrange X Y Z for a 3 dot color product and...HOLD ON we just got a compute command...kill all the data structurers, kill the pipeline, pop values onto a stack and reset the data registers to handle the aync structures" could cause issues if the pipeline is that complex. Putting these features into software to be handled by a CPU would make more sense because you can dynamically restructure how commands are processed preemptively. HOWEVER you are then limited by the CPU resources and how these things are handled. You could fine tune adjustments to the CPU core. However with GPU firmware, things become harder because you have a much more fixed function pipeline.

.
DX12 already isnt CPU limited like DIrectx used to be(except for obvous titles which are CPU bound like RTS) so something else is limiting their performance like you said how their architecture handles the game pipeline, but can it be solved without meeting with the developer to improve their DX12 path? instead making them turn off the features dont work as nvidia intended
 
Last edited:
Awesome article, some really interesting insight gained from the low core speed. I wonder what the first game to show linear gains associated with DX12 will be, where we can clearly extract meaning from hardware changes, when is BF5 supposed to come along?
 
I actually would like to see memory speed looked at again. Seemed every time before it wasn't a huge factor. But with FuryX caching and DX12/Win10 WDDM2.0 it might have a bit more of an impact not before seen.
My core I7 990X with two 7970 does not allocate as much memory in Mantle BF4. The 5930 with a Fury X allocates about a gig more yet the card allocates less vram than the two 7970.
I will take a wild guess that same thing happens in DX12. This makes a lot of sense and why both benefit from faster memory settings. The difference is huge between running 2800 at 15 15 15 35 and 2133 at 13 13 13 32.
I also think that anything past 2666 does not net any gains. simply because one ends up running higher timings.
 
Why does the article say you don't have a way of collecting real world gameplay framerate data in DX12 as of yet?

Let me introduce you to PresentMon
 
More consumers that have a PC / laptop capable of playing games creates a larger market for MS to sell software to from their Windows 10 Store.

And sadly you would have to email Microsoft about this ;) .
It would also allow external video cards to be a thing, once you get this you could do serious gaming on a very light laptop.
 
And sadly you would have to email Microsoft about this ;) .
It would also allow external video cards to be a thing, once you get this you could do serious gaming on a very light laptop.

There's more to it than just games. 4K hardware accelerated video playback requires a nice video card. Those 16K VR games that AMD was talking about? Try playing this 8K video on your current video card. Enable stats for nerds in Google Chrome and see how many frames you drop. Could you imagine the popularity of 16K native pron? Whomever does the first 16K VR presentation would instantly win the contract they were competing for.

If Intel and AMD could sell 8K & 16k hardware accelerated video playback on their new CPU, then there would be a market for an upgrade; especially in a laptop. Create a market by sending the Nature, National Geographic, a few other shows, etc free cameras with the stipulation that they upload the videos to Youtube in 8K or 16K. I bet we'd need new video editing software like Sony Vegas, bigger hard drives to archive the film, bigger SSDs, more physical and faster ram for buffering, a new motherboard with VR concerns addressed by the addition of PCIe 4.0 to move video data faster with lower latency, a file server upgrade in the living room, better wireless and wired networks to stream the video to our TV, etc. 16K 360° video and audio would be a BIG seller for home theater nerds.

These companies are big, slow moving slugs that can't see the forest for the trees.
 
Last edited:
The odd thing is they already are describing that when comparing 4K TV with 1080p TV displaying the same content (scaled down from 4K) viewers can hardly see the difference.
That video stutters like crazy on 8K, 4K plays fine. Got to say that Youtube uses VP9 codec not sure how that performs compared to HEVC.
 
Very interesting article, I've been looking for something like this ever since this benchmark came out and I signed up just to comment here as to encourage ongoing articles like this one for those who are waiting for the GPU die shrinks to upgrade old computers still running old CPUs. I did a little testing of my own here in CPU limited scenarios in DX11 using an i5-750 + HD5850, just to see how it fared compared to the 1.2GHZ clocked results. Seems like it can manage, struggling, to chug out 30fps @ 3.4GHZ and 26fps @ stock speeds in DX11 with low settings and a 16/9 resolution of 660*371 ( max lowest resolution I could manage without messing the FOV while still being able to work the UI to minimize GPU usage and make the bench as much CPU limited as possible ).

It gave me a kind of idea of how much of a speedup there will be in DX12 games. Not enough to top 60fps, but close enough that an incremental upgrade to say an overclocked used i7-870 or X3470 might be worth it ( anyone into selling old lga 1156 compatible CPUs better than i5-750 reading this, I'm open to suggestions, particularly to 875k/880 or x3480, unless you have a bargain for an i5-2xxx or i7-2xxx + mobo you want to get rid of ) to greatly extend the lifetime of this already 7 years old rig well into the current console cycle.

So yes, Intel will definitely suffer, as well as motherboard makers, from the shift to DX12 in the short term, no doubt about that.

I'll be upgrading the GPU probably in the next round, especially if they combine HBM2 + die shrink + great DX12 scaling, so any coverage of both AMD's and Nvidia's upcoming offerings ( especially Nvidia's midrange, altho AMD's midrange will also be interesting ) combined with a slowed down CPU or simply old CPUs proper would be a godsend to guide upgrade choices for the best results.

Many thanks for the precious info HardOCP.

You might find this interesting tested both AMD cpu 4 core and 8 core :
DX11 vs DX12 AMD CPU Scaling and Gaming Framerate @ [H]
 
thnx for linking that!
I did a tear down/rebuild/sping cleaning on ol'faithfull a couple days ago and grabbed some stock numbers on my FX-8120.
stock:
== Hardware Configuration ================================================
GPU 0: AMD Radeon R9 200 Series
CPU: AuthenticAMD
AMD FX(tm)-8120 Eight-Core Processor
Physical Cores: 4
Logical Cores: 8
Physical Memory: 16316 MB
Allocatable Memory: 134217727 MB
==========================================================================


== Configuration =========================================================
API: DirectX 12
==========================================================================
Quality Preset: Custom
==========================================================================

Resolution: 2560x1440
Fullscreen: True
Bloom Quality: High
PointLight Quality: High
Glare Quality: Low
Shading Samples: 8 million
Terrain Shading Samples: 8 million
Shadow Quality: Mid
Temporal AA Duration: 0
Temporal AA Time Slice: 0
Multisample Anti-Aliasing: 1x (2x enabled in crimson settings)
Texture Rank : 1


== Total Avg Results =================================================
Total Time: 60.004814 ms per frame
Avg Framerate: 36.481472 FPS (27.411177 ms)
Weighted Framerate: 35.940876 FPS (27.823475 ms)
CPU frame rate (estimated if not GPU bound): 43.406765 FPS (23.037884 ms)
Percent GPU Bound: 71.736931 %
Driver throughput (Batches per ms): 3151.029297 Batches
Average Batches per frame: 13591.148438 Batches
==========================================================================


== Results ===============================================================
BenchMark 0
TestType: Full System Test
== Sub Mark Normal Batch =================================================
Total Time: 70.949295 ms per frame
Avg Framerate: 41.691746 FPS (23.985563 ms)
Weighted Framerate: 41.078793 FPS (24.343460 ms)
CPU frame rate (estimated if not GPU bound): 47.803230 FPS (20.919088 ms)
Percent GPU Bound: 49.732849 %
Driver throughput (Batches per ms): 2534.318848 Batches
Average Batches per frame: 4801.196777 Batches
== Sub Mark Medium Batch =================================================
Total Time: 56.036999 ms per frame
Avg Framerate: 37.582314 FPS (26.608261 ms)
Weighted Framerate: 36.908218 FPS (27.094236 ms)
CPU frame rate (estimated if not GPU bound): 46.499294 FPS (21.505703 ms)
Percent GPU Bound: 78.311310 %
Driver throughput (Batches per ms): 3093.643555 Batches
Average Batches per frame: 9820.259766 Batches
== Sub Mark Heavy Batch =================================================
Total Time: 53.028141 ms per frame
Avg Framerate: 31.605860 FPS (31.639702 ms)
Weighted Framerate: 31.218082 FPS (32.032719 ms)
CPU frame rate (estimated if not GPU bound): 37.468819 FPS (26.688858 ms)
Percent GPU Bound: 87.166634 %
Driver throughput (Batches per ms): 3453.562012 Batches
Average Batches per frame: 26151.988281 Batches
=========================================================================

OC'd:
== Hardware Configuration ================================================
GPU 0: AMD Radeon R9 200 Series 280x @ 1175/1750
CPU: AuthenticAMD
AMD FX(tm)-8120 Eight-Core Processor @ 4560
Physical Cores: 4
Logical Cores: 8
Physical Memory: 16300 MB
Allocatable Memory: 134217727 MB
==========================================================================


== Configuration =========================================================
API: DirectX 12
==========================================================================
Quality Preset: Custom
==========================================================================

Resolution: 2560x1440
Fullscreen: True
Bloom Quality: High
PointLight Quality: High
Glare Quality: Low
Shading Samples: 8 million
Terrain Shading Samples: 8 million
Shadow Quality: Mid
Temporal AA Duration: 0
Temporal AA Time Slice: 0
Multisample Anti-Aliasing: 1x (2x enabled in crimson settings)
Texture Rank : 1


== Total Avg Results =================================================
Total Time: 60.007462 ms per frame
Avg Framerate: 41.143223 FPS (24.305340 ms)
Weighted Framerate: 40.456188 FPS (24.718096 ms)
CPU frame rate (estimated if not GPU bound): 55.626465 FPS (17.977055 ms)
Percent GPU Bound: 88.015099 %
Driver throughput (Batches per ms): 4276.850586 Batches
Average Batches per frame: 13625.298828 Batches
==========================================================================


== Results ===============================================================
BenchMark 0
TestType: Full System Test
== Sub Mark Normal Batch =================================================
Total Time: 70.998451 ms per frame
Avg Framerate: 48.747543 FPS (20.513855 ms)
Weighted Framerate: 47.691422 FPS (20.968132 ms)
CPU frame rate (estimated if not GPU bound): 62.852825 FPS (15.910184 ms)
Percent GPU Bound: 75.486008 %
Driver throughput (Batches per ms): 3377.241455 Batches
Average Batches per frame: 4823.318359 Batches
== Sub Mark Medium Batch =================================================
Total Time: 55.999081 ms per frame
Avg Framerate: 42.714985 FPS (23.410988 ms)
Weighted Framerate: 41.889503 FPS (23.872330 ms)
CPU frame rate (estimated if not GPU bound): 59.831779 FPS (16.713526 ms)
Percent GPU Bound: 90.587875 %
Driver throughput (Batches per ms): 4247.091797 Batches
Average Batches per frame: 9874.906250 Batches
== Sub Mark Heavy Batch =================================================
Total Time: 53.024853 ms per frame
Avg Framerate: 34.493259 FPS (28.991171 ms)
Weighted Framerate: 34.113594 FPS (29.313828 ms)
CPU frame rate (estimated if not GPU bound): 46.931927 FPS (21.307457 ms)
Percent GPU Bound: 97.971405 %
Driver throughput (Batches per ms): 4732.761719 Batches
Average Batches per frame: 26177.669922 Batches
=========================================================================

edit: added colours
 
I specifically signed up to comment (well aside from Hardocp to be a good read anyway)

I like this test as a lookup what dx12 can do, but
I also would like to see some very specific test on core scalability on Intel parts

more interested on scaling from 4 to 6 cores, with and without ht

Throw in an i7 with EDRAM
And if there's actually a difference between, say 2666 and 4000 MHz as memory frequency

To make it simple
I'd like to know if the price difference between an i5 and i7 is worth the HT under DX12, especially since sometimes HT decreases fps a bit (would be guessing the cores run out of cache and have to get data from the ram, how would memory with 4000 MHz compare in such a scenario?, hmm)

And if a 6 core can scale enough to warrant the price (I only do gaming)

Though if I could get a 6700k with some good chunk of edram I'd happily put down the cash for it
Well maybe with caby lake
 
That is what is so interesting, the unknown. We are so used to testing certain metrics with expected results. Now there is this vast unknown to what DX12 changes entirely. Memory speeds in DX11 seemed to matter little to none at all, DX12 they seem to have some impact. And that is a very interesting thought on i5 to i7 and if this will have a great impact on sales for each.
 
With more cores being active memory speed has a bigger impact. Just testing Aida 64 in which uses all available cores, the results change dramatically with changes in ram speed be it speed or settings of the ram. DX11 was mostly single core so ram speed I would say was not as critical as it is when feeding 8 cores.
 
Back
Top