3dmark Time Spy DX12 test

The async compute option in time spy made zero difference for my AMD 7870 gpu. I mean, it won't even do 30fps at 720p in Time Spy. So I guess the 7870 is too saturated to be able to worry about async compute...

anyway, here is my score.
Skylake i3 6100
16gb DDR-3000 15-15-15-35
HD 7870 1140 GPU core and 1325 VRAM

SEVb.jpg
 
Last edited:
aftAlty.jpg



Mahigan was right, discard any relevance to timespy as a gaming benchmark. The entire point is for game devs in both dx12 and vulkan to optimize to the best of their ability for different hardware and do the best they can with the capabilities of each card.

Use all of nvidia specific hardware tricks, and all amd tricks. This is SUPPOSED to be about an optimization contest on the dev side to extract the most performance you can on whatever card the game is running on, if you fail to do that for time constraints and budget it's more understandable but still not an ideal comparison of the true strengths of the hardware.
 
Mahigan was right, discard any relevance to timespy as a gaming benchmark. The entire point is for game devs in both dx12 and vulkan to optimize to the best of their ability for different hardware and do the best they can with the capabilities of each card.

Use all of nvidia specific hardware tricks, and all amd tricks. This is SUPPOSED to be about an optimization contest on the dev side to extract the most performance you can on whatever card the game is running on, if you fail to do that for time constraints and budget it's more understandable but still not an ideal comparison of the true strengths of the hardware.
After reading a bit more on this... It is being blown a bit out of proportion, but I do see what is getting up some posters ire.

But that gave me a thought. As 3DMark is a bench for system analysis, mainly GPUs, why don't they have a bit more settings. Like showing how different amounts of Async will affect each card, or even AO and different types, things like this. Sure it is a lot of work but it is a bench suite. It would go a long ways is quelling such arguments.
 
  • Like
Reactions: N4CR
like this
aftAlty.jpg



Mahigan was right, discard any relevance to timespy as a gaming benchmark. The entire point is for game devs in both dx12 and vulkan to optimize to the best of their ability for different hardware and do the best they can with the capabilities of each card.

Use all of nvidia specific hardware tricks, and all amd tricks. This is SUPPOSED to be about an optimization contest on the dev side to extract the most performance you can on whatever card the game is running on, if you fail to do that for time constraints and budget it's more understandable but still not an ideal comparison of the true strengths of the hardware.
That's exactly what Oxide said about AotS. They said they had no vendor specific code. Lol.
 
After reading a bit more on this... It is being blown a bit out of proportion, but I do see what is getting up some posters ire.

But that gave me a thought. As 3DMark is a bench for system analysis, mainly GPUs, why don't they have a bit more settings. Like showing how different amounts of Async will affect each card, or even AO and different types, things like this. Sure it is a lot of work but it is a bench suite. It would go a long ways is quelling such arguments.

Yes, a stress test would be the perfect way to test the limits of different cards capabilities. Where you could ratchet up the quality and number of different types of effects that may work better or worse depending on the gpu used.
 
well,they do offer a few settings to mess with. texture filtering. anti-aliasing. resolution. tesselation. etc. and if your GPU and display support it, DSR or VSR for resolution downsampling, is also something which can be done. So if you have say, an RX480 with a 1080p monitor, you could VSR and try a higher resolution.
 
Isn't a vendor-specific code path good for everyone? The whole point is that each arch would use whatever benefits it the most, presumably performance would go up for all GPUs (probably less for Nvidia).

It's interesting though, AMD wins in Ashes and they make sure Oxide pushes their "non-specific paths" neutral victory narrative, now we flip the script on 3dmark and suddenly it's developer negligence. Are we supposed to believe non-specific paths are fine as long as AMD wins? Where was the outrage about Oxide not optimizing for Nvidia GPUs then? Haven't people been saying there aren't enough DX12 games to form an opinion yet?

Just further proof that some people will never be satisfied either way. We now live in a world where Nvidia is not allowed to win, or even compete, in benchmarks anymore. Although I suppose it's been that way for a while thanks to blowback from GameWorks. God forbid if Vega doesn't beat the GTX 1080, what a shitstorm that'll be.
 
Last edited:
I don't get it. It makes sense that AMD would generally win an asynchronus compute test.

AMD has been running hard on R&D and subsequent marketing, for the asynchronus abilities of their GCN architecture.

I don't recall Nvidia doing that and indeed, they do not have an hardware architecture which has a focus on that.
 
Time Spy is basically only just doing that new feature that Pascal has - it preempts some 3D work, quickly switches context to the compute work, then switches back to the 3D.

I'm sure Microsoft would be happy to hear about this DX12 pre-emption API that futuremark is using.

So it seems to me that Time Spy has a very minimal amount of async compute work compared to Doom and AotS, and the manner in which it does its async is friendly to Pascal hardware. I don't think it's necessarily "optimized" for nvidia, as GCN seems to have no issue with context switching either. It's just not being allowed to take full advantage of GCN hardware.

So basically people are just upset that Time Spy wasn't hand tuned specifically for GCN? Why should it be? Most games aren't.
 
I don't get it. It makes sense that AMD would generally win an asynchronus compute test.

AMD has been running hard on R&D and subsequent marketing, for the asynchronus abilities of their GCN architecture.

I don't recall Nvidia doing that and indeed, they do not have an hardware architecture which has a focus on that.
Pascal supports async unlike popular opinion. You can check the white page for further information on it.
 
Mahigan was right, discard any relevance to timespy as a gaming benchmark. The entire point is for game devs in both dx12 and vulkan to optimize to the best of their ability for different hardware and do the best they can with the capabilities of each card.

Use all of nvidia specific hardware tricks, and all amd tricks. This is SUPPOSED to be about an optimization contest on the dev side to extract the most performance you can on whatever card the game is running on, if you fail to do that for time constraints and budget it's more understandable but still not an ideal comparison of the true strengths of the hardware.
I don't see what's the problem. They use DX12 feature level 11_0 which works on every card. You get the low level api and threading benefits though and the results are comparable between cards (i.e. same amount of work).
 
Pascal supports async unlike popular opinion. You can check the white page for further information on it.

I'm not saying that they don't support it. I'm saying that AMD has rather publicly been investing a lot of effort into GPU compute and asynchronus performance. Whereas maybe Nvidia has not. And so far, in actual results, it seems to be the case that Nividia's hardware is not as focused on this aspect, as AMD's.
 
I'm sure Microsoft would be happy to hear about this DX12 pre-emption API that futuremark is using.



So basically people are just upset that Time Spy wasn't hand tuned specifically for GCN? Why should it be? Most games aren't.
Isn't a vendor-specific code path good for everyone? The whole point is that each arch would use whatever benefits it the most, presumably performance would go up for all GPUs (probably less for Nvidia).

It's interesting though, AMD wins in Ashes and they make sure Oxide pushes their "non-specific paths" neutral victory narrative, now we flip the script on 3dmark and suddenly it's developer negligence. Are we supposed to believe non-specific paths are fine as long as AMD wins? Where was the outrage about Oxide not optimizing for Nvidia GPUs then? Haven't people been saying there aren't enough DX12 games to form an opinion yet?

Just further proof that some people will never be satisfied either way. We now live in a world where Nvidia is not allowed to win, or even compete, in benchmarks anymore. Although I suppose it's been that way for a while thanks to blowback from GameWorks. God forbid if Vega doesn't beat the GTX 1080, what a shitstorm that'll be.

This makes sense for a benchmark frankly, because it avoids the issue of ambiguity when drawing the line between vendor-specific paths/optimizations and favoring one over the other. You would end up with essentially totally different things running on each, the NV ones would be going crazy with geometry, GCN would be running compute shaders stacked from here to the moon.

I don't see what's the problem. They use DX12 feature level 11_0 which works on every card. You get the low level api and threading benefits though and the results are comparable between cards (i.e. same amount of work).

There is no problem :D
 
Isn't a vendor-specific code path good for everyone? The whole point is that each arch would use whatever benefits it the most, presumably performance would go up for all GPUs (probably less for Nvidia).

It's interesting though, AMD wins in Ashes and they make sure Oxide pushes their "non-specific paths" neutral victory narrative, now we flip the script on 3dmark and suddenly it's developer negligence. Are we supposed to believe non-specific paths are fine as long as AMD wins? Where was the outrage about Oxide not optimizing for Nvidia GPUs then? Haven't people been saying there aren't enough DX12 games to form an opinion yet?

Just further proof that some people will never be satisfied either way. We now live in a world where Nvidia is not allowed to win, or even compete, in benchmarks anymore. Although I suppose it's been that way for a while thanks to blowback from GameWorks. God forbid if Vega doesn't beat the GTX 1080, what a shitstorm that'll be.


This benchmark was supposed to push the graphics card capabilities under dx12...

Just think about a car race... my gtr vs your toyota corolla, i won't be able to boost or use turn off traction control because your car lack of it and isn't designed with those capabilities? hell no, we are supposed to push our cards to the limit no matter what.
 
Last edited:
Pascal supports async unlike popular opinion. You can check the white page for further information on it.

Pascal does not have any hardware devoted to Async Shaders, a component of compute, and uses a less efficient software hack job to emulate that it can. So you stating this doesn't change what is known already, that NVidia cannot fully, properly do async compute efficiently due to no hardware that can handle the shader component.
 
Pascal does not have any hardware devoted to Async Shaders, a component of compute, and uses a less efficient software hack job to emulate that it can. So you stating this doesn't change what is known already, that NVidia cannot fully, properly do async compute efficiently due to no hardware that can handle the shader component.

Oh my god this is beyond ridiculous. What do you mean it can't handle the shader component? What does that even mean? Pascal doesn't support shaders?

Pascal doesn't attempt to emulate async shaders, the architecture simply doesn't lend itself to that approach of context switching CUs/SMs, and it doesn't really need to.

You really don't understand async compute. I thought you were conflating async shaders with async compute, it turns out you don't understand either.
 
I think someone there made a salient point, which I copied down below:


[–]formfactor 6 points7 points8 points 2 hours ago (1 child)

Remember when everyone was accusing AMD of cheating the Ashes benches but it turned out to be nvidia cheating and the whole internet was like oh ok well that make sense then.

Like how the fuck are people ok with nvidias cheating being business as usual


I'll respond here how I responded there:


Having a bigger market share means having a bigger base of loyal brand fans. I liken it to the Bulls of the latter 90s and Dennis Rodman. Bulls fans hated Rodman from his Detroit days - until he was on the Bulls, and then all his antics were ok, because they won titles. Likewise, Nvidia fans are ok with NVidia cheating on benches, because its "their" brand doing it - but it would not be ok if another brand did the same thing.

It's why AMD realized market share is so important - if they want parity, they first need to win over their own loyal fanbase to rival Nvidia's.​


I simply brought this over because I felt it could contribute to the conversation here.
 
This benchmark was supposed to push the graphics card capabilities under dx12...

Just think about a car race... my gtr vs your toyota corolla, i won't be able to boost or use turn off traction control because your car lack of it and isn't designed with those capabilities? hell no, we are supposed to push our cards to the limit no matter what.
I see people elsewhere complaining that this benchmark was designed to run better with Nvidia's pre-emption vs AMD's ACEs which is what we've already seen plenty of. AMD got themselves directly involved in Hitman's async implementation, where were the pitchforks over that?

I think people are confusing "pushing the capability of DX12" with "optimizing for AMD". Is it really possible to create an *objective* benchmark in these new APIs? If they were to go ahead and push hard async then they'd piss off a different set of users who would complain about AMD bias.
 
I see people elsewhere complaining that this benchmark was designed to run better with Nvidia's pre-emption vs AMD's ACEs which is what we've already seen plenty of.
I think people are confusing "pushing the capability of DX12" with "optimizing for AMD".

man preemption has nothing to do with async compute:
Preemption has nothing to do with async compute, in a certain sense it is antithetical to async compute; the entire point is make better use of shader units by doing more things concurrently.

Nvidia doesn't "do" async compute with preemption; for the record "doing" async compute is simply executing commands from multiple queues (possibly) concurrently.

What enables this to provide performance benefits on Pascal is the ability to partition the pool of SMs between dispatches from the graphics pipe and compute pipes.

Just a simple example, a task is running on all the SMs available (20). It runs for 10 units of time. for 2/10 units of time, the shaders were idle because there is a dependency on data being processed by the rasterizer(s).

What Pascal and Maxwell can both do is choose to partition the SMs; instead of dedicating all 20 to that shader, 16 SMs run that shader and 4 SMs can run compute shaders.

With Maxwell, the partitioning is static; it is done by the driver, at drawcall boundaries only. On Pascal the partitioning is dynamic and it is in hardware logic; that means the driver no longer needs to accurately assess the execution time of each task prior to scheduling.

The Fable Legends developers had actually gotten gains from async compute on Maxwell by implementing it statically.

Neither Maxwell nor Pascal implement a feature similar to AMD's async shaders ; context switching on SMs still carries a very heavy latency penalty due to VRAM access times.

What's this now about nvidia cheating in AotS? there was some terrain shader bug, and it was tested, and it had no effect at all on performance lol. That said, the damned game looked better with the shader bug, NV should have played it like MS and called it a feature.
 
I see people elsewhere complaining that this benchmark was designed to run better with Nvidia's pre-emption vs AMD's ACEs which is what we've already seen plenty of. AMD got themselves directly involved in Hitman's async implementation, where were the pitchforks over that?

I think people are confusing "pushing the capability of DX12" with "optimizing for AMD". Is it really possible to create an *objective* benchmark in these new APIs? If they were to go ahead and push hard async then they'd piss off a different set of users who would complain about AMD bias.


I think the issue is forward vs backward.

NVidia's Pre-emption feels like a hack job since they could not get async working on hardware in time for Pascal.


So a better analogy would be that if the test were to measure heat, it actually is measuring smoke output, where Nvidia is using an older fossil fuel burner, and AMD is using an electric burner that can get hotter but generates no smoke.

If you run a benchmark designed to test DX12 speed, then yes, tedious as it is, you owe it people to make sure you test all functions of both major cards, favoring neither. Else, it really isn't a good, trustworthy benchmark. I'd have no problem with it using pre-emption, if it also fairly tested Async.

Since it does not, and also doesn't test multi-thread, but instead single thread, people who say the test is "rigged" would be correct.

Ideally, the test should detect what architecture is being used, and optimize itself for that architecture.

If you feel that is "too much work", then you aren't interested in a fair benchmark - plain and simple.
 
I think the issue is forward vs backward.

NVidia's Pre-emption feels like a hack job since they could not get async working on hardware in time for Pascal.


So a better analogy would be that if the test were to measure heat, it actually is measuring smoke output, where Nvidia is using an older fossil fuel burner, and AMD is using an electric burner that can get hotter but generates no smoke.

If you run a benchmark designed to test DX12 speed, then yes, tedious as it is, you owe it people to make sure you test all functions of both major cards, favoring neither. Else, it really isn't a good, trustworthy benchmark. I'd have no problem with it using pre-emption, if it also fairly tested Async.

Since it does not, and also doesn't test multi-thread, but instead single thread, people who say the test is "rigged" would be correct.

Ideally, the test should detect what architecture is being used, and optimize itself for that architecture.

If you feel that is "too much work", then you aren't interested in a fair benchmark - plain and simple.

This post feels like a hack job

"So a better analogy would be that if the test were to measure heat, it actually is measuring smoke output, where Nvidia is using an older fossil fuel burner, and AMD is using an electric burner that can get hotter but generates no smoke."

A better analogy for this post would be to imagine a blindfolded man untrained in archery attempting to hit a moving target while riding a unicycle, during an earthquake.
 
Countdown until someone knows better than the developer that actually wrote the code
 
I dunno why we are even trying to justify programming for ONE OF THE THREE players in PC gaming graphics.
 
That depends if you isolate it to Windows 10+DX12 GPUs.
Steam Hardware & Software Survey

if we just count those above 0.5%
Nvidia 14.82%
AMD 3.13%
Intel 1.31%

Ohh puhleaze everyone knows that AMD cards get distributed thinly due to the way their different rebadges and bios tell the OS what is being used.


edit:

Also to everyone trying to laud that then no vendor optimization is the ideal, do you realize that on OpenGL for example it was mostly all about vendor extensions? Are you telling us that OpenGL benchmarks through the years have been invalid since "it was not the exact same test for both of them!"?, in that case the lauded Nvidia OpenGL wins don't count! at all!
 
Last edited:
I think the issue most are having with it is that it isn't utilizing DX12 as intended. I haven't had a lot of time to look at the issue but from what I have read it seems the issue is fences and barriers that generally shouldn't be used in DX12. So those fences and barriers are running the bench more like DX11 rather than DX12. The gains are good going from async on/off but that has more to do with the ability of running the 2 ques at the same time rather than the added benefit of implementing at the same time. These ques are apparently being executed the same as DX11 but the new Pascal and GCN(as it has been) allow for overlap where as before Maxwell would not.
 
I think the issue most are having with it is that it isn't utilizing DX12 as intended. I haven't had a lot of time to look at the issue but from what I have read it seems the issue is fences and barriers that generally shouldn't be used in DX12. So those fences and barriers are running the bench more like DX11 rather than DX12. The gains are good going from async on/off but that has more to do with the ability of running the 2 ques at the same time rather than the added benefit of implementing at the same time. These ques are apparently being executed the same as DX11 but the new Pascal and GCN(as it has been) allow for overlap where as before Maxwell would not.

Fences and barriers are used to 'synchronize' tasks across queues, since the compute queues are used for rendering, you need them
 
I think the issue most are having with it is that it isn't utilizing DX12 as intended. I haven't had a lot of time to look at the issue but from what I have read it seems the issue is fences and barriers that generally shouldn't be used in DX12. So those fences and barriers are running the bench more like DX11 rather than DX12. The gains are good going from async on/off but that has more to do with the ability of running the 2 ques at the same time rather than the added benefit of implementing at the same time. These ques are apparently being executed the same as DX11 but the new Pascal and GCN(as it has been) allow for overlap where as before Maxwell would not.

You literally need them for a working program, they're paramount to maintaining sanity.

One example would be uploading resources to the GPU, you'd use it to wait for and signal that a given operation(s) has completed. You obviously can't use the data until it's actually transferred. I mean, I guess you could try... maybe it'd finish in time, maybe you'd get corruption, a crash...
 
You literally need them for a working program, they're paramount to maintaining sanity.

One example would be uploading resources to the GPU, you'd use it to wait for and signal that a given operation(s) has completed. You obviously can't use the data until it's actually transferred. I mean, I guess you could try... maybe it'd finish in time, maybe you'd get corruption, a crash...
forgot to type it but it was more of too many, double fences and barriers. Seemed they thought it was making it serial in nature hence the FL11 part of the discussion.
 
forgot to type it but it was more of too many, double fences and barriers. Seemed they thought it was making it serial in nature hence the FL11 part of the discussion.
It's not the fences that make it 'serial in nature', it's the data dependencies. The fences are there to make sure you don't end up with a catastrophic failure
 
It's not the fences that make it 'serial in nature', it's the data dependencies. The fences are there to make sure you don't end up with a catastrophic failure

Hooray for people who actually know what they're talking about. So rare nowadays.
 
Timespy seems to be a case of false advertising. It's always been biased. It's not a surprise and probably shouldn't be to anyone.
 
Back
Top