What's the latest word on Mantle?

Unknown-One · Jan 20, 2014

Rizen said:
This is getting outside of my technical understanding a bit but can you explain to me how an ARM coprocessor is going to remedy this problem? My understanding of the current problem is that DX has a lot of processor overhead because it's an abstraction layer.

Alright, first off, the fact that it's an abstraction layer does not automatically mean there will be a worthwhile amount of overhead.

Mantle is an abstraction layer too, after all.

Rizen said:
Isn't this overhead running on the primary CPU (in this case an x86 CPU running the game code)? How does the ARM processor resolve this? They're not using the same instruction sets and you can't run the same code on them?

Instruction set has VERY little to do with it in this case, for two reasons.

1. ARM doesn't support anywhere near the total number of instructions found in the x86 instruction set, but it DOES support all the instructions needed to handling the overhead associated with 3D graphics already. And that's all that matters in this case. There are direct 1:1 equivalents between ARM and x86 for this particular workload.

2. The performance hit comes from context switches and less-than-ideally filled queues, not a general lack of processing power. Look up what a context switch does to a processor and you'll understand why this is an expensive thing to do.

Rizen said:
How does NVIDIA plan to use this exactly?

Well, first you need to understand how high-level API's like DirectX and OpenGL work...

This is a simplified workflow:
1. Game engine makes standardized DirectX or OpenGL API calls.
2. The graphics API translates these calls into the ones actually supported by the hardware.
3. The graphics driver begins queuing up all of those native commands in a buffer.
4. The buffer is flushed and sent to the GPU for actual processing.
5. The GPU processes while the CPU re-fills its buffer.

What usually happens is the GPU finishes working on the data sent to it before the CPU is ready with another batch. The GPU has to sit on its hands and wait for another buffer flush before it can get back to rendering. This situation is called being "CPU limited."

You might think this could be solved by simply flushing the buffer more often, but that's where things really start to get bad. What causes the CPU to take so long in the first place? Overhead from context switches is a big part of it, and two context switches are required nearly every time the queue is flushed to the graphics card (from user mode to kernel mode, then back again).

The higher you want your framerate, the more often your GPU needs new data. This means flushing that buffer more often, which means more CPU overhead the faster the game runs (until eventually the CPU simply can't keep up and is stuck waiting for context switches to complete over-and-over).

Soooo, how does Nvidia fix this with an ARM core on the graphics card? Look back at the workflow I outlined above again, because it's important. Note that the graphics driver is what handles the buffer of queued native commands, not DirectX / OpenGL. Nvidia has full control over how that buffer is handled, which includes placing it on a dedicated ARM processor that does NOT have to change contexts, rather than chewing through it on the CPU. In this situation the ARM processor can flush as often as it wants, with no penalty to the CPU.

Pieter3dnow · Jan 20, 2014

wonderfield said:
A 'batch' is a curious term (considering it necessarily implies batching) to describe a submission, or in OGL and D3D parlance, a draw call. Direct3D can do 100k draw calls. There's probably some sort of soft limit as to how many submissions can be done per frame, but I'm unaware of such a number.

To suggest that traditional APIs cannot do what Mantle does at least in this respect is not quite correct. They can reach that high, but at steeper penalty versus Mantle. Importantly, however, both D3D and OGL have built-in means of reducing that number, and those exist as a function of rendering optimization. That's one thing that seems to be missing from the conversation when people compare Mantle to traditional APIs: they don't quite compare the same way when taking about draw call overhead. There's a lot of fixation on these numbers without much emphasis on how they relate to the way things have been done before.

It is correct because the end result would not be something that you can sell as a game , it is called a slideshow and no one would pay money for it.

And to put it in perspective Mantle also allow certain functions to work without it needing to be addressed in the driver itself something that both DX/OpenGL suffer from.

tangoseal · Jan 20, 2014

Good lord he asked what the latest word on mantle is.... not 700 opinions on whos opinion is wrong.

Unknown-One · Jan 20, 2014

Pieter3dnow said:
It is correct because the end result would not be something that you can sell as a game , it is called a slideshow and no one would pay money for it.

You missed his point about how DirectX and OpenGL don't actually NEED to use 100k draw calls to represent what Mantle is representing with 100k draw calls...

You want hundreds-of-thousands of spaceships on-screen? Are they all the same ship?
- That's 1 draw call (yes, really, DirectX 10+ allows this)

Oh, they're unique ships?
- That's 1 call per unique model. Lets say there are 200 different ships, that's 200 calls, no matter how many duplicates are buzzing around in the swarm.

Some of them are so far away and so tiny that they're only represented by a handful of pixels?
- Drop to generic model for such ships, draw calls saved for any unique content on them.

This also allows you to do things like render entire FIELDS of grass, with individually-rendered blades (where most blades can be identical without anyone being able to notice), with only a handful of actual draw calls taking place. When BF3 came out, DICE actually had a slideshow where they showcased how upgrading from DX9 to DX10+ had impacted draw calls. Scenes that would have taken 80k draw calls were suddenly possible in 8k.

It's unclear if Mantle has such provisions, or if it actually NEEDS a draw call per-object no matter what.

Geck0 · Jan 20, 2014

Unknown-One said:
Right, there are no 100% concrete sources for said information, so I stated ahead of time that it was rumor.

Are you making a point of some kind, or just repeating the obvious?

Right, and if the rumor holds true, a dedicated processor would then be handling the overhead from said abstraction. Overall performance becomes unbound from CPU performance, exactly the same end-result as Mantle.

AMD's solution involves using a totally new API, and there's no co-processor available for any events that DO happen to still produce CPU overhead.
A co-processor "fixes" the issue with existing API's. Developers keep on doing what they do now, nothing new to learn, no extra API to program for in addition to DX/OGL. The card handles the heavy lifting associated with using a high-level API.

Put another way, AMD would have a super-efficient solution that only works when specifically programmed for. Nvidia would have a less-efficient solution (basically employing brute force) that fixes the overhead issue everywhere.

Remains to be seen if Maxwell will actually do this, but it would be exceedingly handy.

You'd need an enormous amount of brute force to fix the abstraction issues with DX and OGL because of all the baggage that they both contain, you are nowhere near programming to the metal at all.

They are also rubbish at multi threading so co processing is going to run out of steam in a hurry - if chucking more cores or compute units at the problem was the solution then why haven't multi core CPU's knocked the problem on the head?

It's because DX and OGL are bloated and locked in single thread land for the most part, yes there is some multi threading in there but it's lip service at best. I think that the people saying that Mantle will really only benefit low end hardware will be shocked at the improvements they see when it's fully utilised, hopefully Dice haven't managed to fuck this up as hardcore as they did the game code.

Unknown-One · Jan 20, 2014

Geck0 said:
You'd need an enormous amount of brute force to fix the abstraction issues with DX and OGL because of all the baggage that they both contain, you are nowhere near programming to the metal at all.

Define "baggage"?

And also define "brute force," because solving the problem with brute-force by increasing CPU speed is an ENTIRELY different animal from solving the problem with brute-force by offloading to a processor that never has to go through a context switch.

Offloading immediately kills-off the single most-expensive part of draw calls. Suddenly you don't need NEARLY as much raw speed to do the job.

Geck0 said:
They are also rubbish at multi threading so co processing is going to run out of steam in a hurry - if chucking more cores or compute units at the problem was the solution then why haven't multi core CPU's knocked the problem on the head?

Uh... you realize the majority of the overhead is driver-level, not API-level, right? I covered this a couple posts ago, might want to go back and take a look...

Flushing the buffer of native commands from the CPU to the GPU is the biggest bottleneck. That's nothing to do with the API, the API already did its work of translating generalized calls into hardware-specific calls. The video driver manages that buffer.

The multi-threading issues are all on AMD / Nvidia, they're the ones who're splitting up the queue among cores and trying to get it to the GPU as efficiently as possible. Again, not directly anything wrong with the API here...because we're well passed the point at which the API is doing anything to the data being moved around.

Geck0 said:
It's because DX and OGL are bloated and locked in single thread land for the most part, yes there is some multi threading in there but it's lip service at best.

multi-threading on a CPU wont help nearly as much as a dedicated processor that doesn't have to jump through hoops handling things.

Also, DirectX 11 has very well-implemented multitasking. The API itself covers this amazing well, but it calls upon the video driver to support it... and as Dice found out, neither Nvidia or AMD have a full Microsoft-spec implementation in their drivers yet. There's room to improve things, and it's in the video drivers, not the API...

Rizen · Jan 20, 2014

Unknown-One said:
Alright, first off, the fact that it's an abstraction layer does not automatically mean there will be a worthwhile amount of overhead.

Mantle is an abstraction layer too, after all.

Instruction set has VERY little to do with it in this case, for two reasons.

1. ARM doesn't support anywhere near the total number of instructions found in the x86 instruction set, but it DOES support all the instructions needed to handling the overhead associated with 3D graphics already. And that's all that matters in this case. There are direct 1:1 equivalents between ARM and x86 for this particular workload.

2. The performance hit comes from context switches and less-than-ideally filled queues, not a general lack of processing power. Look up what a context switch does to a processor and you'll understand why this is an expensive thing to do.

Well, first you need to understand how high-level API's like DirectX and OpenGL work...

This is a simplified workflow:
1. Game engine makes standardized DirectX or OpenGL API calls.
2. The graphics API translates these calls into the ones actually supported by the hardware.
3. The graphics driver begins queuing up all of those native commands in a buffer.
4. The buffer is flushed and sent to the GPU for actual processing.
5. The GPU processes while the CPU re-fills its buffer.

What usually happens is the GPU finishes working on the data sent to it before the CPU is ready with another batch. The GPU has to sit on its hands and wait for another buffer flush before it can get back to rendering. This situation is called being "CPU limited."

You might think this could be solved by simply flushing the buffer more often, but that's where things really start to get bad. What causes the CPU to take so long in the first place? Overhead from context switches is a big part of it, and two context switches are required nearly every time the queue is flushed to the graphics card (from user mode to kernel mode, then back again).

The higher you want your framerate, the more often your GPU needs new data. This means flushing that buffer more often, which means more CPU overhead the faster the game runs (until eventually the CPU simply can't keep up and is stuck waiting for context switches to complete over-and-over).

Soooo, how does Nvidia fix this with an ARM core on the graphics card? Look back at the workflow I outlined above again, because it's important. Note that the graphics driver is what handles the buffer of queued native commands, not DirectX / OpenGL. Nvidia has full control over how that buffer is handled, which includes placing it on a dedicated ARM processor that does NOT have to change contexts, rather than chewing through it on the CPU. In this situation the ARM processor can flush as often as it wants, with no penalty to the CPU.

Thanks, this was helpful!

TroyX · Jan 20, 2014

Xizer said:
...And that's before overclocking.

Maybe when you're done educating yourself on the power of modern laptops you can start attending grammar school.

you are changing subjects.

you said: "my laptop" in the same price range of PS4.

that Laptop is easily over $2k? is it?

I never said any laptop in general cant run games 1080p/60fps.

bernaby · Jan 20, 2014

All I wanted to know was whether anyone knew when we might see Mantle released into the wild. There always seems to be a difference between what the marketing hype says and what a product actually delivers. This is generally true of any company and any new product. I've been waiting to see before and after comparisons by independent reviewers.

wonderfield · Jan 20, 2014

Pieter3dnow said:
It is correct because the end result would not be something that you can sell as a game , it is called a slideshow and no one would pay money for it.

And it wouldn't be done to begin with. Not by any competent studio, anyway.

PRIME1 · Jan 20, 2014

bernaby said:
All I wanted to know was whether anyone knew when we might see Mantle released into the wild. There always seems to be a difference between what the marketing hype says and what a product actually delivers. This is generally true of any company and any new product. I've been waiting to see before and after comparisons by independent reviewers.

Beta support sometime in the 1st quarter of this year.

Feellia · Jan 20, 2014

Think Plants vs Zombies and Thief will have Mantle out of the gate when released in February.

Hopefully BF4 gets there sooner...quite eager to see Mantle reviews like everyone else.

Tamlin_WSGF · Jan 20, 2014

Unknown-One said:
Soooo, how does Nvidia fix this with an ARM core on the graphics card? Look back at the workflow I outlined above again, because it's important. Note that the graphics driver is what handles the buffer of queued native commands, not DirectX / OpenGL. Nvidia has full control over how that buffer is handled, which includes placing it on a dedicated ARM processor that does NOT have to change contexts, rather than chewing through it on the CPU. In this situation the ARM processor can flush as often as it wants, with no penalty to the CPU.

Where to start ... First off, if Nvidia could have pulled off reducing API and driver overhead by slapping on some ARM cores, that would have been pretty cool. Mantle isn't interesting because its from AMD, but what it might do for us consumers as an API itself, or to push current API's like DX/OGL forward.

The overhead with the driver under DX, isn't that a 6-core Intel Core i7-4930K or a 2 core Intel Core i3-3245 is too slow to run the driver and need help from a low powered ARM core. Its not the CPU that is the problem, but the driver and the API itself. On Mantle, a lot of the driver tasks are being handled by the game engine itself, like synchronization which is normally done through the GPU driver.

The context switching is done by the GPU already today via the scheduler. An ARM core won't do much good here I think. The buffer is in memory, so you cannot place it on an ARM core (did you mean something else?).

What information is it that you wish to use an ARM core to flush? Pipeline flush due to state changes? If so, why and how would you want to use an ARM core for that? Or is it the command buffer which is already handled by the scheduler on the GPU?

Draw calls, resource creation, state settings etc. are submitted by the game engine to the API, so there is no need to flush anything before something new is ready to be processed. If the GPU is ready for new instructions, it would have to wait regardless of what you flush on the GPU side, so even if (and I doubt it) an ARM core should make the GPU flush faster, it wouldn't help if it was CPU bound anyway.

I see an ARM core being useful under CUDA spesific scenarios, where it might be able to do some asynchronous work on the side like PhysX for gaming purposes. Other then that, I think Nvidia intends to use the ARM core to make the GPU more independent of CPU in the GPGPU supercomputer part of their business or for other professional use under CUDA or perhaps also with OGL extentions that makes uses of that extra CPU. It might even be disabled for consumer products.

I would have loved to have seen the ARM core give some benefit for gaming besides that, but I cannot see how.

Raap · Jan 20, 2014

Unknown-One said:
You want hundreds-of-thousands of spaceships on-screen? Are they all the same ship?
- That's 1 draw call (yes, really, DirectX 10+ allows this)

Oh, they're unique ships?
- That's 1 call per unique model. Lets say there are 200 different ships, that's 200 calls, no matter how many duplicates are buzzing around in the swarm.

My experience is limited to hobby OpenGL programming, but aren't you simplifying quite a bit here? Many game objects are made up of several graphics models and thus require multiple draw calls unless otherwise worked around. You're also ignoring the massive amount of special effects being done these days; even if you did have objects that could be drawn with a single draw call, you would, practically speaking, be drawing it several times into different buffers to create your shadowmaps and whatnot.

The result being that even simple'ish objects can end up taking several draw calls for every frame.

You also seem to be ignoring some of the other huge benefits of Mantle, like being able to take control over the GPU's memory and state, and context-less global resources.

If you look at the actual slides(http://www.slideshare.net/DICEStudio/mantle-for-developers), you'll see the improvements to draw calls is barely even mentioned, but there are lots of other goodies.

thebufenator · Jan 20, 2014

Raap said:
My experience is limited to hobby OpenGL programming, but aren't you simplifying quite a bit here? Many game objects are made up of several graphics models and thus require multiple draw calls unless otherwise worked around. You're also ignoring the massive amount of special effects being done these days; even if you did have objects that could be drawn with a single draw call, you would, practically speaking, be drawing it several times into different buffers to create your shadowmaps and whatnot.

The result being that even simple'ish objects can end up taking several draw calls for every frame.

You also seem to be ignoring some of the other huge benefits of Mantle, like being able to take control over the GPU's memory and state, and context-less global resources.

If you look at the actual slides(http://www.slideshare.net/DICEStudio/mantle-for-developers), you'll see the improvements to draw calls is barely even mentioned, but there are lots of other goodies.

He is trying to say Nvidia is better......so yeah, he is ignoring the benefits of Mantle.

Unknown-One · Jan 20, 2014

Tamlin_WSGF said:
The overhead with the driver under DX, isn't that a 6-core Intel Core i7-4930K or a 2 core Intel Core i3-3245 is too slow to run the driver and need help from a low powered ARM core.

Are you being sarcastic or agreeing? Did you even read my post?

I said pretty clearly that the issue isn't due to a lack of raw processing power on current CPU's, it's the fact that context switches are expensive and flushing the command queue from the CPU's memory space to the GPU multiple times per second holds up the show.

If you can eliminate the need for the constant switching and flushing, you eliminate the need for such a fast processor. Offloading onto a dedicated chip does just that.

Tamlin_WSGF said:
I would have loved to have seen the ARM core give some benefit for gaming besides that, but I cannot see how.

I explained pretty simply how it'd help...

I honestly couldn't make sense of most of your reply here. Could you please word it more clearly?

Raap said:
My experience is limited to hobby OpenGL programming, but aren't you simplifying quite a bit here? Many game objects are made up of several graphics models and thus require multiple draw calls unless otherwise worked around. You're also ignoring the massive amount of special effects being done these days; even if you did have objects that could be drawn with a single draw call, you would, practically speaking, be drawing it several times into different buffers to create your shadowmaps and whatnot.

Example was based off of the (very few) Mantle demos we've seen so far, which have all been fairly simple graphically (aside from the MASSIVE number of objects being presented at one time).

DirectX and OpenGL also have no trouble under these workloads when some simple optimizations are done.

That was the only way to stay apples-to-apples. If you want to talk actual gaming workloads, we have no proper Mantle-powered examples as of yet.

Raap said:
You also seem to be ignoring some of the other huge benefits of Mantle, like being able to take control over the GPU's memory and state, and context-less global resources.

Not ignoring it, Nvidia cards just already have that capability to some extent (exposed through CUDA), and Maxwell expands upon it with fully shared virtual memory between the CPU and GPU.

thebufenator said:
He is trying to say Nvidia is better......so yeah, he is ignoring the benefits of Mantle.

Where the heck did I say that? I said nothing of the sort.

I said, if Nvidia can offload the overhead associated with DirectX and OpenGL onto the ARM processor on Maxwell, it would potentially be a superior solution to a solution that requires explicit support (like Mantle).

This is all hypothetical, can't claim Nvidia is better on something that doesn't yet exist... we're reading the same thread, right?

Mantle is interesting, but I find it far less interesting than an implementation that removes overhead on current widely-deployed graphics APIs. If Nvidia can pull off the latter, they'll have a serious advantage.

thebufenator · Jan 20, 2014

So you prefer a hardware solution to poor software design, rather than good software design that is efficient?

AltTabbins · Jan 20, 2014

I love this thread. 2 groups of people arguing tooth and nail about things that don't even exist.

Ws60 · Jan 20, 2014

lloose said:
I love this thread. 2 groups of people arguing tooth and nail about things that don't even exist.

Welcome to every single communication medium ever invented by humanity.

Unknown-One · Jan 20, 2014

thebufenator said:
So you prefer a hardware solution to poor software design, rather than good software design that is efficient?

You're putting words in my mouth again, I did not say that.

I simply said that a solution that works with all existing DirectX and OpenGL titles is going to be a boon to both gamers and developers.

- Gamers get a performance boost in all games (new and old), no explicit support by the developer required.
- Developers don't have to do anything new or support anything extra.
- More of a developer's time and money can go to actually making a good game.
- The extra GPU hardware used to accomplish this is general-purpose and can be used for other tasks besides gaming.

Compare that to mantle, which only helps a sub-set of new games that explicitly support the API... which one sounds better to an end user/developer?

Pieter3dnow · Jan 20, 2014

Unknown-One said:
You missed his point about how DirectX and OpenGL don't actually NEED to use 100k draw calls to represent what Mantle is representing with 100k draw calls...

You want hundreds-of-thousands of spaceships on-screen? Are they all the same ship?
- That's 1 draw call (yes, really, DirectX 10+ allows this)

Oh, they're unique ships?
- That's 1 call per unique model. Lets say there are 200 different ships, that's 200 calls, no matter how many duplicates are buzzing around in the swarm.

Some of them are so far away and so tiny that they're only represented by a handful of pixels?
- Drop to generic model for such ships, draw calls saved for any unique content on them.

This also allows you to do things like render entire FIELDS of grass, with individually-rendered blades (where most blades can be identical without anyone being able to notice), with only a handful of actual draw calls taking place. When BF3 came out, DICE actually had a slideshow where they showcased how upgrading from DX9 to DX10+ had impacted draw calls. Scenes that would have taken 80k draw calls were suddenly possible in 8k.

It's unclear if Mantle has such provisions, or if it actually NEEDS a draw call per-object no matter what.

DICE and AMD just wasted their time on Mantle because you said so

.

And the point that I was making is that the driver model is something that is limiting developers and if you cared to check the youtube I linked through the BSN page, you could see actual developers say the same thing.

I know some software developers in real life and if they don't have to do the work for it they won't .

When it comes down to Mantle it is such a waste of time that EA does Frostbite in it and Oxide have their engine while they could got away with 0 extra hours of work just implement the DX10 "trick" you mentioned.

wonderfield · Jan 20, 2014

There's no trick: it's just part of the engineering effort.

Mantle's real benefits are going to be in terms of exposing the architectural strengths of GCN and having ways to exploit that through a lower-level means of access, not its reduced draw call overhead. That's how AMD relates it to traditional APIs, but that's not what's really interesting about it.

Unknown-One · Jan 20, 2014

Pieter3dnow said:
DICE and AMD just wasted their time on Mantle because you said so .

Where did I say they wasted their time? Where?

Pieter3dnow said:
I know some software developers in real life and if they don't have to do the work for it they won't

Exactly, which is why a solution that gets around the overhead associated with DX and OGL would be more palatable to developers than a solution that requires them to do something new.

Pieter3dnow said:
When it comes down to Mantle it is such a waste of time that EA does Frostbite in it and Oxide have their engine while they could got away with 0 extra hours of work just implement the DX10 "trick" you mentioned.

DICE already implemented it. Instancing is used heavily in Battlefield 3, and they noted as much in their tech slides from when BF3 was released.

On average, they were able to render the same scenes with roughly 1/10th the number of draw calls of previous versions of DirectX. That's a huge savings, and means the game performs a LOT better under DX11 than it does under DX9.

Wildace · Jan 20, 2014

i guess it also kinda assumes that the hardware solution will work with all current games, and not need a patch to take advantage of it or new games to be written to work to the hardware solution.

Venomous · Jan 20, 2014

Are the pro Nvidia guys still under the belief mantle won't bring shit to the table but the next GPU, aka maxwell will, because it has an arm processor built in? Lol, the same Nvidia arm processor that's been getting it's ass handed to it by a snapdragon?

TaintedSquirrel · Jan 20, 2014

Venomous said:
Are the pro Nvidia guys still under the belief mantle won't bring shit to the table but the next GPU, aka maxwell will, because it has an arm processor built in? Lol, the same Nvidia arm processor that's been getting it's ass handed to it by a snapdragon?

idk AMD is keeping their benchmarks locked away at Fort Knox apparently. They love flaunting those APU's though.
Doesn't inspire much confidence.

funkydmunky · Jan 20, 2014

Unknown-One said:
You're putting words in my mouth again, I did not say that.

I simply said that a solution that works with all existing DirectX and OpenGL titles is going to be a boon to both gamers and developers.

- Gamers get a performance boost in all games (new and old), no explicit support by the developer required.
- Developers don't have to do anything new or support anything extra.
- More of a developer's time and money can go to actually making a good game.
- The extra GPU hardware used to accomplish this is general-purpose and can be used for other tasks besides gaming.

Compare that to mantle, which only helps a sub-set of new games that explicitly support the API... which one sounds better to an end user/developer?

Are you a paid employee of Nvidia, hence privy to this insider information that Nvidia has not announced, or is it all from your imagination? You seem amazingly technical in your details providing a solution that is better than mantle, but Nvidia has disclosed non of this. So which is it?

Venomous · Jan 20, 2014

funkydmunky said:
Are you a paid employee of Nvidia, hence privy to this insider information that Nvidia has not announced, or is it all from your imagination? You seem amazingly technical in your details providing a solution that is better than mantle, but Nvidia has disclosed non of this. So which is it?

Wouldn't be hard to figure out if Kyle looked at his logs in vbulletin. AT has the same crap going on at their forum and now these employees of companies have to disclose it and it's placed on their signature.

He sure seems to know an awfully lot though doesn't he?

socK · Jan 21, 2014

Unknown-One said:
You missed his point about how DirectX and OpenGL don't actually NEED to use 100k draw calls to represent what Mantle is representing with 100k draw calls...

You want hundreds-of-thousands of spaceships on-screen? Are they all the same ship?
- That's 1 draw call (yes, really, DirectX 10+ allows this)

Oh, they're unique ships?
- That's 1 call per unique model. Lets say there are 200 different ships, that's 200 calls, no matter how many duplicates are buzzing around in the swarm.

Some of them are so far away and so tiny that they're only represented by a handful of pixels?
- Drop to generic model for such ships, draw calls saved for any unique content on them.

This also allows you to do things like render entire FIELDS of grass, with individually-rendered blades (where most blades can be identical without anyone being able to notice), with only a handful of actual draw calls taking place. When BF3 came out, DICE actually had a slideshow where they showcased how upgrading from DX9 to DX10+ had impacted draw calls. Scenes that would have taken 80k draw calls were suddenly possible in 8k.

It's unclear if Mantle has such provisions, or if it actually NEEDS a draw call per-object no matter what.

Unknown-One said:
Where did I say they wasted their time? Where?

Exactly, which is why a solution that gets around the overhead associated with DX and OGL would be more palatable to developers than a solution that requires them to do something new.

DICE already implemented it. Instancing is used heavily in Battlefield 3, and they noted as much in their tech slides from when BF3 was released.

On average, they were able to render the same scenes with roughly 1/10th the number of draw calls of previous versions of DirectX. That's a huge savings, and means the game performs a LOT better under DX11 than it does under DX9.

Where are you even getting these numbers from?

The slides on instancing in BF3 say 1500 - 2000 average down from "3000 - 7000 in heavy cases."

Unknown-One · Jan 21, 2014

funkydmunky said:
Are you a paid employee of Nvidia, hence privy to this insider information that Nvidia has not announced, or is it all from your imagination? You seem amazingly technical in your details providing a solution that is better than mantle, but Nvidia has disclosed non of this. So which is it?

Read. The. Thread.

Already stated very. very. very clearly that Maxwell having API-offload capability is strictly a rumor. Already been said, in multiple posts, by me. Again, READ the THREAD that you're responding to...

I'm merely commenting that, if Nvidia manages to pull it off, I find such a solution preferable to Mantle. It's applicable in a winder range of games with less work on the dev's part, sounds good to me.
I was questioned how such a solution could possibly work and I offered a possible explanation. Not difficult to understand.

Venomous said:
Wouldn't be hard to figure out if Kyle looked at his logs in vbulletin. AT has the same crap going on at their forum and now these employees of companies have to disclose it and it's placed on their signature.

*sigh* I am in no way affiliated with Nvidia. I have had no contact with Nvidia. I have never accepted any payment or gifts from Nvidia.

I have, however, worked directly with AMD. I have been sent AMD engineering sample graphics cards in the past as well. AMD even gifted me an HD 6970 after working through driver issues with them for months on end. Check my signature, I still have the card.

Nvidia shill, my ass...

Venomous said:
He sure seems to know an awfully lot though doesn't he?

Did you miss the two posts where I said and/or clarified it was a rumor, and the post where I said it was hypothetical?

People really just can't seem to read today...

Klapcos · Jan 21, 2014

True that.

Dreamerbydesign · Jan 21, 2014

Venomous said:
Are the pro Nvidia guys still under the belief mantle won't bring shit to the table but the next GPU, aka maxwell will, because it has an arm processor built in? Lol, the same Nvidia arm processor that's been getting it's ass handed to it by a snapdragon?

Comparing apples and rocks?

Geck0 · Jan 21, 2014

Unknown-One said:
Define "baggage"?

All the accumulated compatibility in the code that's no longer required or desirable but never pared because it's a job nobody wants to do. Also applies to the single threaded nature of the API's due to the fact that the fundamentals were written when nobody had multi core CPU's on the radar at all. Baggage.

Unknown-One said:
And also define "brute force," because solving the problem with brute-force by increasing CPU speed is an ENTIRELY different animal from solving the problem with brute-force by offloading to a processor that never has to go through a context switch.

YOU define it, you coined the term in your earlier post and I responded in the context in which it was used.

Unknown-One said:
Offloading immediately kills-off the single most-expensive part of draw calls. Suddenly you don't need NEARLY as much raw speed to do the job.

Seriously? So a state of the art CPU with 8 cores can't accomplish that but a 2nd rate ARM core will just solve the problem? You could throw a render farm at the issue and it would be still be shite because it's just shite code. You need a driver and an API that can handle more than one thing at once with any degree of efficiency first.

Unknown-One said:
Uh... you realize the majority of the overhead is driver-level, not API-level, right? I covered this a couple posts ago, might want to go back and take a look...

Why should I? You don't really know what you are talking about IMO. Otherwise you'd understand that you can never code an efficient driver to an inefficient API - especially when you consider that for DX at least, nobody really knows what they are coding for because it's a virtual black box. They just keep picking around it and breaking stuff as they fix other things. If the game devs were that content with their lot to date then Mantle would not have eventuated. It's not come about because people can't write drivers.

Unknown-One said:
SNIP, more obfuscation

Unknown-One said:
multi-threading on a CPU wont help nearly as much as a dedicated processor that doesn't have to jump through hoops handling things.

What a load of poop, the software still has to accommodate it DOING SOMETHING, and the software is the root of the problem

Unknown-One said:
Also, DirectX 11 has very well-implemented multitasking. The API itself covers this amazing well, but it calls upon the video driver to support it... and as Dice found out, neither Nvidia or AMD have a full Microsoft-spec implementation in their drivers yet. There's room to improve things, and it's in the video drivers, not the API...

DirectX 11 is a pile of shit, why is it that if it's so bloody good at multitasking (was that multithreading? dunno but I'll forge on) then why is it that intel CPU's that are good at single threaded apps and not so hot at multithreading do so well in windows gaming vs AMD with a juxtaposition in effect?

Mate all I've seen you do in this thread is spout off semi technical nonsense that flies in the face of common sense, while making broad arguments that are taken by others to a logical conclusion, only to defend yourself by saying "Where did I say that?"

I'm done with the argument - TBH it's pretty pointless until the pudding arrives with the proof in it. Personally I'm looking forward to all the egg needing to be cleaned off of faces.

Pieter3dnow · Jan 21, 2014

Unknown-One said:
Where did I say they wasted their time? Where?

Exactly, which is why a solution that gets around the overhead associated with DX and OGL would be more palatable to developers than a solution that requires them to do something new.

DICE already implemented it. Instancing is used heavily in Battlefield 3, and they noted as much in their tech slides from when BF3 was released.

On average, they were able to render the same scenes with roughly 1/10th the number of draw calls of previous versions of DirectX. That's a huge savings, and means the game performs a LOT better under DX11 than it does under DX9.

You proved my point right, you do realize this ? Battlefield 3 didn't use frostbite 3 which has a lot more going for it then most people realize.

Unknown-One · Jan 21, 2014

Geck0 said:
All the accumulated compatibility in the code that's no longer required or desirable but never pared because it's a job nobody wants to do. Also applies to the single threaded nature of the API's due to the fact that the fundamentals were written when nobody had multi core CPU's on the radar at all. Baggage.

DirectX 10+ and OpenGL 4.0+ were fairly clean breaks from all previous versions, no baggage or compatibility code there... Windows has to include DirectX 9.0L in addition to DirectX 11 in order to support legacy games, because DirectX 11 doesn't run them on its own.

And I'm not sure where you got the idea that DirectX or OpenGL are single-threaded, but DX10+ and OpenGL 4.0 were both designed with multi-core in mind.

Geck0 said:
Seriously? So a state of the art CPU with 8 cores can't accomplish that but a 2nd rate ARM core will just solve the problem? You could throw a render farm at the issue and it would be still be shite because it's just shite code. You need a driver and an API that can handle more than one thing at once with any degree of efficiency first.

Yes. An 8-core x86 CPU that has to halt almost all operations hundreds of times to service context switches and flush a buffer is going to be seriously handicapped compared to a dedicated ARM processor accomplishing the same task without ever halting.

This problem is compounded when said CPU has to actually do other work besides processing draw calls. Again, almost the entire chip halts, so any game code being ran also has to wait for the context switch to finish, which slows things down still further.

What part of this do you not understand, exactly? A processor that has to stop constantly and is handling multiple tasks simultaneously is going to have issues keeping up with a slower processor, doing one task only, that can sail right on through without ever halting.

Geck0 said:
You don't really know what you are talking about IMO. Otherwise you'd understand that you can never code an efficient driver to an inefficient API - especially when you consider that for DX at least, nobody really knows what they are coding for because it's a virtual black box.

You're kidding, right? AMD, Nvidia, and Intel have all the access they need to DirectX. Their cards have to support all the features of the API, after all. They couldn't design GPU's if they didn't know what DirectX would be requesting of their GPU's...

Geck0 said:
What a load of poop, the software still has to accommodate it DOING SOMETHING, and the software is the root of the problem

Not sure what you're getting at here. Maxwell supports shared virtual memory, the driver can simply dump the command queue directly into address space shared by the CPU and GPU, where the ARM processor can order it and get the workload processing on-the-fly.

Cuts out all the system-CPU overhead.

Geck0 said:
DirectX 11 is a pile of shit, why is it that if it's so bloody good at multitasking (was that multithreading? dunno but I'll forge on) then why is it that intel CPU's that are good at single threaded apps and not so hot at multithreading do so well in windows gaming vs AMD with a juxtaposition in effect?

Because, once again, you failed to read the thread. This has already been covered.

DirectX includes provisions for multithreading, and they're very easy for developers to use. DICE implemented DX11 and associated multithtreading optimizations starting with Battlefield 3. They released all the technical details ages ago.

But, as was said previously, the video drivers from AMD and Nvidia are causing issues. DX11 is splitting up the workload in a way that's easy to multithread, but the drivers are largely running the workload on 2 or 3 cores anyway. This is an internal hangup in the video driver, not an internal hangup in the API.

Geck0 said:
Mate all I've seen you do in this thread is spout off semi technical nonsense that flies in the face of common sense, while making broad arguments that are taken by others to a logical conclusion, only to defend yourself by saying "Where did I say that?"

Right, people are drawing their own conclusions and then claiming that's what I was saying...

The response "where did I say that?" is totally appropriate in that situation.

Pieter3dnow said:
You proved my point right, you do realize this ? Battlefield 3 didn't use frostbite 3 which has a lot more going for it then most people realize.

Not sure what you're on about, you were agreeing with me. Devs would prefer a solution that requires minimal effort on their part.

Killing-off the overhead associated with existing API's would tick that box nicely.

wonderfield · Jan 21, 2014

Geck0 said:
DirectX 11 is a pile of shit

For what it is, Direct3D is a perfectly reasonable API. The notion that it's "shit" has been parroted by both die-hard OpenGL proponents (who believe anything and everything Microsoft produces is terrible) and by Mantle proponents who've never actually worked with the API.

Mantle exists to fulfill a different goal: it's not a Direct3D replacement.

Digital Viper-X- · Jan 21, 2014

wonderfield said:
For what it is, Direct3D is a perfectly reasonable API. The notion that it's "shit" has been parroted by both die-hard OpenGL proponents (who believe anything and everything Microsoft produces is terrible) and by Mantle proponents who've never actually worked with the API.

Mantle exists to fulfill a different goal: it's not a Direct3D replacement.

To be fair, he did also state that OpenGL wasn't much better, so It's not a case of MS hate.

Unknown-One, you keep talking about buffer flushing halting the entire cpu, care to explain where you get this info from, and explain it on a more technical level?

Unknown-One · Jan 21, 2014

Digital Viper-X- said:
Unknown-One, you keep talking about buffer flushing halting the entire cpu, care to explain where you get this info from, and explain it on a more technical level?

Already covered this... current video drivers queue up hardware-native commands in a buffer. This is controlled by the CPU.

To flush that buffer out to the GPU, (up to) two context switches are required. DX9 (and previous) required a context switch to-and-from kernel-mode EVERY time, DX10 (and later) don't always require a context switch.

For most intents and purposes DX11 already has optimizations to fix most of the draw-call related overhead that DX9 suffered from. Instancing lets you get an insane number of objects on-screen without adding to the draw-call count, and multithreaded display lists help increase the total number of draw calls by spreading things out across multiple CPU cores (if the video driver is willing).

Venomous · Jan 21, 2014

wonderfield said:
For what it is, Direct3D is a perfectly reasonable API. The notion that it's "shit" has been parroted by both die-hard OpenGL proponents (who believe anything and everything Microsoft produces is terrible) and by Mantle proponents who've never actually worked with the API.

Mantle exists to fulfill a different goal: it's not a Direct3D replacement.

Considering where we are now, how long we have had multicore cpu's, SLI/CX, there's really no reason why the dx api shouldn't support multicore and everything else by now. They are lazy and don't care. DX has no competition in today's windows world ... If apple ever released osx for PC, you would finally see an OpenGL vs DX battle happen, which it won't.

Mantle is the step needed to bring gaming to a level it should of been headed years ago. DX is stagnant and has had very little advancement. Add the fact that new versions of DX require an OS upgrade that many don't want nor appreciate, that Infact makes DX a shit API in my book.

Stoly · Jan 21, 2014

I'm very skeptical about Mantle. I'm not sold on the idea and I don't know if it will deliver.

But I'm even more skeptical on Maxwell, a rumored ARM core that will magically do what a multicore CPU can't... Not buying it.

What's the latest word on Mantle?

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

[H]F Junkie

Weaksauce

[H]F Junkie

[H]F Junkie

[H]ard|Gawd

Limp Gawd

Supreme [H]ardness

2[H]4U

Weaksauce

2[H]4U

Gawd

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

Fully [H]

Weaksauce

[H]F Junkie

Supreme [H]ardness

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

Gawd

[H]F Junkie

2[H]4U

Gawd

Supreme [H]ardness

[H]F Junkie

Limp Gawd

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

[H]F Junkie

Gawd

Supreme [H]ardness