Hawaii matching Maxwell 2?? What's happening under Ashes of the Singularity?

Mahigan

Limp Gawd
Joined
Aug 25, 2015
Messages
142
What's happening under Ashes of the Singularity?

If your GPU architecture is both compute heavy and massively parallel, and you have a game which is compute heavy (Post Processing Effects, Lights and Physics), and you use a massively parallel API which makes efficient use of the compute capabilities of your architecture, then evidently the in-game results will translate into an increase in Frames per Second.

Of course this is assuming that the game engine itself is not bottlenecked on other fronts (Fill Rate, Memory Bandwidth, Texture Mapping, Geometry etc).

This is what we see with Ashes of the Singularity.

The easiest way to derive a comparison of the theoretical compute capabilities of various architectures (theoretical assumes efficient use):

GeForce GTX 980 Ti
CUDA Cores: 2816
Boost Clock (MHz): 1075
1075(2816 * 2) = 6,054,400 flops

Radeon R9 290X
Stream Cores: 2816
Clock (MHz): 1000
1000(2816 * 2) = 5,632,000 flops

If we look at the relative compute performance between the two architectures we can conclude that, theoretically, the GeForce GTX 980 Ti should have an edge (and a noticeable one at that). My findings, by looking at the GCN 1.1 and the Maxwell2 architectures, explain why we do not see this result in the various AotS benchmarks published (8 x Asynchronous Compute Engines working independently of one another or "out of order" (with the ability to correct errors, 8 queued tasks each or 64 total queued tasks) as well as an independent Graphics Command Processor handling Graphics tasks vs 1 x Grid Management Unit working with 1 x Work distributor (32 queues total which includes 1 queue to be used for Graphics commands) feeding 32 Asynchronous Warp Schedulers. I've explained why, architecturally, nVIDIAs HyperQ implementation is not as parallel as AMDs Asynchronous shading implementation (32 queued tasks, operating "in order" vs 64 queued compute tasks plus 1 Graphic task operating "out of order"). This lack in parallelism is one aspect of HyperQ's limitations. The other aspect are the additional hierarchical stages present in HyperQ. Under GCN, the ACEs communicate directly with the various Compute Units (CUs). Under Maxwell 2, the Grid Management Unit (capable of holding 1000s of pending grids) communicates through a Work Distributor which then communicates with the various AWSs in the SMMs.

Maxwell 2 is a less parallel compute architecture than Hawaii. How? HyperQ works "in order" and in several additional hierarchical stages which are not present in GCN's Asynchronous compute solution. HyperQ can thus be described as working "in order" in several segments of its pipeline and is prone to pipeline stalls due to dependencies, whereas Asynchronous Compute can be described as working "out of order" (capable of working in order as well, by syncing several ACEs, if the need arises).

How can we verify this?
Like Fermi and Kepler, GM204 is composed of an array of Graphics Processing Clusters (GPCs), Streaming
Multiprocessors (SMs), and memory controllers. GM204 consists of four GPCs, 16 Maxwell SMs (SMM),
and four memory controllers. GeForce GTX 980 uses the full complement of these architectural
components (if you are not well versed in these structures, we suggest you first read the Kepler and
Fermi whitepapers).

nVIDIA recommends we look at the Kepler White Papers so lets do that...
KDEjIFp.jpg


LksgnXO.png



But where did I get the idea that nVIDIAs HyperQ solution worked hierarchically (in several additional hierarchical stages relative to Asynchronous Compute)? Well... right from nVIDIAs White Paper:
wKcVvUP.jpg



Of course Maxwell 2 improved upon Kepler and Maxwell in this respect as seen in this graph below (AMD GCN info is wrong):
BcTBJgK.jpg



AMD GCN 1.1 290 series and GCN 1.2 works in a different way by using 8 Asynchronous Compute Engines.

WHAT IS THE ROLE OF AN ACE (ASYNCHRONOUS COMPUTE ENGINE)?
The ACEs are responsible for all compute shader scheduling and resource allocation. Products may have multiple ACEs, which operate independently, to scale up or down in terms of performance. Each ACE fetches commands from cache or memory and forms task queues, which are the starting point for scheduling.

Each task has a priority level for scheduling, ranging from background to real-time. The ACE will check the hardware requirements of the highest priority task and launch that task into the GCN shader array when sufficient resources are available.

Many tasks can be in-flight simultaneously; the limit is more or less dictated by the hardware resources. Tasks complete out-of-order, which releases resources earlier, but they must be tracked in the ACE for correctness. When a task is dispatched to the GCN shader array, it is broken down into a number of workgroups that are dispatched to individual compute units for execution. Every cycle, an ACE can create a workgroup and dispatch one wavefront from the workgroup to the compute units.

While ACEs ordinarily operate in an independent fashion, they can synchronize and communicate using cache, memory or the 64KB Global Data Share. This means that an ACE can actually form a task graph, where individual tasks have dependencies on one another. So in practice, a task in one ACE could depend on tasks on another ACE or part of the graphics pipeline. The ACEs can switch between tasks queue, by stopping a task and selecting the next task from a different queue. For instance, if the currently running task graph is waiting for input from the graphics pipeline due to a dependency, the ACE could switch to a different task queue that is ready to be scheduled. The ACE will flush any workgroups associated with the old task, and then issue workgroups from the new task to the shader array.

BAbY4ZB.jpg

EFTb72e.png



ASYNCHRONOUS COMPUTING
For many tasks in the graphics rendering pipeline, the GPU needs to know about ordering; that is, it
requires information about which tasks must be executed in sequence (synchronous tasks), and
which can be executed in any order (asynchronous tasks). This requires a graphics application
programming interface (API) that allows developers to provide this information. This is a key
capability of the new generation of graphics APIs, including Mantle, DirectX® 12, and Vulkan™.
In DirectX 12, this is handled by allowing applications to submit work to multiple queues. The API
defines three types of queues:

  1. Graphics queues for primary rendering tasks
  2. Compute queues for supporting GPU tasks (physics, lighting, post-processing, etc.)
  3. Copy queues for simple data transfers

Command lists within a given queue must execute synchronously, while those in different queues
can execute asynchronously (i.e. concurrently and in parallel). Overlapping tasks in multiple queues
maximize the potential for performance improvement.
Developers of games for the major console systems are already familiar with this idea of multiple
queues and understand how to take advantage of it. This is an important reason why those game
consoles have typically been able to achieve higher levels of graphics performance and image quality
than PCs equipped with a similar level of GPU processing power. However the availability of new
graphics APIs is finally bringing similar capabilities to the PC platform.

7JHO9cA.jpg




SCHEDULING
A basic requirement for asynchronous shading is the ability of the GPU to schedule work from
multiple queues of different types across the available processing resources. For most of their
history, GPUs were only able to process one command stream at a time, using an integrated
command processor. Dealing with multiple queues adds significant complexity. For example, when
two tasks want to execute at the same time but need to share the same processing resources, which
one gets to use them first?

Consider the example below, where two streams of traffic (representing task queues) are
attempting to merge onto a freeway (representing GPU processing resources). A simple way of
handling this is with traffic signals, which allow one traffic stream to enter the freeway while the
other waits in a queue. Periodically the light switches, allowing some traffic from both streams onto
the freeway.

Representation of a simple task switching mechanism:
VapZbxp.jpg


To get the GPU to switch from working on one task to another, a number of steps are required:
  1. Stop submitting new work associated with the current task
  2. Allow all calculations in flight to complete
  3. Replace all context data from the current task with that for the new task
  4. Begin submitting work associated with the new task

Context (also known as “state”) is a term for the working set of data associated with a particular task
while it is being processed. It can include things like constant values, pointers to memory locations,
and intermediate buffers on which calculations are being performed. This context data needs to be
readily accessible to the processing units, so it is typically stored in very fast on-chip memories.
Managing context for multiple tasks is central to the scheduling problem.

An alternative way to handle scheduling is by assigning priorities to each queue, and allowing tasks
in higher priority queues to pre-empt those in lower priority queues. Pre-emption means that a
lower priority task can be temporarily suspended while a higher priority task completes. Continuing
with the traffic analogy, high priority tasks are treated like emergency vehicles – that is, they have
right-of-way at intersections even when the traffic light is red, and other vehicles on the road must
pull to the side to let them pass.

Pre-emption mechanism for handling high priority tasks:
MGlOnRl.jpg


This approach can reduce processing latency for tasks that need it most, however it doesn’t
necessarily improve efficiency since it is not allowing simultaneous execution. In fact, it can actually
reduce efficiency in some cases due to context switching overhead. Graphics tasks can often have a lot of context data associated with them, making context switches time consuming and sapping
performance.

A better approach would be to allow new tasks to begin executing without having to suspend tasks
already in flight. This requires the ability to perform fine-grained scheduling and interleaving of
tasks from multiple queues. The mechanism would operate like on-ramps merging on to a freeway,
where there are no traffic signals and vehicles merge directly without forcing anyone to stop and
wait.

Asynchronous compute with fine-grained scheduling:
D4yS28N.jpg


The best case for this kind of mechanism is when lightweight compute/copy queues (requiring
relatively few processing resources) can be overlapped with heavyweight graphics queues. This
allows the smaller tasks to be executed during stalls or gaps in the execution of larger tasks, thereby
improving utilization of processing resources and allowing more work to be completed in the same
span of time.



HARDWARE DESIGN
The next consideration is designing a GPU architecture that can take full advantage of asynchronous
shading. Ideally we want graphics processing to be handled as a simultaneous multi-threaded (SMT)
operation, where tasks can be assigned to multiple threads that share available processing
resources. The goal is to improve utilization of those resources, while retaining the performance
benefits of pipelining and a high level of parallelism.

AMD’s Graphics Core Next (GCN) architecture was designed to efficiently process multiple command
streams in parallel. This capability is enabled by integrating multiple Asynchronous Compute
Engines (ACEs). Each ACE can parse incoming commands and dispatch work to the GPU’s processing
units. GCN supports up to 8 ACEs per GPU, and each ACE can manage up to 8 independent queues.
The ACEs can operate in parallel with the graphics command processor and two DMA engines. The
graphics command processor handles graphics queues, the ACEs handle compute queues, and the
DMA engines handle copy queues. Each queue can dispatch work items without waiting for other
tasks to complete, allowing independent command streams to be interleaved on the GPU’s Shader
Engines and execute simultaneously.

This architecture is designed to increase utilization and performance by filling gaps in the pipeline,
where the GPU would otherwise be forced to wait for certain tasks to complete before working on
the next one in sequence. It still supports prioritization and pre-emption when required, but this
will often not be necessary if a high priority task is also a relatively lightweight one. The ACEs are
designed to facilitate context switching, reducing the associated performance overhead.



USING ASYNCHRONOUS SHADERS
The ability to perform shading operations asynchronously has the potential to benefit a broad range
of graphics applications. Practically all modern game rendering engines today make use of compute
shaders that could be scheduled asynchronously with other graphics tasks, and there is a trend
toward making increasing use of compute shaders as the engines get more sophisticated. Many
leading developers believe that rendering engines will continue to move away from traditional
pipeline-oriented models and toward task-based multi-threaded models, which increases the
opportunities for performance improvements. The following are examples of some particular cases
where asynchronous shading can benefit existing applications.

Post-Processing Effects
Today’s games implement a wide range of visual effects as post-processing passes. These are
applied after the main graphics rendering pipeline has finished rendering a frame, and are often
implemented using compute shaders. Examples include blur filters, anti-aliasing, depth-of-field,
light blooms, tone mapping, and color correction. These kinds of effects are ideal candidates for
acceleration using asynchronous shading.

Example of a post-process blur effect accelerated with asynchronous shaders:
KdTJtVK.png

Measured in AMD Internal Application – Asynchronous Compute. Test System Specifications: AMD FX 8350 CPU,
16GB DDR3 1600 MHz memory, 990 FX motherboard, AMD R9 290X 4GB GPU, Windows 7 Enterprise 64-bit


Lighting
Another common technique in modern games is deferred lighting. This involves performing a prepass
over the scene with a compute shader before it is rendered, in order to determine which light
sources affect each pixel. This technique makes it possible to efficiently render scenes with a large
number of light sources.
The following example, which uses DirectX 12 and deferred lighting to render a scene with many
light sources, shows how using asynchronous shaders for the lighting pre-pass improves
performance by 10%.

Demonstration of deferred lighting using DirectX 12 and asynchronous shaders:
83b9z8p.jpg

Measured in AMD internal application – D3D12_AsyncCompute. Test System Specifications: Intel i7 4960X, 16GB
DDR3 1866 MHz, X79 motherboard, AMD Radeon R9 Fury X 4GB, Windows 10 v10130

.


SUMMARY
Hardware, software and API support are all now available to deliver on the promise of asynchronous
computing for GPUs. The GCN architecture is perfectly suited to asynchronous computing, having
been designed from the beginning with this operating model in mind. This will allow developers to
unlock the full performance potential of today’s PC GPUs, enabling higher frame rates and better
image quality.

Sources:
http://amd-dev.wpengine.netdna-cdn....10/Asynchronous-Shaders-White-Paper-FINAL.pdf
http://www.microway.com/download/whitepaper/NVIDIA_Kepler_GK110_GK210_Architecture_Whitepaper.pdf

Why would not a lack of "out of order" compute coupled with the lack of error correction matter? One answer to this... LATENCY.

Therefore you have Latency (inability to process "out of order" with error correction leading to pipeline stalls) to thank for the lack of compute efficiency relative to Hawaii when utilizing either nVIDIA HyperQ or AMD Asynchronous Compute for Async Shading. If compute performance becomes the bottleneck, in a DirectX 12 title, GCN 1.1 290 series and GCN 1.2 will match (or in the case of Fiji beat) Maxwell 2 in frames per second.

Hope that answers all of your questions relative to what we're seeing in the Ashes of the Singularity benchmarking results.

Good day :)
 
Last edited:
I hear ya :) I'm looking forward to Fable Legends the most. I also can't wait for Multi-Adapter support.
 
I saw your exact same post on OCN forum. I am sure there were plenty of discussions already regarding your theory, why bother to post same stuff here unless you have some kind of agenda :rolleyes:
 
Dangit it man, I just want to look at nice visual graphs and see longer bars and bigger numbers for team Green!

Ok, got through the reading and your OCN post, which is slightly different. Thanks for the explanation. I can see why AMD pushed Mantel after it solidified its hold on the console market. With so many games being console-ported or mixed the majority of developers are going to end up using DX12 and newer engines which are a lot friendlier to AMDs parallel heavy gpu architecture.
 
I saw your exact same post on OCN forum. I am sure there were plenty of discussions already regarding your theory, why bother to post same stuff here unless you have some kind of agenda :rolleyes:

Of course he have an agenda.. he's also in Anandtech and Hexus and everything else.. xD he's spreading post more than a Jehovah witness door for door.. in every forum he have a extreme low of post count.. I think the higher it's in anandtech and have only 15post.. of course it's a paid guy..
 
Of course he have an agenda.. he's also in Anandtech and Hexus and everything else.. xD he's spreading post more than a Jehovah witness door for door.. in every forum he have a extreme low of post count.. I think the higher it's in anandtech and have only 15post.. of course it's a paid guy..

Of course.

Anyone who doesnt agree with you MUST have an agenda

:rolleyes:.
 
Dangit it man, I just want to look at nice visual graphs and see longer bars and bigger numbers for team Green!

Ok, got through the reading and your OCN post, which is slightly different. Thanks for the explanation. I can see why AMD pushed Mantel after it solidified its hold on the console market. With so many games being console-ported or mixed the majority of developers are going to end up using DX12 and newer engines which are a lot friendlier to AMDs parallel heavy gpu architecture.

I'm concerned with tackling the widespread ignorance I see around the tech forums. Is there a problem with providing clear, concise and factual information? My agenda is the truth.
 
No offense,

I see very little actual information tackling what we're seeing in the benchmark, and a heck of a lot of partisan drivel in those threads.

Yet you started the thread with sensationalist title inviting all the zealots lol
 
Of course.

Anyone who doesnt agree with you MUST have an agenda

:rolleyes:.

so, a recently made user account with the exact same name in every tech forum out there with a max post count of 15 spreading the exact same information, with almost exact same lines in every reply with exact same "sources" its a typical behavior? oh common, don't be obtuse Mac, Never said I disagree with the information Just said the guy have an agenda and its a paid user to create this kind of post defending something specific, again I never said agree or disagree with this post, in fact I would have made the same statement if the post were in the other direction to the green camp.
 
the info seems perfectly valid to me.

If it was factually incorrect, i might agree with you.

Lots of people crosspost on different forums, i guess im just not as sensitive to it.
 
the info seems perfectly valid to me.

If it was factually incorrect, i might agree with you.

again, never said I agree or disagree with the info provided, you want my point? yes I agree in 80% or so of the Original Post.. it's valid and truth mostly. happy?. but that don't delete the fact the guy have an agenda spreading the exact same information everywhere, nobody do that without a reason. specially a so recently made account in every forum out there.
 
so, a recently made user account with the exact same name in every tech forum out there with a max post count of 15 spreading the exact same information, with almost exact same lines in every reply with exact same "sources" its a typical behavior? oh common, don't be obtuse Mac, Never said I disagree with the information Just said the guy have an agenda and its a paid user to create this kind of post defending something specific, again I never said agree or disagree with this post, in fact I would have made the same statement if the post were in the other direction to the green camp.

So maybe he is wanting discussion, or people to fact check him? Instead of attacking him for an agenda, why not discuss what he is talking about?
 
Of course he have an agenda.. he's also in Anandtech and Hexus and everything else.. xD he's spreading post more than a Jehovah witness door for door.. in every forum he have a extreme low of post count.. I think the higher it's in anandtech and have only 15post.. of course it's a paid guy..

Why don't you fire off an email to both AMD and nVIDIA and ask them to take down their white papers if they're erroneous? But of course they're not erroneous. Therefore one can only conclude that your attitude towards the information I have presented is tied to an emotional attachment you have for a particular Corporation. If anyone is attempting to do word-of-mouth marketing, for an organization, then clearly that would be yourself. I mean if you were interested in the truth then a post which is based on the architectural differences between Kepler, Maxwell/2, GCN 1.1 and GCN 1.2 would be of great interest to you. No?

Rather than resort to conspiracy theories and personal attacks, why don't you explain the results we are seeing?

If you're not interested in such a pursuit than I can only conclude that you have no interest for the truth, facts and objectivity. What concerns you is PR damage control, for your preferred Corporation, at any cost. Even if that means belittling others in your quest to "defend" your preferred brand.

You may not be paid, but you're definitely acting like a partisan hack.

I do have an agenda. My agenda is simple. Ending partisanship by spreading the truth. Ending the disgrace that is now the PC Gaming community by showing it where it has lost its way. I'm also interested in moving people away from benchmarks and game reviews, for a bit, and instead explaining the various GPU architectures in detail and how that translates to purchasing decisions. The whole benchmarking model has led to Tech websites being beholden to hardware makers for exclusivity upon launches. That means the "information" you're being fed is the marketing info alone, without any idea as to how this translates into present and future gaming expectations. Back in my day... we had intelligent, if at times heated, discussions. We were informed. Now it appears that partisanship is what drives most PC Gaming community members. You will be seeing more of me. I have a website in the works. As each new DirectX 12 title is released, I will be explaining what is happening behind the scenes. I will force the partisanship out of this community. Don't believe me? Just watch me.
 
Last edited:
Why don't you fire off an email to nVIDIA and ask them to take down their white papers if they're erroneous? But of course they're not erroneous. Therefore one can only conclude that your attitude towards the information I have presented is tied to an emotional attachment you have for a particular Corporation. If anyone is attempting to do word-of-mouth marketing, for an organization, then clearly that would be yourself. I mean if you were interested in the truth then a post which is based on the architectural differences between Kepler, Maxwell/2, GCN 1.1 and GCN 1.2 would be of great interest to you. No?

Rather than resort to conspiracy theories and personal attacks, why don't you explain the results we are seeing?

If you're not interested in such a pursuit than I can only conclude that you have no interest for the truth, facts and objectivity. What concerns you is PR damage control, for your preferred Corporation, at any cost. Even if that means belittling others in your quest to "defend" your preferred brand.

You may not be paid, but you're definitely acting like a partisan hack.

I do have an agenda. My agenda is simple. Ending partisanship by spreading the truth. Ending the disgrace that is now the PC Gaming community by showing it where it has lost its way. I'm also interested in moving people away from benchmarks and game reviews, for a bit, and instead explaining the various GPU architectures in detail and how that translates to purchasing decisions. The whole benchmarking model has led to Tech websites being beholden to hardware makers for exclusivity upon launches. That means the "information" you're being fed is the marketing info alone, without any idea as to how this translates into present and future gaming expectations. Back in my day... we had intelligent, if at times heated, discussions. We were informed. Now it appears that partisanship is what drives most PC Gaming community members. You will be seeing more of me. I have a website in the works. As each new DirectX 12 title is released, I will be explaining what is happening behind the scenes. I will force the partisanship out of this community. Don't believe me? Just watch me.

If you are interested in doing that, than make your own blog. Making threads on every single tech site forum posting same stuff does not make you look credible.
 
why, because you arent interested in discussion?

he seems perfectly credible to me.
 
If you are interested in doing that, than make your own blog. Making threads on every single tech site forum posting same stuff does not make you look credible.

So then each forum links to the blog and... same thing happens? What is wrong with having a discussion in multiple places? Many people only visit one if any of the forums.
 
go to B3D this post will get debunked fast, and actually it already has, this guy doesn't know how things work pretty simple to see that.
 
go to B3D this post will get debunked fast, and actually it already has, this guy doesn't know how things work pretty simple to see that.

This benchmark has already been debunked several times. It keeps getting reposted as a desperate attempt to spread FUD.

When actual games get released that use DX12, I bet you will see a different tune being sung.
 
Why don't you fire off an email to both AMD and nVIDIA and ask them to take down their white papers if they're erroneous? But of course they're not erroneous. Therefore one can only conclude that your attitude towards the information I have presented is tied to an emotional attachment you have for a particular Corporation. If anyone is attempting to do word-of-mouth marketing, for an organization, then clearly that would be yourself. I mean if you were interested in the truth then a post which is based on the architectural differences between Kepler, Maxwell/2, GCN 1.1 and GCN 1.2 would be of great interest to you. No?

Rather than resort to conspiracy theories and personal attacks, why don't you explain the results we are seeing?

If you're not interested in such a pursuit than I can only conclude that you have no interest for the truth, facts and objectivity. What concerns you is PR damage control, for your preferred Corporation, at any cost. Even if that means belittling others in your quest to "defend" your preferred brand.

You may not be paid, but you're definitely acting like a partisan hack.

I do have an agenda. My agenda is simple. Ending partisanship by spreading the truth. Ending the disgrace that is now the PC Gaming community by showing it where it has lost its way. I'm also interested in moving people away from benchmarks and game reviews, for a bit, and instead explaining the various GPU architectures in detail and how that translates to purchasing decisions. The whole benchmarking model has led to Tech websites being beholden to hardware makers for exclusivity upon launches. That means the "information" you're being fed is the marketing info alone, without any idea as to how this translates into present and future gaming expectations. Back in my day... we had intelligent, if at times heated, discussions. We were informed. Now it appears that partisanship is what drives most PC Gaming community members. You will be seeing more of me. I have a website in the works. As each new DirectX 12 title is released, I will be explaining what is happening behind the scenes. I will force the partisanship out of this community. Don't believe me? Just watch me.

LoL hahaha you are seriously butthurt'd right? just because people know you have an agenda and already being debunked =) oh nice.. yes, I hope to see more of you soon..
 
This benchmark has already been debunked several times. It keeps getting reposted as a desperate attempt to spread FUD.

When actual games get released that use DX12, I bet you will see a different tune being sung.
The benchmark for AOS is probably valid, but async shader performance are sensitive to how they are written for the architecture (this is much more so then serially written shaders), there are many variables for this.

He is incorrect on many assumptions of how the GCN and Maxwell 2 architectures work from an ALU, ACE, AWS, ROP, Hull shader units, and many more.

GCN and Maxwell 2 ALU structures are very different. Maxwell 2 does dual issue and GCN can do 4 or 5 co issue. Maxwell 2 has 32 async wrap schedulers which he didn't account for, in this regard Maxwell 2 should more capable than GCN with 8 ACE's (if we follow the OP's logic) but again as above depending on how the shaders are written. He has lumped GCN ACE performance with API overhead due to draw calls (from another post from another forum I have only glanced over this current post of his), one mistake. ACE's and AWS's don't interact with the CPU they were made to reduce latency by doing out of order instructions within the GPU, if they had to interect with the CPU that would defeat the purpose of latency reduction.

He has linked to Keplar's white paper for an example for Maxwell 2's Hyper Q's workflow, which Maxwell 2's Hyper Q's workflow is very different, because the individual units are different another mistake. Maxwell 2's HyperQ does work on grids, with child girds for compute if the parent grid needs a child grild, and does so in serial this is the SAME for GCN, within the grids is where ACE's and AWC's are "asynchronous", the reason for this is if they weren't you will have rendering issues and other problems. Think of this as a critical path problem if X to Y you have 4 operations ABCD, C and D first (C is a child set for A, and D is a different grid that needs to be used for B to complete), each A B C D can be done separately, but C must be done for A to complete and D must be done first for B to complete. With in each of the sets they can be done any which, so some parts of A can be done in concurrent with C as also B parts can be done as D is is being done. After C and D are done, then A and B are complete, then X can become Y with the results of A and B.

There is another mistake with the Exectution of Compute and Graphics computations vs Execution in Queue, those are two different things, the first is actually being done in realtime the second is storing for computation as ALU's get freed up. Maxwell 2 execution in queue is dependent on if there is register space and cache is available to store. Not sure about GCN if there is a variable amount do to the same limitation, but AMD stated a set amount of 64. Anandtech's table is correct in what they stated, not incorrect as the OP stated so any conclusions from that are incorrect.

ROPs don't control tessellation performance, the Hull shaders and how often they are fed if they are bottlenecked due to the amount of procedural geometry is being created do, another mistake which he posted in another forum seems like he is learning not to lump all his conclusions into one.

Microsoft's specifications were carefully made to reduce pressure on the tessellation pipeline going from Dx10 to Dx11. This is because in Dx10 the geometry shaders were being bottlenecked by the shader array because of the new geometry had to be sent into the shader array again from another one of his posts about async shaders on another forum.

These are just a few of the mistakes, there are many more, I would suggest the best way to understand these things is to go through the CUDA and OpenCl handbooks by each respective IHV's to get a better understanding of Async shaders and how they can work optimally for each of the architectures.

I just posted a paraphrased version of how and why async shaders work on B3D so lets see where it goes the rest of what I posted its easily googleable for white papers or found in handbooks as above.
 
Last edited:
LoL hahaha you are seriously butthurt'd right? just because people know you have an agenda and already being debunked =) oh nice.. yes, I hope to see more of you soon..

Debunked where?

The only one with an agenda here are people like you who are attacking the OP instead of trying to have an actual discussion here.
 
wow thread crap was a success it seems. I really don't care what NVidia cards can do as I will not give them a dime... getting burned once with a laptop with their defective junk was enough for me.
 
the benchmarks speak for themselves, I dont get the squabbling over it.

The real question for AMD isnt about whether or not Hawaii matches/defeats Maxwell, it's whether or not Arctic Islands will possess the performance crown next year, and whether or not Pascal will have enough improvements to compete in DX12.
 
What i'm still not seeing here, is why we should care about obscure results from a pre-beta (meaning it's so far off from anything resembling a stable game that it's not even in beta yet). Even then, if you actually read about the goals of the game, it's very obvious that this is not a typical game and thus I don't see how it could be directly representative of general DX12 performance.
 
Debunked where?

The only one with an agenda here are people like you who are attacking the OP instead of trying to have an actual discussion here.

read razor1 post... and you will start to have better conclusions and even a other couple of things not mentioned by razor1.

And btw I do not waste my time discussing with people like Flopper, Fixedmy7970, wantapple that are blindly AMD fanboys or the recently added OP, even you that are biased toward AMD but I think more reasonable.. a lot of people that were AMD defenders bought Nvidia cards and suddenly they stop to be NVidia hater and defend blindly AMD.. and again I will not waste my time discussing with people with agenda.. ;)

I have both camps cards, in fact Just bought a 390X I'm waiting to deliver, I don't have a reason to buy that card, also have no reason to have both AMD FX chips however I'm one of those that also like to support AMD because I don't want a market without competition or fully monopoly.
 
the benchmarks speak for themselves, I dont get the squabbling over it.

This. I couldn't give two shits if there's a tiny monkey inside my video card banging rocks together that magically makes frames appear on my screen. If that tiny blessed magic monkey can make those frames appear faster than the other guy's card, that's what I'm buyin'!
 
Well I was on vacation so I didn't respond for the past week and a half, but if you want I can do that later, I was using it to detect GPU cores for bindless textures in a program that my company is using.
 
Green forums trolls want to make sure this discussion doesn't get any further than "you are a paid shill" and "haha AMD on damage control".
I wont say that razor1 is right or wrong, but why not at least be like him and discuss the topic at hand.
Acting so childishly makes your intentions shine through as clear as daylight.
 
Green forums trolls want to make sure this discussion doesn't get any further than "you are a paid shill" and "haha AMD on damage control".
I wont say that razor1 is right or wrong, but why not at least be like him and discuss the topic at hand.
Acting so childishly makes your intentions shine through as clear as daylight.

If the only thing AMD has going for them is a benchmark for an unreleased game from a company they have paid in the past. It's no surprise their market share is fast approaching zero.

/thread
 
Back
Top