FXAA Performance - not as good as MSAA on lower-end GPUS?

Medion

[H]ard|Gawd
Joined
Mar 26, 2012
Messages
1,584
Forgive me for not posting this in the FXAA quality comparison thread, but this is more for performance comparisons.

On my desktop, I noticed a huge difference in performance between 4xMSAA and FXAA. This was further corroborated by posts here, elsewhere, and [H]ardOCP's in-depth performance comparison. However, I enabled it on my laptop for World of Warcraft (C2D 2.4ghz, GeForce GT 130M 1GB) and thought the performance was a little weird. So, I did some comparisons. Without going into too much detail, across all tests the ONLY differences were the AA settings. However, I'm not a professional, so take this with a grain of salt. Results will be given in min/max - avg frame rates.

No AA (baseline)
79/133 - 110.133
Framerates seem pretty high, but I had most of the settings dialed down lower than I'd play it (and vsync off of course). For this comparison I wanted to remove some bottlenecks.

4x MSAA (in-game setting)
60/87 - 71.95
Game is still more than playable despite the high performance hit.

FXAA (NV CP forced)
47/79 - 56.95
While still very playable, performance takes a brutal hit, not only over baseline, but in comparison to 4xMSAA as well. Due to this, I decided to try one more run for comparative purposes...

4xMSAA (in-game) + TrSSAA (NV CP forced)
61/91 - 72.933

You've go to be shitting me!? No performance hit over 4xMSAA? Well, WoW isn't a VRAM intensive game, and SSAA uses VRAM more than memory bandwidth, so this makes sense. Bottom line is that in this case, you can get better visual quality AND performance by NOT using FXAA. So, why is this? Well, since FXAA is shader-based while MSAA is memory bandwidth intensive, obviously memory bandwidth wasn't the bottleneck. There are a lot of cheap GPUs out there (like mine) that come with larger amounts of RAM, sometimes even GDDR5, just to make them more appealing to a casual gamer. In these cases, the cards will be bottlenecked far before they ever need that amount of memory or the associated bandwidth. So, that GT 640 that just came out might suffer with either FXAA or MSAA. But, if there is a GDDR5 version released, we can pretty much expect MSAA to be fine and FXAA to hurt performance more.

I'm going to run these tests tomorrow on my desktop when I get it back up and running (waiting on my monitor which should arrive then). In the meantime, seems that FXAA wasn't the performance silver bullet we all thought it was. It still takes resources that may or may not be more limited than memory bandwidth. If you have a newer, mid-range or higher GPU, FXAA will still likely consume far less resources than MSAA. For budget card though, FXAA may be a waste.
 
Makes sense that a shader effect like FXAA would be crippled on a GPU with a very low shader count. The GT640 with its 384 shaders will probably show much different results.
 
Makes sense that a shader effect like FXAA would be crippled on a GPU with a very low shader count. The GT640 with its 384 shaders will probably show much different results.

Probably. While Kepler's shader units are supposedly less capable than Fermi's (and prior generations), my GT130m only has 32 shader units, or cuda cores. The 384 on the GT640 should offer significant improvements. Also, seems the 1GB RAM on my GT130m is actually only 500mhz DDR2!

Kind of wish I still had the 9600GT to test this out. Wonder how it would look with 64 shader units and GDDR3 memory in this comparison. Since it only had 512MB of RAM, the TrSSAA always did have a noticeable performance hit.
 
Finally got around to testing it on my main system; Core2Quad Q6600 @ 2.4ghz, EVGA GeForce GTX 560 OC 1GB, 4GB RAM. Basically, not a very impressive system but more than adequate for my needs. Did the same test in World of Warcraft, then through in the Batman AA benchmark for something more consistent. Results below: (again, min/max-avg)

World of Warcraft - all ultra/maxed out settings, 1080p
No AA - 39/85 - 58.867
FXAA (NV CP forced) - 37/86 - 58.067
4xMSAA (in-game) - 42/69 - 55.5
4xMSAA + TrSSAA - 38/75 - 54.733
16xCSAA + TrSSAA - 37/72 - 53.417

In the above "benchmark" I took what is a somewhat demanding flight path.Min/Max framerates were inconsistent due to things outside of my control, but the average framerates told an easy to understand story, which is that neither shading nor memorymemory bandwidth are really taxed in WoW. As I'm primarily CPU limited, performance differences were minor in all cases.

Batman: Arkham Asylum - 1080p, all settings maxed, including PhysX
No AA - 19/95 - 59
FXAA (NV CP forced) - 25/86 - 56
4xMSAA - 28/90 - 56
4xMSAA + TrSSAA - 26/86 - 55
16xCSAA + TrSSAA - 18/86 - 56

Likely another case of being CPU limited, as the results were similar throughout. So, what do my benchmarks prove, aside from me needing a new CPU? I feel that [H]ardOCP's results show what can happen when you're in higher end games with higher end hardware at ultra-high resolutions. In these cases, you become limited by memory and memory bandwidth, which results in MSAA causing a massive performance hit. In these cases, you're not starved for shading power, which means that FXAA will yield better performance. However, dial-it back a notch to a system where memory bandwidth and available VRAM aren't issues, and in these cases MSAA usually has a lower or comparable performance hit to FXAA.

FXAA looks better than no AA, but it's a huge tradeoff. I've found that 4xMSAA (or 16xCSAA if it works) + TrSSAA to offer SIGNIFICANTLY better visual quality while still being solid for performance. It's just not a good combo for those who want to drive 120hz monitors on one GPU :)
 
One thing to understand as well is that the injection approach to FXAA (which is essentially what NVIDIA does) doesn't yield particularly good results. Developers who are able to implement FXAA directly into their renderer can do so at the stage where other full-frame post-processes are being done and spare the considerable performance hit. In some cases, depending on the extent of the post-processing and the GPU, FXAA can be free or very nearly free. When no full-frame post-process effects are being done, FXAA performance can be worse than 4xMSAA performance even on higher-end-ish GPUs.

If you like FXAA and want the best performance from it, reach out to developers and ask them to implement it as Lottes suggests.
 
Back
Top