why do nvidia cards fold so much better than ati?

NVidia and ATI cards both have what we refer to as shaders, however, the way that NV cards do their computing is differant than how the ATI cards do, hence the large numbers on the ATI cards vs Nvidia cards

The client for the NV cards is just more efficient, its better written, and at the end of the day, despite all the differences in the 2 architecture (killed the sp on that one) it comes down to the NV client is better written than the ATI one

we can only hope that with the rumored release of the next GPU core in the coming months this issue is fixed, but until then, NV will stay on top

Thats the laymens version, im sure someone could get more in depth
 
NVidia and ATI cards both have what we refer to as shaders, however, the way that NV cards do their computing is differant than how the ATI cards do, hence the large numbers on the ATI cards vs Nvidia cards

The client for the NV cards is just more efficient, its better written, and at the end of the day, despite all the differences in the 2 architecture (killed the sp on that one) it comes down to the NV client is better written than the ATI one

we can only hope that with the rumored release of the next GPU core in the coming months this issue is fixed, but until then, NV will stay on top

Thats the laymens version, im sure someone could get more in depth

Much better than your flawed guess ;) :
http://forum.beyond3d.com/showthread.php?t=50539
 
It's not a flawed guess at all. He is right in saying that the difference is due to the fact that the ATI client is poorly optimized. It is designed for the HD2000 and HD3000 series and hasn't been updated to take advantage of architectural improvements in the 4000 and 5000 series that could lead to significant increases in performance. They are also using Brook+ and have not transitioned to OpenCL yet which should give another boost in performance once it's fully supported on ATI cards simply because Brook+ seems to be much more difficult to work with.

So to make a long story short, it's basically because the necessary resources and manpower to properly optimize the ATI client haven't been dedicated to the task, either because the project doesn't have the resources or because they are needed elsewhere.
 
Here's why:
ATi has several big traditional bruteforce processing (for lack of a better word) units, and lott 'o tiny MADD (math) units. All are counted as shaders.

nVidia has many bruteforce processing units, and the MADD units are just aren't counted as shader cores.
That is why you always have to divide the ATi SP count by 5 to get a better reading of actuall useable power (the way nVidia counts it).

Simply put, the nVidia architecture is just faster. The ATi cards (dunno about the current generation) is slower.

How DX11/DirectCompute may change this, we don't know. But right now, the new ATi architecture is actually limited by the current GPU client, as DX11 adds two new shader operations (on top of Geometry, Pixel, and Vertex), and that may change things - however, I'm not betting on it. I still belive, based on the limited performance gain compared to what is out there, and ATi's stated concentration on Graphics, any current nVidia GPU will probably still outdo any optimised ATi client, unless if F@H can run like a video game (resource wise), and not as a parrell processing application.

Dunno if that made sense.
 
Also, didn't ATi shift off of the Brook+ a while back? I thought it died with the R600 generation?
 
It's not a flawed guess at all. He is right in saying that the difference is due to the fact that the ATI client is poorly optimized. It is designed for the HD2000 and HD3000 series and hasn't been updated to take advantage of architectural improvements in the 4000 and 5000 series that could lead to significant increases in performance. They are also using Brook+ and have not transitioned to OpenCL yet which should give another boost in performance once it's fully supported on ATI cards simply because Brook+ seems to be much more difficult to work with.

So to make a long story short, it's basically because the necessary resources and manpower to properly optimize the ATI client haven't been dedicated to the task, either because the project doesn't have the resources or because they are needed elsewhere.

That is just a lame excuse for an architectual problem:
http://theovalich.wordpress.com/2008/11/04/amd-folding-explained-future-reveale/
http://foldingforum.org/viewtopic.php?f=51&t=10442&start=0#p103025

Like I have said before...NVIDIA is generations ahead of AMD on GPGPU.
 
That is just a lame excuse for an architectual problem:
http://theovalich.wordpress.com/2008/11/04/amd-folding-explained-future-reveale/
http://foldingforum.org/viewtopic.php?f=51&t=10442&start=0#p103025

Like I have said before...NVIDIA is generations ahead of AMD on GPGPU.

You didn't say that, you linked to a flame war on B3D (Charlie, that ahole is believed to get most of his info from there).

Besides, I think my earlier post describes it clearly. (no lost love;)).
 
You didn't say that, you linked to a flame war on B3D (Charlie, that ahole is believed to get most of his info from there).

Besides, I think my earlier post describes it clearly. (no lost love;)).

The links can be found in my first link...but I guess people are getting more and more lazy...
 
That is just a lame excuse for an architectual problem:
http://theovalich.wordpress.com/2008/11/04/amd-folding-explained-future-reveale/
http://foldingforum.org/viewtopic.php?f=51&t=10442&start=0#p103025

Like I have said before...NVIDIA is generations ahead of AMD on GPGPU.
I think it's funny that you would even imagine that you have read more about this subject than I have and that you assume that there are any links you can post that I haven't already seen and read long ago.

The fact is that nothing you have posted contradicts my statements in any way, and you are misinterpreting the currently flawed implementation of the F@H GPU client on ATI video cards as an issue with the architecture, when in reality it is merely a matter of the client not being properly designed to take advantage of the strengths of ATI's different GPU design.

As evidenced by the fact that ATI GPUs take less of a performance hit when performing calculations involving larger proteins, ATI's architecture actually has more brute force power than nVidia's architecture, since most of the calculations in question are simple MADD operations rather than the other transcendental operations that only 20% of ATI's SPs are capable of handling. The problem is that the ATI cores are still stuck in "R600 mode" as it is called by some, and it doesn't make use of the LDS that was added to RV770 which would mitigate many of the "calculate twice" issues that currently plague the client.

Another issue is that the system used to benchmark ATI workunits is using an RV670 GPU, so the points allocation is also geared towards people with those cards and not newer RV770 and RV870 GPUs which are much improved when it comes to GPGPU applications.
 
I think it's funny that you would even imagine that you have read more about this subject than I have and that you assume that there are any links you can post that I haven't already seen and read long ago.

The fact is that nothing you have posted contradicts my statements in any way, and you are misinterpreting the currently flawed implementation of the F@H GPU client on ATI video cards as an issue with the architecture, when in reality it is merely a matter of the client not being properly designed to take advantage of the strengths of ATI's different GPU design.

As evidenced by the fact that ATI GPUs take less of a performance hit when performing calculations involving larger proteins, ATI's architecture actually has more brute force power than nVidia's architecture, since most of the calculations in question are simple MADD operations rather than the other transcendental operations that only 20% of ATI's SPs are capable of handling. The problem is that the ATI cores are still stuck in "R600 mode" as it is called by some, and it doesn't make use of the LDS that was added to RV770 which would mitigate many of the "calculate twice" issues that currently plague the client.


oopps!

I really need to get data on the new ATi GPUs. Guess I'l hold off buying that gts250, for now.
 
I think it's funny that you would even imagine that you have read more about this subject than I have and that you assume that there are any links you can post that I haven't already seen and read long ago.

The fact is that nothing you have posted contradicts my statements in any way, and you are misinterpreting the currently flawed implementation of the F@H GPU client on ATI video cards as an issue with the architecture, when in reality it is merely a matter of the client not being properly designed to take advantage of the strengths of ATI's different GPU design.

As evidenced by the fact that ATI GPUs take less of a performance hit when performing calculations involving larger proteins, ATI's architecture actually has more brute force power than nVidia's architecture, since most of the calculations in question are simple MADD operations rather than the other transcendental operations that only 20% of ATI's SPs are capable of handling. The problem is that the ATI cores are still stuck in "R600 mode" as it is called by some, and it doesn't make use of the LDS that was added to RV770 which would mitigate many of the "calculate twice" issues that currently plague the client.

No amount of "optimizing" drivers/software can create the cache AMD GPU's lack and NVIDIA's GPU have. ..which means that AMD GPU's have to do more work (due to being unable to store the data) compared to NVIDIA GPU's...like it or not.
 
No amount of "optimizing" drivers/software can create the cache AMD GPU's lack and NVIDIA's GPU have. ..which means that AMD GPU's have to do more work (due to being unable to store the data) compared to NVIDIA GPU's...like it or not.
The problem isn't that ATI GPUs can't store "enough" data, it's that they aren't storing "any" data at all right now since F@H doesn't use the LDS. And a single step of a single GPU workunit doesn't require a particularly large amount of data storage, especially not with the small proteins that are currently being used for most of the workunits that are in the wild right now. Each shader unit (set of 4 standard FPUs and one special-function unit) has a 16KB LDS in RV770 and 32KB in RV870, which is more than enough to give a significant performance boost to overall work production speed.
 
Thank you for defending me Zero, I couldnt think of anyone Id rather have on my side :)

/hiding now
 
Back
Top