GF100 (Fermi) explained

I am going to disagree with you there. DX11 does actually bring something to the table that justifies it. (let me agree with you on 10, 10.1 could have if everyone went with it) I agree that a lot of companies are going to focus on the larger game markets but trends come and go and console platforms do not stay stagnant. tessellation alone (I think some already use it?) make it a candidate for the next Xbox. also Direct compute, OpenCL and the still viable PC game market are going to be pushing this. probably not this coming year but with lambree and other factors I think it is coming. OpenCL is awesome but if it ends up like OpenGL MS will be setting the standard I think.

Also I think right now we are looking at a case where PC gaming is going to distinguish itself from console gaming, hell look at eyefinity. there is no console that can do that. and Nivia's up and coming may well be able to do all kinds of special effects (I am hoping for working AO driver override) to say that PC gaming is dead is short sighted I think.

JM2C


I don't disagree that directx 11 will be important for the next generation of consoles. But until then it won't have any impact. No pc game is going to make full use of it. Nvidia loses nothing in acutality by not having mainstream directx 11 parts until they get into a console, sad but true.

The biggest challenge for nvida is not intel, it's not Amd, it's microsoft, who has singlehandly taken apart the pc gaming space. for the first time i can remember, pc game hardware is so far ahead of pc games it's insane. except for the e-penis there is no reason to upgrade if you already have a 8800gtx level or higher card.

AMD is not wrong either, making low cost efficient cost card is also one way to go, your business won't grow, but it's stable, and given AMD's position, stable is the way to go.

Nvidia on the other hand, is very very very wealthy, no debt, so they should be looking for growth.

So i think both companies acted in thier best long term interest as best they good, based on imperfect information.
 
Thanks for the link ;) BTW GTX295=223.8GB/s
111.9GB/s to be exact (Both GPUs have their own memory banks, but data in those banks are always identical..also data sent trough their own memory channels are therefore identical
--->
111.9GB/s is the effective memory bandwidht for GTX 295
 
All the people who don't want to wait nobody is asking you to. Go buy what you need now. So what's the point of the same 2 or 3 people polluting every Fermi thread with their noise and BS? Is it out of fear? Do you fear Fermi? :p
 
All the people who don't want to wait nobody is asking you to. Go buy what you need now. So what's the point of the same 2 or 3 people polluting every Fermi thread with their noise and BS? Is it out of fear? Do you fear Fermi? :p

3 billion trannies are not something that should be ignored ;)
 
I think the numbers NVIDIA's provided so far are intriguing. It's impossible to gauge whether GF100 will make a great gaming chip (or at least as good a gaming chip as RV870, which would be fine with me), but the supposed increase in compute performance is pretty exciting, to say the least.

My only concerns at this point center on the more practical aspects: power consumption, heat output, cooler efficiency and cost. If NVIDIA does a reasonably good job covering those bases, I'm sold.
 
Not sure what to make of your first post Ron, pretty much everyone on this site knows specs alone does not make an excellent product. And to me it looks like the Nvidia fans are worried cause they seem to be referring less and less to games and focusing on other area's of the market. And if I was them I would be worried, within a year or two intel will be up and running with there videocards. And NV has no X86 license to speak of.
 
Not sure what to make of your first post Ron, pretty much everyone on this site knows specs alone does not make an excellent product. And to me it looks like the Nvidia fans are worried cause they seem to be referring less and less to games and focusing on other area's of the market. And if I was them I would be worried, within a year or two intel will be up and running with there videocards. And NV has no X86 license to speak of.
If NVIDIA fans should be worried, then we should all be worried.
 
At this point the best thing Nvidia can do is to spend all their R&D resources on developing a time machine to go back to 1978 and sabotaging Intel just before they introduced the 8086. That way they wouldn't have to deal with all this bullshit now.
 
Not sure what to make of your first post Ron, pretty much everyone on this site knows specs alone does not make an excellent product. And to me it looks like the Nvidia fans are worried cause they seem to be referring less and less to games and focusing on other area's of the market. And if I was them I would be worried, within a year or two intel will be up and running with there videocards. And NV has no X86 license to speak of.

You're right, specs alone don't make an excellent product, but these specs do say a lot.

I'm sure your right and some Nvidia fans are worried. Me? I know I have no control over what billion dollar companies decide to do, so I just buy what I think is the best for my buck.

As far as "should" be worried, I guess time will tell. Overall, I think Nvidia has made better hardware for video cards than Ati, and typically the performance backs that statement up. It looks like they are shifting some R and D to other aspects, so logic would dictate they they may not stay on top with kick ass hardware (gaming wise), but I don't think I'm worried that they are going to be producing inferior gaming chips.

The license issue looks to be a thorn in their side for sure, but who knows? They could sink, they could become mediocre, things could get better because there will be three companies to choose from instead of two.

I just don't see a company that's done this good so far in regards to hardware development, and financial management rolling over and dying.

But who knows....

cheers
 
Hmm the probably of people whining about x86 really makes no difference. Nvidia wants to replace the CPU not become the new CPU, x86 is an architecture that is solely owned and licensed out by intel so in the end intel wins a lot of money by allowing AMD to use it. Now the interesting part is that intel owns x86 and AMD owns x86_x64 or amd64 as it ends up being called so the two companies are in part somewhat linked. So lets not dwell on how nvidia could screw themselves and dwell more on how awesome this card could be.
 
I think the numbers NVIDIA's provided so far are intriguing. It's impossible to gauge whether GF100 will make a great gaming chip (or at least as good a gaming chip as RV870, which would be fine with me), but the supposed increase in compute performance is pretty exciting, to say the least.

What the GF100 can do in gaming, haven't been revealed yet from Nvidia.I'm sure the results will be good. :)

What I want to know, is what the GF100 can bring us (consumers) with its new cGPU/GPGPU design? What can it do, that cannot be achieved on the CPU and GPU's we already have (as example on opencl and dx11)?

Intel (Larrabee) and AMD(Bulldozer) have been working for years with this. Nvidia have succeeded and will be the first one out. What should be on everyone's lips are: "What does it give us!?!" How will it perform vs. having a seperate GPU and CPU? :)
 
Wild thought

maybe NV should make a cGPU for general purpose, and a GPU for gaming, perhaps much like their current Quadro/GeForce seperation, or could go even more low-level hardware on the seperation
 
What I want to know, is what the GF100 can bring us (consumers) with its new cGPU/GPGPU design? What can it do, that cannot be achieved on the CPU and GPU's we already have (as example on opencl and dx11)?
Well, I have a feeling something like FLACuda might run up to five times faster on GF100 than it does on an i5 or i7. If that's the case, then there may be the capability to speed up more FP-heavy tasks (like video encoding) by a factor of maybe four or five times over the current generation. Developers really aren't scratching the surface of what CUDA can provide because the hardware we have today isn't usually faster than high-end CPUs, so there's not much incentive to bother with it. If that changes, and we see a two, three or four-fold increasing in computational performance out of GF100, then the whole computing environment itself is likely to change. Not immediately, but probably pretty quickly.

We'll do the same stuff we're doing on CPUs, only we'll be doing it quicker.
 
Well, I have a feeling something like FLACuda might run up to five times faster on GF100 than it does on an i5 or i7. If that's the case, then there may be the capability to speed up more FP-heavy tasks (like video encoding) by a factor of maybe four or five times over the current generation. Developers really aren't scratching the surface of what CUDA can provide because the hardware we have today isn't usually faster than high-end CPUs, so there's not much incentive to bother with it. If that changes, and we see a two, three or four-fold increasing in computational performance out of GF100, then the whole computing environment itself is likely to change. Not immediately, but probably pretty quickly.

We'll do the same stuff we're doing on CPUs, only we'll be doing it quicker.

Thanks. :) But, opencl is capable of doing the same on a lets say I7 with a GTX285. I'm sure that this could be the case with CUDA on a I7 with a GTX285 as well. What makes the GF100 better? (then Intels top CPU coupled with a top standalone GPU like the GTX285)

And, won't any major program with GPGPU be run on more hardware agnostic API's like DX11 (directXcompute) and Opencl anyway?
 
Last edited:
20091001fermi6.jpg
 
111.9GB/s to be exact (Both GPUs have their own memory banks, but data in those banks are always identical..also data sent trough their own memory channels are therefore identical
--->
111.9GB/s is the effective memory bandwidht for GTX 295

Sigh, no. Bandwidth scales. It's capacity that doesn't.
 
Probably, but not necessarily linearly. Didn't Nvidia reuse the same heatspreader for GT200 and GT200b?
 
Sigh, no. Bandwidth scales. It's capacity that doesn't.
If those memory channels carry that identical data..then how does that scaling happens? Those channels can transfer that 233GB/s of data, but only 111.9GB/s of different data.
 
True but it's got to scale. You're telling me bigger, hotter chip =/= larger physical heatspreader?

not necessarily, there are (as I understand) both pros and cons to doing so. but I was also expecting bigger. they are claiming though that it will not be outside the power envelop of the current GTX200 series, if true then they really would not need a bigger one. that would be neat in itself if true.
 
If those memory channels carry that identical data..then how does that scaling happens? Those channels can transfer that 233GB/s of data, but only 111.9GB/s of different data.

They do not carry identical data. Each GPU is rendering a different frame, with different geometry. And even if they were transferring identical data it would not affect the bandwidth calculation. The definition of bandwidth has nothing to do with whether the data is different or not. In any case that's irrelevant as different GPUs are doing different tasks in parallel, hence bandwidth is doubled.

The easiest way to understand this is to ask yourself - how much data would one GPU have to transfer to render two frames. That's the same amount of data that two GPUs transfer in parallel in half the time.
 
GPU1 has 896mb with a 448 bit bus
GPU2 has 896 with 448 bit bus also

the data is replicated, they are rendering the same textures and shit so it loads everything in both memories ... but GPU1 has only access to its memory pool, same for GPU2
And that memory pool can only communicate at 111.9GB/s to the GPU its associated

(this is as i understand)
 
GPU1 has 896mb with a 448 bit bus
GPU2 has 896 with 448 bit bus also

the data is replicated, they are rendering the same textures and shit so it loads everything in both memories ... but GPU1 has only access to its memory pool, same for GPU2
And that memory pool can only communicate at 111.9GB/s to the GPU its associated

(this is as i understand)

That's incorrect. Trinibwoy is correct.

With multi-GPU, the data is different in each GPUs memory for SLI or Crossfire. Its as if you had the frames back to back on the same single GPU (different memory contents) but instead you have 2 GPUs and they are actually processed simultaneously.

Basically what you're saying is like saying dedicated caches for each arbitrary type of processor core are always kept identical to one another, which would defeat the purpose of having a dedicated cache for each core in the first place...
 
You mean the Ati fanboys are starting to sweat a little? The 5870 isn't even the fastest card out, and this card looks like it's going to eat it for breakfast. Maybe not, but with those specs......wow....

Then again I might just have one of each card.....eyefinity is really really amazing.

Do you understand GPU architectures? Because if you did you wouldn't hold this opinion.

GF100 has a peak theoretical computational performance figure of ~ 1.25TFLOPs for Single Precision and 624GFLOPs for Double Precision. (how did I get to this number you ask? GT200 had a rate of 78GFLOPS for DP and nVIDIAs CEO was quoted saying GF100 has 8x that Peak figure for a total of 624GFLOPs. If you understand that GF100 does DP at half of the SP rate then you understand that multiplying the 624GFLOP DP figure gives you the GFLOP SP figure. You can catch that here: http://www.pcper.com/article.php?aid=789 at the bottom). This means the clock speed is likely to be ~610MHz with a Shader Clock of ~ 1220MHz.

Most folks do not know this but GT200 (GTX 280) actually has a peak theoretical computational performance figure of 622GFLOPs (not 933GFLOPs which was derived under the false assumption of a missing MUL which has now been pulled out of the SFU see here: http://www.anandtech.com/video/showdoc.aspx?i=3651&p=3 Quote: "In addition to the cores, each SM has a Special Function Unit (SFU) used for transcendental math and interpolation. In GT200 this SFU had two pipelines, in Fermi it has four. While NVIDIA increased general math horsepower by 4x per SM, SFU resources only doubled. The infamous missing MUL has been pulled out of the SFU, we shouldn’t have to quote peak single and dual-issue arithmetic rates any longer for NVIDIA GPUs.").

RV870 has a peak computational performance rate of 2.72TFLOPs for Single Precision and 544GFLOPs for Double Precision.

Therefore for Double Precision workloads GF100 has the upper hand. Now when it comes to games you're relying on SP loads as well as RBE, Memory Bandwidth and TMU performance mainly.

384bit GDDR5 of GF100 is wider than the 256bit GDDR5 bus of the RV870. This is an area where GF100 has a clear advantage.

All that is left is TMU and RBE designs for us to get a clear picture of how things will pan out. Games are moving towards more compute heavy loads (DirectX11, Direct Compute 11 and OpenCL). It's fair to say that things will likely end up quite close if there aren't any other large architectural design changes in the Texture Mapping Unit and Render Back End units. If things are close price/performance will likely be the deciding factor. GF100 is an enormous design (My memory isn't the best but I think I read something to the tune of 1 Billion more transistors than RV870). This means that just as before, AMD will be able to compete quite easily with pricing while nVIDIA will struggle in that area.
 
Do you understand GPU architectures? Because if you did you wouldn't hold this opinion.

GF100 has a peak theoretical computational performance figure of ~ 1.25TFLOPs for Single Precision and 624GFLOPs for Double Precision. (how did I get to this number you ask? GT200 had a rate of 78GFLOPS for DP and nVIDIAs CEO was quoted saying GF100 has 8x that Peak figure for a total of 624GFLOPs. If you understand that GF100 does DP at half of the SP rate then you understand that multiplying the 624GFLOP DP figure gives you the GFLOP SP figure. You can catch that here: http://www.pcper.com/article.php?aid=789 at the bottom). This means the clock speed is likely to be ~610MHz with a Shader Clock of ~ 1220MHz.

Most folks do not know this but GT200 (GTX 280) actually has a peak theoretical computational performance figure of 622GFLOPs (not 933GFLOPs which was derived under the false assumption of a missing MUL which has now been pulled out of the SFU see here: http://www.anandtech.com/video/showdoc.aspx?i=3651&p=3 Quote: "In addition to the cores, each SM has a Special Function Unit (SFU) used for transcendental math and interpolation. In GT200 this SFU had two pipelines, in Fermi it has four. While NVIDIA increased general math horsepower by 4x per SM, SFU resources only doubled. The infamous missing MUL has been pulled out of the SFU, we shouldn’t have to quote peak single and dual-issue arithmetic rates any longer for NVIDIA GPUs.").

RV870 has a peak computational performance rate of 2.72TFLOPs for Single Precision and 544GFLOPs for Double Precision.

Therefore for Double Precision workloads GF100 has the upper hand. Now when it comes to games you're relying on SP loads as well as RBE, Memory Bandwidth and TMU performance mainly.

384bit GDDR5 of GF100 is wider than the 256bit GDDR5 bus of the RV870. This is an area where GF100 has a clear advantage.

All that is left is TMU and RBE designs for us to get a clear picture of how things will pan out. Games are moving towards more compute heavy loads (DirectX11, Direct Compute 11 and OpenCL). It's fair to say that things will likely end up quite close if there aren't any other large architectural design changes in the Texture Mapping Unit and Render Back End units. If things are close price/performance will likely be the deciding factor. GF100 is an enormous design (My memory isn't the best but I think I read something to the tune of 1 Billion more transistors than RV870). This means that just as before, AMD will be able to compete quite easily with pricing while nVIDIA will struggle in that area.


Look at folding @ home.
There 2 AMD FLOPS = 1 NVIDIA FLOP in effective performance.

The real question is then:
Why do AMD need 2x FLOPS to match NVIDIA's performance?
http://forum.beyond3d.com/showthread.php?t=50539

Do we need to deucate you that pure FLOP numbers are a joke, like you need education in not making deceitful Youtube videos claming CPU = GPU in performance?
Try running the ingame benchmark with "your" CPU hack and post the numbers...
 
Look at folding @ home.
There 2 AMD FLOPS = 1 NVIDIA FLOP in effective performance.

The real question is then:
Why do AMD need 2x FLOPS to match NVIDIA's performance?
http://forum.beyond3d.com/showthread.php?t=50539

Do we need to deucate you that pure FLOP numbers are a joke, like you need education in not making deceitful Youtube videos claming CPU = GPU in performance?
Try running the ingame benchmark with "your" CPU hack and post the numbers...

No.

I've explained that to you before but you have a thick head.

The performance difference in Folding@Home had to do with GT200s ability to access protected memory as a temporary software cache. This is something RV770 lacked and it just so happens something that helped in that particular application/instance (F@H).

In fact you posted the links which proved my assertion right here: http://foldingforum.org/viewtopic.php?f=51&t=10442

and here: http://www3.interscience.wiley.com/cgi-bin/fulltext/121677402/HTMLSTART

I'll explain. GT200 has an ability to access protected memory in software which is used as a temporary cache. When an error occurs, GT200 can revert back to the previous calculations and continue from there. RV770 did not have this ability. Therefore when an error occurred, RV770 had to simply start all over again. This explain why RV770 is doing more FLOPs than GT200 despite having a lower output.

Here is a quote:

According to the developer, the peak computing power of the RV870 is as high as 2.7 teraflops in single-precision mode (FP32) and 544 gigaflops in double precision mode (FP64) which is used for most serious computing tasks. A special mention must be made of the ability to execute threads in protected memory sections which makes it easier to transfer code originally developed for the classic CPU to the GPGPU platform. All these innovations in the RV870’s computing section make it a perfect choice for GPGPU, especially in comparison with Nvidia’s solutions whose double-precision performance is far from ideal.
Taken from: http://www.xbitlabs.com/articles/video/display/radeon-hd5870_3.html

This is a software option therefore a new AMD Folding@Home GPU client would need to be written to take advantage of the changes with RV870.

Are you still in denial?
 
No.

I've explained that to you before but you have a thick head.

The performance difference in Folding@Home had to do with GT200s ability to access protected memory as a temporary software cache. This is something RV770 lacked and it just so happens something that helped in that particular application/instance (F@H).

In fact you posted the links which proved my assertion right here: http://foldingforum.org/viewtopic.php?f=51&t=10442

and here: http://www3.interscience.wiley.com/cgi-bin/fulltext/121677402/HTMLSTART

I'll explain. GT200 has an ability to access protected memory in software which is used as a temporary cache. When an error occurs, GT200 can revert back to the previous calculations and continue from there. RV770 did not have this ability. Therefore when an error occurred, RV770 had to simply start all over again. This explain why RV770 is doing more FLOPs than GT200 despite having a lower output.

Here is a quote:

Taken from: http://www.xbitlabs.com/articles/video/display/radeon-hd5870_3.html

This is a software option therefore a new AMD Folding@Home GPU client would need to be written to take advantage of the changes with RV870.

Are you still in denial?

Yea, AMD's GPGPU are generations behind NVIDIA's...and again are you still trying to use raw FLOPS numbers? :rolleyes:
What is next...the consoles are "supercompters" because the have l33t "FLOPS"?:rolleyes:

And you "forgot" to post your ingame benchmark numbers?
 
Yea, AMD's GPGPU are generations behind NVIDIA's...and again are you still trying to use raw FLOPS numbers? :rolleyes:
What is next...the consoles are "supercompters" because the have l33t "FLOPS"?:rolleyes:

And you "forgot" to post your ingame benchmark numbers?

What are you talking about? You lose an argument (because you clearly have no clue what you're talking about) and now you want to discuss the PhysX hack? Why are you attempting to change the topic?

GPGPU has yet to take off. The deciding factors are now in play (Direct Compute 11 and OpenCL).

Do you even know what a FLOP is? FLoating point Operations Per Second.."The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to the older, simpler, instructions per second.". It is THE theoretical performance measurement figure for hardware. We're not talking MHz here. Damn I'll link you to Wikipedia to make it easy on you: http://en.wikipedia.org/wiki/FLOPS. The peak amount of work that can be done each second. RV870 displays an astounding 2.72TFLOPs in Single Precision.

You have no argument here. RV870s is a Computational Monster. You could argue that AMD hasn't placed many resources in the development of their GPGPU tool-set (and you would be correct) but to mock higher Floating Point Operations Per Second is ridiculous and shows that you really haven't got a clue.

Are there other limiting factors? Yes there are. Cache is one of them and GF100 comes with a large shared L2 Cache (768KB I believe). There is also how the software is written and how it utilizes the Computational Performance (Folding@Home being the prime example of that). RV870 is a SuperScalar design, that's a fancy way of saying that it is a highly threaded design. With anything highly threaded you need software that is coded to take advantage of this. Any performance limitations you may be insinuating are primarily caused by this. nVIDIA, on the other hand, chose a Scalar design. Scalar designs are more simple but not as powerful. When running several simple calculations they can be superior to SuperScalar designs which rely on more complex calculations.

AMD have fixed several of their GPGPU shortcomings with the RV870 (one which I highlighted in my post above). They also seem to be slowly retiring Brook+ in exchange for OpenCL and Direct Compute 11 (a wise move IMHO).

Before you post, make sure you know what you're talking about.
 
Last edited:
What are you talking about? You lose an argument (because you clearly have no clue what you're talking about) and now you want to discuss the PhysX hack? Why are you attempting to change the topic?

No I simply caompred you erroneous about the mening of RAW FLOPS with you erroneous claim about CPU PhysX.

GPGPU has yet to take off. The deciding factors are now in play (Direct Compute 11 and OpenCL).

Yeah, let forget all about CUDA...
Or about the dud formerly known as Brook...Like I said:
AMD is generations behind NVIDIA here.

Do you even know what a FLOP is? FLoating point Operations Per Second.."The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to the older, simpler, instructions per second.". It is THE theoretical performance measurement figure for hardware. We're not talking MHz here. Damn I'll link you to Wikipedia to make it easy on you: http://en.wikipedia.org/wiki/FLOPS. The peak amount of work that can be done each second. RV870 displays an astounding 2.72TFLOPs in Single Precision.

Show me any real world lead in application due to the x2 FLOPS over NVIDIA.
It's as simple as that to debunk your notion that RAW FLOPS means anything on diffrent architectures.

You have no argument here. RV870s is a Computational Monster. You could argue that AMD hasn't placed many resources in the development of their GPGPU tool-set (and you would be correct) but to mock higher Floating Point Operations Per Second is ridiculous and shows that you really haven't got a clue.

Oh really?
Like the R700 has more FLPS than the GT200...but still looses out when it comes to the practical application?

Are there other limiting factors? Yes there are. Cache is one of them and GF100 comes with a large shared L2 Cache (768KB I believe). There is also how the software is written and how it utilizes the Computational Performance (Folding@Home being the prime example of that). RV870 is a SuperScalar design, that's a fancy way of saying that it is a highly threaded design. With anything highly threaded you need software that is coded to take advantage of this. Any performance limitations you may be insinuating are primarily caused by this. nVIDIA, on the other hand, chose a Scalar design. Scalar designs are more simple but not as powerful. When running several simple calculations they can be superior to SuperScalar designs which rely on more complex calculations.

AMD have fixed several of their GPGPU shortcomings with the RV870 (one which I highlighted in my post above). They also seem to be slowly retiring Brook+ in exchange for OpenCL and Direct Compute 11 (a wise move IMHO).

The R800 is basicly a refresh of the R700 series, if you want to talk about a GPGPU architecture, you should really look closer a "fermi"....
Like I said before:
All those FLOPS means nothing if you need 2 x the FLOPS to the the same workload.

Before you post, make sure you know what you're talking about.

Talking to your self isn't good argumentation.

Remember the PR-FUD about consoles and FLOPS?:
http://groups.google.com/group/alt.games.video.sony-playstation2/msg/62ff83d96ea78ea9?hl=en

Do I need to highlight the section talking about FLOPS...or can you digest that on your own?
 
Back
Top