Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
The anandtech review has more regarding GPGPU although no CUDA tests.
Has anyone found a good review comparing CUDA performance to the GTX580/590? Once I get rolling on CUDA development I'd like to upgrade to something beefier than my 8800.
some kind of a fan-boi, much?Those rumors are based on one crappy openCL benchmark done by someone known to be an idiot. Which is a totally different animal to CUDA, as one is cared about, the other not so much. Would like to see some proper benchmarks too! It's very very unlikely they've stripped out all the GPGPU parts, because otherwise it would have no physx/CUDA.
Would also like to see some folding@home benchmarks whenever possible. Though I don't think the fermi core would work on it yet.
Kepler is great............ at games!
And that's all most of the people buying these cards cares about.
Rumors are that it will perform worse (perhaps much worse) on GPGPU stuff because it was originally intended to be mid-range and nVidia stripped out all the GPGPU to give it better gaming performance.
I haven't seen comprehensive data to back this up yet though. If you're looking for CUDA and only CUDA I would wait until the 580 prices drop substantially and then get one of those.
Thank you for sharing, I was looking for this information.Memory Copy
Host Pinned to Device: 6231.78 MB/s
Host Pageable to Device: 4091.38 MB/s
Device to Host Pinned: 6257.27 MB/s
Device to Host Pageable: 3997.35 MB/s
Device to Device: 72118.9 MB/s
GPU Core Performance
Single-precision Float: 2.01397e+06 Mflop/s
Double-precision Float: 143681 Mflop/s
32-bit Integer: 573612 Miop/s
24-bit Integer: 572383 Miop/s
Then why do Tesla GPUs have a higher double-precision when they are the exact same GPU? (speaking from the GTX580 era that is)It's not an artificial cap either. It's the way the hardware was designed for better performance per watt.
Then why do Tesla GPUs have a higher double-precision when they are the exact same GPU? (speaking from the GTX580 era that is)
The reason I say this is that the Tesla GPU's double-precision is exactly 1/2 of the single-precision, not having an artificial cap.
GeForce GPU's double-precision is exactly 1/8 of the single-precision, hence why it is called an artificial cap.
Not saying you're wrong about the power design, I'd just like to learn more.
Read through the article, but it didn't explain much on that.
Double precision is 1/2 of single precision for Tesla 20-series, whereas double precision
is 1/8th of single precision for GeForce GTX 470/480
Full double precision performance
Because Tesla and GeForce GPUs are exactly the same, though it's not really "artificial" as much as it is just disabled.Artificial how?
I think so, we are just saying two different things, but mean the same thing.I think we are on the same page for the most part
For GF104 (read: GF114), NVIDIA removed FP64 from only 2 of the 3 blocks of CUDA cores. As a result 1 block of 16 CUDA cores is FP64 capable, while the other 2 are not. This gives NVIDIA the advantage of being able to employ smaller CUDA cores for 32 of the 48 CUDA cores in each SM while not removing FP64 entirely. Because only 1 block of CUDA cores has FP64 capabilities and in turn executes FP64 instructions at 1/4 FP32 performance (handicapped from a native 1/2), GF104 will not be a FP64 monster. But the effective execution rate of 1/12th FP32 performance will be enough to effectively program in FP64 and debug as necessary.
The CUDA FP64 block contains 8 special CUDA cores that are not part of the general CUDA core count and are not in any of NVIDIA’s diagrams. These CUDA cores can only do and are only used for FP64 math. What's more, the CUDA FP64 block has a very special execution rate: 1/1 FP32. With only 8 CUDA cores in this block it takes NVIDIA 4 cycles to execute a whole warp, but each quarter of the warp is done at full speed as opposed to ½, ¼, or any other fractional speed that previous architectures have operated at. Altogether GK104’s FP64 performance is very low at only 1/24 FP32 (1/6 * ¼).... it’s the very first time we’ve seen 1/1 FP32 execution speed.
So with 1/4 the DP, you would need to run 4 times as long, and the smaller memory would restrict the problem size. Doesn't sound to bad to me...
If you want hot NVIDIA pr0n, go with Tesla, it can handle any steamy DP you throw at it.A lot of this thread could be misconstrued as the script of a new porn movie....or is that just me?
I guess, my point is that crippled dp is not the end of the story for gpgpu programming.
Thats why i'd like to see some benchmarks before drawing conclusions. In some heavy CUDA usage senarios you'll max out the memory (the transfer speed not the total ram) way before you'll get to 50% on the core (this happens to me lots). So DP/SP performance isn't really conclusive. The memory transfer speeds are higher than cards i'm using now, not sure about 500 series.
The memory transfer speeds are higher than cards i'm using now, not sure about 500 series.
I would just wait for the gk110 or radeon 89xx
What CUDA/HPC apps are you running that max out your VRAM transfer rates?
Realtime video editing stuff seems to do this the most. Most the actual processing tasks in are light (AA/saturation) a few use a bit more grunt (mattes), but as more and more layers of stuff/processes pile up the vram transfer speed/throughput (or amount) seem to become the limiting factor while the GPU can get to about 25%.