GTX Titan vs. Tesla K20/K20x

Zarathustra[H]

Extremely [H]
Joined
Oct 29, 2000
Messages
38,877
Hey,

I've been googling my brains out trying to find a good comparison between these cards.

I know the Titan is essentially a limited K20, but I can't find details on what each can do in a way I can compare them.

Has anyone found this anywhere?

Thanks,
M
 
if you're doing CUDA stuff and can afford it you might as well go tesla because they haven't been castrated like the mainstream (sic) gamer targeted Titan.
 
Why would you need this info?

Why do we need any information? I'm curious. :p

if you're doing CUDA stuff and can afford it you might as well go tesla because they haven't been castrated like the mainstream (sic) gamer targeted Titan.

Oh I know that the Titan has been limited (presumably in firmware). I'm curous by how much though.

A friend of mine is a postdoc and also runs a biomedical simulation startup on the side. He does some sort of simulations and was very curious to play with my new Titan when I told him about it.

I'm going to set up a linux boot for him so he can SSH in and run some test programs. I'm just curious how what he will experience on my Titan will compare to what he would experience with a Tesla K20...
 
You can't really compare them, there is a lot more to the K20 than just the card. Different drivers/software etc.

AS for the various applications, well, try them and find out.
 
The trick will be if the compiler "sees" the Titan as a Tesla. If it doesn't, it will raped.
 
I found a ZDnet article that suggests the raw compute power has been left untouched on the Titan both in single and double precision, but that ECC and HyperQ (whatever the hell that is) have been disabled.

Not being familiar enough with compute, I don't know what the significance of this is.
 
I thought I remembered reading somewhere about a seperate gpgpu mode you could change in nvidia control panel that would unleash it's gpgpu capability in exchange for 3d performance. I can't seem to find info on it now, though. I'm also curious what a difference this makes, although personally I wouldn't use it for anything beyond F@H.
 
I thought I remembered reading somewhere about a seperate gpgpu mode you could change in nvidia control panel that would unleash it's gpgpu capability in exchange for 3d performance. I can't seem to find info on it now, though. I'm also curious what a difference this makes, although personally I wouldn't use it for anything beyond F@H.

Yeah I remember reading the same thing, I'll try to find it and update this post. Either way price to performance if you're looking at a card for gpgpu/folding/mining/ the 7970 is a more attractive option.
 
Yeah I remember reading the same thing, I'll try to find it and update this post. Either way price to performance if you're looking at a card for gpgpu/folding/mining/ the 7970 is a more attractive option.

Yeah, the 7970 provides A LOT of compute performance for the money.

This is true unless you need double precision, in which case the Titan blows everything this side of a Tesla board out of the water.
 
If only we could put gameing driver on a tesla ewwwww like you could do back in old days with the 6800 series. Sure it could be done but I don't know how to do it.
 
Zarathustra[H];1039662572 said:
I found a ZDnet article that suggests the raw compute power has been left untouched on the Titan both in single and double precision, but that ECC and HyperQ (whatever the hell that is) have been disabled.

Not being familiar enough with compute, I don't know what the significance of this is.

Anandtech's review has a fairly extensive portion dedicated to Titan's Compute Performance:

http://www.anandtech.com/show/6774/nvidias-geforce-gtx-titan-part-2-titans-performance-unveiled/3

From what I've read of the Titan, the disabled portions relate to using the GPU in a distributed GPU environment. If you're using it in a workstation they shouldn't affect you.

In terms of price and performance the Titan or the eventual Quadro version of the Titan is looking really good for a simple workstation compute card (that supports multiple monitors)
 
main thing is the ram is different. Tesla have ECC (error correcting code) type ram which is needed if you dont want any errors in your calculations if using cuda.

normal gamer gfx card will not bother about errors and might draw something or calculate something wrong for a second.
 
main thing is the ram is different. Tesla have ECC (error correcting code) type ram which is needed if you dont want any errors in your calculations if using cuda.

normal gamer gfx card will not bother about errors and might draw something or calculate something wrong for a second.

Just rambling about stuff I have no direct experience with::D

Not sure about non-ECC causing an issue with single card users.

It's only 6 GB of RAM, and usually inside a metal case. Don't buy a computer case that is not grounded and doesn't weigh 20lb or more if you are running important stuff. Bit flip is caused by solar radiation. Two millimeters of lead does better than any normal ECC RAM.

Server arrays need ECC. It's important due to the amount of RAM there. Your risk at 6gb of RAM isn't close to your risk with 1000gb of RAM.

And Tesla cards are normally run in arrays for serious crunching. A 4U holds what, 8 Tesla cards? A supercomputer holds hundreds.
 
Zarathustra[H];1039662572 said:
I found a ZDnet article that suggests the raw compute power has been left untouched on the Titan both in single and double precision, but that ECC and HyperQ (whatever the hell that is) have been disabled..

As someone else pointed out earlier with the Anand review HyperQ isn't disabled. Neither is Dynamic Parallelism as I've seen in other reviews.

Only certain features that are a subset of those 2 features have been cut out.

If you don't believe Anand, Nvidia corroborates at least some of it.

https://developer.nvidia.com/ultimate-cuda-development-gpu


if you're doing CUDA stuff and can afford it you might as well go tesla because they haven't been castrated like the mainstream (sic) gamer targeted Titan.

Titan is 1k. Minimum buy in for Tesla is 3.3k. Some of the features in Tesla simply aren't needed for the scope of the workload.
 
If only we could put gameing driver on a tesla ewwwww like you could do back in old days with the 6800 series. Sure it could be done but I don't know how to do it.

No, it can't be done. Let me show you why. Do you see what's missing?

872319.jpg
 
Zarathustra[H];1039662572 said:
I found a ZDnet article that suggests the raw compute power has been left untouched on the Titan both in single and double precision, but that ECC and HyperQ (whatever the hell that is) have been disabled.

Not being familiar enough with compute, I don't know what the significance of this is.

HyperQ -

Hyper-Q–Hyper‐Qenablesmultiple CPUcoresto launch work on a singleGPU
simultaneously,thereby dramatically increasingGPUutilization and significantly reducing CPU
idle times.Hyper‐Qincreasesthe total number of connections(work queues) between the host
and theGK110GPUby allowing 32 simultaneous, hardware‐managed connections(compared to
the single connection available with Fermi).Hyper‐Qis a flexible solution that allowsseparate
connectionsfrommultiple CUDA streams,frommultiple Message Passing Interface (MPI)
processes, or even frommultiple threads within a process. Applicationsthat previously
encountered false serialization acrosstasks,thereby limiting achievedGPUutilization, can see
up to dramatic performance increase without changing any existing code.

taken from: http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
 
Since I just tested this on a GTX TItan, I thought I'd post. The simpleHyperQ sample in the CUDA 5.0 SDK can do up to 8 streams at a time on the Titan, versus 32 streams on the K20/K20x. Tested this on Ubuntu 12.10 x64, MSI X79A-GD45 (8D) motherboard, NVIDIA drivers 313.18, which detect the Titan as D15U-50. Similar to Windows, nvidia-settings has a CUDA-Double precision checkmark under the PowerMizer settings of the Titan GPU to enable/disable full DP speeds.

Some other differences:
I do not believe there is any overclock support on Linux currently. However, utilities like NVIDIA Inspector and EVGA Precision X are able to overclock the card in Windows.
nvidia-smi settings of application clocks/TCC/ECC are not supported by Titan (they are supported on K20/K20x)
Tesla K20 and K20x run at PCI-E 2.0 speeds... Titan runs at PCI-E 3.0 speeds... take a look at my bandwidthTest results with the system configuration listed above: ;)

Code:
root@Tesla:/usr/local/cuda/samples/1_Utilities/bandwidthTest# ./bandwidthTest --device=0
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: D15U-50
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			11190.6

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			11802.5

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			221383.4
 
Last edited:
so you can unlock titan on windows to get full dp speeds just without ecc as that is hardware locked/not available?
 
so you can unlock titan on windows to get full dp speeds just without ecc as that is hardware locked/not available?

That is correct, DP can be unlocked in Linux or Windows. The support for ECC/TCC is driver-dependent... basically NVIDIA locks you out of those features given you're buying a consumer card to protect their market. Some in the past were able to unlock a GTX 480 into a C2050... see:

https://devtalk.nvidia.com/default/...-c2050-hack-or-unlocking-tcc-mode-on-geforce/

It might or might not be possible to do the same on Titan cards, but it's not an easy process and prone to bricking a card... so basically if you need those advanced features, buy the real deal (K20)
 
No, it can't be done. Let me show you why. Do you see what's missing?

872319.jpg

We had a tard nugget on here a couple weeks ago saying he played BF3 on his k20x. Guess how I knew he was full of crap.
 
Since I just tested this on a GTX TItan, I thought I'd post. The simpleHyperQ sample in the CUDA 5.0 SDK can do up to 8 streams at a time on the Titan, versus 32 streams on the K20/K20x. Tested this on Ubuntu 12.10 x64, MSI X79A-GD45 (8D) motherboard, NVIDIA drivers 313.18, which detect the Titan as D15U-50. Similar to Windows, nvidia-settings has a CUDA-Double precision checkmark under the PowerMizer settings of the Titan GPU to enable/disable full DP speeds.

Some other differences:
I do not believe there is any overclock support on Linux currently. However, utilities like NVIDIA Inspector and EVGA Precision X are able to overclock the card in Windows.
nvidia-smi settings of application clocks/TCC/ECC are not supported by Titan (they are supported on K20/K20x)
Tesla K20 and K20x run at PCI-E 2.0 speeds... Titan runs at PCI-E 3.0 speeds... take a look at my bandwidthTest results with the system configuration listed above: ;)

Code:
root@Tesla:/usr/local/cuda/samples/1_Utilities/bandwidthTest# ./bandwidthTest --device=0
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: D15U-50
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			11190.6

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			11802.5

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			221383.4

Interesting! Thank you for this.

Do you know if it is possible to enable/disable full DP speeds in a headless system without X installed?
 
some Tesla come wth monitor ports for workstations the super computer ones do not. the Tesla at our studio on the big maya workstation has dvi ports.

you can out put the ones with out dvi ports to a monitor it is just really complicated and need s 2 cards or a video network remote interface.

we can view the render output from a large render farms that have Tesla and some at I fire pros on huge monitor in the meeting room in our studio.

I may benchmark a titan on rendering but I still can't buy a titan. I can tell you nvida nerfs other geforce the quadtro line slow quadtro cards kill even the geforce 680 at rendering.
 
Last edited:
I don't need studio grade rendering just yet. Still in school for animation. So a Titan is a great balance for work, school and play.
 
I am using my titan for Iray and other cuda and open CL things at home.
 
I get the feeling that Titans are built on broken Tesla GK110 gpus, the ones that couldn't cut it for the higher end cards that cost thousands of dollars more.

Seems like a good way to make some money on bad GK110s.

Is this fathomable?
 
it's called binning and thats how intel and amd manufacture they're chips
 
I get the feeling that Titans are built on broken Tesla GK110 gpus, the ones that couldn't cut it for the higher end cards that cost thousands of dollars more.

Seems like a good way to make some money on bad GK110s.

Is this fathomable?

I think that is the EXACT reason that Titan's exist..They may only be able to do 95-99% quality, and the K series need 100%, so that 1% would otherwise force that GK110 die to be disposed of..Now, they have a use..
 
I get the feeling that Titans are built on broken Tesla GK110 gpus, the ones that couldn't cut it for the higher end cards that cost thousands of dollars more.

Seems like a good way to make some money on bad GK110s.

Is this fathomable?

Not sure about that.
Titan doesn't seem to have anything disabled in hardware. It has the same number of SMXs and cuda FP64 cores.
 
I think that is the EXACT reason that Titan's exist..They may only be able to do 95-99% quality, and the K series need 100%, so that 1% would otherwise force that GK110 die to be disposed of..Now, they have a use..

Not sure about that.
Titan doesn't seem to have anything disabled in hardware. It has the same number of SMXs and cuda FP64 cores.

Yeah, that's why it doesn't make sense to say "95% quality." For a device full of transistors that perform logical operations, there is no 95%. It is digital. It either works 100% to its specification or it is broken.

Most likely the OP is correct that Nvidia disabled certain compute operations to gimp the cards for certain types of supercomputer applications but still deliver a solid baseline for people just getting into GPU programming.
 
If you're talking a startup doing medical simulations, I would stick with the Tesla.

ECC memory is not a requirement for 99% (or more) of usage. Bit flips are indeed rare, and even without ECC memory, errors will generally be detected. The difference is that ECC offers single-bit error immunity, and it ensures that such errors will be corrected.

The current culture of "server or workstation requires ECC memory and nothing else" may be unfounded, registered, high density, memory aside. But for some workflows ensuring zero tolerance towards single-bit errors is definitely the best policy. Anything to do with the financials is the obvious example, but many or even most medical, engineering, disaster, and etc. simulations certainly apply as well.

Many times a small, undetected, error (no matter how rare) can be greatly compounded over the length of a long running simulation. Other times, the risk of having to start over after an error is detected would be unacceptable or costly.

Anyway, your friend will be able to best determine how error tolerant their workflow(s) need to be.
 
If you're talking a startup doing medical simulations, I would stick with the Tesla.

ECC memory is not a requirement for 99% (or more) of usage. Bit flips are indeed rare, and even without ECC memory, errors will generally be detected. The difference is that ECC offers single-bit error immunity, and it ensures that such errors will be corrected.

The current culture of "server or workstation requires ECC memory and nothing else" may be unfounded, registered, high density, memory aside. But for some workflows ensuring zero tolerance towards single-bit errors is definitely the best policy. Anything to do with the financials is the obvious example, but many or even most medical, engineering, disaster, and etc. simulations certainly apply as well.

Many times a small, undetected, error (no matter how rare) can be greatly compounded over the length of a long running simulation. Other times, the risk of having to start over after an error is detected would be unacceptable or costly.

Anyway, your friend will be able to best determine how error tolerant their workflow(s) need to be.

We actually talked about that.

For his particular application yeah, most simulations do such a large amount of averaging
that a bit error here and there is likely to be less common than a statistical outlier
and probably won't even be noticed.

ECC would definitely be better, in most cases, but in this specific case, possibly not.
 
Can someone give me a clear answer on this ?

I see mixed responses.

I am considering buying 4 Tesla GPUs. But I'm not sure if they will produce more computing results (folding@home - Stanford) over the consumer grade equipment. Or if it's just about the same.

That said. Should I go with 4x K20s or 4 Titans? I love saving money.


I am doing this for free to donate some computing power.
It's for tax reasons. I can classify as the power computed and hardware as expenses/donations and save a lot. Love my accountant. :)
 
I am considering buying 4 Tesla GPUs. But I'm not sure if they will produce more computing results (folding@home - Stanford) over the consumer grade equipment.

You might consider posting this in the distributed computing sub-forum.


I am doing this for free to donate some computing power.
It's for tax reasons. I can classify as the power computed and hardware as expenses/donations and save a lot. Love my accountant. :)

You might want to consider getting a second opinion on the issue of deductibility.
 
Back
Top