Titan V - 110 TFlops

BOINC should be compatible. The question is whether any projects will need to re-compile or monkeying with app_info/app_config files.
 
So that is how he has been so successful at recruiting.... lol
 
Hmm, only 15tflops single precision, and 7 double... 110 tflops is for tensor calcs... (AI)

So not very cost effective for folding/crunching yet...
 
The 110 TH is HALF PRECISION only - not applicable to Folding which uses a mix of standard precision and double-precision work.
7 TFlops DP precision is bloody A LOT - nothing "consumer" even comes close, the old R9 280x (and other Tahiti-based varients) were the closest in a CONSUMER card and they're only ballpark 2 or so.
The PREVIOUS Titan X(p) was somewhere UNDER 1 Tflop on double precision, as I recall.

At $3000, this was never intended as a consumer card - it the first time the Titan was based on the FULL TESLA GPU as opposed to being a "high end consumer" GPU.
The intent appears to be to create an "entry level TESLA" card for researchers.
 
Hi Everybody & Happy New Year!

Haven't been here for some years, thought I come back sometimes - like now :)

The last years had been quite intense and interesting businesswise, leaving almost no room to play around with GPUs, multi/many cores and Distributed Computing. Had been deep into GPU's back in 2013 with the availability of the original Titan, collected some cards then. Since then, I only traced and looked into architectural changes like Maxwell and Pascal with one card each. Given the significant architectural changes in Volta and the limited cuts of functionality NVidia did, the Titan V was too hard to resist. :)

It is still early days looking into this, but the low level changes (like native instruction set) in Volta are significantly more and deeper than any previous generation architectures. I would not be surprised, if it takes a few generations of software updates in drivers and applications to leverage the potential in the architecture.

I've updated my native instruction set "cheat sheet" (instructions "below" the PTX ISA) to include Volta as well, now including Fermi, Kepler, Maxwell, Pascal and Volta.

Wrt to the Tensor cores and the famous 110 TFlop:
Correct, they are mixed precision with FP16 and FP32. The interesting attribute is that the matrix "Warp-level Matrix Multiply and Accumaltion" instruction (Nvidia speak for the basic/only Tensor core operation) is a full Warp MMA instruction.

Titan V has 80 SM processors, each one with 64x FP32, 32x FP64 and 8x Tensor cores. Given the optimization on Matrix Mulitply and Accumulate operations, the 8 TensorCores are able to execute 1024 mixed-precision Flop per processor cycle per SM. To compare, the FP32 and FP64 units are able to deliver 2 Flop / cycle each (FMA operation). Per SM, this accounts to 128 FP32 Flop / cycle, or 64 FP64 Flop / cycle. Currently, it is not possible to address the 8 Tensor cores separately in a program (this might change, depending on the evolution of CUDA).

Depending on your workload, TensorCores are currently quite limited with regards to what data types they can work with. If you are in AI, the units are a great step in the right direction. For other apps, it might just be irrelevant. Not sure, how relevant they might become for game developers. If they do, this might be fun :)

Matrix dimensions as operands can be:
1) 16x16x16
2) 32x8x16
3) 8x32x16

On double precision performance:
One of the compelling arguments for the original Titan was its high DP performance. The ratio of FP32 to FP64 operations was 1:3. Maxwell didn't have any implementation with high DP performance, but Pascal did in the professional(=expensive) Tesla cards. No "consumer" Pascal model had high DP performance, even the Titan Xp missed that feature.
With the Titan V, high DP performance is back in the Titan series. Delightfully with a 1:2 ratio.

cheers,
Andy
 
Last edited:
Back
Top