Titan V - 110 TFlops

NickOfTime · Dec 8, 2017

https://www.nvidia.com/en-us/titan/titan-v/

I wonder about FAH and Boinc compability....

So who is planning to Upgrade Soon and Test?

Gilthanis · Dec 8, 2017

BOINC should be compatible. The question is whether any projects will need to re-compile or monkeying with app_info/app_config files.

Deleted member 88227 · Dec 8, 2017

The real question is who's going to buy me one?

Araxie · Dec 8, 2017

Skillz said:
The real question is who's going to buy me one?

I thought you was giving away two of those this month..

..

Gilthanis · Dec 8, 2017

So that is how he has been so successful at recruiting.... lol

Deleted member 88227 · Dec 8, 2017

Ha! I wish I had 2 to give away!

MN Scout · Dec 8, 2017

I could sell my cards and I'd be only 1/3rd the way there.

face2palm · Dec 8, 2017

"Limit 2 per customer"

Bah, I'm out.

NickOfTime · Dec 11, 2017

Hmm, only 15tflops single precision, and 7 double... 110 tflops is for tensor calcs... (AI)

So not very cost effective for folding/crunching yet...

Pocatello · Dec 11, 2017

Can it play Far Cry at 30fps?

Nathan_P · Dec 11, 2017

Apparently its around 1.7m PPD on F@H at the moment

EXT64 · Dec 11, 2017

Weak...

QuintLeo · Dec 24, 2017

The 110 TH is HALF PRECISION only - not applicable to Folding which uses a mix of standard precision and double-precision work.
7 TFlops DP precision is bloody A LOT - nothing "consumer" even comes close, the old R9 280x (and other Tahiti-based varients) were the closest in a CONSUMER card and they're only ballpark 2 or so.
The PREVIOUS Titan X(p) was somewhere UNDER 1 Tflop on double precision, as I recall.

At $3000, this was never intended as a consumer card - it the first time the Titan was based on the FULL TESLA GPU as opposed to being a "high end consumer" GPU.
The intent appears to be to create an "entry level TESLA" card for researchers.

AndyE · Jan 6, 2018

Hi Everybody & Happy New Year!

Haven't been here for some years, thought I come back sometimes - like now

The last years had been quite intense and interesting businesswise, leaving almost no room to play around with GPUs, multi/many cores and Distributed Computing. Had been deep into GPU's back in 2013 with the availability of the original Titan, collected some cards then. Since then, I only traced and looked into architectural changes like Maxwell and Pascal with one card each. Given the significant architectural changes in Volta and the limited cuts of functionality NVidia did, the Titan V was too hard to resist.

It is still early days looking into this, but the low level changes (like native instruction set) in Volta are significantly more and deeper than any previous generation architectures. I would not be surprised, if it takes a few generations of software updates in drivers and applications to leverage the potential in the architecture.

I've updated my native instruction set "cheat sheet" (instructions "below" the PTX ISA) to include Volta as well, now including Fermi, Kepler, Maxwell, Pascal and Volta.

Wrt to the Tensor cores and the famous 110 TFlop:
Correct, they are mixed precision with FP16 and FP32. The interesting attribute is that the matrix "Warp-level Matrix Multiply and Accumaltion" instruction (Nvidia speak for the basic/only Tensor core operation) is a full Warp MMA instruction.

Titan V has 80 SM processors, each one with 64x FP32, 32x FP64 and 8x Tensor cores. Given the optimization on Matrix Mulitply and Accumulate operations, the 8 TensorCores are able to execute 1024 mixed-precision Flop per processor cycle per SM. To compare, the FP32 and FP64 units are able to deliver 2 Flop / cycle each (FMA operation). Per SM, this accounts to 128 FP32 Flop / cycle, or 64 FP64 Flop / cycle. Currently, it is not possible to address the 8 Tensor cores separately in a program (this might change, depending on the evolution of CUDA).

Depending on your workload, TensorCores are currently quite limited with regards to what data types they can work with. If you are in AI, the units are a great step in the right direction. For other apps, it might just be irrelevant. Not sure, how relevant they might become for game developers. If they do, this might be fun

Matrix dimensions as operands can be:
1) 16x16x16
2) 32x8x16
3) 8x32x16

On double precision performance:
One of the compelling arguments for the original Titan was its high DP performance. The ratio of FP32 to FP64 operations was 1:3. Maxwell didn't have any implementation with high DP performance, but Pascal did in the professional(=expensive) Tesla cards. No "consumer" Pascal model had high DP performance, even the Titan Xp missed that feature.
With the Titan V, high DP performance is back in the Titan series. Delightfully with a 1:2 ratio.

cheers,
Andy

Titan V - 110 TFlops

NickOfTime

[H]ard|DCer of the Month - April 2014

Gilthanis

[H]ard|DCer of the Year - 2014

Deleted member 88227

Guest

Araxie

Supreme [H]ardness

Gilthanis

[H]ard|DCer of the Year - 2014

Deleted member 88227

Guest

MN Scout

Supreme [H]ardness

face2palm

Gawd

NickOfTime

[H]ard|DCer of the Month - April 2014

Pocatello

DC Moderator and [H]ard DCOTM x6

Nathan_P

[H]ard DCOTM x3

EXT64

[H]ard|DCer of the Year 2020

QuintLeo

n00b

AndyE

Limp Gawd