erek
[H]F Junkie
- Joined
- Dec 19, 2005
- Messages
- 10,921
Previous thread was about GA102. This thread is about Big Ampere or A100.
"Nvidia invented a new number format for AI, Tensor Float 32 (TF32), which its third generation Tensor Cores support. For AI acceleration, working with the smallest number of bits is desirable, since that’s more efficient for computation and data movement, but this is traded off with the accuracy of the final result. TF32 aims to strike this balance using the 10-bit mantissa (which determines precision) from half-precision numbers (FP16), and the 8-bit exponent (which determines the range of numbers that can be expressed) from single-precision format (FP32) (read more about AI number formats here).
“With this new precision, A100 offers 20 times more compute for single-precision AI, and because developers can continue to use the inputs as single-precision and get outputs back as single-precision, they do not need to do anything differently. They benefit from this acceleration automatically out of the box,” Kharya said.
The Tensor Cores now also natively support double-precision (FP64) numbers, which more than doubles performance for HPC applications."
Source: EETimes @ Nvidia Reinvents GPU, Blows Previous Generation Out of the Water: A100 Big Ampere
"Nvidia invented a new number format for AI, Tensor Float 32 (TF32), which its third generation Tensor Cores support. For AI acceleration, working with the smallest number of bits is desirable, since that’s more efficient for computation and data movement, but this is traded off with the accuracy of the final result. TF32 aims to strike this balance using the 10-bit mantissa (which determines precision) from half-precision numbers (FP16), and the 8-bit exponent (which determines the range of numbers that can be expressed) from single-precision format (FP32) (read more about AI number formats here).
“With this new precision, A100 offers 20 times more compute for single-precision AI, and because developers can continue to use the inputs as single-precision and get outputs back as single-precision, they do not need to do anything differently. They benefit from this acceleration automatically out of the box,” Kharya said.
The Tensor Cores now also natively support double-precision (FP64) numbers, which more than doubles performance for HPC applications."
Source: EETimes @ Nvidia Reinvents GPU, Blows Previous Generation Out of the Water: A100 Big Ampere