# floating point operations per cycle

#### NeghVar

##### 2[H]4U
How do I determine how many floating-point operations per cycle my CPU is able to do? I have the rest of the formula to determine the FLOPS.
sockets * (cores per socket) * (number of clock cycles per second) * (number of floating-point operations per cycle).
I am charting FLOPS from past supercomputers and seeing about when consumer-level computers (PC and smartphone) surpassed the performance of the supercomputers.

Linpack

#### whateverer

##### [H]ard|Gawd
AVX is what you need to dig into. You'll have to analyze your processor architecture (to determine how many vector units can be processed in a single cycle, and how wide each is)

It also depends on what you are doing. If you're doing multiply-accumulate, you get twice as many operations per-clock (it's a trick most GPUs use)

But if you're not doing that, then just assume a single operations per-clock , per-data-slot in the AVX units. The amount each slot is taken up based on the size of that data type.

Single-precision FP is 32-bits.

So for a processor with 2 AVX 256-bit units, you get 256 + 256 = 512-bit total vector width, and the divide that by 32 to get the number of 32-bit slots, or the peak operations per clock.

512 / 32 = 16 slots available = 16 sp flops/cycle./

Last edited:

#### bwang

##### Gawd
So for a processor with 2 AVX 256-bit units, you get 256 + 256 = 512-bit total vector width, and the divide that by 32 to get the number of 32-bit slots, or the peak operations per clock.

512 / 32 = 16 slots available = 16 sp flops/cycle./

Don't forget FMA! Every AVX op counts as two FLOPS if the processor supports FMA.

#### whateverer

##### [H]ard|Gawd
Don't forget FMA! Every AVX op counts as two FLOPS if the processor supports FMA.

It's an operation mostly used by GPUs, but you can find some simulation that can use it as well - but I wouldn't automatically assume it's necessary for a server.

Hence why I said single flop/cycle per AVX slot

#### bwang

##### Gawd

It's an operation mostly used by GPUs, but you can find some simulation that can use it as well - but I wouldn't automatically assume it's necessary for a server.

Hence why I said single flop/cycle per AVX slot

Oops, didn't see that, sorry! I think single cycle MACs typically factor into supercomputer performance measurements, since dense matrix math makes great use of them.
I think the OP is mostly looking for FP performance data for historical purposes - for practical purposes vector FLOPS are somewhat tenuous since a lot of applications have computation patterns that aren't readily vectorized, or are completely bandwidth bound.