AdoredTV has an informative video on the subject:Okay, I was just looking at a hardwaretimes article on INT16, now that you mentioned the half-execution.
The article was indicating: "As a result of this new partitioning, each Ampere SM partition can execute either 32 FP32 instructions per clock or 16 FP32 and 16 INT32 instructions per cycle. You’re essentially trading integer performance for twice the floating-point capability. Fortunately, as the majority of graphics workloads are FP32, this should work towards NVIDIA’s advantage." But I must still be misunderstanding something, because if the majority of workloads are FP32 focused, then the sheer number of cores should be able to complete the workload much faster...
Games still utilize integer data path for about 1/4 of operations: