Specifying FP32 precision again lets Titan RTX show off the advantage of 24GB of memory. Titan V and Xp started displaying out-of-memory errors after successful runs with batch sizes of 32.
Enabling FP16 precision in the dataset’s training parameter script allows the 12GB cards to accommodate larger batches. However, they’re both overwhelmed by Titan RTX as we push 384-image batches through its TU102.
Inferencing Performance
Before we isolate the performance of Turing’s new INT8 mode, we took a pass at inferencing on a trained ResNet-50 model with TensorFlow.
107aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS8wL1IvODIwMzk1L29yaWdpbmFsL0luZmVyZW5jaW5nVGVuc29yRmxvd.jpg
Across the board, Titan RTX outperforms Titan V (as does GeForce RTX 2080 Ti).
Nvidia recommends inferencing in TensorRT, though, which supports Turing GPUs, CUDA 10, and the Ubuntu 18.04 environment we’re testing under. Our first set of results inference a GoogleNet model pre-trained in Caffe.
108aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS8xLzEvODIwNDA1L29yaWdpbmFsL0luZmVyZW5jaW5nR29vZ2xlTmV0L.jpg
109aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS8xLzAvODIwNDA0L29yaWdpbmFsL0luZmVyZW5jaW5nR29vZ2xlTmV0L.jpg
110aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS8xLzIvODIwNDA2L29yaWdpbmFsL0luZmVyZW5jaW5nR29vZ2xlTmV0L.jpg
With the FP32 numbers serving as a baseline, Titan RTX’s speed-ups thanks to FP16 and INT8 modes are significant. GeForce RTX 2080 Ti benefits similarly.
Titan V’s INT8 rate is ½ of its peak FP16 throughput, so although the card does see improvements versus FP16 mode, they’re not as pronounced.
Titan Xp’s 48.8 TOPS of INT8 performance prove useful in inferencing workloads. An FP16 rate that’s 1/64 of FP32 throughput means we’re not surprised to see FP16 precision only barely faster than the FP32 result.