Interested? Seems cool. I'm appreciating this HPC and ML computing a lot more recently tanks to ParlAI.ai / Blender
"TFRT Architecture - source: https://github.com/tensorflow/runtime
TFRT is one of many recent improvements made to the TensorFlow framework to increase inference performance, including TensorFlow Lite and the Model Optimization Toolkit. TensorFlow Lite also converts models to a form targeted for specific hardware but focuses on resource-constrained processors such as mobile and edge devices. TFRT, by contrast, aims to improve model inference across all platforms, including the cloud or datacenter, and includes targets such as GPUs and high-end CPUs. To measure the improvements to inference latency, Google integrated TFRT with TensorFlow Serving, a production-grade serving environment for model inference. For their experiment, they chose a ResNet-50 model and executed it on TFRT and the previous runtime. TFRT's average inference time improved 28%."
https://www.infoq.com/news/2020/05/google-tensorflow-runtime/
