Though a few decades ago it seemed like pure science-fiction, our reliance on artificial intelligence seems to be growing by the week, to the point where we use it in our daily lives without even noticing. For instance, interactive speech applications on phones, searches by visual cues and even basic video recommendations are AI-based services. With each and every improvement, the AI systems become increasingly complex, and they are doing so faster than the CPUs are able to keep up with them. This created a demand for a new product to help manage the AI-related workload, and the NVIDIA Tesla P4 inferencing accelerator is one of the more interesting things that came out of it.
What Exactly is the NVIDIA Tesla P4?
To put it as simply as possible, it’s a graphics processing unit that is strictly dedicated to boosting the efficiency of servers which are running the deep learning computing model. It helps to speed up the processing of any data related to artificial intelligence workflows, with the aim of offering users a responsive experience with as little a delay as possible.
The NVIDIA Tesla P4 is a low profile extremely efficient card consuming only 50 to 75 Watts. This makes it more suitable for use in smaller scale servers.
As for the more technical details:
- The GPU Architecture used is NVIDIA Pascal
- It performs Integer Operations at 22 Tera-Operations per second
- Single-precision performance is 5.5 TeraFLOPS
- Contains GB of on-board memory with a bandwidth of 192 GB/s
- It has ECC Protection, enhanced programmability with page migration engine, and is server-optimized specifically to be employed in data centres.
- A hardware-accelerate video decoding engine that can transcode over 35 high-definition video streams at the same time.
- It comes packaged with TensorRT, a library specifically designed to help you optimize various deep learning models for production.
- DeepStream SDK is also included and it allows the simultaneous decoding and analysis of multiple video streams.
The NVIDIA Tesla P4 has been compared to the NVIDIA Tesla M4 and an Intel Xeon E5-2699v4 in three categories: application latency, inference efficiency, as well as video transcode and inference on H.264 streams.
To compare their performance regarding application latency, a deep learning HPC (high-performance computing) application was ran on all three processors and the response times were recorded. The Intel Xeon E5-2699v4 processor took over 160ms to respond, the NVIDIA Tesla M4 82ms, while the NVIDIA Tesla P4 took only 11ms.
To test the inference efficiency, they measured the amount of images per second per watt the three candidates could process. The Xeon netted 1.4 images, the M4 12 images, and the Tesla P4 91 images.
Finally, they measured the amount of high-definition streams all three could handle. The Xeon E5-2699v4 was limited at 2, the NVIDIA Tesla M4 at 14, and the NVIDIA Tesla P4 at 35.
Artificial intelligence and deep learning technologies are not only here to stay, but will likely pave the way for great future discoveries. The NVIDIA Tesla P4 was designed to help in a time when our progress is starting to outrun our resources, and can only be a good sign of things to come in the world of AI.