Tesla

POWER NEW LEVELS OF USER ENGAGEMENT
Boost throughput and responsive experiences in deep learning inference workloads.
Boost throughput and responsive experiences in deep learning inference workloads

ACCELERATE DEEP LEARNING INFERENCE

In the new era of artificial intelligence (AI), deep learning is enabling superhuman accuracy in complex tasks to enhance our everyday experiences. Interactive speech, computer vision, and predictive analytics are a few of the areas where deep learning models trained on GPUs have demonstrated incredible results that were previously thought impossible.

When modern neural networks are deployed on CPUs for inference, AI-based services can’t provide the responsiveness needed for user engagement. NVIDIA® Tesla® P40 and P4 GPU accelerators give you the perfect solution—built to deliver the highest throughput and most responsive experiences for deep learning inference workloads. They’re powered by NVIDIA Pascal™ architecture to provide over 60X faster inference performance than CPUs for real-time responsiveness in even the most complex deep learning models.

 

NVIDIA TESLA INFERENCE ACCELERATORS

Deep Learning Inference Latency

Deep Learning Inference Throughput

 
NVIDIA Tesla P40

MAXIMUM DEEP LEARNING INFERENCE THROUGHPUT

The Tesla P40 is purpose-built to deliver maximum throughput for deep learning inference. With 47 TOPS (Tera-Operations Per Second) of inference performance per GPU, a single server with eight Tesla P40s can replace over 100 CPU servers.

Pdf
Tesla P40 Datasheet (PDF – 166KB)
 

ULTRA-EFFICIENT DEEP LEARNING IN SCALE-OUT SERVERS

The Tesla P4 accelerates any scale-out server, offering an incredible 40X higher energy efficiency compared to CPUs.

Pdf
Tesla P4 Datasheet (PDF – 164KB)
Tesla P4
 

DEEP LEARNING ACCELERATOR FEATURES AND BENEFITS

These GPUs power faster predictions that enable amazing user experiences for AI applications.

 
100X Higher Throughput to Keep Up with Expanding Data

100X Higher Throughput to Keep Up with Expanding Data

The volume of data generated every day in the form of sensor logs, images, videos, and records is economically impractical to process on CPUs. Pascal-powered GPUs give data centers a dramatic boost in throughput for deep learning deployment workloads and extract intelligence from this tsunami of data. A server with eight Tesla P40s can replace over 100 CPU-only servers for deep learning workloads, so you get higher throughput with lower acquisition cost.

 
A Dedicated Decode Engine for New AI-based Video Services

A Dedicated Decode Engine for New AI-based Video Services

Tesla P4 and P40 GPUs can analyze up to 39 HD video streams in real time, powered by a dedicated hardware-accelerated decode engine that works in parallel with the NVIDIA CUDA® cores performing inference. By integrating deep learning into the video pipeline, customers can offer new levels of smart, innovative video services to users.

Unprecedented Efficiency for Low-Power Scale-out Servers

Unprecedented Efficiency for Low-Power Scale-out Servers

The ultra-efficient Tesla P4 GPU accelerates density-optimized scale-out servers with a small form factor and 50/75 W power footprint design. It delivers an incredible 40X better energy efficiency than CPUs for deep learning inference workloads. This lets hyperscale customers scale within their existing infrastructure and service the exponential growth in demand for AI-based applications.



 
Faster Deployment With NVIDIA TensorRT™ and DeepStream SDK

Faster Deployment With NVIDIA TensorRT™ and DeepStream SDK

NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. It includes a library created to optimize deep learning models for production deployment, taking trained neural nets—usually in 32-bit or 16-bit data—and optimizing them for reduced-precision INT8 operations. NVIDIA DeepStream SDK taps into the power of Pascal GPUs to simultaneously decode and analyze video streams.

 

PERFORMANCE SPECIFICATION FOR NVIDIA TESLA P40 AND P4 ACCELERATORS

 
  Tesla P4 for Ultra-Efficient Scale-Out Servers Tesla P40 for Maximum-Inference Throughput Servers
Single-Precision Performance 5.5 TeraFLOPS 12 TeraFLOPS
Integer Operations (INT8) 22 TOPS* 47 TOPS*
GPU Memory 8 GB 24 GB
Memory Bandwidth 192 GB/s 346 GB/s
System Interface Low-Profile PCI Express Form Factor Dual-Slot, Full-Height PCI Express Form Factor
Power 50 W/75 W 250 W
Hardware-Accelerated Video Engine 1x Decode Engine, 2x Encode Engines 1x Decode Engine, 2x Encode Engines

*Tera-Operations per Second with Boost Clock Enabled

NVIDIA Tesla P40 and P4 Data Sheets

Pdf
Tesla P40 Data Sheet (PDF – 166KB)
Pdf
Tesla P4 Data Sheet (PDF – 164KB)
 
 

Get the NVIDIA Tesla P40 and P4 Today

The Tesla P40 and P4 are available now for deep learning inference.

WHERE TO BUY

 
CUDA and GPU Computing

What is GPU Computing?
GPU Computing Facts
GPU Programming
Kepler GPU Architecture
GPU Cloud Computing
Contact Us

What is CUDA?
CUDA Showcase
CUDA Webinars
CUDA Training
CUDA Training Calendar
CUDA Research Centres
CUDA Teaching Centres

GPU Applications

Tesla GPU Applications
Tesla Case Studies
Tesla GPU Test Drive
OpenACC Directives

Tesla GPUs for
Servers for Workstations

Why Choose Tesla
Tesla Server Solutions
Tesla Workstation Solutions
Embedded Development Platform
Buy Tesla GPUs

Tesla News and Information

Tesla Product Literature
Tesla Software Features
Tesla Software Development Tools
NVIDIA Research
Tesla Alerts

Find Us Online

NVIDIA Blog NVIDIA Blog
Facebook Facebook
YouTube YouTube