The NVIDIA® Tesla™ C870 GPU computing processor is a massively multi-threaded processor architecture that is ideal for high performance computing (HPC) applications used by scientists, engineers, and other technical professionals.

The Tesla C870 GPU computing processor transforms a standard workstation into a personal supercomputer. With 128 streaming processor cores, the CUDA C-language development environment and developer tools, and a range of applications already ported, the Tesla C870 enables professionals to develop applications faster and solve problems that traditionally required access to a shared server cluster.

Image Courtesy of Evolved Machines. Massively Multi-threaded Processor Architecture
Solve compute problems at your workstation that previously required a large cluster

128 Floating Point Processor Cores
Achieve up to 350 GFLOPS of performance (512 GFLOPS peak) with one C870 GPU
 
Image Courtesy of University of
Illinois at Urbana-Champaign
Multi-GPU Computing
Solve large-scale problems by dividing it across multiple GPUs

Shared Data Memory
Groups of processor cores can collaborate using shared data

High Speed, PCI-Express Data Transfer
Fast and high-bandwidth communication between CPU and GPU
 
To download the Product Overview PDF - please click here. (1.7 MB PDF)

Features Benefits
Massively parallel many core architecture 128 processor cores that can execute thousands of concurrent threads to deliver unprecedented application performance
Widely accepted, easy to learn CUDA C parallel programming environment Easily express application parallelism to take advantage of the GPU’s many core architecture
Deskside supercomputer with two GPUs with 256 processor cores Power-efficient and quiet deskside unit that delivers the performance of two Tesla C870 GPUs
Use low power host interface card to connect the deskside to the host desktop Connect the Tesla D870 deskside to an inexpensive desktop PC to get a supercomputer at your desktop
Scale to multiple GPUs and harness the performance of 1000s of processor cores Get very high application performance by mapping the application to multiple GPUs
128 IEEE single precision floating point units per GPU Achieve up to 350 GFLOPs of sustained performance (512 GFLOPs peak) per GPU
384-Bit Memory Interface from GPU to on-board memory Fast GDDR3, 384-bit memory interface delivers 76.8 GB/sec memory bandwidth for blistering data transfer
1.5 GB of on-board memory with each GPU Transfer data less often and perform larger computations on a larger working data set
High Speed, PCI Express Data Transfer Computing applications benefit from the high data transfer rate possible through standard PCI-Express architecture
Parallel shared memory Low latency memory shared by groups of processors to enable efficient collaboration among them
Tesla GPUs available in flexible form factor Tesla add-in card, deskside, and 1U system enables deployment in a wide range of environments


Form Factor ATX, 4.38" x 12.28", dual slot height
# of Tesla GPUs 1
# of Streaming Processor Cores 128
Floating Point Precision IEEE 754 single-precision floating point
Floating Point Performance 430 GFLOPs achievable with a C (CUDA) program (512 peak)
Total Dedicated Memory 1.5 GB at 800 Mhz
Memory Interface 384-bit GDDR3
Memory Bandwidth 76.8 GB/sec peak
Max Power Consumption 170W peak, 120W typical
Programming environment CUDA