Product Info
Relevant Links
The NVIDIA® Tesla™ S1070 Computing System (Dual PCI Express 2.0 cable connections) is a four-teraflop 1U system powered by the world’s first one-teraflop processor.

With the world’s first teraflop many-core processor, the NVIDIA® Tesla™ S1070 computing system speeds the transition to energy-efficient parallel computing. With 240 processor cores and a standard C compiler that simplifies application development, Tesla S1070 scales to solve the world’s most important computing challenges—more quickly and accurately.

Feeding the HPC Industry’s Relentless Demand for Performance.
Keeps pace with the increasing demands of the toughest computing challenges including drug research, oil and gas exploration and computational finance.

Many-core Architecture Delivers Optimum Scaling across HPC Applications.
Parallel performance from 240 cores capable of concurrent execution of thousands of computing threads and scalable architecture meets computational demands of applications whose complexity has outstripped the CPU’s ability to solve them.
High Efficiency Computing Platform for Energy-conscious Organizations.
Higher performance and density computing for solving complex problems with fewer resources.
NVIDIA CUDA™ Technology Unlocks the Power of Tesla Many-core Computing Products.
The only C language environment that unlocks the many core processing power of GPUs to solve the world’s most computationally-intensive challenges.

Features Benefits
Four 1-Teraflop Processors in a High Density 1U System Delivers up to 4 teraflops of performance in a 1U rack-mount system for unmatched performance in high density rack systems.
Massively-Parallel Many-Core Architecture 240 computing cores per processor that can execute thousands of concurrent threads.
Scales to Multi-GPU Computing Scale to thousands of processor cores to solve large-scale problems by splitting the problem across multiple GPUs.
Program in NVIDIA CUDA™: C for GPU Programmable using CUDA, the world's leading application development platform for many core solutions.
IEEE 754 Floating Point Ensures your results meet industry standard precision including optional features to ensure accuracy.
Double-Precision Floating Point Support Meets the precision requirements of your most demanding applications with IEEE 64-bit precision.
Asynchronous Data Transfer Turbocharges system performance because data transfers can be executed even while the computing cores are busy.
16 GB Ultra-fast Memory Enables larger datasets to be stored locally with 4 GB dedicated for each processor to maximize performance and minimize data movement around the system.
4x 512-bit Memory Interface Delivers 408 GB/s peak memory bandwidth for blistering data transfer as a 512-bit interface dedicated to each processor.
High-Speed, PCI-Express 2.0 Data Transfer With low latency and high bandwidth, computing applications benefit from the highest data transfer rate possible through standard PCI-Express architecture.
Single-screw Rail Mounting Single-screw rail design is quick to install like a tool-less design, but with the extra security and rigidity from a single screw to secure the rail to the rack.
System Monitoring Features Easy management and monitoring post-installation helps your IT staff manage systems with minimal effort. Remote capabilities and status lights on the front and rear of the unit ensure your staff can see the status whether they are on the other side of the rack, or the other side of the world.
Dual PCI-Express 2.0 Cable Connections Maximizes bandwidth between the host processor and the Tesla processors with up to 12.8 GB/s transfer rates (up to 6.4 GB/s per PCI Express connection)
Small-form-factor (SFF) host adapter card The low power host adapter card enables Tesla systems to work with virtually any PCIe compliant host system with an open PCI Express slot (x8 or x16).
# of Tesla Processors 4
# of Computing Cores 960 (240 per processor)
Frequency of processor cores 1.296 to 1.44 GHz
Single Precision floating point performance (peak) 3.73 to 4.14 TFlops
Double Precision floating point performance (peak) 311 to 345 GFlops
Floating Point Precision IEEE 754 single & double
Total Dedicated Memory 16 GB
Memory Interface 512-bit
Memory Bandwidth 408 GB/sec
Max Power Consumption 800 W (typical)
System Interface PCIe x16 or x8
Programming environment CUDA