Scheduled to be released in Q4 2025
The Optimal Solution for Both Inference and Training
Building Blocks
Chip
Wormhole Processor
Tenstorrent’s Wormhole processor is a lightweight and efficient building block for AI clusters. It delivers 164 TFLOPS of BLOCKFP8 (BF16) performance and 12 GB of memory. While an individual chip may be weak, the true power of Wormhole emerges when many chips come together to form a single cluster.
Networking
Switchless Chip-to-Chip Torus Network
Every Wormhole processor is equipped with 3.2 Tbps Ethernet interfaces (16 x 200 Gbps), allowing direct connections with adjacent chips in the north, south, east, and west directions. All Wormhole processors in a cluster form a torus network, without requiring a complex switch network. This efficiently handles the communication patterns of typical AI workloads.
Inference Software
Moreh vLLM and MoAI Inference Framework
Moreh vLLM supports a wide range of models and delivers consistently optimized performance across diverse usage patterns. On top of this, MoAI Inference Framework supports optimal distributed inference including disaggregation, torus-aware scheduling and routing, and auto scaling.
Training Software
MoAI Training Framework
MoAI Training Framework transforms an entire cluster of Tenstorrent’s lightweight and scalable chips into a single PyTorch virtual device. It enables efficient LLM training on the Tenstorrent architecture.