With Moreh’s software and Tenstorrent’s network-integrated architecture, we deliver the best scalability and the lowest TCO for AI infrastructure.

Scheduled to be released in Q4 2025

The Optimal Solution for Both Inference and Training

Inference

The ultra-low-latency collective communication of Tenstorrent accelerators delivers scalable performance for LLMs running across multiple chips. Various model disaggregation techniques can be efficiently implemented on the chip-to-chip torus network.

Training

BLOCKFP formats, large SRAM, and data tile multicast across multiple FPUs enable efficient execution of training workloads. The torus network is ideal for massive-scale training by connecting thousands of chips without relying on heavy switch networks.

Building Blocks

Chip

Wormhole Processor

Tenstorrent’s Wormhole processor is a lightweight and efficient building block for AI clusters. It delivers 164 TFLOPS of BLOCKFP8 (BF16) performance and 12 GB of memory. While an individual chip may be weak, the true power of Wormhole emerges when many chips come together to form a single cluster.

Server

Galaxy Server

Tenstorrent’s Galaxy server is equipped with 32 Wormhole processors, delivering performance equivalent to a typical 8-GPU server.

Networking

Switchless Chip-to-Chip Torus Network

Every Wormhole processor is equipped with 3.2 Tbps Ethernet interfaces (16 x 200 Gbps), allowing direct connections with adjacent chips in the north, south, east, and west directions. All Wormhole processors in a cluster form a torus network, without requiring a complex switch network. This efficiently handles the communication patterns of typical AI workloads.

Inference Software

Moreh vLLM and MoAI Inference Framework

Moreh vLLM supports a wide range of models and delivers consistently optimized performance across diverse usage patterns. On top of this, MoAI Inference Framework supports optimal distributed inference including disaggregation, torus-aware scheduling and routing, and auto scaling.

Training Software

MoAI Training Framework

MoAI Training Framework transforms an entire cluster of Tenstorrent’s lightweight and scalable chips into a single PyTorch virtual device. It enables efficient LLM training on the Tenstorrent architecture.

Through our partnership with Tenstorrent and 3 years of joint development, we deliver a fully integrated solution from hardware to software ready to efficiently run a wide range of AI workloads.