Scale PyTorch, TensorFlow,
and Triton Inference Server to thousands of GPU/NPUs

Train and deploy any multi-billion or multi-trillion parameter model.
Program anything you need for pretraining, fine tuning, compression, and serving.
Scale to thousands of GPU/NPUs by automatic parallelization and optimization.
Virtualize all GPU/NPUs in a cluster for higher utilization and failover.
Decouple Al infrastructure from the specific hardware vendor.
PyTorch, TensorFlow, and Triton Inference Server are all you need.

Software stack for
the hyperscale AI era

The MoAl platform transforms the way of executing Al applications while preserving the semantics of standard deep learning frameworks including PyTorch and TensorFlow. It is powered by an on-the-fly IR constructor, a graph-level compiler, and a distributed runtime system.

Programmable AI
infrastructure at scale

Users can treat thousands of accelerators as a single (very large and powerful) virtual device in PyTorch and TensorFlow. Various large Al models and algorithms can be easily implemented without concerning about complex system architectures and parallelization techniques.

MoAI Platform primarily supports AMD ROCm

Other accelerators will be added soon!