Performance
Benchmark Catalog
Transparent, reproducible benchmarks across single-node, cluster, and heterogeneous GPU configurations.
Single Node
Single-Node Inference — Moreh vLLM
Per-server inference performance powered by Moreh vLLM.
InferenceMAX DeepSeek R1 0528
8× AMD MI300X
E2E Latency geomean (lower is better)
Cluster
Cluster Inference — MoAI Inference Framework
PD disaggregation, intelligent routing, and other optimizations at cluster scale.
DeepSeek R1 671B
5× AMD MI300X nodes
End-to-end latency (lower is better)
DeepSeek R1 671B
2× vs 5× AMD MI300X nodes
Throughput
DeepSeek R1 671B
2× vs 5× AMD MI300X nodes
TTFT (lower is better)
Heterogeneous
Heterogeneous GPU Integration
Higher throughput and lower latency by orchestrating GPUs across vendors and generations.
GPT-OSS 120B
H100 + AMD MI300X
Throughput
GPT-OSS 120B
H100 + AMD MI250
Throughput
GPT-OSS 120B
4× AMD MI250 nodes
TTFT at 100K context (lower is better)