Performance
基准测试目录
涵盖单节点、集群和异构GPU配置的透明、可复现的基准测试。
Single Node
单节点推理 — Moreh vLLM
基于Moreh vLLM的单服务器推理性能。
InferenceMAX DeepSeek R1 0528
8× AMD MI300X
E2E Latency geomean (越低越好)
Moreh vLLM
0.68×
SGLang
1.0×
Cluster
集群推理 — MoAI Inference Framework
PD分离、智能路由等集群规模优化。
DeepSeek R1 671B
5× AMD MI300X nodes
End-to-end latency (越低越好)
PD disaggregation
0.74×
Non-disaggregated
1.0×
DeepSeek R1 671B
2× vs 5× AMD MI300X nodes
Throughput
Cache-aware (2 nodes)
2.2×
Naive routing (5 nodes)
1.0×
DeepSeek R1 671B
2× vs 5× AMD MI300X nodes
TTFT (越低越好)
Cache-aware (2 nodes)
0.03–0.05×
Naive routing (5 nodes)
1.0×
Heterogeneous
异构GPU集成
通过编排不同厂商和代次的GPU,实现更高吞吐量和更低延迟。
GPT-OSS 120B
H100 + AMD MI300X
Throughput
Cross-vendor PD disagg
1.7×
Same-vendor PD disagg
1.0×
GPT-OSS 120B
H100 + AMD MI250
Throughput
Speculative decoding
1.17×
All-inference baseline
1.0×
GPT-OSS 120B
4× AMD MI250 nodes
TTFT at 100K context (越低越好)
Multi-node prefill engine
<2s
Single-node baseline
~9s