Performance

基准测试目录

涵盖单节点、集群和异构GPU配置的透明、可复现的基准测试。

Single Node

单节点推理 — Moreh vLLM

基于Moreh vLLM的单服务器推理性能。

InferenceMAX DeepSeek R1 0528

8× AMD MI300X

Throughput geomean

Moreh vLLM1.47×
SGLang1.0×
Blog Post

InferenceMAX DeepSeek R1 0528

8× AMD MI300X

E2E Latency geomean (越低越好)

Moreh vLLM0.68×
SGLang1.0×
Blog Post

DeepSeek R1 671B

8× AMD MI300X

Output TPS geomean

Moreh vLLM1.68×
ROCm vLLM1.0×
Technical Report

DeepSeek R1 671B

8× AMD MI300X

TTFT (越低越好)

Moreh vLLM0.57×
ROCm vLLM1.0×
Technical Report

Llama 3.3 70B

2× AMD MI300X

Output TPS geomean

Moreh vLLM1.74×
ROCm vLLM1.0×
Technical Report

Llama 3.3 70B

2× AMD MI300X

TTFT (越低越好)

Moreh vLLM0.50×
ROCm vLLM1.0×
Technical Report

Step3 321B

8× AMD MI308X

Decode TPS

Moreh vLLM4,082
NVIDIA H20 baseline3,147
Customer Case

Telco 7.8B LLM

1× AMD MI300X

Output TPS

Moreh vLLM (MI300X)186.75
vLLM (H100)143.39
Customer Case

Telco 7.8B LLM

1× AMD MI300X

SLO-compliant max concurrency

Moreh vLLM (MI300X)880
vLLM (H100)636
Customer Case
Cluster

集群推理 — MoAI Inference Framework

PD分离、智能路由等集群规模优化。

DeepSeek R1 671B

5× AMD MI300X nodes

Output tok/s per decode node

PD disagg + EP22,000+
Docs

DeepSeek R1 671B

5× AMD MI300X nodes

End-to-end latency (越低越好)

PD disaggregation0.74×
Non-disaggregated1.0×
Technical Report

DeepSeek R1 671B

2× vs 5× AMD MI300X nodes

Throughput

Cache-aware (2 nodes)2.2×
Naive routing (5 nodes)1.0×
Docs

DeepSeek R1 671B

2× vs 5× AMD MI300X nodes

TTFT (越低越好)

Cache-aware (2 nodes)0.03–0.05×
Naive routing (5 nodes)1.0×
Docs
Heterogeneous

异构GPU集成

通过编排不同厂商和代次的GPU,实现更高吞吐量和更低延迟。

GPT-OSS 120B

H100 + AMD MI300X

Throughput

Cross-vendor PD disagg1.7×
Same-vendor PD disagg1.0×
Technical Report

GPT-OSS 120B

H100 + AMD MI300X

End-to-end latency (越低越好)

Cross-vendor PD disagg0.57×
Same-vendor PD disagg1.0×
Technical Report

DeepSeek R1 671B

AMD MI300X + MI308X

Throughput

PD disaggregation1.53×
Load-balanced1.0×
Blog Post

GPT-OSS 120B

H100 + AMD MI250

Throughput

Speculative decoding1.17×
All-inference baseline1.0×
Technical Report

GPT-OSS 120B

4× AMD MI250 nodes

TTFT at 100K context (越低越好)

Multi-node prefill engine<2s
Single-node baseline~9s
Blog Post