Performance

ベンチマークカタログ

単一ノード、クラスター、ヘテロジニアスGPU構成全体にわたる透明性と再現性のあるベンチマーク。

Single Node

単一ノード推論 — Moreh vLLM

Moreh vLLMによるサーバー単位の推論性能。

InferenceMAX DeepSeek R1 0528

8× AMD MI300X

Throughput geomean

Moreh vLLM

1.47×

SGLang

1.0×

InferenceMAX DeepSeek R1 0528

8× AMD MI300X

E2E Latency geomean (低いほど良い)

Moreh vLLM

0.68×

SGLang

1.0×

DeepSeek R1 671B

8× AMD MI300X

Output TPS geomean

Moreh vLLM

1.68×

ROCm vLLM

1.0×

Technical Report ›

DeepSeek R1 671B

8× AMD MI300X

TTFT (低いほど良い)

Moreh vLLM

0.57×

ROCm vLLM

1.0×

Technical Report ›

Llama 3.3 70B

2× AMD MI300X

Output TPS geomean

Moreh vLLM

1.74×

ROCm vLLM

1.0×

Technical Report ›

Llama 3.3 70B

2× AMD MI300X

TTFT (低いほど良い)

Moreh vLLM

0.50×

ROCm vLLM

1.0×

Technical Report ›

Step3 321B

8× AMD MI308X

Decode TPS

Moreh vLLM

4,082

NVIDIA H20 baseline

3,147

Customer Case ›

Telco 7.8B LLM

1× AMD MI300X

Output TPS

Moreh vLLM (MI300X)

186.75

vLLM (H100)

143.39

Customer Case ›

Telco 7.8B LLM

1× AMD MI300X

SLO-compliant max concurrency

Moreh vLLM (MI300X)

880

vLLM (H100)

636

Customer Case ›

Cluster

クラスター推論 — MoAI Inference Framework

PD disaggregation、インテリジェントルーティングなど、クラスター規模の最適化。

DeepSeek R1 671B

5× AMD MI300X nodes

Output tok/s per decode node

PD disagg + EP

22,000+

DeepSeek R1 671B

5× AMD MI300X nodes

End-to-end latency (低いほど良い)

PD disaggregation

0.74×

Non-disaggregated

1.0×

Technical Report ›

DeepSeek R1 671B

2× vs 5× AMD MI300X nodes

Throughput

Cache-aware (2 nodes)

2.2×

Naive routing (5 nodes)

1.0×

DeepSeek R1 671B

2× vs 5× AMD MI300X nodes

TTFT (低いほど良い)

Cache-aware (2 nodes)

0.03–0.05×

Naive routing (5 nodes)

1.0×

Heterogeneous

ヘテロジニアスGPU統合

ベンダーと世代をまたぐGPUオーケストレーションにより、より高いスループットと低いレイテンシを実現。

GPT-OSS 120B

H100 + AMD MI300X

Throughput

Cross-vendor PD disagg

1.7×

Same-vendor PD disagg

1.0×

Technical Report ›

GPT-OSS 120B

H100 + AMD MI300X

End-to-end latency (低いほど良い)

Cross-vendor PD disagg

0.57×

Same-vendor PD disagg

1.0×

Technical Report ›

DeepSeek R1 671B

AMD MI300X + MI308X

Throughput

PD disaggregation

1.53×

Load-balanced

1.0×

GPT-OSS 120B

H100 + AMD MI250

Throughput

Speculative decoding

1.17×

All-inference baseline

1.0×

Technical Report ›

GPT-OSS 120B

4× AMD MI250 nodes

TTFT at 100K context (低いほど良い)

Multi-node prefill engine

<2s

Single-node baseline

~9s