1.68×
vs ROCm vLLM
单服务器上的DeepSeek R1
20,000+
每节点tok/s
MI300X集群上的DeepSeek R1
1.7×
跨厂商GPU协同
NVIDIA + AMD PD分离
2.2×
减少40%服务器的吞吐量提升
前缀缓存感知路由
全栈推理软件
从内核到集群
Moreh覆盖异构加速器上的整个推理栈——从芯片级内核到分布式服务。
MoAI Inference Framework
路由与调度 · 自动扩缩 · SLO驱动优化 · KV缓存
Moreh vLLM
SOTA模型优化 · 量化 · 图执行
Native vLLM
Moreh Libraries
定制内核 · GEMM/Attention/MoE · 通信
AMD Instinct GPU
Tenstorrent芯片
NVIDIA GPU
Why Moreh
Moreh推理软件为您的AI基础设施创造价值的三种方式。
博客文章
查看全部 ›
Moreh Unlocks AMD MI300X Potential: 1.5× Faster DeepSeek R1 Inference vs. SGLang (InferenceMax)
March 16, 2026
Moreh’s optimized inference engine achieves 1.47x improvement in end-to-end latency and throughput per GPU for DeepSeek R1 on AMD MI300X, compared to InferenceMAX baseline.

TIDE: Temporal Incremental Draft Engine for Self-Improving LLM Inference
February 5, 2026
TIDE continuously improves inference speed by training a lightweight draft model in the background, using idle GPUs in the cluster — no extra data preparation or downtime required.

Step3 Inference Optimization on AMD Instinct MI308X: 1.30× Higher Decode Throughput vs. NVIDIA H20
December 29, 2025
Moreh optimized StepFun’s Step3 321B MoE model for AMD Instinct MI308X GPUs, achieving 1.30× higher decode throughput and 23% lower decode latency compared to NVIDIA H20.
生态系统与开源
我们为开源生态系统做贡献,并与领先的芯片厂商合作。



















