Moreh vLLM
The fastest way to serve LLMs on AMD GPUs
Drop-in replacement for vLLM with up to 2× higher throughput on AMD Instinct GPUs. Same API, same model formats — just faster. Deploy in minutes with a single Docker image.
Benchmarks
Proven Performance Across Models
DeepSeek R1 671B · 8× AMD Instinct MI300X
Output tokens/s normalized to ROCm vLLM, across input lengths, output lengths, and concurrency levels.
Measured using vLLM’s benchmark_serving tool.
More evaluation reports
Getting Started
Preset-Based Deployment
Moreh vLLM ships with optimized presets for popular models and hardware configurations. Pick a preset, point to your model, and serve — parallelism, memory, and kernel settings are handled automatically.
Example Deployments
$ docker run --device /dev/kfd --device /dev/dri \
--network host -v /models:/models \
moreh/moreh-vllm:latest \
serve.sh /models/DeepSeek-R1 \
presets/deepseek-ai-deepseek-r1-amd-mi300x-dp8-moe-ep8.yamlUnder the Hood
Why It’s Faster
Moreh vLLM replaces the compute backend with engines purpose-built for AMD GPU architecture.
Custom Libraries for AMD GPUs
Compute libraries — including GEMM, attention, MoE, and fused operations — built specifically for AMD GPU architecture.
Model Optimization
Techniques such as operation fusion, graph-level execution, and quantization to run each model as efficiently as possible.
Multi-GPU Scaling
Communication/compute overlap, EP load balancing, and other optimizations to scale across GPUs within a server.
Supported Models
Optimized for popular open-source LLMs, including:
Supported Hardware
Running a proprietary model?
Moreh provides on-demand vLLM optimization for your private and fine-tuned models on AMD GPUs. We build a custom Moreh vLLM tailored to your model architecture, so you get the same performance gains without any extra work on your side.
We’ve done this for customers including StepFun (Step3 321B on MI308X, 1.30× higher decode throughput vs. NVIDIA H20) and a major Korean telco (7.8B affiliate model on MI300X, 1.38× higher serving capacity vs. NVIDIA H100).
Contact us ›