Solution
Turnkey Inference-Optimized AMD GPU Clusters
Moreh delivers AMD Instinct GPU clusters with full-stack software optimization built in — from custom kernels to cluster-level orchestration — so your infrastructure is production-ready from day one.
What We Deliver
GPU
AMD Instinct Accelerators
Across every generation, AMD Instinct GPUs match or exceed NVIDIA in compute, memory, and bandwidth. Intra-node GPU communication is handled by Infinity Fabric, AMD's equivalent to NVIDIA's NVLink.
MI300X vs H100
FP16 Performance
Memory
Memory Bandwidth
MI325X vs H200
FP16 Performance
Memory
Memory Bandwidth
MI355X vs B200
FP16 Performance
Memory
Memory Bandwidth
Networking
RoCE Cluster Networking
Moreh designs RoCE (RDMA over Converged Ethernet) network topologies optimized for your workload and cluster size, with software-level communication optimizations.
- •RoCE network topology designed per cluster size and workload
- •Software-level optimizations to minimize communication overhead
Platform
Kubernetes-Based Cluster Platform
Every cluster ships with a production-ready Kubernetes platform built from open-source components — so your team can focus on models, not infrastructure.
- •Kubernetes orchestration with GPU-aware scheduling
- •Ceph distributed storage for model weights and checkpoints
- •Monitoring and logging (Prometheus, Grafana, Loki)
- •Authentication and access control (LDAP, Keycloak)
- •AI job management and scheduling (SkyPilot)
Software
Moreh Inference Software
Every cluster comes with Moreh's full-stack inference software, purpose-built for AMD GPUs and production-ready from day one.
Moreh vLLM
Single-Node Inference Engine
- •Drop-in replacement for vLLM with OpenAI-compatible API
- •Best-in-class throughput and latency on AMD GPUs
- •Delivered as container images, regularly updated
MoAI Inference Framework
Cluster-Scale Orchestration
- •Scale from a single node to full cluster deployment
- •Prefill-Decode disaggregation, smart routing, auto scaling, and SLO-driven optimization
- •OpenAI-compatible API endpoint for the entire cluster
Heterogeneous GPU
Already have NVIDIA GPUs?
You don't have to replace your existing infrastructure. Add AMD GPU nodes to your NVIDIA cluster and run them as a single, unified inference endpoint. MoAI Inference Framework handles cross-vendor orchestration — routing each request to the right accelerator automatically.
Learn about heterogeneous GPU clusters ›Why AMD
Why AMD GPU Clusters
AMD Instinct GPUs offer a compelling alternative to NVIDIA — and Moreh's software ensures you capture every bit of that advantage.
More Memory, Bigger Models
MI325X offers 256 GB per GPU — 1.8× more than H200. Serve larger models per node, or fit the same model with fewer GPUs.
Competitive TCO
AMD Instinct GPUs deliver comparable or better compute per dollar. Combined with Moreh's software optimization, you get more throughput from the same investment.
No Performance Compromise
Moreh's full-stack software closes the ecosystem gap. Custom kernels, cluster-level orchestration, and production-grade tooling ensure AMD GPUs perform at their full potential.
End-to-End Support
Hardware Supply
We source and supply AMD Instinct GPUs and servers, handling procurement so you don't have to.
Cluster Construction
We design and build the cluster — from rack layout and power planning to networking topology.
Software Deployment
Moreh vLLM and MoAI Inference Framework are deployed and optimized for your specific workloads.
Technical Support
Ongoing support for AMD GPU-specific issues, performance tuning, and software updates.
Ready to deploy AMD GPU inference at scale?
From a single node to a full cluster — we handle the hardware, software, and everything in between.
Contact Sales