Solution

Turnkey Inference-Optimized AMD GPU Clusters

Moreh delivers AMD Instinct GPU clusters with full-stack software optimization built in — from custom kernels to cluster-level orchestration — so your infrastructure is production-ready from day one.

2,000+AMD GPUs deployed in customers' AI data centers

What We Deliver

GPU

AMD Instinct Accelerators

Across every generation, AMD Instinct GPUs match or exceed NVIDIA in compute, memory, and bandwidth. Intra-node GPU communication is handled by Infinity Fabric, AMD's equivalent to NVIDIA's NVLink.

MI300X vs H100

AMDNVIDIA

FP16 Performance

1,307 TFLOPS
989 TFLOPS

Memory

192 GB
80 GB

Memory Bandwidth

5.3 TB/s
3.35 TB/s

MI325X vs H200

AMDNVIDIA

FP16 Performance

1,307 TFLOPS
989 TFLOPS

Memory

256 GB
141 GB

Memory Bandwidth

6 TB/s
4.8 TB/s

MI355X vs B200

AMDNVIDIA

FP16 Performance

2,500 TFLOPS
2,250 TFLOPS

Memory

288 GB
180 GB

Memory Bandwidth

8 TB/s
7.7 TB/s

Networking

RoCE Cluster Networking

Moreh designs RoCE (RDMA over Converged Ethernet) network topologies optimized for your workload and cluster size, with software-level communication optimizations.

  • RoCE network topology designed per cluster size and workload
  • Software-level optimizations to minimize communication overhead

Platform

Kubernetes-Based Cluster Platform

Every cluster ships with a production-ready Kubernetes platform built from open-source components — so your team can focus on models, not infrastructure.

  • Kubernetes orchestration with GPU-aware scheduling
  • Ceph distributed storage for model weights and checkpoints
  • Monitoring and logging (Prometheus, Grafana, Loki)
  • Authentication and access control (LDAP, Keycloak)
  • AI job management and scheduling (SkyPilot)

Software

Moreh Inference Software

Every cluster comes with Moreh's full-stack inference software, purpose-built for AMD GPUs and production-ready from day one.

Moreh vLLM

Single-Node Inference Engine

  • Drop-in replacement for vLLM with OpenAI-compatible API
  • Best-in-class throughput and latency on AMD GPUs
  • Delivered as container images, regularly updated
Learn more

MoAI Inference Framework

Cluster-Scale Orchestration

  • Scale from a single node to full cluster deployment
  • Prefill-Decode disaggregation, smart routing, auto scaling, and SLO-driven optimization
  • OpenAI-compatible API endpoint for the entire cluster
Learn more

Heterogeneous GPU

Already have NVIDIA GPUs?

You don't have to replace your existing infrastructure. Add AMD GPU nodes to your NVIDIA cluster and run them as a single, unified inference endpoint. MoAI Inference Framework handles cross-vendor orchestration — routing each request to the right accelerator automatically.

Learn about heterogeneous GPU clusters
1.7×throughput with cross-vendor PD disaggregation
1unified API endpoint across all GPU vendors
0application changes required

Why AMD

Why AMD GPU Clusters

AMD Instinct GPUs offer a compelling alternative to NVIDIA — and Moreh's software ensures you capture every bit of that advantage.

More Memory, Bigger Models

MI325X offers 256 GB per GPU — 1.8× more than H200. Serve larger models per node, or fit the same model with fewer GPUs.

Competitive TCO

AMD Instinct GPUs deliver comparable or better compute per dollar. Combined with Moreh's software optimization, you get more throughput from the same investment.

No Performance Compromise

Moreh's full-stack software closes the ecosystem gap. Custom kernels, cluster-level orchestration, and production-grade tooling ensure AMD GPUs perform at their full potential.

End-to-End Support

01

Hardware Supply

We source and supply AMD Instinct GPUs and servers, handling procurement so you don't have to.

02

Cluster Construction

We design and build the cluster — from rack layout and power planning to networking topology.

03

Software Deployment

Moreh vLLM and MoAI Inference Framework are deployed and optimized for your specific workloads.

04

Technical Support

Ongoing support for AMD GPU-specific issues, performance tuning, and software updates.

Ready to deploy AMD GPU inference at scale?

From a single node to a full cluster — we handle the hardware, software, and everything in between.

Contact Sales