Blog

21K Output Tokens Per Second DeepSeek Inference on AMD Instinct MI300X GPUs with Expert Parallelism
November 13, 2025

Moreh demonstrated that DeepSeek-R1 inference can be executed at a decoding throughput of >21,000 tokens/sec by implementing EP on the ROCm software stack.
Runtime Draft Model Training: Adapting Speculative Decoding to Real-World Workloads
November 10, 2025

TIDE provides a method to optimize inference computation on newer GPUs by utilizing older or idle GPUs for runtime draft model training, resulting in better overall cost-performance at the system level.
Distributed Inference on Heterogeneous Accelerators Including GPUs, Rubin CPX, and AI Accelerators
September 23, 2025

MoAI Inference Framework supports automatic and efficient distributed inference on heterogeneous accelerators such as AMD MI300X + MI308X and NVIDIA Rubin CPX + GPU.
Moreh vLLM Performance Evaluation: Llama 3.3 70B on AMD Instinct MI300X GPUs
August 30, 2025

Moreh vLLM achieves 1.68x higher output TPS, 2.02x lower TTFT, and 1.59x lower TPOT compared to the original vLLM for Meta's Llama 3.3 70B model.
Moreh vLLM Performance Evaluation: DeepSeek V3/R1 671B on AMD Instinct MI300X GPUs
August 29, 2025

Moreh vLLM achieves 1.68x higher output TPS, 1.75x lower TTFT, and 1.70x lower TPOT compared to the original vLLM for the DeepSeek V3/R1 671B model.
DeepSeek V3 and R1 on MoAI: 1. Fine-Tuning on AMD GPU Clusters
February 20, 2025

MoAI provides a PyTorch-compatible environment that makes LLM fine-tuning on hundreds of AMD GPUs super easy, including DeepSeek 671B MoE.
Introducing Motif: A High-Performance Open-Source Korean LLM by Moreh
December 2, 2024

Moreh announces the release of Motif, a high-performance 102B Korean language model (LLM), which will be made available as an open-source model.
Fine-tuning Llama 3.1 405B on AMD GPUs
September 3, 2024

There are no barriers to fine-tune Llama 3.1 405B on the MoAI platform. The Moreh team has actually demonstrated fine-tuning on the model with 192 AMD GPUs.
GPU Virtualization in the MoAI Platform
August 19, 2024

The MoAI platform provides comprehensive GPU virtualization including fine-grained resource allocation, multi-GPU scaling, and heterogeneous GPU support.
Training 221B Parameter Korean LLM on 1,200 AMD MI250 GPU Cluster
August 14, 2023

Moreh trained a largest-ever Korean LLM with 221B parameters on top of the MoAI platform and an 1,200 AMD MI250 cluster system.