Resources

博客

Technical ReportMarch 18, 2026

跨供应商 Disaggregated 推理：在 NVIDIA H100 和 AMD MI300X GPU 上运行 GPT-OSS-120B

MoAI Inference Framework 实现跨供应商 disaggregation，将 H100 用于 prefill、MI300X 用于 decode，与单一供应商集群相比延迟降低最多 43%，吞吐量提升最多 67%。

Technical ReportMarch 17, 2026

多节点 Disaggregated 推理：在 AMD Instinct MI300X GPU 上运行 DeepSeek R1 671B

在 5 节点 AMD Instinct MI300X 集群上使用 MoAI Inference Framework 对 DeepSeek R1 671B 进行 prefill-decode disaggregation 基准测试，实现最高 1.84 倍端到端延迟改善和 23.85 倍 P99 inter-token latency 降低。

BlogMarch 16, 2026

Moreh 释放 AMD MI300X 潜力：DeepSeek R1 推理速度比 SGLang (InferenceMAX) 快 1.5 倍

我们使用自研优化推理引擎运行 InferenceMAX 基准测试，在相同的 AMD MI300X 硬件上实现了端到端延迟和每 GPU 吞吐量 1.47 倍的提升（几何平均值），证明软件优化是释放 AMD GPU 全部潜力的关键。

Technical ReportFebruary 5, 2026

TIDE：用于自改进 LLM 推理的时序增量 Draft 引擎

TIDE 通过在后台利用集群中的空闲 GPU 训练轻量级 draft 模型，持续提升推理速度——无需额外的数据准备或停机时间。

Technical ReportJanuary 30, 2026

HetCCL：利用异构GPU加速LLM训练

介绍HetCCL，首个跨供应商集合通信库，无需修改驱动程序即可实现NVIDIA和AMD GPU之间基于RDMA的集合通信。

Customer CaseDecember 29, 2025

Step3 推理优化：AMD Instinct MI308X 的 Decode 吞吐量比 NVIDIA H20 高 1.30 倍

Moreh vLLM 在 AMD Instinct MI308X 上针对 StepFun Step3（321B MoE）进行了优化，通过自定义 HIP attention kernel、CUDA graph 和混合精度量化，decode 吞吐量达到 4,082 tok/s，比 NVIDIA H20 高出 1.30 倍。

BlogDecember 26, 2025

在多个旧一代GPU节点上优化长上下文Prefill

SLOPE是一个专用的prefill引擎，它在多节点GPU集群上应用context parallelism技术（Ulysses + Ring Attention），以实现面向SLO的长上下文输入优化。

Customer CaseNovember 25, 2025

电信运营商 LLM 推理优化：AMD MI300X 服务容量提升 1.38 倍

Moreh 为一家韩国电信运营商的关联公司开发的 7.8B LLM 在 AMD MI300X 上进行了优化，SLO 合规服务容量比 NVIDIA H100 高 1.38 倍。

Technical ReportNovember 18, 2025

Moreh-Tenstorrent AI 数据中心解决方案系统架构

Moreh将Tenstorrent的轻量级可扩展硬件与我们专有的软件栈相结合，为大规模AI数据中心提供高效灵活的解决方案。

Technical ReportNovember 13, 2025

在AMD Instinct MI300X GPU上通过Expert Parallelism实现每秒21K输出token的DeepSeek推理

AMD软件合作伙伴Moreh通过在ROCm软件栈上实现Expert Parallelism，在配备8x AMD Instinct MI300X GPU的服务器上实现了超过21,000 tokens/sec的DeepSeek-R1解码吞吐量。

BlogNovember 10, 2025

运行时 Draft Model 训练：将 Speculative Decoding 适配到实际工作负载

TIDE 通过运行时 draft model 训练自动提升 speculative decoding 性能，在韩语对话工作负载上相对于静态预训练 draft model 实现了 1.14× 至 1.35× 的输出 token 吞吐量加速。

BlogSeptember 23, 2025

在包括GPU、Rubin CPX和AI Accelerator在内的异构Accelerator上进行分布式推理

了解分布式推理为何成为AI数据中心的核心挑战，以及MoAI Inference Framework如何在包括GPU、Rubin CPX和AI accelerator在内的异构accelerator上实现自动化分布式推理。

Technical ReportAugust 30, 2025

Moreh vLLM 性能评估：Llama 3.3 70B 在 AMD Instinct MI300X GPU 上的表现

对于 Meta 的 Llama 3.3 70B 模型，Moreh vLLM 与原版 vLLM 相比实现了 1.68 倍的 output TPS、2.02 倍更低的 TTFT 和 1.59 倍更低的 TPOT。

Technical ReportAugust 29, 2025

AMD Instinct MI300X GPU 上的 Moreh vLLM 性能评估：DeepSeek V3/R1 671B

Moreh vLLM 在 AMD MI300X GPU 上优化 DeepSeek V3/R1 671B 模型的推理性能，与原版 vLLM 相比，平均吞吐量提高 1.68 倍，延迟最高降低 1.75 倍。

BlogFebruary 20, 2025

DeepSeek V3 and R1 on MoAI: 1. Fine-Tuning on AMD GPU Clusters

MoAI provides a PyTorch-compatible environment that makes LLM fine-tuning on hundreds of AMD GPUs super easy, including DeepSeek 671B MoE.

BlogDecember 2, 2024

Introducing Motif: A High-Performance Open-Source Korean LLM by Moreh

Moreh announces the release of Motif, a high-performance 102B Korean language model (LLM), which will be made available as an open-source model.

BlogSeptember 3, 2024

Fine-tuning Llama 3.1 405B on AMD GPUs

There are no barriers to fine-tune Llama 3.1 405B on the MoAI platform. The Moreh team has actually demonstrated fine-tuning on the model with 192 AMD GPUs.

BlogAugust 19, 2024

GPU Virtualization in the MoAI Platform

The MoAI platform provides comprehensive GPU virtualization including fine-grained resource allocation, multi-GPU scaling, and heterogeneous GPU support.

BlogAugust 14, 2023

Training 221B Parameter Korean LLM on 1,200 AMD MI250 GPU Cluster

Moreh trained a largest-ever Korean LLM with 221B parameters on top of the MoAI platform and an 1,200 AMD MI250 cluster system.

BlogNovember 11, 2022

KT’s Success Stories in AI Cloud Service and Large AI Model Training on AMD Instinct MI250 and Moreh AI Platform

KT has collaborated with Moreh and AMD to overcome the challenges in public cloud services and in-house AI model development.