Skip to content
  • Product
    • MoAI Inference Framework
    • Moreh vLLM
    • MoAI Training Framework
    • MoAI Platform
  • Solutions

    Infrastructure

    AMD GPU Appliance

    Deploy fully integrated AMD GPU-based cluster systems with scalable RoCE networking

    Tenstorrent Appliance

    Deliver the lowest TCO with inherently scalable, network-integrated chips

    Use Cases

    End-to-End Model Deployment

    Build cost-effective inference endpoints on-premises or in the cloud

    Large Scale Training

    Maximize GPU utilization and reduce training costs at the 1,000+ GPU scale

    Operation

    GPU Virtualization

    Flexible GPU aggregation, decomposition, and scaling with heterogeneous GPU support

    System Reliability

    Automatic GPU failover and diagnostic hardware monitoring

    • AMD GPU Appliance
    • Tenstorrent Appliance
    • End-to-End Model Deployment
    • Large Scale Training
    • GPU Virtualization
    • System Reliability
  • Resources
    • Blog
    • Docs
    • Demo Videos
    • Open Source
  • Career
  • Company
    • About
    • Contact
    • Newsroom

moreh-vllm-performance-evaluation

  • Moreh vLLM Performance Evaluation: Llama 3.3 70B on AMD Instinct MI300X GPUs

    August 30, 2025

    Moreh vLLM achieves 1.68x higher output TPS, 2.02x lower TTFT, and 1.59x lower TPOT compared to the original vLLM for Meta's Llama 3.3 70B model.

  • Moreh vLLM Performance Evaluation: DeepSeek V3/R1 671B on AMD Instinct MI300X GPUs

    August 29, 2025

    Moreh vLLM achieves 1.68x higher output TPS, 1.75x lower TTFT, and 1.70x lower TPOT compared to the original vLLM for the DeepSeek V3/R1 671B model.

Moreh, Inc.

  • Home
  • About
  • Career
  • Contact
  • Docs
  • Blog
  • Newsroom
  • Privacy Policy
  • Terms of Use
  • Home
  • About
  • Career
  • Contact
  • Docs
  • Blog
  • Newsroom

© 2026 Moreh, Inc. All right reserved.

Page load link
Go to Top