CoreWeave Delivers Leading Inference Performance in MLPerf® Benchmark

Rhea-AI Impact

(Neutral)

Rhea-AI Sentiment

(Neutral)

Key Terms

mlperf technical

A standardized set of performance tests for machine learning systems that measures how fast and efficiently hardware and software can train and run AI models. Like a car mileage test for computers, MLPerf lets investors compare different vendors on speed, energy use and cost for AI workloads, helping predict which technologies will be more competitive, scalable and profitable as demand for AI grows.

inference technical

Inference is the process of drawing a conclusion from available evidence or data, like a detective piecing together clues to form a likely story. For investors it matters because these judgments turn raw reports, test results, or market signals into expectations about future performance, risk, or regulatory outcomes—so how someone infers from the same facts can change investment decisions and valuation.

mixture-of-experts technical

A mixture-of-experts is a computer model design that uses a group of smaller specialist models, with a controller that picks which specialist(s) handle each task—like sending a question to the right expert on a team. For investors, this matters because it can deliver faster, cheaper, or more accurate AI features without building one giant model, affecting a company’s product performance, costs, competitive edge, and regulatory or safety risks tied to how the system is managed.

tokens per second technical

Tokens per second measures how many discrete pieces of data — such as words or data units used by an algorithm, or requests handled by a system — are processed or produced each second. For investors it signals the speed and capacity of a technology: higher rates mean faster analysis, lower delays in trading or real‑time services, and better scalability, much like more cars moving per minute on a highway improves traffic flow.

04/01/2026 - 11:00 AM

Latest submissions featuring NVIDIA Grace Blackwell architectures demonstrate how CoreWeave’s purpose-built AI infrastructure translates raw compute into industry-leading inference performance

LIVINGSTON, N.J.--(BUSINESS WIRE)-- CoreWeave, Inc. (Nasdaq: CRWV), The Essential Cloud for AI™, today announced landmark results in the MLPerf® Inference v6.0 benchmark suite. Participating in the Datacenter Closed division, CoreWeave leveraged NVIDIA’s newest AI infrastructure, the NVIDIA GB200 NVL72 and NVIDIA GB300 NVL72.

CoreWeave leads MLPerf v6.0, doubling performance and delivering top results.

The AI industry is undergoing a fundamental shift with inference as the new critical focus. As enterprises move AI from experimentation into production and agentic workloads become the new standard, inference has emerged as the critical measure of performance. At the same time, demand for inference is growing faster than the underlying hardware can be deployed, and the gap between theoretical system performance and real-world output has emerged as a defining constraint on how quickly AI companies can grow. CoreWeave's MLPerf v6.0 results reflect the company's continued investment in full-stack optimization, consistently turning cutting-edge hardware into real-world inference performance.

"Inference is the defining layer in AI. It's where models are actually put to work and where performance in production shows up. Benchmarks like MLPerf help measure how theoretical performance translates into real-world output," said Peter Salanki, co-founder and chief technology officer of CoreWeave. "These latest results reflect our ability to deliver exceptional performance for the most demanding frontier reasoning models at scale through full-stack optimization. That's why customers rely on CoreWeave to launch, scale, and operate AI workloads in production, where real-world value is created and where it matters most."

CoreWeave’s v6.0 submissions reflected NVIDIA’s reference configurations as a verified, production-ready baseline across two of the most demanding reasoning models available: DeepSeek-R1 and GPT-OSS-120B. Key results include:

Continued NVIDIA GB200 NVL72 Leadership: Led performance for DeepSeek-R1 in server and offline mode in tokens per second per GPU¹. The configuration of GB200 NVL72 demonstrated standout throughput on DeepSeek-R1’s sparse Mixture-of-Experts architecture, where efficient serving requires dynamic expert routing and high-bandwidth internode communication.
NVIDIA GB300 NVL72 Portfolio Leadership: Delivered high server throughput measured in tokens per second per GPU and per-GPU efficiency in the portfolio on DeepSeek-R1, 2X CoreWeave’s own MLPerf® 5.1 results on the same hardware footprint².
Innovation at Speed: Today, eight of the leading 10 model providers rely on CoreWeave Cloud, enabling customers to innovate at speed.

"The gap between benchmark performance and production reality has been one of the most persistent challenges in AI,” said Nick Patience, vice president & practice lead, AI platforms at Futurum Research. “CoreWeave's MLPerf v6.0 results, particularly on DeepSeek-R1, demonstrate the company is closing that gap through disciplined, full-stack optimization, which is exactly what enterprises and AI labs need as inference workloads move from experimental to mission-critical."

CoreWeave’s MLPerf v6.0 results provide additional validation as the only AI cloud to earn top Platinum ranking in both SemiAnalysis ClusterMAX™ 1.0 and 2.0, which evaluate AI cloud performance, efficiency and reliability. These benchmark results reflect CoreWeave’s platform strategy: delivering infrastructure purpose-built for the demands of production AI, from high-performance compute through the software layer that builders depend on to develop, test, and deploy at scale.

About CoreWeave
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to move at the pace of innovation, building and scaling AI with confidence. Established in 2017, CoreWeave completed its public listing on Nasdaq (CRWV) in March 2025. Learn more at www.coreweave.com.

¹CoreWeave MLPerf 6.0-0022 server and offline mode. TPS/GPU is not an official MLPerf metric. Used in this article to normalize submissions that use different numbers of GPUs

²Verified MLPerf score of v5.1 Inference Closed DeepSeek R1 server. Retrieved from https://mlcommons.org/benchmarks/inference, 2 April 2025, entry 5.1-0097. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

View source version on businesswire.com: https://www.businesswire.com/news/home/20260401967118/en/

press@coreweave.com

Source: CoreWeave, Inc.

CoreWeave Delivers Leading Inference Performance in MLPerf® Benchmark

Key Terms

Related Articles