Industry's First-to-Market Supermicro NVIDIA HGX™ B200 Systems Demonstrate AI Performance Leadership on MLPerf® Inference v5.0 Results

Rhea-AI Impact

(Low)

Rhea-AI Sentiment

(Very Positive)

Insights

Supermicro's announcement marks a significant competitive advantage in the AI infrastructure market with their first-to-market NVIDIA HGX B200 systems. The 3x performance increase in token generation compared to previous H200-based systems represents a substantial leap forward for large language model inference workloads.

What's particularly impressive is Supermicro's dual approach offering both liquid-cooled and air-cooled solutions that achieved top benchmark positions. This demonstrates exceptional thermal engineering capabilities, especially considering the 1000W TDP of each Blackwell GPU. Their innovations in cooling technology—with newly developed cold plates and 250kW coolant distribution units—effectively address the critical power density challenges that have become bottlenecks in AI data centers.

The MLPerf benchmark results provide credible validation of Supermicro's performance claims, with standout results including 129,047 tokens/second for Mixtral-8x7b and 1,521 tokens/second for the massive Llama3.1-405b model. These metrics suggest Supermicro's systems will be particularly compelling for enterprises running inference workloads at scale.

Supermicro's building block architecture has enabled their rapid time-to-market—a crucial advantage in the fast-moving AI hardware space where being first with next-generation technology typically translates to premium pricing opportunities and customer mindshare. The fact that they're already delivering these systems to customers while conducting benchmarks indicates strong production readiness and supply chain execution.

The performance leadership Supermicro has demonstrated with their B200-based systems represents more than just incremental improvement—it's a step-change that could reshape AI infrastructure economics. When inference performance increases by 3x, organizations can achieve the same workload throughput with fewer systems, potentially reducing total cost of ownership despite higher upfront hardware costs.

Supermicro's engineering achievement is particularly notable in the context of the world's largest language models. Their systems achieved 1,080 tokens/second on Llama3.1-405b (server) benchmarks—models of this scale were previously impractical for real-time inference. This capability opens new possibilities for enterprise AI applications requiring both massive parameter counts and responsive user experiences.

The dual cooling approach (air and liquid) shows strategic market awareness. While liquid cooling delivers maximum performance density, many enterprises still prefer air cooling for simplicity and compatibility with existing infrastructure. By optimizing both solutions to perform comparably "within operating margin," Supermicro addresses the full spectrum of deployment environments.

Their rack-scale redesign with vertical coolant distribution manifolds that no longer consume valuable rack units demonstrates sophisticated system-level thinking. This allows packing up to 96 NVIDIA Blackwell GPUs in a 52U rack—extraordinary compute density that addresses data center space constraints while maximizing AI processing capability per square foot. For organizations building or expanding AI infrastructure, this density advantage translates to tangible real estate and operational savings.

04/03/2025 - 09:05 AM

Latest Benchmarks Show Supermicro Systems with the NVIDIA B200 Outperformed the Previous Generation of Systems with 3X the Token Generation Per Second

SAN JOSE, Calif., April 3, 2025 /PRNewswire/ -- Super Micro Computer, Inc. (SMCI), a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, is announcing first-to-market industry leading performance on several MLPerf Inference v5.0 benchmarks, using the NVIDIA HGX™ B200 8-GPU. The 4U liquid-cooled and 10U air-cooled systems achieved the best performance in select benchmarks. Supermicro demonstrated more than 3 times the tokens per second (Token/s) generation for Llama2-70B and Llama3.1-405B benchmarks compared to H200 8-GPU systems.

"Supermicro remains a leader in the AI industry, as evidenced by the first new benchmarks released by MLCommons in 2025," said Charles Liang, president and CEO of Supermicro. "Our building block architecture enables us to be first-to-market with a diverse range of systems optimized for various workloads. We continue to collaborate closely with NVIDIA to fine-tune our systems and secure a leadership position in AI workloads."

Learn more about the new MLPerf v5.0 Inference benchmarks at: https://mlcommons.org/benchmarks/inference-datacenter/

Supermicro is the only system vendor publishing record MLPerf inference performance (on select benchmarks) for both the air-cooled and liquid-cooled NVIDIA HGX™ B200 8-GPU systems. Both air-cooled and liquid-cooled systems were operational before the MLCommons benchmark start date. Supermicro engineers optimized the systems and software to showcase the impressive performance. Within the operating margin, the Supermicro air-cooled B200 system exhibited the same level of performance as the liquid-cooled B200 system. Supermicro has been delivering these systems to customers while we conducted the benchmarks.

MLCommons emphasizes that all results be reproducible, that the products are available and that the results can be audited by other MLCommons members. Supermicro engineers optimized the systems and software, as allowed by the MLCommons rules.

The SYS-421GE-NBRT-LCC (8x NVIDIA B200-SXM-180GB) and SYS-A21GE-NBRT (8x NVIDIA B200-SXM-180GB) showed performance leadership running the Mixtral 8x7B Inference, Mixture of Experts benchmarks with 129,000 tokens/second. The Supermicro air-cooled and liquid-cooled NVIDIA B200 based system delivered over 1,000 tokens/second inference for the large Llama3.1-405b model, whereas the previous generations of GPU systems have much smaller results. For smaller inferencing tasks, using the LLAMA2-70b benchmark, a Supermicro system with the NVIDIA B200 SXM-180GB installed shows the highest performance from a Tier 1 system supplier.

Specifically:

Stable Diffusion XL (Server)
SYS-A21GE-NBRT (8x B200-SXM-180GB)

#1 queries/s, 28.92
llama2-70b-interactive-99 (Server)
SYS-A21GE-NBRT (8x B200-SXM-180GB)

#1 Tokens/s, 62,265.70
Llama3.1-405b (offline)
SYS-421GE-NBRT-LCC (8xB200-SXM-180GB)

#1 Tokens/s 1521.74
Llama3.1-405b (Server)
SYS-A21GE-NBRT (8x B200-SXNM-180GB)

#1 Tokens/s, 1080.31 (for an 8-GPU node)
mixtral-8x7b (Server)
SYS-421GE-NBRT-LCC (8x B200-SXM-180GB)

#1 Tokens/s, 129,047.00
mixtral-8x7b (Offline)
SYS-421GE-NBRT-LCC (8x B200-SXM-180GB)

#1 Tokens/s, 128,795.00

"MLCommons congratulates Supermicro on their submission to the MLPerf Inference v5.0 benchmark. We are pleased to see their results showcasing significant performance gains compared to earlier generations of systems," said David Kanter, Head of MLPerf at MLCommons. "Customers will be pleased by the performance improvements achieved which are validated by the neutral, representative and reproducible MLPerf results."

Supermicro offers a comprehensive AI portfolio with over 100 GPU-optimized systems, both air-cooled and liquid-cooled options, with a choice of CPUs, ranging from single-socket optimized systems to 8-way multiprocessor systems. Supermicro rack-scale systems include computing, storage, and network components, which reduce the time required to install them once they are delivered to a customer site.

Supermicro's NVIDIA HGX B200 8-GPU systems utilize next-generation liquid-cooling and air-cooling technology. The newly developed cold plates and the new 250kW coolant distribution unit (CDU) more than double the cooling capacity of the previous generation in the same 4U form factor. Available in 42U, 48U, or 52U configurations, the rack-scale design with the new vertical coolant distribution manifolds (CDM) no longer occupies valuable rack units. This enables eight systems, comprising 64 NVIDIA Blackwell GPUs in a 42U rack, and up to 12 systems with 96 NVIDIA Blackwell GPUs in a 52U rack.

The new air-cooled 10U NVIDIA HGX B200 system features a redesigned chassis with expanded thermal headroom to accommodate eight 1000W TDP Blackwell GPUs. Up to 4 of the new 10U air-cooled systems can be installed and fully integrated in a rack, the same density as the previous generation, while providing up to 15x inference and 3x training performance.

About Super Micro Computer, Inc.

Supermicro (NASDAQ: SMCI) is a global leader in Application-Optimized Total IT Solutions. Founded and operating in San Jose, California, Supermicro is committed to delivering first-to-market innovation for Enterprise, Cloud, AI, and 5G Telco/Edge IT Infrastructure. We are a Total IT Solutions provider with server, AI, storage, IoT, switch systems, software, and support services. Supermicro's motherboard, power, and chassis design expertise further enables our development and production, enabling next-generation innovation from cloud to edge for our global customers. Our products are designed and manufactured in-house (in the US, Taiwan, and the Netherlands), leveraging global operations for scale and efficiency and optimized to improve TCO and reduce environmental impact (Green Computing). The award-winning portfolio of Server Building Block Solutions® allows customers to optimize for their exact workload and application by selecting from a broad family of systems built from our flexible and reusable building blocks that support a comprehensive set of form factors, processors, memory, GPUs, storage, networking, power, and cooling solutions (air-conditioned, free air cooling or liquid cooling).

Supermicro, Server Building Block Solutions, and We Keep IT Green are trademarks and/or registered trademarks of Super Micro Computer, Inc.

All other brands, names, and trademarks are the property of their respective owners.

View original content to download multimedia:https://www.prnewswire.com/news-releases/industrys-first-to-market-supermicro-nvidia-hgx-b200-systems-demonstrate-ai-performance-leadership-on-mlperf-inference-v5-0-results-302419115.html

SOURCE Super Micro Computer, Inc.

FAQ

What performance improvements did SMCI achieve in MLPerf Inference v5.0 benchmarks?

SMCI systems achieved 3x higher token generation per second compared to H200 8-GPU systems, with 129,000 tokens/second for Mixtral 8x7B and over 1,000 tokens/second for Llama3.1-405b model.

How many NVIDIA Blackwell GPUs can SMCI's new rack configurations support?

SMCI's racks can support up to 64 NVIDIA Blackwell GPUs in a 42U rack and 96 GPUs in a 52U rack configuration.

What cooling improvements has SMCI introduced for the NVIDIA HGX B200 systems?

SMCI introduced new cold plates and a 250kW coolant distribution unit that doubles the cooling capacity of the previous generation in the same 4U form factor.

What is the performance capability of SMCI's new 10U air-cooled system?

The 10U air-cooled system supports eight 1000W TDP Blackwell GPUs and delivers up to 15x inference and 3x training performance improvement.

Industry's First-to-Market Supermicro NVIDIA HGX™ B200 Systems Demonstrate AI Performance Leadership on MLPerf® Inference v5.0 Results

Insights

FAQ

What performance improvements did SMCI achieve in MLPerf Inference v5.0 benchmarks?

How many NVIDIA Blackwell GPUs can SMCI's new rack configurations support?

What cooling improvements has SMCI introduced for the NVIDIA HGX B200 systems?

What is the performance capability of SMCI's new 10U air-cooled system?

NASDAQ:SMCI

SMCI Rankings

SMCI Latest News

SMCI Stock Data

Trending News

Industry's First-to-Market Supermicro NVIDIA HGX™ B200 Systems Demonstrate AI Performance Leadership on MLPerf® Inference v5.0 Results

Insights

Hardware Technology Analyst positive

AI Infrastructure Strategist positive

FAQ

What performance improvements did SMCI achieve in MLPerf Inference v5.0 benchmarks?

How many NVIDIA Blackwell GPUs can SMCI's new rack configurations support?

What cooling improvements has SMCI introduced for the NVIDIA HGX B200 systems?

What is the performance capability of SMCI's new 10U air-cooled system?

NASDAQ:SMCI

SMCI Rankings

SMCI Latest News

SMCI Stock Data