STOCK TITAN

Industry's First-to-Market Supermicro NVIDIA HGX™ B200 Systems Demonstrate AI Performance Leadership on MLPerf® Inference v5.0 Results

Rhea-AI Impact
(Low)
Rhea-AI Sentiment
(Very Positive)
Tags
AI

Super Micro Computer (SMCI) has achieved industry-leading performance in MLPerf Inference v5.0 benchmarks using NVIDIA HGX B200 8-GPU systems. The company's 4U liquid-cooled and 10U air-cooled systems demonstrated over 3x token generation per second compared to H200 8-GPU systems for Llama2-70B and Llama3.1-405B benchmarks.

Key performance highlights include:

  • 129,000 tokens/second for Mixtral 8x7B Inference
  • Over 1,000 tokens/second for Llama3.1-405b model
  • 62,265.70 Tokens/s for llama2-70b-interactive-99

The company offers over 100 GPU-optimized systems with both cooling options. The new liquid-cooling technology features enhanced cold plates and a 250kW coolant distribution unit, doubling previous generation cooling capacity. The air-cooled 10U system accommodates eight 1000W TDP Blackwell GPUs, delivering up to 15x inference and 3x training performance.

Super Micro Computer (SMCI) ha raggiunto prestazioni leader del settore nei benchmark MLPerf Inference v5.0 utilizzando sistemi NVIDIA HGX B200 con 8 GPU. I sistemi a liquido da 4U e quelli ad aria da 10U dell'azienda hanno dimostrato oltre 3 volte la generazione di token al secondo rispetto ai sistemi H200 con 8 GPU per i benchmark Llama2-70B e Llama3.1-405B.

I principali punti salienti delle prestazioni includono:

  • 129.000 token al secondo per l'inferenza Mixtral 8x7B
  • Oltre 1.000 token al secondo per il modello Llama3.1-405b
  • 62.265,70 Token/s per llama2-70b-interactive-99

L'azienda offre oltre 100 sistemi ottimizzati per GPU con entrambe le opzioni di raffreddamento. La nuova tecnologia di raffreddamento a liquido presenta piastre fredde migliorate e un'unità di distribuzione del refrigerante da 250kW, raddoppiando la capacità di raffreddamento della generazione precedente. Il sistema ad aria da 10U ospita otto GPU Blackwell da 1000W TDP, offrendo fino a 15 volte le prestazioni di inferenza e 3 volte le prestazioni di addestramento.

Super Micro Computer (SMCI) ha logrado un rendimiento líder en la industria en los benchmarks MLPerf Inference v5.0 utilizando sistemas NVIDIA HGX B200 con 8 GPU. Los sistemas de refrigeración líquida de 4U y los sistemas de refrigeración por aire de 10U de la compañía demostraron más de 3 veces la generación de tokens por segundo en comparación con los sistemas H200 de 8 GPU para los benchmarks Llama2-70B y Llama3.1-405B.

Los aspectos más destacados del rendimiento incluyen:

  • 129,000 tokens/segundo para la inferencia Mixtral 8x7B
  • Más de 1,000 tokens/segundo para el modelo Llama3.1-405b
  • 62,265.70 Tokens/s para llama2-70b-interactive-99

La empresa ofrece más de 100 sistemas optimizados para GPU con ambas opciones de refrigeración. La nueva tecnología de refrigeración líquida presenta placas frías mejoradas y una unidad de distribución de refrigerante de 250kW, duplicando la capacidad de refrigeración de la generación anterior. El sistema de aire de 10U acomoda ocho GPU Blackwell de 1000W TDP, ofreciendo hasta 15 veces el rendimiento de inferencia y 3 veces el rendimiento de entrenamiento.

슈퍼 마이크로 컴퓨터 (SMCI)는 NVIDIA HGX B200 8-GPU 시스템을 사용하여 MLPerf Inference v5.0 벤치마크에서 업계 최고의 성능을 달성했습니다. 회사의 4U 액체 냉각 시스템과 10U 공기 냉각 시스템은 Llama2-70B 및 Llama3.1-405B 벤치마크에 대해 H200 8-GPU 시스템과 비교하여 초당 3배 이상의 토큰 생성을 보여주었습니다.

주요 성능 하이라이트는 다음과 같습니다:

  • Mixtral 8x7B 추론을 위한 초당 129,000 토큰
  • Llama3.1-405b 모델을 위한 초당 1,000 토큰 이상
  • llama2-70b-interactive-99의 초당 62,265.70 토큰

회사는 두 가지 냉각 옵션이 있는 100개 이상의 GPU 최적화 시스템을 제공합니다. 새로운 액체 냉각 기술은 개선된 냉각판과 250kW 냉각수 분배 장치를 특징으로 하여 이전 세대의 냉각 용량을 두 배로 늘립니다. 공기 냉각 10U 시스템은 1000W TDP 블랙웰 GPU 8개를 수용하여 최대 15배의 추론 성능과 3배의 훈련 성능을 제공합니다.

Super Micro Computer (SMCI) a atteint des performances de premier plan dans l'industrie lors des benchmarks MLPerf Inference v5.0 en utilisant des systèmes NVIDIA HGX B200 avec 8 GPU. Les systèmes refroidis par liquide en 4U et les systèmes refroidis par air en 10U de l'entreprise ont montré plus de 3 fois la génération de tokens par seconde par rapport aux systèmes H200 à 8 GPU pour les benchmarks Llama2-70B et Llama3.1-405B.

Les points forts des performances incluent:

  • 129 000 tokens/seconde pour l'inférence Mixtral 8x7B
  • Plus de 1 000 tokens/seconde pour le modèle Llama3.1-405b
  • 62 265,70 Tokens/s pour llama2-70b-interactive-99

L'entreprise propose plus de 100 systèmes optimisés pour GPU avec les deux options de refroidissement. La nouvelle technologie de refroidissement liquide présente des plaques froides améliorées et une unité de distribution de liquide de 250 kW, doublant la capacité de refroidissement de la génération précédente. Le système refroidi par air de 10U peut accueillir huit GPU Blackwell de 1000W TDP, offrant jusqu'à 15 fois les performances d'inférence et 3 fois les performances d'entraînement.

Super Micro Computer (SMCI) hat branchenführende Leistungen bei den MLPerf Inference v5.0 Benchmarks mit NVIDIA HGX B200 8-GPU-Systemen erzielt. Die 4U flüssigkeitsgekühlten und 10U luftgekühlten Systeme des Unternehmens zeigten über 3-fache Token-Generierung pro Sekunde im Vergleich zu H200 8-GPU-Systemen für die Benchmarks Llama2-70B und Llama3.1-405B.

Wichtige Leistungsmerkmale umfassen:

  • 129.000 Tokens/Sekunde für Mixtral 8x7B Inference
  • Über 1.000 Tokens/Sekunde für das Llama3.1-405b Modell
  • 62.265,70 Tokens/s für llama2-70b-interactive-99

Das Unternehmen bietet über 100 GPU-optimierte Systeme mit beiden Kühloptionen an. Die neue Flüssigkeitskühltechnologie verfügt über verbesserte Kälteplatten und eine 250kW Kühlmittelverteilereinheit, die die Kühlkapazität der vorherigen Generation verdoppelt. Das luftgekühlte 10U-System fasst acht 1000W TDP Blackwell GPUs und bietet bis zu 15-fache Inferenz- und 3-fache Trainingsleistung.

Positive
  • Achieved 3x performance improvement in token generation compared to previous generation
  • First-to-market with both air-cooled and liquid-cooled NVIDIA HGX B200 systems
  • Doubled cooling capacity with new technology
  • 15x inference and 3x training performance improvement in air-cooled systems
Negative
  • None.

Insights

Supermicro's announcement marks a significant competitive advantage in the AI infrastructure market with their first-to-market NVIDIA HGX B200 systems. The 3x performance increase in token generation compared to previous H200-based systems represents a substantial leap forward for large language model inference workloads.

What's particularly impressive is Supermicro's dual approach offering both liquid-cooled and air-cooled solutions that achieved top benchmark positions. This demonstrates exceptional thermal engineering capabilities, especially considering the 1000W TDP of each Blackwell GPU. Their innovations in cooling technology—with newly developed cold plates and 250kW coolant distribution units—effectively address the critical power density challenges that have become bottlenecks in AI data centers.

The MLPerf benchmark results provide credible validation of Supermicro's performance claims, with standout results including 129,047 tokens/second for Mixtral-8x7b and 1,521 tokens/second for the massive Llama3.1-405b model. These metrics suggest Supermicro's systems will be particularly compelling for enterprises running inference workloads at scale.

Supermicro's building block architecture has enabled their rapid time-to-market—a crucial advantage in the fast-moving AI hardware space where being first with next-generation technology typically translates to premium pricing opportunities and customer mindshare. The fact that they're already delivering these systems to customers while conducting benchmarks indicates strong production readiness and supply chain execution.

The performance leadership Supermicro has demonstrated with their B200-based systems represents more than just incremental improvement—it's a step-change that could reshape AI infrastructure economics. When inference performance increases by 3x, organizations can achieve the same workload throughput with fewer systems, potentially reducing total cost of ownership despite higher upfront hardware costs.

Supermicro's engineering achievement is particularly notable in the context of the world's largest language models. Their systems achieved 1,080 tokens/second on Llama3.1-405b (server) benchmarks—models of this scale were previously impractical for real-time inference. This capability opens new possibilities for enterprise AI applications requiring both massive parameter counts and responsive user experiences.

The dual cooling approach (air and liquid) shows strategic market awareness. While liquid cooling delivers maximum performance density, many enterprises still prefer air cooling for simplicity and compatibility with existing infrastructure. By optimizing both solutions to perform comparably "within operating margin," Supermicro addresses the full spectrum of deployment environments.

Their rack-scale redesign with vertical coolant distribution manifolds that no longer consume valuable rack units demonstrates sophisticated system-level thinking. This allows packing up to 96 NVIDIA Blackwell GPUs in a 52U rack—extraordinary compute density that addresses data center space constraints while maximizing AI processing capability per square foot. For organizations building or expanding AI infrastructure, this density advantage translates to tangible real estate and operational savings.

Latest Benchmarks Show Supermicro Systems with the NVIDIA B200 Outperformed the Previous Generation of Systems with 3X the Token Generation Per Second

SAN JOSE, Calif., April 3, 2025 /PRNewswire/ -- Super Micro Computer, Inc. (SMCI), a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, is announcing first-to-market industry leading performance on several MLPerf Inference v5.0 benchmarks, using the NVIDIA HGX™ B200 8-GPU. The 4U liquid-cooled and 10U air-cooled systems achieved the best performance in select benchmarks. Supermicro demonstrated more than 3 times the tokens per second (Token/s) generation for Llama2-70B and Llama3.1-405B benchmarks compared to H200 8-GPU systems.

"Supermicro remains a leader in the AI industry, as evidenced by the first new benchmarks released by MLCommons in 2025," said Charles Liang, president and CEO of Supermicro. "Our building block architecture enables us to be first-to-market with a diverse range of systems optimized for various workloads. We continue to collaborate closely with NVIDIA to fine-tune our systems and secure a leadership position in AI workloads."

Learn more about the new MLPerf v5.0 Inference benchmarks at: https://mlcommons.org/benchmarks/inference-datacenter/

Supermicro is the only system vendor publishing record MLPerf inference performance (on select benchmarks) for both the air-cooled and liquid-cooled NVIDIA HGX™ B200 8-GPU systems. Both air-cooled and liquid-cooled systems were operational before the MLCommons benchmark start date. Supermicro engineers optimized the systems and software to showcase the impressive performance. Within the operating margin, the Supermicro air-cooled B200 system exhibited the same level of performance as the liquid-cooled B200 system. Supermicro has been delivering these systems to customers while we conducted the benchmarks.

MLCommons emphasizes that all results be reproducible, that the products are available and that the results can be audited by other MLCommons members. Supermicro engineers optimized the systems and software, as allowed by the MLCommons rules.

The SYS-421GE-NBRT-LCC (8x NVIDIA B200-SXM-180GB) and SYS-A21GE-NBRT (8x NVIDIA B200-SXM-180GB) showed performance leadership running the Mixtral 8x7B Inference, Mixture of Experts benchmarks with 129,000 tokens/second. The Supermicro air-cooled and liquid-cooled NVIDIA B200 based system delivered over 1,000 tokens/second inference for the large Llama3.1-405b model, whereas the previous generations of GPU systems have much smaller results. For smaller inferencing tasks, using the LLAMA2-70b benchmark, a Supermicro system with the NVIDIA B200 SXM-180GB installed shows the highest performance from a Tier 1 system supplier.

Specifically:

  • Stable Diffusion XL (Server)
    SYS-A21GE-NBRT (8x B200-SXM-180GB)

    #1 queries/s, 28.92

  • llama2-70b-interactive-99 (Server)
    SYS-A21GE-NBRT (8x B200-SXM-180GB)

    #1 Tokens/s, 62,265.70

  • Llama3.1-405b (offline)
    SYS-421GE-NBRT-LCC (8xB200-SXM-180GB)

    #1 Tokens/s 1521.74

  • Llama3.1-405b (Server)
    SYS-A21GE-NBRT (8x B200-SXNM-180GB)

    #1 Tokens/s, 1080.31 (for an 8-GPU node)

  • mixtral-8x7b (Server)
    SYS-421GE-NBRT-LCC (8x B200-SXM-180GB)

    #1 Tokens/s, 129,047.00

  • mixtral-8x7b (Offline)
    SYS-421GE-NBRT-LCC (8x B200-SXM-180GB)

    #1 Tokens/s, 128,795.00

"MLCommons congratulates Supermicro on their submission to the MLPerf Inference v5.0 benchmark. We are pleased to see their results showcasing significant performance gains compared to earlier generations of systems," said David Kanter, Head of MLPerf at MLCommons. "Customers will be pleased by the performance improvements achieved which are validated by the neutral, representative and reproducible MLPerf results."

Supermicro offers a comprehensive AI portfolio with over 100 GPU-optimized systems, both air-cooled and liquid-cooled options, with a choice of CPUs, ranging from single-socket optimized systems to 8-way multiprocessor systems. Supermicro rack-scale systems include computing, storage, and network components, which reduce the time required to install them once they are delivered to a customer site.

Supermicro's NVIDIA HGX B200 8-GPU systems utilize next-generation liquid-cooling and air-cooling technology. The newly developed cold plates and the new 250kW coolant distribution unit (CDU) more than double the cooling capacity of the previous generation in the same 4U form factor. Available in 42U, 48U, or 52U configurations, the rack-scale design with the new vertical coolant distribution manifolds (CDM) no longer occupies valuable rack units. This enables eight systems, comprising 64 NVIDIA Blackwell GPUs in a 42U rack, and up to 12 systems with 96 NVIDIA Blackwell GPUs in a 52U rack.

The new air-cooled 10U NVIDIA HGX B200 system features a redesigned chassis with expanded thermal headroom to accommodate eight 1000W TDP Blackwell GPUs. Up to 4 of the new 10U air-cooled systems can be installed and fully integrated in a rack, the same density as the previous generation, while providing up to 15x inference and 3x training performance.

About Super Micro Computer, Inc.

Supermicro (NASDAQ: SMCI) is a global leader in Application-Optimized Total IT Solutions. Founded and operating in San Jose, California, Supermicro is committed to delivering first-to-market innovation for Enterprise, Cloud, AI, and 5G Telco/Edge IT Infrastructure. We are a Total IT Solutions provider with server, AI, storage, IoT, switch systems, software, and support services. Supermicro's motherboard, power, and chassis design expertise further enables our development and production, enabling next-generation innovation from cloud to edge for our global customers. Our products are designed and manufactured in-house (in the US, Taiwan, and the Netherlands), leveraging global operations for scale and efficiency and optimized to improve TCO and reduce environmental impact (Green Computing). The award-winning portfolio of Server Building Block Solutions® allows customers to optimize for their exact workload and application by selecting from a broad family of systems built from our flexible and reusable building blocks that support a comprehensive set of form factors, processors, memory, GPUs, storage, networking, power, and cooling solutions (air-conditioned, free air cooling or liquid cooling).

Supermicro, Server Building Block Solutions, and We Keep IT Green are trademarks and/or registered trademarks of Super Micro Computer, Inc.

All other brands, names, and trademarks are the property of their respective owners.

 

Cision View original content to download multimedia:https://www.prnewswire.com/news-releases/industrys-first-to-market-supermicro-nvidia-hgx-b200-systems-demonstrate-ai-performance-leadership-on-mlperf-inference-v5-0-results-302419115.html

SOURCE Super Micro Computer, Inc.

FAQ

What performance improvements did SMCI achieve in MLPerf Inference v5.0 benchmarks?

SMCI systems achieved 3x higher token generation per second compared to H200 8-GPU systems, with 129,000 tokens/second for Mixtral 8x7B and over 1,000 tokens/second for Llama3.1-405b model.

How many NVIDIA Blackwell GPUs can SMCI's new rack configurations support?

SMCI's racks can support up to 64 NVIDIA Blackwell GPUs in a 42U rack and 96 GPUs in a 52U rack configuration.

What cooling improvements has SMCI introduced for the NVIDIA HGX B200 systems?

SMCI introduced new cold plates and a 250kW coolant distribution unit that doubles the cooling capacity of the previous generation in the same 4U form factor.

What is the performance capability of SMCI's new 10U air-cooled system?

The 10U air-cooled system supports eight 1000W TDP Blackwell GPUs and delivers up to 15x inference and 3x training performance improvement.
Super Micro Computer Inc

NASDAQ:SMCI

SMCI Rankings

SMCI Latest News

SMCI Stock Data

17.87B
509.17M
14.22%
51.83%
17.35%
Computer Hardware
Electronic Computers
Link
United States
SAN JOSE