Supermicro's NVIDIA HGX B200 Sets New AI Performance Standard

Supermicro's NVIDIA HGX B200 Sets New AI Performance Standard

2025-04-03 smart

San Jose, Thursday, 3 April 2025.
Supermicro’s NVIDIA HGX B200 systems tripled token generation speed, achieving the highest scores on MLPerf Inference v5.0 benchmarks, showcasing a significant leap in AI hardware performance.

Breakthrough Performance Metrics

In a significant development announced on April 3, 2025, Supermicro’s new systems demonstrated unprecedented AI processing capabilities, achieving more than three times the tokens per second generation for Llama2-70B and Llama3.1-405B benchmarks compared to previous H200 8-GPU systems [1]. The company’s SYS-421GE-NBRT-LCC and SYS-A21GE-NBRT systems, both equipped with eight NVIDIA B200-SXM-180GB GPUs, showcased industry-leading performance by delivering 129,000 tokens per second on the Mixtral 8x7B Inference benchmarks [1].

Advanced Technical Architecture

The systems feature cutting-edge specifications, including dual Intel® Xeon® 6900 series processors supporting up to 128 cores and 256 threads per CPU [4]. The GPU configuration boasts 1.4TB of total GPU memory, achieved through eight NVIDIA HGX B200 GPUs, each equipped with 180GB of HBM3e memory [4]. To maintain optimal performance, the systems utilize an advanced cooling system with 15 counter-rotating 80mm fans and 4 counter-rotating 60mm fans [4], while six 5250W redundant power supplies ensure reliable operation [4].

Market Impact and Industry Recognition

Charles Liang, president and CEO of Supermicro, emphasized the company’s leadership position in the AI industry, highlighting their first-to-market advantage with diverse system optimizations [1]. The achievement has garnered recognition from MLCommons, with David Kanter, Head of MLPerf, acknowledging the significant performance gains compared to earlier generations [1]. This breakthrough coincides with Supermicro’s expansion of its AI portfolio, now featuring over 100 GPU-optimized systems supporting various configurations [2].

sources

  1. www.prnewswire.com
  2. www.supermicro.com
  3. www.nvidia.com
  4. www.theserver.group

AI performance MLPerf