AMD has responded to NVIDIA’s H100 TensorRT-LLM figures with the MI300X once again leading in the AI benchmarks when running optimized software.
AMD & NVIDIA Are Engaged In A Fierce Battle With Both GPU Makers Claiming AI Superiority Over Each Other Using Optimized Software Stacks For H100 & MI300X Chips
Two days ago, NVIDIA published new benchmarks of its Hopper H100 GPUs to showcase that their chips perform much better than what was showcased by AMD during its “Advancing AI” event. The red team compared its brand new Instinct MI300X GPU against the Hopper H100 chip which is over a year old now but remains the most popular choice in the AI industry. The benchmarks used by AMD were not using the optimized libraries such as TensorRT-LLM which provides a big boost to NVIDIA’s AI chips.
Using TensorRT-LLM resulted in the Hopper H100 GPU gaining almost 50% performance uplift over AMD’s Instinct MI300X GPU. Now, AMD is firing with all cylinders back at NVIDIA by showcasing how the MI300X still retains faster performance than the H100 even when the Hopper H100 is running its optimized software stack. According to AMD, the numbers published by NVIDIA:
- Used TensorRT-LLM on H100 instead of vLLM used in AMD benchmarks
- Compared performance of FP16 datatype on AMD Instinct MI300X GPUs to FP8 datatype on H100
- Inverted the AMD-published performance data from relative latency numbers to absolute throughput
So AMD has decided to go for a more fair comparison and with the latest figures, we see the Instinct MI300X running on vLLM offering 30% faster performance than the Hopper H100 running on TensorRT-LLM.
- MI300X to H100 using vLLM for both.
- At our launch event in early December, we highlighted a 1.4x performance advantage for MI300X vs H100 using equivalent datatype and library setup. With the latest optimizations we have made, this performance advantage has increased to 2.1x.
- We selected vLLM based on broad adoption by the user and developer community and supports both AMD and Nvidia GPUs.
- MI300X using vLLM vs H100 using Nvidia’s optimized TensorRT-LLM
- Even when using TensorRT-LLM for H100 as our competitor outlined, and vLLM for MI300X, we still show a 1.3x improvement in latency.
- Measured latency results for MI300X FP16 dataset vs H100 using TensorRT-LLM and FP8 dataset.
- MI300X continues to demonstrate a performance advantage when measuring absolute latency, even when using lower precisions FP8 and TensorRT-LLM for H100 vs. vLLM and the higher precision FP16 datatype for MI300X.
- We use FP16 datatype due to its popularity, and today, vLLM does not support FP8.
These results again show MI300X using FP16 is comparable to H100 with its best performance settings recommended by Nvidia even when using FP8 and TensorRT-LLM.
Surely, these back-and-forth numbers are something that are kind of unexpected but given just how important AI has become for the likes of AMD, NVIDIA, and Intel, we can expect to see more such examples being shared in the future. Even Intel has recently stated that the whole industry is motivated to end NVIDIA’s CUDA dominance in the industry. The fact as of right now is that NVIDIA has years of software expertise in the AI segment and while Instinct MI300X offers some beastly specs, it will soon be competing with an even faster Hopper solution in the form of H200 and the upcoming Blackwell B100 GPUs in 2024.
Intel is also ready to out its Gaudi 3 accelerators in 2024 which would further heat the AI space but in a way, this competition would make for a vibrant and more lively AI industry where each vendor continues to innovate and excel over the other, offering customers better capabilities and even faster performance. NVIDIA, despite having no competition for years, has continued to innovate in this segment, and with AMD and Intel ramping up their AI production and software, we can expect them to respond with even better hardware/software of their own.