Intel says its Gaudi2 accelerator is greater than a fit for the Nvidia A100

Posted on

Briefly: Intel has drummed up a contention between its new Gaudi2 accelerator and the now two-year-old marketplace chief, the Nvidia A100. In two benchmarks suited for its area of interest, the brand new gaudily-named accelerator pulls out forward.

Gaudi2 is made for Intel by means of Habana Labs, an Israeli corporate that it bought on the finish of 2019 for $2 billion. Habana in reality makes two forms of specialised accelerators: some for coaching neural networks, like Gaudi2; and others for operating (i.e., “inferencing”) them, equivalent to Goya and Greco.


Habana and Intel introduced Gaudi2 in Would possibly however waited till closing week to add its benchmark ratings into the general public MLPerf database. Of their graphs, they evaluate the ratings in their Gaudi2 machine in opposition to the general public ratings of A100-equipped programs from Nvidia and Dell.

ResNet-50 exams {hardware}’s skill to coach an AI to categorise photographs. Habana’s Gaudi2 machine took simply 18 mins to coach the AI smartly sufficient for it to move the take a look at, simply surpassing Nvidia’s A100 machine, which wanted virtually part an hour.

Habana’s Gaudi2 machine took simply 17 mins to coach the BERT style, beating Nvidia’s A100 machine’s time by means of a few minute. BERT is a herbal language processing style, and on this take a look at, it trains itself with Wikipedia articles.

For each benchmarks, all of the programs used 8 accelerators/GPUs. Habana’s machine paired theirs with a couple of 40-core Intel Xeon 8380 CPUs and Nvidia’s used two 64-core AMD Epyc 7742 CPUs.


Gaudi2 options 24 TPCs (tensor processor cores) and two MMEs (matrix multiplication engines) that run partly in parallel. It helps a extensive array of knowledge varieties, together with FP32, TF32, BF16, FP16, and FP8. It additionally has a devoted media engine for processing audio and visible media as inputs.

For reminiscence, Gaudi2 has six 16 GB stacks of HBM2e that sum to 96 GB and a couple of.45 TB/s of overall reminiscence bandwidth. Inside of, it has a 48 MB cache. For connectivity, it makes use of an x16 PCIe 4.0 connection and has 24x 100 Mbps RoCE2 (RDMA over Converged Ethernet 2) ports.


Habana has obviously created an actual A100-competitor for Intel. Its timing may well be higher, for the reason that Nvidia introduced the H100 3 months in the past, however the two are such other merchandise that despite the fact that they could compete in benchmarks, they won’t in point of fact be competing for motherboard slots.

While the A100 and H100 are flexible behemoths, Gaudi2 is a streamlined accelerator seeking to do one thing other, and it’s going to be attention-grabbing to look whether or not it is a hit or now not.

Leave a Reply

Your email address will not be published.