As far as natural language processing is concerned, we are talking about 1.8x – 3.0x times performance improvements over A100 machines. A part of this advantage can be attributed to Intel's industry-leading media processing engines incorporated into Gaudi2. But it looks like internal bandwidth and compute capabilities along with SynapseAI software advantages (keep in mind the advantages that Intel did to its PyTorch and TensorFlow support in the recent quarters) that come with Gaudi2 do the significant part of the job here.

Scaling Out

Among the things that Intel submitted to ML Common's database (which have not been published yet) were performance results of 128 and 256-accelerator configurations demonstrating parallel scale out capability of the Gaudi2 platform available to commercial software stack available to Habana customers (bear in mind, this chip has 24 100GbE RDMA ports and it may scale in different ways).


Among the things that Intel submitted to ML Common's database (which have not been published yet) were performance results of 128 and 256-accelerator configurations demonstrating parallel scale out capability of the Gaudi2 platform available to commercial software stack available to Habana customers (bear in mind, this chip has 24 100GbE RDMA ports and it may scale in different ways).

Amdahl's law says that performance scaling within one chip beyond one execution core depends on many factors, such as within-chip latency as well as software and interconnection speeds. GPU developers have long discouraged of this law. When it comes to scale out capabilities, Intel's Gaudi2 outperforms all existing AI models given its vast I/O. Meanwhile, Intel does not disclose how AMD and Nvidia-based solutions perform in the same cases (we should presume that it scales better with tensor ops, don't we?).

"Gaudi2 delivers clear leadership training performance as proven by our latest MLPerf results,"| said Eitan Medina, chief operating officer at Habana Labs. "And we continue to innovate on our deep-learning training architecture and software to deliver the most cost-competitive AI training solutions."

Some Thoughts

Without any doubt, performance results of Intel's Habana 8-way Gaudi2 96GB-based deep learning machine are nothing but impressive when compared to an Nvidia's 8-way A100 DL system. Beating a competitor by two times on the same process node is spectacular to say the least. But this competitor is two years old.

Yet this is without consideration of power consumption, which we do not know. We can only assume that Intel's Gaudi2 are OAM cards are rated at a 560W max (as spaked) per board. But this is barely a metric for those deploying things like Gaudi2.

Intel's Gaudi2 system partners currently include Ddn and Supermicro. Given the nature of ddn, we are talking about an AI-enabled storage solution here (bear in mind, this is an Intel PDF). Supermicro is only mentioned.