There's a lot more going on in AI chip development than Nvidia and AMD. GPUs happen to be useful for recurrent neural networks, like transformers, which are used by large language model AI software. (You can do a search for all of these terms, and get explanations for what they mean. But keep in mind that understanding the explanations does require some knowledge of moderately advanced computer science concepts.) GPUs are designed to a general computing model called Single Instruction Multiple Data (SIMD), which means that a single instruction can be simultaneously executed in parallel on multiple data streams. GPUs consist of hundreds or thousands of arithmetic-logic units (ALUs), each of which execute a simple instruction set compared to general purpose CPUs, but there is sufficient generality to allow GPUs to be applied to different problem spaces, such as graphics, database processing, image processing... there's actually a rather long list of applications, and AI applications are just the currently hot addition.
Nvidia's GPU development is, IMO, way ahead of anything AMD has or Intel has in that product category. Nvidia has the best GPUs close-coupled with many-core Arm-based server CPUs, and Nvidia has far and away the best system interconnect strategy of these three companies with NVlink and InfiniBand. AMD and Intel, at best, can only apply their coherent CPU interconnects and CXL, which are very limited by comparison. Even CXL, while Intel conceived it as an accelerator to CPU interconnect, has been mostly repositioned to support remote shared memory pools, and does not seemed focused on the CPU-accelerator problem. And on top of all of these advantages, Nvidia's CUDA ecosystem is years ahead of the open software competition (PyTorch).
But there are other strategies for AI processing, that may be better than Nvidia's in the long run.
Google, for several years now, has been using custom ASICs called TPUs (Tensor Processing Units), which are specialized ASICs consisting of multiple matrix-multiply units, each consisting of a 128x128 systolic array capable of 16,384 multiply-accumulate operations
per clock cycle in TPU version 4.
cloud.google.com
It's called a Tensor Processing Unit because tensors are mathematical operations from linear algebra used to operate on vectors and matrices, which are the basic data structures in neural networks. In other words, a TPU is a custom hardware implementation of what people are programming GPUs to do. Google assembles multiple TPUs on a board, and then networks thousands of them together on a leading-edge end-to-end optical interconnect called OCS, which incorporates optical switches based on MEMS chips. I think it's amazing. From a technical standpoint, when I read about TPUs and OCS, I'm much more impressed by Google's technology than I am anything Nvidia has. The question is, can Google's AI application technology that runs on these TPU systems be leading edge? I don't know, but they claim to be running various LLMs on these systems already.
A new paper describes how Google’s Cloud TPU v4 outperforms TPU v3 by 2.1x on a per-chip basis, and improves performance/Watt by 2.7x.
cloud.google.com
Amazon has announced they have also developed their own inference ASIC, called Inferentia Accelerators. As you can see from their website, they are already working with customers and have a software ecosystem.
Learn about AWS Inferentia an ML chip presented by AWS.
aws.amazon.com
So Google and Amazon could win the AI-in-the-cloud market by just being good enough at far lower cost (and likely lower power consumption).
Intel, while it appears to still be working on GPUs, is extending the x86 instruction set for fast matrix extensions, and seems to be working from the bottom up rather than the top down in the AI market:
Sanchit Misra is a senior research scientist and leads the efforts in computational biology/HPC research at Intel Labs. Highlights: Intel is democratizing AI inference by delivering a better price and performance for real-world use cases on the 4th gen Intel® Xeon® Scalable Processors...
community.intel.com
Tenstorrent appears to have similar ideas to Intel, and is extending RISC-V to better process AI applications.
(The Google-Intel-Tenstorrent situation is somewhat weird, because Dave Patterson, one of the leaders of the RISC-V initiative at UC Berkeley, is a distinguished engineer at Google working on TPUs, while Tenstorrent's CEO, Jim Keller, was previously an SVP at Intel.)
It might be that the CPU-based AI capabilities are just easier to develop applications for, and much cheaper to start small and grow, so this strategy shouldn't be quickly dismissed.
Tenstorrent is a next-generation computing company that builds computers for AI. Headquartered in the U.S. with offices in Austin, Texas, and Silicon Valley, and global offices in Toronto, Belgrade, Seoul, Tokyo, and Bangalore, Tenstorrent brings together experts in the field of computer...
tenstorrent.com
And then there's Cerebras, a start-up currently valued at $4B+, which uses wafer-scale integration to produce the world's largest monolithic chips. The chips have hundreds of thousands of custom AI processors on each wafer-chip, interconnected by the world's fastest interconnection network (because it's on-die), but they're only available as fully proprietary systems. That means Cerebras is more like the old Cray full-custom supercomputers than the systems Nvidia GPUs are used in, but if we're talking pure technical capability, Cerebras is the current world champion. Why is Cerebras even mentioned in the same breath as Nvidia? Because most of Nvidia's datacenter story is now about assembling GPUs into what are really AI supercomputers,
Cerebras is the go-to platform for fast and effortless AI training. Learn more at www.cerebras.net.
www.cerebras.net
Note the write-up on their website about the creation of a 4 exaflop AI supercomputer.
So I'm not so sure that Nvidia maintains its current far-and-away leadership position in the long run. And it doesn't look like it's AMD that will be the challenger. To me, Google looks most impressive in the long run as a challenger.