The AI market is rapidly growing, with chip developers creating products to fulfill both training and logic output needs. Training chips are designed to run at full capacity, while logic output chips must be more versatile and optimized for power efficiency. To make the most of these chips, software is essential, and with hundreds of frameworks available, it can be difficult to create a chip that is optimized for them all. Nvidia GPUs are currently the default solution for training chips, although AMD and other startups have attempted to create learning chips with limited success. Furthermore, hyperscalers such as Google and AWS have developed their own proprietary training chips, with the latter’s Trainium proving to be a modest success. Ultimately, most people in the market for educational PCs will likely choose to build their models on Nvidia GPUs.
AI the chips perform two functions. AI creators first take a large (or really massive) data set and run complex software to find patterns in that data. These patterns are expressed as a pattern, so we have chips that “train” the system to generate a pattern.
This model is then used to make a prediction based on a new piece of data, and the model infers a likely outcome based on that data. Here, logic output chips process new data compared to an already trained model. These two goals are very different.
Training chips are designed to run at full capacity, sometimes for several weeks, until the model is complete. Therefore, training chips are usually large, “heavy iron”.
Logic output chips are more diverse, some are used in data centers, others are used at the “periphery” in devices such as smartphones and camcorders. These chips tend to be more versatile and are designed to optimize various aspects, such as power efficiency at the periphery. And, of course, any intermediate options. The fact is that there are big differences between “AI chips”.
For chip developers, these are completely different products, but, as with all semiconductor devices, the most important thing is the software that runs on them. In this light, the situation is much simpler, but also dizzyingly complex.
Simple, because logic output chips usually just need to execute models derived from training chips (yes, we’re simplifying). Complex because the software running on training chips is very varied. And this is very important. Currently, hundreds and possibly thousands of frameworks are used to train models. There are some incredibly good open source libraries out there, but many large AI companies/hyperscalers are creating their own.
Because the field of learning software frameworks is so fragmented, it is almost impossible to create a chip optimized for them. As we noted earlier, small software changes can effectively negate the benefits provided by dedicated chips. Also, people using educational software want that software to be as optimized as possible for the silicon it runs on. The programmers working with this software probably don’t want to mess with the intricacies of each chip, their lives are hard enough to create these training systems. They don’t want to learn low-level code for just one chip, then re-learn hacks and shortcuts for a new one. Even if this new chip offers 20% better performance, the difficulty of re-optimizing the code and learning the new chip makes this advantage moot.
This brings us to CUDA, low-level programming environment for Nvidia chips. By now, any software engineer working on learning systems probably knows a little about using CUDA. CUDA isn’t perfect, it’s not elegant, and it’s not particularly simple, but it’s familiar. Huge fortunes are built on such quirks. Because the training software environment is already so diverse and rapidly changing, Nvidia GPUs are the default solution for training chips.
The market for all these AI chips is now worth several billion dollars and, according to forecasts, will grow at 30-40% per year for the foreseeable future. According to a McKinsey study (perhaps not the most authoritative source), the AI chip market for data centers will be between $13 billion and $15 billion by 2025 — by comparison, the overall processor market is now around $75 billion.
Of these, $15 billion AI market, about two-thirds are for inference and one-third for learning. So it’s a significant market. One downside to all of this is that the training chips cost $1000 or even $10,000 and the logic output chips cost $100+, meaning that the total number of training chips is only a small fraction of the total. approximately 10%-20% units.
In the long term, this will be important for market formation. Nvidia will have a large learning curve that it can use to fight for the inference market, just as Intel once used PC processors to fill its factories and data center processors to maximize its profits.
To be clear, Nvidia is not the only player in this market. AMD also makes GPUs, but has never developed an effective (or at least widespread) alternative to CUDA. They have a relatively small market share in AI GPUs and we don’t expect that to change anytime soon.
There are a number of startups that have tried to create learning chips, but they have mostly run into the software problem described above. And for what it’s worth, AWS has also implemented its own proprietary training chip called Trainium. Judging from what we can tell has been a modest success, AWS has no clear advantages here other than its own internal (huge) workloads. However, we understand that they are moving forward with the next generation of Trainium, so they should be happy with the results for now.
Some other hyperscalers can also build their own training chips, esp google, which will soon have new TPU variants specially customized for training. And that’s the market. Simply put, we think most people in the market for educational PCs will want to build their models on Nvidia GPUs.