Ljubisa Bajic is the CEO of Tenstorrent, a company he co-founded in 2016 to bring to market, full-stack AI compute solutions with a compelling new approach to address the exponentially growing complexity of AI models.
What is Tenstorrent?
Tenstorrent is a next-generation computing company that is bringing to market the first dynamic artificial intelligence architecture that facilitates scalable deep learning. Tenstorrent’s vision is to enable Artificial General Intelligence – a more sentient capability. Tenstorrent’s mission is to build the substrate for radically more efficient and scalable AI computation while simplifying software development. This would enable larger, exponentially more complex models to be handled and deployed faster and cost-effectively and create ubiquitous, advanced AI solutions from the edge to the cloud. The first announced product from Tenstorrent is called Grayskull – a processor that has benchmarked as the fastest single-chip AI solution and that will be available to ship to customers by Q4’20
Why did you see a need for another AI company?
AI is a burgeoning field, and the computation required for its use cases is evolving rapidly. We are still early in the growth phase where newer, emerging models and use cases are overwhelming even State of the Art (SoTA) architectures and implementations. The use cases and specializations continue to grow. What everyone agrees on, however, is that the complexity of the problems, and hence models, continues to explode; that there is a huge diversity of use cases that need specialized solutions to deliver on the Service Level Agreements (SLA); and that the cost of delivering the next generation of AI solutions – or their TCO – will grow exponentially if we stick to current SoTA.
Moreover, there are solutions that can thrive in some use cases, but fail in others. For example, specialized AI acceleration solutions that do well at image processing today might not do well for natural language processing or recommendation engines. GPU-based solutions may be better than CPU-based for quite a few use cases, but not versus specialized AI acceleration. You might be able to get to the answer for some models, but it may take too long (years) with a CPU and be prohibitively expensive, even with specialized hardware. The cost estimates for training the recently announced GPT-3 are exorbitant and would take an inordinately long time even with specialized acceleration. Some cloud companies offer hundreds of different SKUs of compute to fit the right requirements. Simply put, while great strides have been made by various players in the AI space, the richness of the problem, when paired with emerging requirements, means that new approaches, architectures and solutions are needed to solve the future problems. Just as there were a slew of search companies in the late ‘90s, Google came in much later with a novel approach that revolutionized search.
What makes Tenstorrent different?
First, Tenstorrent has rethought the computation problem from the software point of view, as we believe the future software 2.0 will not be written in the traditional fashion but at a higher level and a lot of lower-level code generation happens automatically. So greater intelligence is needed for the compute elements to be able to handle allocation, communication and other such adjustments at run time. Tenstorrent starts off with the idea of making a new system much more brain-like. A human brain operates at (effectively) less than 20W – considering the amount it computes, the efficiency is sky high. A lot of that efficiency comes from the human ability to drop unnecessary data and computation, especially with learned information. With Tenstorrent’s conditional execution, you can achieve similar gains by speeding up some compute models by orders of magnitude. This is built on top of a hardware architecture that is focused on compute efficiency – finding the right granule size for the most efficient compute, memory and storage. The compute element is designed for optimal compute integrating lossless compression for storage efficiency, and network processing which provides communication efficiency. It is designed to efficiently scale up to 100K nodes. The software approach is also geared to allow massive scale and takes the pressure off the Ahead of Time (AOT) scheduling by having run time control and firmware that can control compute, storage allocation and other activities programmably. This is a non-trivial problem that has been handled in a novel way. The software tooling supports not only compiler generality for neural networks finding the balance between ahead of time (AOT) scheduling and run-time allocation for max efficiency, it’s versatile enough to handle non-neural pre- or post- processing steps for a more holistic, simpler, overall solution from a programmer’s point of view.
To summarise, the solution is a scalable distributed network computer that brings in the non-linear benefits of conditional computing, performance that is decoupled both from the size of the model and batching, combined with a software approach for simplification of deployment across a wide variety of AI models.
What is a Tensix?
The Tensix is the smallest self-contained constituent unit of the network-computer that forms the base of Tenstorrent’s solutions. The Tensix consists of a packet engine that routes packets into and out of the Tensix, a compute engine, 1MB of SRAM and five RISC processors that give it the unique, granular programmability. The compute engine consists of a SIMD engine and a Tensor/matrix engine that gives its name. Tenstorrent made a conscious decision to move away from large matrix multiplication units with lots of parallelism, but little control over what gets computed and how. The Tensix has a compact granule that, in its network, can easily be filled with parallel tasks, and yet the processors help regulate conditional execution by autonomous ability to stop compute on threads that have reached their optimum level of accuracy. It still packs a lot of punch. The SIMD unit can do vector math for AI and non-AI calculations such as signal processing. It can support numerous floating-point formats. The Tensor unit can accelerate both convolutional and GEMM type operations. With 1MB of SRAM, there is plenty of space for the computation to occur without stressing the rest of the Network on Chip or DRAM – once the necessary information is loaded.
What applications / use cases is Grayskull best suited for? Wormhole?
Grayskull is specified to be ideal for inference. Tenstorrent’s approach of having the right amount of onboard SRAM and efficient, high-bandwidth DRAM gives the product a great deal of flexibility and versatility. For inference, its strengths are not only in its performance, but in its low latency, which is important for especially real-time response. In its production 65W form factor PCIe card, it does extremely well at inferencing on Natural Language Processing models like BERT and GPT demonstrating scores that are over five times the performance of GPU-based alternatives in the same power envelope. Grayskull represents the world’s first conditional compute processor, which can enable additional performance gains over the previously stated 5X.. It is also well-suited for vision-recognition systems and recommender networks – which are currently critical in various verticals. Grayskull can also be used for training in some configurations, however, the follow-on product – Wormhole – which will be sampling in early 2021 and shipping in the second half of the year, is designed specifically to improve training capabilities by introducing a much higher bandwidth memory system and coherence connectivity that will scale to much larger accelerator configurations with few host CPUs.
What trends in AI / ML do you find most interesting right now? How do you think they will play out in the coming years?
The reality is that AI is transforming almost every industry. Deep learning will continue to become more sophisticated and will eventually enable machines to piece together new ideas through old ones – potentially even “expressing” emotion. But even without that, AI is becoming increasingly more useful as people and industries rely more on it for applications like drug and vaccine discovery or using ECGs or other usual reports to predict health issues or provide early diagnosis. This requires more sophisticated models. AI systems are still just beginning to do activities that humans take for granted – the dexterity of hands to handle delicate objects, the ability to drive vehicles, conduct conversations, realize context-based comprehension and so many more. The models are growing exponentially in size and complexity – which puts a severe strain on the compute resources required to just keep up. The result is that very few companies can afford to train on these sophisticated models, which limits the proliferation of these capabilities. There will be new techniques that simplify software and models but, even with that, models will continue to grow. So we see a trend toward greater innovation in architecture, software and hardware development to achieve the next growth phase.
What’s the future for Tenstorrent?
At its heart, Tenstorrent is an AI systems company. We are quite excited about the first generation of products hitting the performance target and demonstrating the capabilities of our base architecture and the promise of conditional execution. We are learning very quickly on customer use cases in AI inference and clearly seeing the pain points that we can solve – needless to say, performance and TCO figure prominently, as does software simplification. We are looking forward to getting our products into market and then accelerate our roadmap on both hardware and software to revolutionize training with our Wormhole product’s conditional computation and hyper-efficient scaling. The Grayskull inference solution is beginning evaluations and can show a multi-fold improvement with models trained on other platforms. However, if the training platform is also Tenstorrent based, you will see an even bigger gain at both ends.
Why Toronto vs. Silicon Valley?
That is a fair question. The founding team and most of the core engineering team were educated in Toronto and worked there prior to Tenstorrent. Toronto is North America’s fourth largest city and is known as a very cosmopolitan metropolis with a very thriving cultural scene and very respectable presence in the hardware and software world, being a strong center for graphics and parallel compute. However, over the last two decades, Toronto has become the epicenter of artificial intelligence outside Silicon Valley. With universities that pioneered deep learning, three Turing award winners that all taught there and created a groundswell of thinkers, implementers and engineers. It also has the fourth largest pool of tech resources in North America, but is particularly concentrated with talent in artificial intelligence. It is fertile recruiting ground for some of the top talent, and of course a great city to live in.
Tenstorrent is a next-generation computing company that is bringing to market the first dynamic artificial intelligence architecture facilitating scalable deep learning. The company’s mission is to address the rapidly growing computing demands for training and inference, and to produce highly programmable and efficient AI processors with the right computational density designed from the ground up.
Headquartered in Toronto, Canada, with U.S. offices in Austin, Texas, and Silicon Valley, Tenstorrent brings together experts in the field of computer architecture, complex processor and ASIC design, advanced systems and neural network compilers, and who have developed successful projects for companies including Altera, AMD, Arm, IBM, Intel, Nvidia, Marvell and Qualcomm. Tenstorrent is backed by Eclipse Ventures and Real Ventures, among others. For more information visit www.tenstorrent.com.