I had a chance to catch up with Arun Iyengar, CEO of Untether AI. Untether AI recently unveiled its tsunAImi accelerator cards powered by the company’s runAI devices. Using at-memory computation, Untether AI breaks through the barriers of traditional von Neumann architectures, offering industry-leading compute density with power and price efficiency.
What brought you to Untether AI? (After almost 20 years in the FPGA business)
I spent a long time with FPGA companies and processor companies during a period when the industry viewed hardware as important but not critical. Artificial intelligence changed all that and moved the hardware world to be a critical component to solve the difficult machine learning requirements. As I was considering the impact of AI to the existing chip companies, I realized that it would fundamentally alter the chip landscape. I wanted to fully realize the impact of such a change by being part of a pure play AI company vs being in a larger company that was going to go through the painful process of migrating existing silicon to have AI capability. So that meant being in a startup. However, it was important to me to look at a technology and architecture that would be differentiated and scale readily for both production and targeting various end markets. Untether AI, with its at memory compute architecture, fulfilled this criteria. Untether AI is well positioned to scale for technology nodes as well as scale the size of the die to target various end markets.
Neural Net Inference is an exciting but competitive market, how will you differentiate? (Who do you really compete with?)
Available chips for neural net inference are mostly based on von Neumann architecture. As a quick aside, von Neumann described a computer architecture in 1945 that is still the mainstream approach for silicon. It is very well suited for general purpose compute, but ill-suited for neural net inference. With expected exponential growth in power consumption for AI processing, this leads to an untenable situation. When Untether AI looked at the von Neumann architecture, we found that 90% of the power is wasted in data movement. We set about to solve that with the company’s at memory architecture which reduces data movement by a factor of 6. The resulting product can run at 8 TOPS/W and offers over 2,000 TOPS per PCIe card. There are few companies that can match this compute density and performance.
What can you tell me about your silicon? (Availability? Foundry partner? Process node? Benchmarks?)
We use standard CMOS technology with redundancy incorporated into it for high yields. We use TSMC 16 nm process to produce our runAI200 chips. The product is sampling now and is sold in 2 form factors:
- tsunAImi accelerator PCIe card with 4 runAI200 devices
- standalone runAI200 devices.
For inference benchmarks examples, the tsunAImi accelerator card is capable of computing 80,000 ResNet 50 images per second and 12,000 BERT base queries per second, both of which are at least 3 times better than the closest competitor’s numbers. On a total cost of ownership approach (using benchmark/W/sq mm of die area), the 16nm runAI200 is an impressive 8X better than the GPU competitor’s 7nm part
What type of software effort will be required?
While our tsunAImi cards will be deployed in servers and the cloud, we consider our customer to be the data scientist. The data scientist is great at modeling and proficient in machine learning frameworks like TensorFlow and PyTorch. As such we use these popular frameworks as our entry point. From that point our goal is to make the implementation of the neural network as pain-free as possible. Therfore our imAIgine software development kit requires no knowledge of specifically how we translate the neural network into code running on our devices. The imAIgine compiler does the automated graph lowering and has sophisticated optimization and allocation algorithms. The imAIgine toolkit provides extensive feedback to the modeler highlighting the resource allocation, congestion and providing cycle accurate simulation. The imAIgine runtime engine does the hardware abstraction, communication and health monitoring as it places the net on the chip(s). So the overall vision is of a software development flow that allows the data scientist to stay just at ML framework, but provide more advanced capabilities to the power user if they choose to do so.
Software will be the metric by which any AI startup will succeed or fail. At Untether AI, we have more than half the company as software engineers, with a large number of them having advanced degrees.
Can you talk about your relationship with Intel?
Intel Capital has been an investor in Untether AI from the early days. Along with Radical Ventures, they have been a huge supporter of the company, providing guidance and connections to help us access technology that would be hard for a startup to do on their own. Intel Capital has a good network across their portfolio companies and Untether AI taps into that network as we have specific questions to resolve. For example, as we were looking to bring up our runAI200 silicon, we wanted to get some specific questions answered and were able to talk to another AI company from the Intel Capital network.
I am excited for how silicon can change and enable the AI usage. We are truly back at a golden age if you are a silicon enthusiast.
Untether AI’s goal is to have sustainable AI that does not consume the world’s energy resources in order to have humanity get the benefits of AI. With this, we will have the best combination of the golden age of silicon with democratization of AI.
Also Read:Share this post via: