A physicist by training, Axel is used to the need for large-scale compute. He discovered over 30 years ago that scalability of processor performance was paramount for solving any computational problem. That necessitated a new paradigm in computer architecture. At Parimics, SSRLabs and Axiado he was able to show that new thinking was needed, and what novel practical solutions could look like. Axel is now repeating that approach with Abacus Semi.
What is Abacus Semiconductor Corporation’s vision?
Abacus Semi envisions a future in which supercomputers can be built with Lego-like building blocks – mix and match any combination of processors, accelerators and smart multi-homed memories. We believe that supercomputers today do not fulfill the requirements of the users. They do not scale nearly linearly. Oftentimes, 100,000 servers making up a supercomputer can be found to provide just 5,000 times the performance of a single server. That is largely due to the fact that today’s supercomputers in essence are commercial off the shelf (COTS) devices, without any consideration of communication between those servers for instruction- and data-sharing at low levels of latency and a high level of bandwidth. Another drawback is that accelerators for special-purpose applications do not integrate easily into supercomputers. We have a different view on the basic building blocks – very similar to Legos. If the programmable elements such as processors are used for orchestration of the workload, then accelerators carry out the work, and data comes in and exits through dedicated I/O nodes, while large-scale smart multi-homed memory subsystems keep the intermediate data at hand at all times.
How did Abacus Semiconductor Corporation begin?
Axel is a physicist and computer scientist by training, and as such has used supercomputers for decades and was frustrated by the complexity of deploying and using them, by the lack of linear scaling, and by the enormous cost associated with them. As a result, he set out to fix what could be fixed, always assuming a few basic fundamentals. He started on this journey with Parimics, a vision processor company in 2004, and then with Scalable Systems Research Labs, Inc (SSRLabs) in 2011, with a short detour to a secure processor startup, and now to Abacus Semiconductor Corporation in 2020.
A modern supercomputer should allow the integration of accelerators easily both in hardware and in software, it should be able to provide very large memory configurations in both exclusive and shared memory partitions, and it should be on par in cost with COTS-based systems while keeping operating costs down. Especially the integration of accelerators for numerically intensive applications, for matrix and tensor math, for Artificial Intelligence (AI) and Machine Learning (ML) as well as the need for very large cache coherent memory shared across many processors prove to be good and future-proof calls as today’s requirements for GPT-3 and ChatGPT call for memory arrays of sizes that are not supported in today’s processors.
As a computer scientist, it was clear to Axel that fixed-function devices provide a vastly superior performance, use less power and less silicon real estate than programmable elements, and as such a modern supercomputer should allow for the integration of all kinds of accelerators while keeping the programmability of a processor at hand for orchestration of workloads and for executing those tasks for which no hardware exists.
You mentioned you have some recent developments to share. What are they?
We are very excited to let you know that we have assessed all of the code and the building blocks that we have created over the past more than a decade, and our requirements are all met. With our Server-on-a-Chip, our smart multi-homed memory subsystems, our math and database accelerators we have shown in simulations that we will achieve a vastly better linearity of scale-out. For most applications and configurations, it seems that we will hit an 80% scale-out factor, i.e. a supercomputer consisting of 100,000 servers should provide roughly 80,000 the performance of a single one. Our interface will provide enough bandwidth per pin to allow for over 3.2 TB/s of bandwidth into and out of our accelerators and processors. The smart multi-homed memory subsystem will provide nearly 1 TB/s of bandwidth into and out of the chip. The security and coherency domains can be set for each memory subsystem. We have made progress in building our team – both engineering and management – and we have a term sheet in hand. We are still assessing the validity and veracity of this term sheet, but at this point in time the conditions look good.
Tell us about these new chips you are building?
As stated before, we believe that in order to build a new generation of supercomputers, new processors, accelerators and smart multi-homed memories are needed. We also touched on the fact that today’s cores are incredibly good, and that the problem in supercomputers are not the processors cores, but nearly everything around them. We are using RISC-V processor cores that we modified as the basic programmable building element. Doing that allows us to partake in the growth of the ecosystem around RISC-V, which I believe shows the fastest growth of any processor that I have seen in my career. We removed all of the performance-limiting factors around RISC-V, added hardware support for virtualization and hypervisors, optimized the cache interfaces, and made sure it can connect to our internal processor information superhighway. We are also using accelerators for all I/O and legacy interfaces, and because we do this in a Lego-like fashion, these blocks are being reused in our Server-on-a-Chip and in our integer database processor and the orchestration processor, which are in fact the same hardware with different firmware. The Lego-like principles extend to our smart multi-homed memory subsystem as well. As such, our development effort is relatively low compared to other companies that focus on processor design and supercomputers. Due to our philosophy of parallelism instead of having to crank up the clock frequencies we do not need to spend tons of money on the old cat-and-mouse game of physical design with dynamic timing closure going through multiple iterative rounds to squeeze out one more Hertz of clock frequency. All of that simplifies code and building block reuse, and that is why we try to build our own IP in-house and keep it that way.
What are the chips in the Abacus Semi family?
The chips we are designing are the Server-on-a-Chip that effectively combines an entire server onto one processor, the identical Supercomputer I/O Frontend, an Orchestration Processors, an Integer Database Processor (both of these deploy the same hardware but use different firmware), and a math accelerator as well as a set of smart multi-homed memories.
How are the Abacus Semi chips programmed?
Since we use a RISC-V processor as the underlying programmable element, we can call on the existing ecosystem. Our Server-on-a-Chip, the integer database processor and the orchestration processor are all fully RISC-V Instruction Set Architecture compatible. In other words, they all run Linux and FreeBSD, with GCC and LLVM/CLANG as compilers available for a while now. In fact, the entire LAMP (Linux/Apache/mySQL/PHP) and FAMP (FreeBSD/Apache/mySQL/PHP) stack is available for them, and as such, any PHP and Perl application runs on them unchanged. Due to the fact that we use a DPU-plus approach to networking, we have a piece of firmware available for our processors that acts like a filtering Network Interface Card (NIC) with offload capabilities and with DMA and Remote DMA functions, as well as with direct memory access to the applications processors. A similar offload for mass storage is available and offloads the applications processors from mass storage tasks, thereby making more of the applications processors’ time available for the user applications, with or without a hypervisor. Since the Server-on-a-Chip doubles as an I/O frontend for supercomputers, the supercomputer core does not need to carry out I/O or legacy interface functions; these are all relegated to the Server-on-a-Chip. That allows the users of a supercomputer to deploy the core in a bare-metal fashion, if so desired. The math accelerator for matrix and tensor math as well as for transforms uses openACC and openCL as outward-facing APIs, but we have a translation layer available that converts CUDA into our native command set.
Can you tell us more about your technology behind the scale-out improvement?
We believe that communication is key in scale-out, and more importantly, low-latency and high-bandwidth communication. As a result, we reviewed everything we had built for unnecessary layers of hierarchy of communication through bridges and interface adapters and interface converters. We removed all of them as necessary and possible. As a result, the communication between any two or more elements in our architecture provides the highest possible bandwidth given the restrictions in bump and ball count, and the need to traverse Printed Circuit Boards (PCBs), which necessitates CML-type High Speed Serial Links. However, we use the shortest possible FLITS and commensurate encoding, both of which enable optical and electrical communication. The interface that we have designed is available for broader adoption by anyone who is interested in using it, for a nominal licensing fee. It is wide enough to provide class-leading bandwidth while allowing resilience and error-detection features for system availability in the six nines region. It is also a smart interface in that it can recognize the topology of the network up to three deep in hierarchy autonomously, and it is designed to be on its own chiplet in case we find a partner that wants it but cannot design it into their own designs.
When will the Abacus Semi chips be available?
We are working with customers and partners to ensure a prototype tapeout in Q3 of 2025, and a volume-production set for FCS in Q1 of 2026.
Also Read:
CTO Interview: John R. Cary of Tech-X Corporation
Semiwiki CEO Interview: Matt Genovese of Planorama Design
CEO Interview: Dr. Chris Eliasmith and Peter Suma, of Applied Brain Research Inc.
Share this post via:
TSMC Unveils the World’s Most Advanced Logic Technology at IEDM