You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!
Building next generation systems is a real balancing act. The high-performance computing demands presented by increasing AI an ML content in systems means there are increasing challenges for power consumption, thermal load, and the never-ending appetite for faster data communications. Power, performance, and cooling are all top of mind for system designers. These topics were discussed at a lively webinar held in association with the AI Hardware Summit this year. Several points of view were presented, and the impact of better data channels was explored in some detail. If you’re interested in adding AI/ML to your next design (and who isn’t), don’t be dismayed if you missed this webinar. A replay link is coming so you can see how much impact flexible, scalable interconnect for AI HW system architectures can have.
Introduction by Jean Bozman, President Cloud Architects LLC
Jean moderated the webinar and provided some introductory remarks. She discussed the need to balance power, thermal and performance in systems that transfer Terabytes of data and deliver Zettaflops of computing power across hundreds to thousands of AI computing units. 112G PAM4 transceiver technology was discussed. She touched on the importance of system interconnects to achieve the balance needed for these new systems. Jean then introduced Matt Burns, who set the stage for what is available and what is possible for system interconnects.
Matt Burns, Technical Marketing Manager at Samtec
Matt observed that system and semiconductor products all need faster data communication channels to achieve their goals. Whether the problem is electrical, mechanical, optical or RF, the challenges are ever-present. Matt discussed Samtec’s Silicon-to-Silicon™ connectivity portfolio. He discussed the wide variety of interconnect solutions offered by Samtec. You really need to hear them all, but the graphic below gives you a sense of the breadth of Samtec’s capabilities.
Samtec solutions
Next, the system integrator perspective was discussed.
Cédric Bourrasset, Head of High-Performance AI Computing at Atos
Atos develops large-scale computing systems. Cedric focuses on large AI model training, which brings many challenges. He integrates thousands of computing units with collaboration from a broad ecosystem to address the challenges of model training. Scalability is a main challenge. The ability to implement efficient, fast data communication is a big part of that.
Dawei Huang, Director of Engineering at SambaNova Systems
SambaNova Systems builds an AI platform to help quickly deploy state-of-the-art AI and deep learning capabilities. Their focus is to bring enterprise-scale AI system benefits to all sizes of businesses. Get the benefits of AI without massive investment. They are a provider of the technology used by system integrators.
Panel Discussion
What followed these introductory remarks was a very spirited and informative panel discussion moderated by Jean Bozman. You need to hear the detailed responses, but I’ll provide a sample of the questions that were discussed:
What is driving the dramatic increase in size and power of compute systems? Is it the workloads, the size of the data or something else?
What are foundation models and what are their unique requirements?
What are the new requirements to support AI being seen by ODMs and OEMs – what do they need?
Energy, power, compute – what is changing here with the new pressures seen? Where does liquid cooling fit?
What are the new bandwidth requirements for different parts of the technology stack?
How do the communication and power requirements change between enterprise, edge, cloud and multi-cloud environments?
To Learn More
This is just a sample of the very detailed and insightful discussions that were captured during this webinar. If you are considering either adding or introducing AI/ML for your next project, I highly recommend you check out this webinar. You’ll learn a lot from folks who are in the middle of enabling these transitions.
The system design process can incorporate linear thinking, parallel thinking, or both, depending on the nature of the anticipated system, subsystem, or element of a subsystem. The structure, composition, scale, or focal point of a new/incremental system design incorporates the talents and gifts of the designer in either a top-down or bottom-up design style. Is a centralized or distributed approach to processing the best method? Is a symmetrical or asymmetrical topology warranted? Is power or speed the driving criteria? The answer to these questions can lead to a conceptual block diagram that starts the design process, leading to a design specification.
Conceptual Block Diagrams
Everyone is familiar with a conceptual block diagram, where differences between block diagrams might reflect the level of abstraction, or conversely, how much detail is presented. A two-dimensional block diagram may implement a three dimensional nodal topology, or a linear communication block diagram may implement a multi-modulation scheme (Figure 1 ). After creating a conceptual block diagram, what methodologies are available to evaluate the system performance in terms of system throughput, system power, system latency, resource utilization, as related to cost? In many cases, the system throughput, power, latency, utilization, or cost are established by customers directly, or product marketing indirectly working with customers. It might be termed a marketing requirements document, a design specification, a product specification, or simply a spec sheet. There are many designations for a “design” level specification, which contains one or more conceptual block diagrams.
Figure 1: An example of a block diagram engineers use to describe a new system specification.
Design Level Specification
A design level specification captures a new or incremental approach to improving system throughput, power, latency, utilization, or cost; typically referred to as price-performance tradeoffs in product marketing. In medium to large system design organizations, a design specification may be coordinated, and, or approved by the executive staff, marketing, R&D, manufacturing, field support, or a specific customer. The degree of coordination, or approval, for system design specifications between intra-company groups varies from company to company, depending on the type of system market, time to market, consumer or industrial market segment, and market maturity. At each step in the evolution of a design specification, well-intentioned modifications, or improvements, may occur. What happens to the system design process if a well intentioned design specification change impacts the original conceptual block diagrams, such that design margin for system throughput drops from 20% to 5%? While the R&D group, who created the conceptual block diagram, may realize that the system throughput will be impacted by a marketing modification, there may not be an easy way to determine that the worst-case design margin has shrunk to 5%. Or, in other words, the time required to evaluate a design modification before, or after, the system design process has started, can vary dramatically, depending on the system evaluation methodology selected.
Figure 2. This is an example of the BONeS Designer from ’90s.
System Evaluation Methodologies
Moore’s law in reverse will reveal that systems created in the 1970s, or early 1980s could be designed and modified on a proverbial napkin, sent to a development team, and allow man to explore the moon on schedule, cost not withstanding. Intel’s development of the co-processor in the mid-80s marked the increasing sophistication of system design, given the transition from medium scale integration to large scale integration in chip design.
EXCEL spreadsheets became popular for estimating average throughput, power, latency, utilization, and cost at the system level when some napkin designs began to have problems in accurately estimating overall system performance, as system complexity increased. The problems encountered were mathematical discontinuities related to system operation (especially digital), estimating peak system performance, and simply mistakes in a spreadsheet, that were not readily apparent.
C and C++ golden reference models became popular in the late 1980s, and early 1990s, since they could resolve some of the EXCEL spreadsheet issues with a modest programming effort. To resolve digital system modeling issues, C/C++ provided internal synchronization in the form of software generated clocks or events, common resource objects and user-defined classes. The problems encountered were related to the envisioned “modest” programming effort. Software bugs were more difficult to find in increasingly complex software that resolved some of the EXCEL spreadsheet issues. Nonetheless, better performance modeling results were obtained with substantial programming effort. Different companies or even different groups within the same company typically made different assumptions regarding their internal golden reference models, such that it was difficult to exchange models from one company to another or one group to another group. Their golden reference models lacked a common frame of reference, or sometimes referred to as interoperability. In the early 1990s, the combination of low cost workstations, and modeling tools needing a common frame of reference started to appear in the marketplace.
Several system level tools, such as BONeS Designer (Block-Oriented Network System Designer) (Figure 2 ), Signal Processing Workstation (SPW), OPNET Modeler, SES Workbench, CACI COMNeT and Virtual Component Co-Design (VCC) appeared to provide the notion of time-ordered, concurrent system processes, embedded software algorithms, and data types. C/C++ programming languages do not explicitly provide for time sequenced operations, parallel time sequenced operations, or design related data types. Some companies shifted from C/C++ golden reference models to these standard modeling tool methodologies. In addition, many of these tools were graphically oriented, which reduced the need for extensive C/C++ coding efforts, replacing standard modeling functionality with graphical representations of common functions. If specific functionality was required, the user could create a custom-coded element, or block, depending on the modeling libraries supported by the tool.
Ability to spatially refine an abstract model to a more detailed model
Ability to reuse system level modeling modules
The afore-mentioned tools focused on improving modeling capabilities in terms of performance modeling, ease of use, model creation time, and post-processing of modeling results. Some of the issues with these early system level modeling tools is that they were suited to specific classes of systems, added their own syntax to graphical modeling, and sometimes lacked sufficient libraries to solve certain modeling problems.
System Level Modeling
The system level modeling space consists of both methodology-specific and application-specific modeling domains that overlap to some extent. Methodology-specific domains consist of discrete-event, cycle-based, synchronous data flow and continuous time, models of computation provide a modeling methodology for general classes of modeling problems. The discrete-event model of computation is used for digital portions of a design that may have a strong control component. A discrete-event model of computation is very efficient for higher levels of abstraction, as the current simulation time is based on both deterministic synchronous and asynchronous events. Discrete-event models provide a user with both time-ordered (asynchronous) and concurrent (synchronous) event modeling capabilities.
A cycle-based model of computation is similar to a discrete-event model of the computation with the proviso that the model is clock-driven, executing the simulation engine for each clock cycle. Cycle-based simulators provide a user with more modeling fidelity, meaning that they usually are used for more detailed modeling of digital systems, including verification of the final design. A synchronous data flow model of computation is more DSP algorithm oriented, meaning the mathematical processing of Baseband digital signals, whether vectors, matrices, or complex data types. Internal processing of synchronous data flow type models can be simpler than a discrete-event modeling engine, requiring the concurrence of tokens at each modeling block to start processing, and the generation of new tokens to subsequent modeling blocks.
New Thinking
System level modeling is evolving to solve the original problem cited, how to determine quickly and efficiently how a change to a design specification might impact the performance of a proposed system. Is the throughput margin now 15% or 5%? One aspect of the system design process, the design specification itself typically remains a Word document with text, block diagrams, etc. It is difficult to exchange models with executive staff, marketing, manufacturing, or field support, simply because they lack the modeling expertise certain tools require. If the system level, or golden level model, could be exchanged among the disparate groups, or within design groups located around the globe, as part of the design specification, then the evaluation of a proposed change might be performed by the marketing organization directly.
One approach is to use the power of the Web Browser and Internet-based collaboration to embed a system level model into a design specification document as a Java Applet. Companies such as Mirabilis Design of Sunnyvale California have created a methodology around this to enable design team collaboration and better communication between suppliers and system companies (Figure 3 ). Any internet browser can run a system level model embedded as a Java Applet within an HTML document. In other words, the design specification can now contain an “executable” system level model that other people within the organization can run with a net browser, no additional software or license is required. The model contained within the Java Applet is identical to the tool level model, containing all the parameters, and block attributes the original model contains. Once can modify parameters such as bus speed and input data rate, but cannot modify the base architecture and connectivity.
Similarly models must be stitched together with Intellectual Property (IP) that is produced and maintained in disparate regions of the world. The system-level models associated with these IP can be located at the source, thus ensuring the latest is always available to engineers. Using a graphical model construction environment and sophisticated search engine, the most suitable set of technology can be easily evaluated and selected based on performance/power characteristics. Updates to any technology will be immediately and automatically available at the designer’s desktop. This would be a vast improvement over current IP communication techniques that depend on expensive custom modeling effort and a transition period for the user to upgrade their database setup. Moreover the Designer has vast degrees of freedom to create a quality product.
Another part of the puzzle is a seamless integration of the disparate models of computation. Just as hardware implementation tools support mixed language mode, system tools now support a common modeling platform. For example, Synopsys integrated Cossap (dynamic data flow) and SystemC (digital) into System Studio while VisualSim from Mirabilis Design combines SystemC (digital), synchronous dataflow (DSP), finite state machine (FSM), and continuous time (analog) domains. Previous system level tools typically supported a single modeling specific domain. Additionally, making the modeling software platform to be Operating System agnostic reduces support cost and enables easy data transition across the company.
Finally, a new approach to standard library development is required. Prior generations of graphical modeling tools might advertise 3,000 plus libraries as a sign of modeling tool maturity and robustness. However, if a new Ultra-Wideband model was required, then most of the prior libraries may not be reusable for UWB, since many were prior generation application specific type libraries, or bottom-up component libraries. A new system level modeling approach will not measure the tool by the quantity of libraries, rather by the quality and integration of the system level libraries. Such libraries will have a high likelihood of reuse in a new technology, or system level model. Relative to prior generations of graphical modeling tools, VisualSim integrates as many as thirty bottom-up component functions into a single, system level, easy to use, reusable block, or module. Four VisualSim queue blocks replace 24 prior generation queue blocks through polymorphic port support, block level menu attributes, while improving simulation performance.
About the Authors:
Darryl Koivisto is the CTO at Mirabilis Design. Dr. Koivisto has a Ph.D. from Golden Gate University, MS from Santa Clara University and BS from California Polytechnic University, San Luis Obispo. Deepak Shankar is Founder at Mirabilis Design. Mr. Shankar has an MBA from UC Berkeley, MS from Clemson University and BS from Coimbatore Institute of Technology.
Logistics is even more important today than it was in the early 1800’s. Further, the effectiveness of Defense systems is increasingly driven by sophisticated electronics. As the recent Ukraine conflict reveals, weapons such as precision munitions, autonomous drones, and other similar systems generate asymmetrical advantages on the battlefield. However, all these systems also generate a massive and complex electronics logistical tail which must be managed carefully. For the electronics supply ecosystem, Defense falls into the broader category of Long Lifecycle (LLC) system products.
“Two Speed” Challenge
LLC (Long lifecycle) are products which need to be supported for many years – typically anything over five years or more. In this time, these products need to provide legacy part support, and rarely drive very high chip volumes. However, the economics of semiconductor design imply that custom semiconductors only make sense for markets with high volume. Today, this consists largely of the consumer (cell phone, laptop, tablet, cloud, etc) marketplace which are short lifecycle products with typical lifecycle of < 5 years. This hard economic fact has optimized the semiconductor industry towards consumer driven short lifecycle products. This is at odds with the requirement of long lifecycle products, both from frequent End-of-Life obsolescence and long-term product reliability. The reliability is further impacted in Defense by the fact that these components must perform under strenuous external environments with challenging thermal, vibration and even radiation conditions. This “Two Speed” challenge and results in very frequent failure and obsolescence of electronic components.
Move towards “Availability” Contracts in Aerospace & Défense
Traditionally in the aerospace and defense industry, an initial contract for development and manufacture is followed by a separate contract for spares and repairs. In the past, there has been a trend towards “availability” contracts where industry delivers a complete Product Service System (PSS). The key challenge in such contracts is to estimate “Whole Life Cost” (WLC) of the product which may span 30 or even 40 years. As one might imagine, this PSS paradigm skyrockets the cost of systems and still is not fool proof because of its need to predict the future. This has led to some embarrassing costs for defense part procurement as compared to the commercial equivalent.
The US Secretary of Defense William Perry memorandum in 1994 resulted in a move towards performance-based specifications which led to the virtual abandonment of the MIL-STD and MIL-SPEC system that had been the mainstay of all military procurement in the US for several decades. Coupled with “Diminishing Manufacturing Sources” (DMS) for military grade components, the imperative was to move towards COTS (Commercial Of the Shelf) components while innovating at system level to cater to the stringent operating environment requirements. The initial reasoning and belief in using COTS to circumvent system costs was effective. However, it did expose defense systems to some key issues.
Key Issues for Defense Industry today
Component Obsolescence
Primarily as a result of “Two-Speed” challenge described above, components become harder to source over time and even grow obsolete and the rate of discontinuance of part availability is increasing steadily. Many programs such as the F22 stealth fighter, AWACS, Tornado, and Eurofighter are suffering from such component obsolescence. As a result, the OEM’s are forced to design replacements for those obsolete components and face nonrecurring engineering costs as a result. As per McKinsey’s recent estimates [“How industrial and aerospace and defense OEMs can win the obsolescence challenge, April 2022, McKinsey Insights” ], the aggregate obsolescence related nonrecurring costs for military aircraft segment alone are in the range of US $50 billion to US $70 billion.
Whole Life Cost (WLC)
As mentioned above, with increasing move towards “availability” contracts in defense and aerospace, one of the huge challenges has been to compute a realistic “Whole Life Cost” (WLC) of the products through the product lifecycle. This leads to massive held inventory costs with associated waste when held inventory is no longer useful. Moreover, any good estimate of WLC will require an accurate prediction of the future.
Reliability
Semiconductors for the consumer market are optimized for consumer lifetimes – typically 5 years or so. For LLC markets like Defense, the longer product life in non-traditional environmental situations often leads to product reliability and maintenance issues especially with increased use of COTS components.
Logistics chain in forward deployed areas
One of the unique issues in Defense which is further accentuated due to increased move towards “availability” contracts is the logistics nightmare to support equipment deployed in remote forward areas. A very desirable characteristic would be to have “in theatre” maintenance and update capability for electronic systems. The last mile is the hardest mile in logistics.
Future Function
Given the timeframes of interest, upgrades in functionality are virtually guaranteed to happen. Since Defense products often have the characteristic of being embedded in the environment, upgrade costs are typically very high. A classic example is one of a satellite where the upgrade cost is prohibitively high. Similarly with weapon systems deployed in forward areas, upgrade costs are prohibitive. Another example is obsolescence of industry standards and protocols and need to adhere to newer ones. In fact, field embedded electronics (more so in defense) require flexibility to manage derivative design function WITHOUT hardware updates. How does one design for this capability, and how does a program manager understand the band of flexibility in defining new products, derivatives, and upgrades?
Figure 1: Design for Supply Chain
What is the solution to these issues? That answer there is the need to build a Design for Supply Chain methodology and associated Electronic Design Automation (EDA) capability.
Just as manufacturing test was optimized by “Designing for Test” and power optimized by “Designing for Power” or performance with “Design for performance” etc, one should be designing for “Supply Chain and Reliability”! What are the critical aspects of Design for Supply Chain capability?
Programmable Semiconductor Parts: Programmable parts (CPU, GPU, FPGA, etc) have the distinct advantages of:
Parts Obsolescence: A smaller number of programmable parts minimize inventory skews, can be forward deployed, and can be repurposed around a large number of defense electronic systems. Further, the aggregation of function around a small number of programmable parts raises the volume of these parts and thus minimizes the chances for parts obsolescence.
Redundancy for Reliability: Reliability can be greatly enhanced by the use of redundancy within and of multiple programmable devices. Similar to RAID storage, one can leave large parts of an FPGA unprogrammed and dynamically move functionality based on detected failures.
Future Function: Programmability enables the use of “over the air” updates which update functionality dynamically.
Electronic Design Automation (EDA): To facilitate a design for supply chain approach, a critical piece is the EDA support. Critical functionality required is:
Total Cost of Ownership Model: With LLCs, it is very important to consider lifetime costs based on downstream maintenance and function updates. An EDA system should help calculate lifetime cost metrics based on these factors to avoid mistakes which simply optimize near-term costs. This model has to be sophisticated enough to understand that derivative programmable devices often can provide performance/power increases which are favorable as compared to older technology custom devices.
Programming Abstractions: Programmable devices are based on abstractions (computer ISA, Verilog, Analog models, etc) from which function is mapped onto physical devices. EDA functionality is critical to maintain these abstractions and automate the process of mapping which optimizes for power, performance, reliability, and other factors. Can these abstractions & optimizations further be extended to obsolescence?
Static and Dynamic Fabrics: When the hardware configuration does not have to be changed, EDA functionality is only required for the programming the electronic system. However, if hardware devices require changes, there is a need for a flexible fabric to accept the updates in a graceful manner. The nature of the flexible fabric maybe mechanical (ex..rack-mountable boards) or chemical (quick respins of PCB which maybe done in-field). All of these methods have to be managed by a sophisticated EDA system. These methods are the key to the ease of integration of weapons systems.
With the above capability, one can perform proactive logistics management. One of the best practices that can yield rich dividends is to constitute a cross functional team (with representation from procurement, R&D, manufacturing and quality functions) which continuously scans for potential issues. This team can be tasked with developing set of lead indicators to assess components, identify near-term issues, and develop countermeasures. In order for this cross-functional programs to work, the EDA functionality has to be tied into the Product Lifecycle Management (PLM) systems in the enterprise.
Currently, a great deal of system intent is lost across the enterprise or over time as the design team moves to other projects. Thus, even the OEMs who use such proactive obsolescence management best practices are significantly hampered by lack of structured information or sophisticated tools which allow them to accurately predict these issues and plan mitigating actions including last-time strategic buys, finding alternate suppliers and even finding optimal FFF (Fit-Form-Function) replacements. It is imperative that such software (EDA) tool functions are available yesterday.
Summary
The semiconductor skew towards short lifecycle products (aka consumer electronics) has created a huge opportunity for the Defense industry to access low-cost electronics. However, this also generates issues when there is a need to support products in excess of 30 years as a result of fast obsolescence and shorter component reliability. What further worsens the situation for Defense are couple of unique situations for them – first, logistics issues in supporting equipment deployed on forward deployed areas and second, the increasing use of “availability” or full Product Service System contracts where establishing WLC (Whole Life Costs) for the equipment becomes critical. The only way this can be solved efficiently is to bring in a paradigm shift to “Design for Supply Chain” EDA innovation is needed to significantly enhance to support these levels of abstractions at system/PCB level and be a catalyst to bring in this paradigm shift. McKinsey estimated a 35% cost reduction in obsolescence related nonrecurring costs by only using a structured albeit still reactive obsolescence management methodology, a proactive “Design for Supply Chain” approach can be truly transformational.
Acknowledgements:
Anurag Seth : Special thanks to Anurag for co-authoring this article.
D. “Jake” Polumbo (Major General USAF (ret.)): Thanks for Review (https://www.twoblueaces.com/)
Mark Montgomery (Executive Director): Thanks for Review (CyberSolarium.org)
Dan is joined by Graham Curren, CEO of ASIC provider Sondrel. Graham founded Sondrel in 2002 after identifying a gap in the market for an international company specialising in complex digital IC design. Prior to establishing Sondrel, Graham worked in both ASIC design and manufacturing before joining EDA company, Avant! Corporation.
Dan explores what makes Sondrel unique with Graham. They discuss the company’s recent IPO, as well as how the company’s design skills, track record and market focus have helped them to grow.
The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.
How do you reconfigure system characteristics? The answer to that question is well established – through software. Make the underlying hardware general enough and use platform software to update behaviors and tweak hardware configuration registers. This simple fact drove the explosion of embedded processors everywhere and works very well in most cases, but not all. Software provides flexibility at the expense of performance and power, which can be a real problem in constrained IoT applications. Consider tight-loop encryption/decryption for example. Meeting performance and power goals requires acceleration for such functions, which seems to require custom silicon. But that option only makes sense for high-volume applications. Is there a better option, allowing for a high-volume platform which can offer ISA extensibility for acceleration post-silicon? This is what Menta and Codasip are offering.
Step 1: First build a RISC-V core for your design
This step looks like any RISC-V instantiation though here you use a Codasip core, for reasons you’ll understand shortly. Codasip provides a range of directly customizable RISC-V cores which a semi supplier might choose to optimize to a specific yet broad market objective. The tool suite (Codasip Studio) offers all the features you would expect to get in support of such a core, including generating an SDK and the ability to hardwire customize the ISA. (Here, by “hardwire” I mean extensions built directly into the silicon implementation.) Codasip Studio also provides tools to explore architecture options and generate a custom compiler.
The hardware implementation of custom instructions is through an HDL block in parallel to the regular datapath, as is common in these cases. The HDL for this block is defined by the integrator to implement custom instructions, a byte-swap for example. Codasip Studio takes care of vectoring execution to the HDL rather than the ALU as needed, also connecting appropriate register accesses.
Step 2: Add an eFPGA block in the datapth
So far, this is just regular RISC-V customization. Extending customization options to post-silicon requires reprogrammable logic, such as that offered by Menta. Their technology is standard cell based and is claimed portable to any process technology, making it readily embeddable in most SoC platforms. You can start to see how such a RISC-V core could host not only hardwired extensions but also programmable extensions.
This needs involvement from Codasip Studio (CS) at 2 stages. First, you as an SoC integrator must tell the system that you plan to add ISA customization after manufacture. This instructs CS to embed an unprogrammed eFPGA IP into the datapath.
Second, when silicon is available, you (or perhaps your customer?) will re-run CS to define added ISA instructions, along with RTL to implement those instructions. This will generate a revised compiler and SDK, plus a bitstream to program the eFPGA. Voilà – you have a post-silicon customized RISC-V core!
Post-silicon ISA customization
To recap, this partnership between Codasip and Menta offer the ability to not only customize RISC-V cores pre-silicon but also post-silicon, enabling an SoC vendor to deliver products which can be optimized to multiple applications with potential for high volume appeal. You can learn more in this white paper.
Codasip is based in Europe but has customers worldwide, including Rambus, Microsemi, Mythic, Mobileye and others. Menta is also based in Europe and have particular strengths in secure, defense and space applications. As a technologist with roots in the UK, it’s nice to see yet more successful growth in European IP 😊.
Ann Kelleher is Intel’s Executive Vice President, General Manager, Technology Development, and she gave the first plenary talk to kick off the 2022 IEDM, “Celebrating 75 Years of the Transistor A Look at the Evolution of Moore’s Law Innovation”. I am generally not a fan of plenary talks because I think they are often too broad and general without any real information, but I thought this was a good talk.
She started with the messages that Moore’s law is critical to addressing the needs of computing, that System Technology Co-Optimization (STCO) is the next major evolution of Moore’s law and that there are many challenges to overcome.
She went on to review some of the innovations in device structures and all the products they have enabled.
In her view Moore’s law has gone through four evolutions:
Evolution 1 – geometric scaling. This was the classic Denard scaling era when a geometric shrink of a transistor improved power, performance, and area.
Evolution 2 – as scaling of certain elements of a transistor began to reach limits, novel materials and device structures were introduced to continue scaling. This was innovations like strain and HKMG, and the transition to FinFETs.
Evolution 3 – Design Technology Co-Optimization – co-optimization of the device design and process. For example, shrinking cell height requires fin depopulation and that reduces performance unless the process is improved to compensate. The last several years have seen DTCO becoming increasingly important.
Evolution 4 – System Technology Co-Optimization – just optimizing the device is no longer enough and we need to optimize the entire system!
She gave an example of STCO where chiplets, IP design and validation are DTCO with process technology, and are combined with package, chiplet design and validation integrated with packaging technology, along with software and hardware architecture and validation, to produce an optimized system.
A device includes process technology, transistor & interconnect and foundational IP.
DTCO adds core & accelerator IP and chiplets.
STCO further adds advanced packaging, system architecture, software and applications and workload.
There are many opportunities for innovation:
People – highlighted as the most important, people are the drivers of innovation.
Figure 1 illustrates that Intel continues to be on track to meet their silicon roadmap. I was skeptical Intel could meet the original Intel accelerated timeline, then they pulled it in (18A moved from 2025 to the second half of 2024), and so far, they are meeting it. I am very impressed that Ann and her team are executing at such a high level after a decade of delays at Intel.
On the packaging front Intel is working on Foveros Direct for the second half of 2023, with a 10x higher bump density versus the current Foveros process, alongside higher bandwidth, lower latency, lower power and smaller die area. Intel is also working on pluggable optical interconnect.
Longer term Intel research is working on:
Enabling seamless integration of chiplets with an additional 10x improvement in density and placement flexibility.
Super thin materials to further scale beyond RibbonFET by implementing 2D materials (this is a hot topic at the conference with multiple papers from Intel, TSMC and others).
New possibilities in energy efficiency with more energy efficient memory such as FeRAM (another hot topic), magento electric devices and GaN on silicon.
In summary Ann Kelleher presented a fascinating view of the opportunities and challenges to continue Moore’s law for another decade or more. Intel is clearly back as a leading technology innovator.
Register management tools have been used mostly in a bottom-up approach. There are some documents and/or spreadsheets created by the System Architects that are delivered to the design and verification teams. They then start capturing the HW/SW interface of the peripheral IPs in their in-house or commercial register management tool working their way up to the full SoC.
Using these tools make sure that the different teams are in sync and that there is a single source of truth for the HW/SW interface. This is a great step forward from where different teams were manually maintaining their view of the SoC registers. Jade Design Automation thinks that this could be taken to the next level and register management tools can be used in a top-down approach as well, right from the moment when the first draft of the SoC is created by the System Architect.
How cross domain knowledge can help create better SoCs?
This webinar shows how we can combine tools and practices from seemingly far away domains and how simple ideas can have great impact.
As the System Architect starts capturing the system level memory map and the high level HW/SW interface of the SoC, the Tech Leads need to review the proposed changes, assess the impact on their domains and raise potential issues as early as possible.
The solution proposed in this webinar came from the Embedded SW team, from the other end of the flow. Using a code review tool and process on the design data enables the Tech Leads to review changes before they go live and creates a track record of these reviews. As long as the design data format is human readable (for visual diff) and machine parsable (to auto-generate collateral) this flow can be used on any design aspect with any toolchain.
Learn more about how code review tools and practices can help with the collaboration between the System Architect and the Tech Leads from Tamas Olaszi, Director at Jade Design Automation. Join a live discussion with him on Tue, Dec 13, 2022 10:00 AM – 11:00 AM PST. The concept will be reviewed from the HW/SW interface’s perspective but the same principles can be applied to other aspects of the design flow as well.
Jade Design Automation has a laser sharp focus on register management with a mission to address the register management challenges from system architecture to SW bring-up.
With a good understanding of the challenges that system architects, design and verification engineers, SW teams and technical writers face during complex SoC projects the company offers a tool that mixes the flexibility of the in-house solutions with the robustness of a commercial product.
With an uncompromising attention to quality, state-of-the-art technology and best in class support we can address change requests with a turnaround time that is comparable to the in-house solution.
The company was founded in 2019 by Tamas Olaszi, a long timer of the semiconductor industry. He spent a decade with an Irish EDA company that was offering a tool suite around register management and SoC integration while he lived and worked in Asia, the US and multiple countries across Europe.
After joining Arm in 2014 he worked with his team on internal register management solutions for IoT subsystems and system IP. From this he moved on to building up and managing an embedded SW team that was responsible for the SW bring-up of IoT test boards.
Having seen how difficult it is to maintain in-house solutions for a long period of time and how existing solutions are built on decade old technologies with decade old data models that are inadequate for today’s challenges he decided to start Jade Design Automation with the help of a few co-workers he previously worked together at Arm .
Jade Design Automation is registered and regulated in the European Union.
Successful ASIC providers offer top-notch infrastructure and methodologies that can accommodate varied demands from a multitude of customers. Such ASIC providers also need access to best-in-class IP portfolio, advanced packaging and test capabilities, and heterogeneous chiplet integration capability among other things. Of course, to deliver the above in a viable fashion, the provider needs to pick a focus in terms of what markets it serves. Alchip chose the high-performance markets for its dedicated focus many years ago and has stayed the course. As of last year, more than 80% of its $372M revenue was derived from high performance computing (HPC) related markets.
Prior posts on SemiWiki have spotlighted many of Alchip’s capabilities and accomplishments. Here is one of the latest achievements by Alchip for which it collaborated with Synopsys. Alchip delivered a TSMC 7nm process-based SoC capable of 2.5GHz performance, consuming less than 400 watts of dynamic power on a 480 sq.mm die. Such a performance, power, area (PPA) metric on an AI-enabled data center SoC is noteworthy, with particular attention to be paid to the 400 watts power number.
AI-enabled Data Center SoC
A high-level block diagram of the Alchip implemented SoC (see below) showcases the complexity and highlights the need for a collaborative, multi-company effort.
Of course, for the above SoC which is data center oriented, memory and I/O operations consume an even higher percentage of the total chip power consumption.
With HPC data centers handling increasingly large data sets, memory and other high-speed I/O operations reach extreme levels, leading to elevated thermal conditions inside the server rooms. In addition to the cooling mechanisms deployed within server rooms, lower thermal dissipation from data center chips will go a long way in keeping thermal levels under control. 400 watts of power dissipation is a pretty aggressive target for a data center chip, which in turn calls for aggressively reducing the power consumed by memory and other I/Os.
Alchip-Synopsys Collaboration
A number of members of the TSMC Open Innovation Platform (OIP) teamed up on the AI-Enabled Data Center SoC project, with Synopsys EDA tools, foundation IP and interface IP bringing a lot to bear.
Synopsys offers the broadest portfolio of IP across TSMC technology nodes ranging from 180nm to 3nm FinFET through its continuous innovation cycle for optimizing SoC PPA. Alchip leveraged Synopsys IP portfolio including its optimized cell set tailored for HPC market applications. Included with the IP set are memory design for testability (DFT) and power management support.
Standby Power
Synopsys memory compilers are optimized for power hungry applications: memory standby power is reduced by up to 50% in light sleep mode, by up to 70% in deep sleep mode and by up to 95% in shut down mode.
Switching Power
Clock power is a significant component of a chip’s total power. Through a fully automated multibit-mapping flow from the RTL stage, Synopsys was able to reduce clock power by more than 30%, resulting in more than 19% reduction of the total power. The optimization techniques involved mapping sequential multibits to combinational multibits through De-Banking and Re-Banking.
Optimized Cell Set
Low active power versions of the combinational cells helped reduce dynamic power significantly. Special wide muxes, compressors and adders helped minimize routing congestion and total power. Fast adders help ensured performance was met post route.
PPA Benefits
In addition to reducing power, multibit combinational cells also reduced area. The logic restructuring done as part of optimization techniques reduced congestion to achieve better timing. Flops optimized for Setup/Hold enabled faster timing closure.
Summary
Alchip met its cloud-infrastructure customer’s PPA challenge through its tight collaboration with Synopsys. For Alchip’s press release on this, visit here. With the TSMC N7 based SoC success under its belt, the partners are working together on TSMC N5 and TSMC N3 engagements. For more details contact Alchip.
I am delighted to share my technical insights into RISC-V in this article to inspire and prepare the next generation of chip designers for the future of the open era of computing. If you understand how we build complex electronic devices like desktops and smartphones using processors, you would be more interested in learning and exploring the Instruction Set Architectures.
Usually, we prefer Complex Instruction Set Computer, CISC for desktops/laptops, and Reduced Instruction Set Computer, RISC for smartphones. The OEMs like Dell and Apple have been using x86 CISC processor for their laptops. Let me explain here the laptop design approach. The motherboard has a multicore CISC processor as the main component, which is connected to GPUs, RAM, storage memory, and other subsystems and I/O interfaces. The operating system runs multiple applications in parallel on the multicore processor, managing the memory allocation and I/O operations.
This is how we can realize any electronic system using a processor. However, we prefer System-On-a-Chip using RISC processor for smartphones as it helps us reduce the motherboard’s size and power consumption. Almost the entire system with multi-core RISC CPUs, GPUs, DSPs, Wireless and interface subsystems, SRAMs, Flash memories, and IPs is implemented on an SoC. The OEM Apple is following this smartphone’s SoC design approach even for their MAC books as an OEM trendsetter. All the latest MAC books use their M-series SoCs that use ARM’s RISC processor.
So, it’s evident that the proprietary ISAs Intel’s x86 or ARM’s RISC processors have been the choice of OEMs like Apple, Dell, Samsung, and others, but now why do we need an open ISA like RISC-V beyond all these well-proven proprietary ISAs.
In today’s situation, everyone uses SoCs for their laptops and smartphones. This kind of complex SoC demands both general-purpose and specialized processors. To realize chips like Apple’s M-series SoCs, we need different kinds of processors like RISC CPUs, GPUs, DSPs, Security Processors, Image processors, Machine Learning accelerators, Security and Neural engines, based on various general purpose and specialized ISAs from multiple IP vendors, as shown in figure1.
Figure1: Apple M1 SoC Ref: AnandTech
In this scenario, the major challenges would be:
Choosing and working with multiple IP vendors
Different IP vendors may have different IP licensing schemes, and the engineers will not have the freedom to customize the ISAs and design as they prefer to meet their design goals.
All specialized ISAs will not last/survive for long, affecting the long-term product support plans and roadmaps.
Also, the software/application development and updates involving multiple ISAs and toolchains would be challenging.
RISC-V is a general-purpose license-free open ISA with multiple extensions. It is an ISA separated into a small base integer ISA, usable as a base for customized accelerators and optional standard extensions to support general-purpose software development.
You can add your own extensions to realize your specialized processor or customize the base ISA if needed because it’s open. No license restrictions. So, in the future, we could create all general-purpose and specialized processors using only one RISC-V ISA and realize any complex SoC.
1. What’s RISC-V, and how it’s different from other ISAs?
RISC-V is a fifth major ISA design from UC Berkeley. It’s an open ISA maintained by a non-profit organization, RISC-V international, that involves all the stakeholders’ community to implement and maintain the ISA specifications, Golden reference models, and compliance test suites.
RISC-V is not a CPU implementation. It is an open ISA for both general-purpose and specialized processors. A completely open ISA that is freely available to academia and industry
RISC-V ISA is separated into a small base integer ISA, usable by itself as a base for customized accelerators or educational purposes, and optional standard extensions to support general-purpose software development
RISC-V supports both 32-bit and 64-bit address space variants for applications, operating system kernels, and hardware implementations. So, it is suitable for all computing systems, from embedded microcontrollers to cloud servers, as mentioned below.
Simple embedded microcontrollers
Secure embedded systems that run RTOS
Desktops/Laptops/Smartphones that run operating systems
Cloud Servers that run multiple operating systems
2. RISC-V Base ISA
RISC-V is a family of related ISAs: RV32I, RV32E, RV64I, RV128I
What RV32I/ RV32E/ RV64I/RV128I means:
RV – RISC-V
32/64/128 – Defines the Register width[XLEN] and address space
I – Integer Base ISA
32 Registers for all base ISAs
E – Embedded: Base ISA with only 16 registers
2.1 RISC-V Registers:
All the base ISAs have 32 registers as shown in the figure2, except RV32E. Only RV32E base ISA has only 16 Registers for simple embedded microcontrollers, but the register width is still 32 bits.
The register X0 is hardwired to zero. The special register called Program Counter holds the address of current instruction to be fetched from the memory.
As shown in figure-2, RISC-V Application Binary Interface, ABI defines standard functions for registers. The software development tools usually use ABI names for simplicity and consistency. As per the ABI, additional registers are dedicated for saved registers, function arguments and temporaries in the range X0 to X15, mainly for RV32E base ISA which needs only the top 16 registers for realising simple embedded microcontrollers. But the RV32I base ISA will have all 32 registers X0 to X31.
Figure2: RISC-V Registers and ABI Names Ref: RISC-V Specification
2.2 RISC-V Memory:
A RISC-V hart [Hardware Thread / Core] has a single byte-addressable address space of 2^XLEN bytes for all memory accesses. XLEN to refer to the width of an integer register in bits: 32/64/128.
Word of memory is defined as 32 bits (4 bytes). Correspondingly, a halfword is 16 bits (2 bytes), a doubleword is 64 bits (8 bytes), and a quadword is 128 bits (16 bytes).
The memory address space is circular, so that the byte at address 2^XLEN −1 is adjacent to the byte at address zero. Accordingly, memory address computations done by the hardware ignore overflow and instead wrap around modulo 2^XLEN.
RISC-V base ISAs have either little-endian or big-endian memory systems, with the privileged architecture further defining big-endian operation. Instructions are stored in memory as a sequence of 16-bit little-endian parcels, regardless of memory system endianness.
2.3 RISC-V Load-Store Architecture
You can visualize the RISC-V load-store architecture that is based on RISC-V registers and memory, as shown below in figure3.
The RISC-V processor fetches/loads the instruction from main memory based on the address in PC, decodes the 32-bits instruction, and then the ALU performs Arithmetic/Logic/Memory-RW operations. The results of ALU would be stored back into its registers or memory.
Figure3: RISC-V Load-Store Architecture
2.4 RISC-V RV32 I Base ISA:
RV32I base ISA has only 40 Unique Instructions, but a simple hardware implementation needs only 38 instructions. The RV32I instructions can be classified as:
Here I would like to explain how RISC-V ISA enables us to realize an optimized Register Transfer Level design to meet the low-power and high-performance goals.
As shown in figure4, the RISC-V ISA keeps the source (rs1 and rs2) and destination (rd) registers at the same position in all formats to simplify decoding.
Immediates are always sign-extended, and are generally packed towards the leftmost available bits in the instruction and have been allocated to reduce hardware complexity. In particular,
the sign bit for all immediates is always in bit 31 of the instruction to speed sign-extension circuitry.
Sign-extension is one of the most critical operations on immediates (particularly for XLEN>32),
and in RISC-V the sign bit for all immediates is always held in bit 31 of the instruction to allow
sign-extension to proceed in parallel with instruction decoding.
To speed up decoding, the base RISC-V ISA puts the most important fields in the same place in every instruction. As you can see in the instruction formats table,
The major opcode is always in bits 0-6.
The destination register, when present, is always in bits 7-11.
The first source register, when present, is always in bits 15-19.
The second source register, when present, is always in bits 20-24.
But why are the immediate bits shuffled? Think about the physical circuit which decodes the immediate field. Since it’s a hardware implementation, the bits will be decoded in parallel; each bit in the output immediate will have a multiplexer to select which input bit it comes from. The bigger the multiplexer, the costlier and slower it is.
It’s also interesting to note that only the major opcode (bits 0-6) is needed to know how to decode the immediate, so immediate decoding can be done in parallel with decoding the rest of the instruction.
2.6 RV32I Base ISA Instructions
3. RISC-V ISA Extensions
All the RISC-V ISA extensions are listed out here:
Figure 5: RISC-V ISA Extensions
We follow the naming convention for RISC-V processors as explained below:
RISC-V Processors: RV32I, RV32IMAC, RV64GC
RV32I: Integer Base ISA implementation
RV32IMAC: Integer Base ISA + Extensions: [Multiply + Atomic + Compressed]
RV64GC: 64bit IMAFDC [G-General Purpose: IMAFD]
Integer 64 bits Base ISA + Extensions: [Multiply + Atomic + SP Floating + DP Floating + Compressed]
4. RISC-V Privileged Architecture
RISC-V privileged architecture covers all aspects of RISCV systems beyond the unprivileged ISA which I have explained so far. Privileged architecture includes privileged instructions as well as additional functionality required for running operating systems and attaching external devices.
As per the RISC-V privileged specification, we can realize different kinds of systems from simple embedded controllers to complex cloud servers, as explained below.
Application Execution Environment – AEE: “Bare metal” hardware platforms where harts are directly implemented by physical processor threads and instructions have full access to the physical address space. The hardware platform defines an execution environment that begins at power-on reset. Example: Simple and secure embedded microcontrollers
Supervisor Execution Environment – SEE: RISC-V operating systems that provide multiple user-level execution environments by multiplexing user-level harts onto available physical processor threads and by controlling access to memory via virtual memory.
Example: Systems like Desktops running Unix-like operating systems
Hypervisor Execution Environment – HEE: RISC-V hypervisors that provide multiple supervisor-level execution environments for guest operating systems.
Example: Cloud Servers running multiple guest operating systems
Also, the RISC-V privileged specification defines various Control and Status Registers [CSR] to implement various features like interrupts, debugging, and memory management facilities for any system. You may want to refer to the specification to explore more.
As explained in this article, we could efficiently realize any system, from simple IoT devices to complex smartphones and cloud servers, using a common open RISC-V ISA. With monolithic semiconductor scaling failing, specialization is the only way to increase computational performance. The open RISC-V ISA is modular and supports custom instructions making it ideal for creating a wide range of specialized processors and accelerators.
As we witnessed great success in chip verification from the emergence of IEEE standard Universal Verification Methodology, the open RISC-V ISA will also emerge as an industry-standard ISA by inheriting all good features from various proprietary ISAs and lead us to the future of an open era of computing. Are you ready with RISC-V expertise for this amazing future?
Author: P R Sivakumar, Founder and CEO, Maven Silicon
The Interface IP market has grown with 21% CAGR from 2017 to 2021 and we review the part of this market restricted to the high-end of PCIe, DDR, Ethernet and D2D IP made of PHY and controller targeting the most advanced technology nodes and latest protocol release. We will show that an IP vendor focusing investment on the high-end interconnect IP can benefit for a very healthy ROI based on a business growing with 75% CAGR for 2022 to 2026.
To keep a leading market share on this very demanding IP business, he will have to show the best time to market (TTM) and demonstrate 100% perfect execution. Taking Alphawave as an example, If the company can perfectly execute this strategy, he can grow his IP business from $90 million in 2021 to $600 million or more in 2026.
We have used market information extracted from the “Interface IP Survey & Forecast”, and we have selected the following interconnect protocols and the best technology nodes to focus on, because showing the highest ROI, 7nm, 5nm, 3nm and even lower:
PCIe 4 and above (PCIe 5, 6…)
CXL 1, CXL 2, CXL 3
UCIe
Ethernet based on 56G SerDes, 112G SerDes, 224G…
LPDDR5 memory controller
HBM3 memory controller
We can see that all these protocols support HPC, becoming the segment leader (size and growth) for TSMC in 2022.
Revenue and Growth Rate by Platform – TSMC 2022
Selected Protocol Forecast 2022-2026
IP outsourcing rate and IP market size are growing year over year (about by 20% by year for the total interface IP market since the last five years) and, as we will see, even more for the high-end protocol-based IP.
For PCIe 5 and 6, mature technology nodes don’t make sense and we split the PHY IP forecast between mainstream and advanced technology nodes, as ASP are different, but the Controller IP ASP is expected to be technology independent. We assume that a combo PCIe/CXL PHY will be proposed.
Advanced memory controller includes:
DDR5, LPDDR5 memory controller according with above technology split
GDDR6, GDDR7 targeting 7nm, 5nm and below
HBM3 targeting 7nm, 5nm and below
The picture below describes the number of VHS SerDes IP commercial design starts for the three data rate: 56Gbps, 112Gbs and 224Gbps for the next 5 years.
SerDes PHY IP Number of Sales Forecast 2021-2026
The semiconductor industry is undertaking a major strategy shift towards multi-die systems.
Multi-die systems are driving the need for standardized die-to-die interconnects. Several industry alliances have come together to define such standards, the most promising being Unified Chiplet Interconnect Express (UCIe).
D2D Design Start IP Forecast 2021-2026
Top4 High-End Interface IP by Revenue 2021-2026
The group made of the top 4 Interface IP (PCIe, DDR, Ethernet and D2D) is forecasted to grow with 27% CAGR for 2022 to 2026. If we consider the high-end only, the Global CAGR for HE Top 4 Interfaces will be 75% for 2021 to 2026:
The Top 4 Interface IP by Revenue 2021-2026
The weight of the HE Top 4 protocols IP was $370 million in 2021, the value forecasted in 2026 will be $2115 million, or CAGR of 75%, compared with 27% for the overall IP sales for these Top 4 protocols IP.
We will look at Alphawave IP vendor case study. Created in 2017, the company has focused on high-end IP since the beginning with PAM 4 DSP based 112G SerDes, supporting Ethernet and PCIe segments. The strategy was successful as 2021 IP revenues was $89.9 million. In the meantime, Alphawave IP has acquired an Ethernet controller IP vendor and propose PCIe controller validated by the PCI-SIG for PCIe 5 complete solution.
Moreover, in June 2022, they acquired OpenFive. The first result was to enlarge Alphawave IP portfolio and address two more segments: high-end DDR memory controller (HBM3 and LPDDR5) and D2D. The second result was to bring design services capability to create a potential emerging chiplet business.
That’s why Alphawave is a good candidate for a case study “What can be the potential business development for an IP vendor focusing primarily on high-end Interface IP (PCIe, DDR memory controller, Ethernet and D2D)”.
2022-2026 Market Share Evolution When Focusing on
High-End Interface IP
Starting with 24% market share of HE Interface ($370 million in 2021), we evaluate the revenues generated when keeping deploying this strategy, with three scenarios: flat market share, growing by 1% per year and by 10% by year. The real case may be inserted between these two scenarios, calling for Alphawave revenues from IP only on HE interfaces market to be between $500 million and $800 million in 2026.
In 2020, we have seen the emergence of Alphawave IP building strong position on the high-end interface IP segment (thanks to PAM4 DSP SerDes), creating “Stop-for-Top” strategy, by opposition with Synopsys “One-Stop-Shop”. If we consider that this high-end segment, strongly driven by HPC (including datacenter, IA, storage, etc.), is expected to considerably grow on the 2020 decade, Alphawave IP could enjoy major market share on this $2 billion interface IP sub-segment by 2026, a revenue between $500 and $800 million being realistic.
** This white paper has been sponsored by Alphawave IP, nevertheless the content reflects the author’s positioning about the IP market and the way it expected to evolve in the future, during the 2020 decade.