Banner 800x100 0810

Podcast EP72: Analog AI/ML with Aspinity

Podcast EP72: Analog AI/ML with Aspinity
by Daniel Nenni on 04-20-2022 at 10:00 am

Dan is joined by Tom Doyle, CEO of Aspinity. Dan explores the benefits of Aspinity’s analog signal processing technology with Tom. The ultra-low power analog computing capability delivered by Aspinity has significant implications for the design and deployment of AI/ML systems.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Visual Debug for Formal Verification

Visual Debug for Formal Verification
by Steve Hoover on 04-20-2022 at 6:00 am

ThisIsFormal

Success with Open-Source Formal Verification

The dream of 100% confidence is compelling for silicon engineers. We all want that big red button to push that magically finds all of our bugs for us. Verification, after all, accounts for roughly two-thirds of logic design effort. Without that button, we have to create reference models, focused tests, random stimuli, checkers, coverage monitors, regression suites, etc.

Of course, there is no big red button, and I’d be crazy to suggest that we could abandon all of that work altogether. But, at the same time, that’s not far from what Akos Hadnagy and I did, several years ago, in developing the WARP-V CPU generator.

I wrote WARP-V initially to explore code generation using the emerging Transaction-Level Verilog language. I brought the model to life with a simple test program that summed numbers from 1 to 10. Then, Akos put RISC-V configurations of WARP-V through the wringer, as a student in Google Summer of Code, using the open-source RISC-V Formal framework. By completing formal verification (which has now also been done independently by Axiomise using different tools and checkers), we felt no inclination to bother with any of the standard RISC-V tools and compliance tests.

Formal Verification Hurdles

While our formal-focused approach helped eliminate a considerable amount of work, in other ways it did add some effort, too. Of course it did. If formal verification were a panacea, everyone would be taking this approach, and while formal verification has been around for a long time, it still struggles to attain first-class status in the verification landscape. This has little to do with the core science and everything to do with usability.

The first big leap in usability for formal verification came with the provision of counterexample traces. These let you debug formal failures much like simulation failures, using a waveform viewer. This, however, is not enough to put formal verification on a level playing field with dynamic verification. For one thing, simulations can produce log files in addition to waveforms. These provide high-level context about simulations to help with debugging. For aggressive use of formal verification, getting big picture context is important. Here’s why:

Traditionally, focused testing plays a major role in stabilizing the model. The myriad basic bugs are identified by focused tests, which are written for a specific purpose in a very controlled context. You know what they are doing. You know what to look for. Formal verification, however, will identify a counterexample that could be doing absolutely anything (within your constraints). Fortunately, the trace will be short, but formal tools have a way of finding really gnarly corners you would never expect or never be able to hit in a controlled fashion. That’s what’s so great about formal!

So if we’re going to find a significant portion of our bugs using formal methods, we’d better make it easier to figure out what’s going on in the counterexamples. That’s where visualization comes in.

Streamlining Debugging with Visualization

WARP-V utilizes the Visual Debug framework, now freely available to open-source projects in the Makerchip.com IDE. Visual Debug (or VIZ) makes it easy to define simulation visualizations. These aid in the debugging process of any digital circuit developed using any hardware description language and any design environment that produces industry-standard (.vcd) trace files. You may have seen screenshots of visualizations similar to those of WARP-V in various of my posts about my RISC-V CPU design courses, in which hundreds of students have developed their own RISC-V CPUs.

Using Visual Debug for the first time is like turning the lights on in a room you didn’t realize was dark. Just as you wouldn’t walk into a dark room to find your car keys without turning on the lights, you shouldn’t start debugging without first enabling Visual Debug. Though it wasn’t the case at the start of WARP-V development, as you’ve undoubtedly guessed by now, VIZ now works for formal counterexamples as well as it does for simulation.

Implications of Easier Debugging

Let’s put these visualization benefits in the context of WARP-V’s design methodology. This means, I first get to talk about the benefits of TL-Verilog–my favorite topic. Utilizing TL-Verilog, WARP-V is able to support different pipeline depths and even different instruction set architectures from the same codebase. And it is able to do so in less code (and correspondingly fewer bugs) than a single RTL-based CPU core. Furthermore, transaction-level design greatly simplifies the task of creating test harnesses to connect any RISC-V hardware configuration to the RISC-V Formal checkers. As described in “Verifying RISC-V in One Page of Code!”, the reduction in modeling effort across four different CPU configurations was arguably a factor of 70x or more! (These benefits would apply to test harnesses for dynamic verification as well.)

In the face of these TL-Verilog benefits, the effort to debug formal verification failures became a significant portion of the remaining work, and Visual Debug would have streamlined this effort. More generally, being able to easily decipher formal counterexamples can be the boost in productivity that tips the scales for formal verification. This, in turn, makes our resulting hardware more robust and secure. And security is quite possibly the biggest challenge faced by design teams today.

Visual Debug in Action

I leave you with a screen-capture, narrated by yours truly, demonstrating debugging of a register bypass (aka register forwarding) bug in WARP-V.

Related Links: Makerchip.comVisual DebugWARP-V CPU generatorRISC-V FormalRISC-V CPU design courses“Verifying RISC-V in One Page of Code!”


White Paper: Advanced SoC Debug with Multi-FPGA Prototyping

White Paper: Advanced SoC Debug with Multi-FPGA Prototyping
by Daniel Nenni on 04-19-2022 at 10:00 am

S2C EDA Prototyping White Paper 2022

S2C EDA recently released a whitepaper written by a good friend of mine Steve Walters. Steve and I have worked together many times throughout our careers and I consider him to be one of my trusted few, especially in regards to prototyping and emulation. Steve is also my co author on the book “Prototypical II The Practice of FPGA-Based Prototyping for SoC Design”. Prototypical II and this 10 page white paper are available on the S2C EDA website HERE.

Introduction

As SoC designs advance in complexity and performance, and software becomes more sophisticated and SoC-dependent, SoC designers face a relentless push to “shift left” the co-development of the SoC silicon and software to improve time-to-market.  Consequently, SoC verification has evolved to include multi-FPGA prototyping, and higher prototype performance, to support longer runs of the SoC design prototype, running more of its software, prior to silicon – in an effort to avoid the skyrocketing costs associated with silicon respins.  While FPGA prototyping for SoC design verification by its nature remains a “blunt instrument”, FPGA prototyping is still the only available pre-silicon verification option, beyond hardware emulation, for achieving longer periods of SoC design operation capable of running software, and, in some cases, “plugging” the SoC design prototype directly into real target-system hardware.  Not surprisingly, commercial FPGA prototype suppliers are using the latest FPGA technology to implement FPGA prototyping, offering multi-FPGA prototyping platforms, and advancing FPGA prototyping debug tool capabilities, to meet customer demands for more effective SoC verification.

Ideally, SoC design debug tools for FPGA prototyping would enable software simulation-like verification and debug at silicon speeds – providing visibility of all internal SoC design nodes, not impede prototype performance, provide unlimited debug trace-data storage, and be quickly reconfigurable for revisions to the SoC design and/or the debug setup.  In reality, today’s SoC design debug tools for FPGA prototyping falls short of the ideal, and multi-FPGA prototyping adds to the challenge of achieving ideal SoC design debug tool capabilities.  As a result, today’s FPGA prototyping for SoC design debug offers tradeoffs among the ideal debug tool capabilities, and it is left to the SoC design verification team to configure an “optimal” verification strategy for each SoC design project – with consideration for future scaling-up and improved verification capabilities.

This white paper reviews some of the multi-FPGA prototyping challenges for SoC design verification and debug, and, reviews one example of a commercially available multi-FPGA prototyping debug capability offered by S2C Inc., a leading supplier of FPGA prototyping solutions for SoC design verification and debug (s2ceda.com).

Summary and Conclusions

S2C’s MDM Pro hardware, together with S2C’s Prodigy FPGA prototyping platforms, and S2C’s Player Pro software, implements a rich set of debug features that provides SoC designers with the flexibility to optimize the FPGA prototype debug tools for a given FPGA prototyping project.  MDM Pro combines off-FPGA hardware for “deep” trace-data storage and complex hardware trigger logic, in combination with probe multiplexing IP in the FPGA to access a large number of debug probes over a few FPGA high-speed GTY connections to minimize the consumption of FPGA I/O, and the ability to setup more probe connections than need to be viewed at the same time so that more probes may be viewed when needed without recompiling the FPGA or degrading the debug performance.  Player Pro software for debug compliments the debug hardware with a powerful user interface for managing the debug setup, configuring advanced trace-data trigger conditions, initiating debug runs of the FPGA prototype, and viewing the debug trace-data from multiple FPGAs in a single viewing window.

Also read:

Prototype enables new synergy – how Artosyn helps their customers succeed

S2C’s FPGA Prototyping Solutions

DAC 2021 Wrap-up – S2C turns more than a few heads


SoC Application Usecase Capture For System Architecture Exploration

SoC Application Usecase Capture For System Architecture Exploration
by Sondrel on 04-19-2022 at 6:00 am

Fig 1

Sondrel is the trusted partner of choice for handling every stage of an IC’s creation. Its award-winning define and design ASIC consulting capability is fully complemented by its turnkey services to transform designs into tested, volume-packaged silicon chips. This single point of contact for the entire supply chain process ensures low risk and faster times to market. Headquartered in the UK, Sondrel supports customers around the world via its offices in China, India, Morocco and North America.

Introduction

Early in the SoC development cycle, Product Managers, Systems Architects and relevant technical stakeholders discuss and elaborate product requirements.  Each group tends to have a specific mental model of the product, typically with product managers focusing on the end-use and product applications. At the same time, Systems Architects focus on functionality and execution and implementation of the requirements.

The ‘Requirements Capture Phase’ identifies, formulates and records all known functionality and metrics, including performance in a clear and complete proposal. In addition, this exercise identifies functionality that is not fully understood or may be included later and seeks to determine and plan what tasks are required to complete the qualification and quantification of such functions.

On completion, or as complete as possible at the program’s start, the system architecture team’s requirements go through an analysis phase with appropriate inputs from design and implementation teams. The outcome of this iterative process is an architecture design specification that includes an architecture design for which all functionality, estimation of the power, performance and area are determined.

The inclusion of design and implementation effort at the initial phase ensures better accuracy and validation for the specification and architecture. In addition, it identifies the sensitivities needed to guide design choices.

The architecture analysis includes the architecture exploration, IP selection/specification, verification of requirements, and generation of the project execution plan with major tasks to be elaborated in later phases.

The architecture exploration of the candidate architecture is a significant component. It refines the architecture design by modelling the proposal and evaluating known or reference use cases, dynamically allowing the system topology to be defined and provisioning of resources to be allocated (memory, bus fabric data/control paths etc.).

While it allows aspects of the functionality to be evaluated and validated (connectivity, timing, performance etc.) for confidence in the correctness of the design, later phases using more detailed and accurate models are used to determine and correct potential errors during the implementation of the architecture.

The remaining sections of this article cover the use of modelling in the architecture phase of the program.

SoC application use case capture for system architecture exploration

The initial part of SoC Architecture Exploration is a rigorous way of capturing one or more application use cases and dataflows which an SoC is required to perform.  Accurate and complete description of use cases is necessary to communicate with stakeholders and agree on requirements early in the product definition phase.

The Systems Architect seeks to draw out the product requirements and express them so that technical and non-technical stakeholders can keep up with the product intent and architectural choices without excessive technical detail.

Figure 1 shows an overview of this collaboration process in 8 steps:

  1. Market analysis, industry trends, product requirements definition carried out by the Product Manager for a potential SoC solution
  2. Product Usecase requirements are communicated to the System Architect, usually by presentations, spreadsheets or documents.
  3. Requirements translation to DSL format required by modelling flows
  4. Tools generate an Executable Specification and visualisations of the use case
  5. Tools also generate the cycle-accurate SystemC model required for use case architecture exploration
  6. Systems architect inspects results of an exploration exercise and progressively converges to an optimal architecture for the SoC
  7. System Architect communicates findings with Product Manager
  8. The Product Manager may decide to modify requirements or collaborate with the Systems Architect to further refine the candidate SoC Architecture.

Industry trends show that vision-based applications are becoming more common to incorporate classical computer vision techniques and neural-net-based AI inferencing, with a fusion step to combine results from the two stages.

Figure 2 shows a typical autonomous vision use case data flow graph, with nodes representing processing functions and edges representing data flow.  The specific stages are:

  • Frame Exposure – The interval during which a camera sensor takes a snapshot of its field of vision. The image sensor may be configured in either global shutter or rolling shutter mode, and each mode has an exposure period associated with it.
  • Frame RX – The interval over which pixes of an image grouped in lines are sent to the SoC over a real-time interface such as MIPI CSI-3.
  • Image Conditioning – Any image pre-processing, filtering or summarisation steps performed on the received data before the actual compute stages.
  • Classical Computer Vision – Well-known vision processing algorithms, for example, camera calibration, motion estimation or homography operations for stereo vision.
  • Computational Imaging – Vision algorithms are augmented with custom processing steps such as Pixel Cloud or Depth Map estimation
  • AI Inferencing – Neural Net based image processing for semantic segmentation, object classification and the like.
  • Data Fusion – Final stage sensor fusion and tracking. May also include formatting or packetisation processing.
  • Data TX – Can be over PCIE or a real-time interface such as MIPI CSI-3 at a constant or variable data rate.

Associated with every processing stage are parameters that need to be specified so that the dynamic simulation model can be configured correctly.  These parameters generally describe:

  1. Read DMA characteristics: Number of blocks, block sizes, memory addresses and memory access patterns
  2. Processing characteristics: The delay which the task will require in order to perform its processing.
  3. Write DMA characteristics: Number of blocks, block sizes, memory addresses and memory access patterns

Figure 3 shows that this information is best described in tabular format, where rows represent processing tasks and columns are parameters associated with the task.

The use case graph may also have an embedded sub-graph, which is often the case with AI applications that describe the algorithm in terms of a Neural Network computation graph.  Figure 4 shows a sub-graph within a larger use case graph.  The method of describing the sub-graph is in the same tabular format, which may be present in any part of the larger graph, not just with AI processing.

Usecase parameters captured in tabular format as shown in Figure 3 are sufficient to describe the application intent regarding dataflows between processing stages and the processing delay of a given stage.  The added benefit of having the graph drawn to the left of the table is that it becomes intuitive to understand the data flow, hence the relationship between nodes as processing stages.  Even for large graphs, the method is applicable and offers supplementary information readily available if required.

Separate to the Application Usecase is a model of the Hardware Platform, which will perform the data transfers and processing delays as prescribed by the Usecase model.  The Hardware Platform model will typically have the following capabilities:

  1. Generate and initiate protocol compliant transactions to local memory, global memory or any IO device
  2. Simulate arbitration delays in all levels of a hierarchical interconnect
  3. Simulate memory access delays in a memory controller model as per the chosen JEDEC memory standard.

Figure 4 shows a block diagram of one such Hardware Platform, which, in addition to a simulation model, forms the basis for elaborating an SoC architecture specification.

So far we have defined two simulation constructs – the Application Usecase Model and the Hardware Platform Model.  What is required is now a specification of how the Usecase maps on to the Hardware Platform subsystems.  That is, which tasks of the application usecase model are run by which subsystems in the hardware platform model.  Figure 6 shows a the full simulation model with usecase tasks mapped on to subsystems of the hardware platform.

The Full System Model in Figure 6 is the dynamic performance model used for Usecase and Hardware Platform Exploration.

Every node in the Usecase graph is traversed during simulation, with the Subsystem master transactor generating and initiating memory transactions to one or more slave transactors. As a result, delays due to contention, pipeline stages or outstanding transactions are applied to every transaction, which cumulatively sums up the total duration that the task is active.

The temporal simulation view in Figure 7 shows the duration active for each task for a single traversal of the Application Usecase.  The duration for the entire chain is defined as the Usecase Latency.  Having one visualisation showing the Hardware Platform, Application Usecase and Temporal Simulation view often work very well for various stakeholders because it is intuitive to follow.

Now a single traversal is not useful, decides providing some sanity checks about the setup of the environment.  For thorough System Performance Exploration multiple traversals need to be run, and in this setup, we see the two phases of the simulation.  A transient phase is when the pipeline is filling up, followed by the steady-state when the pipeline is full; hence the system is at maximum contention.

Figure 8 highlights a portion of the simulation when the system is at maximum contention. During the steady-state, metrics are gathered to understand the performance characteristics and bounds of the system.  This guides further tuning and exploration of the use case and hardware platform.

Figure 9 shows two configurations of the hardware platform and the resulting temporal views.  One system is setup for low latency by using direct streaming interfaces to avoid data exchange in the DDR memory.

Yet again, the benefits of showing the two systems visually bring clarity so that all stakeholders can understand with a bit of guidance.

The complete architecture exploration methodology relates to use case and platform requirements, simulation metrics, key performance indicators and reports.

Figure 10 shows the flow of information in the following order:

  1. Application Usecase is defined first. The tabular format for capturing the use case is crucial here, as shown previously in Figure 3
  2. Usecase Requirements associated with the Application Usecase are stated.
  3. Usecase Requirements are converted into Key Performance Indicators, which are thresholds on metrics expected from simulation runs.
  4. Simulation metrics are collected from simulation runs
  5. Usecase performance summary report is produced by checking if metrics meet their Key Performance Indicators or not.

A similar flow applies to Hardware Platform Requirements whereby:

  1. Hardware Platform defined first
  2. Platform Requirements stated
  3. Platform KPIs extracted from Requirements
  4. Platform simulation metrics collected
  5. Platform performance summary generated by comparing metrics with KPIs.

Also read:

Sondrel explains the 10 steps to model and design a complex SoC

Build a Sophisticated Edge Processing ASIC FAST and EASY with Sondrel

Sondrel Creates a Unique Modelling Flow to Ensure Your ASIC Hits the Target

 


Power Transistor Modeling for Converter Design

Power Transistor Modeling for Converter Design
by Tom Simon on 04-18-2022 at 10:00 am

Magwel PTM Field Viewer

Voltage converters and regulators are a vital part of pretty much every semiconductor-based product. They play an outsized role in mobile devices such as cell phones where there are many subsystems operating at different voltages with different power needs. Many portable devices rely on Lithium Ion batteries whose output voltage can vary from 4.2 volts down to 3.0 volts as they discharge. The power distribution systems in these devices need to operate with extremely high efficiency to meet battery life requirements.

As an example, a typical cell phone contains CPU cores, DRAM, RF radio, display backlight, camera, audio codec and other subsystems which need voltages ranging from 0.8V to ~4V – all from a single voltage source in the lithium ion battery. A combination of buck and boost converters are needed to precisely produce all these voltage levels from the battery regardless of its state of charge. Because switching based converters can be noisy, low drop out (LDO) voltage regulators are also needed for several power supplies.

In the converters and regulators listed above, one of the most important elements is the pass device, which handles all the current to the load and controls the final output voltage. Pass devices can be made from a wide range of materials and can be designed as bipolar or MOS devices. Regardless of material and device type, the design of the pass device has a major effect on power loss and thermal behavior.

Magwel PTM Field Viewer

Power devices typically have many fingers and large channel widths (W). Connections to the semiconductor layers are made through a complex interconnection of metal and via layers that connect all the active areas in parallel. The size and topology of these devices leads to complex electrical behaviors. There are a large number of gate/base contacts which often have maze like connections to the external device terminal(s). The same is often true for connections to the source and drain, or emitter and collector.

These complex metal connections contribute to device resistance and can also introduce non-uniform delays within the device. To model this electrical behavior, designers need tools like the Magwel Power Transistor Modeler (PTM) suite. Traditional circuit extractors are not designed to deal with wide metal, large via arrays and usual shapes found in power devices. Likewise, point-to-point resistance values are needed, along with efficient and accurate ways to model the channel.

Magwel’s PTM tools use a solver based extractor that is optimized for the complex metal shapes and vias found in power devices. PTM can automatically identify the channel and will segment it according to user settings to create multiple parallel devices that can be used for full device modeling.

Usually when power devices are used for switching converters the active area can be modeled effectively as a linear resistive value based on the foundry device model and operating conditions, such as temperature and stimulus. However, Low Drop Out (LDO) regulators are often used to get as much working voltage out of a discharging battery. The lower the drop-out voltage the longer the LDO regulator can use a battery and the less overall power is wasted on internal resistance and converted to heat. For this reason, LDO regulator pass device performance is extremely important, necessitating the use of more sophisticated device modeling for the active region. Magwel’s PTM has the option to use non-linear models to accurately predict the behavior of the active area during LDO power device operation.

Another important aspect of power transistor modeling is the stimulus used at the external device pins for simulation. Magwel’s PTM offers a wide range of easy to use options for this. The most basic method is to simply set a constant voltage or current. The user can select the operating Temperature for each simulation. There is also a voltage controlled voltage source (VCVS) mode for modeling the device pin voltage as a proportional function of a probe voltage in the device. This is exceptionally useful for working with circuits that have replica or sensing devices.

With the inputs described above, PTM can provide voltage values at every point in the device. Designers can also view the current density throughout each layer. Thresholds for current density can be set to flag potential electromigration violations. In addition to output reports and exportable csv files, users can view a field view for full visualization of the device for easy debugging and optimization.

Magwel’s PTM is used by many leading converter circuit design companies. Silicon validation results show correlation within a percent or two. Designers can make provisional changes to the device geometry and pin locations and quickly rerun simulations without iterating back through the layout tools to perform what-if analysis when optimizing the design. More information on the PTM suite of tools is available on the Magwel website.


Bespoke Silicon is Coming, Absolutely!

Bespoke Silicon is Coming, Absolutely!
by Daniel Nenni on 04-18-2022 at 6:00 am

IMG 9977

It was nice to be at a live conference again. DesignCon was held at the Santa Clara Convention Center, my favorite location, which to me there was a back to normal crowd. The sessions I attended were full and the show floor was busy. Masks and vaccinations were not required, maybe that was it. Or there was a pent-up demand to get back engaged with the semiconductor ecosystem? Either way it was a great conference, absolutely.

SemiWiki stalworth companies Cadence, Ansys, Siemens, and Samtec, were all there. We will have more coverage of their talks over the next week or two. SemiWiki newcomer Xpeedic was there and we will be covering their new announcement as well.

The first panel I attended in the Chip Head Theater was titled Bespoke Silicon: How System Companies are Driving Chip Design. The panelists were John Lee, GM Semiconductor, Electronics, Optics BU, Ansys. Rob Aitken, Arm Fellow and Director of Technology, Arm Research. Prashant Varshney, Head of Product, Silicon Vertical, Microsoft Azure.

This panel was set up to explore the trend of system/software companies deciding they need semiconductor solutions that cannot be bought off the shelf. Some prominent examples of this are Meta, Amazon, Microsoft, and Google who are all defining and designing their own chips.  An understanding of what is driving this market trend also gives insights on how it impacts the technical demands on Ansys’ simulation/analysis products.

Why are these companies doing this? The background enabler is, of course, the internet and the pervasive digitalization of society and the economy. But more specifically, it is a confluence of advances in AI/ML algorithms together with semiconductor systems that have become big and complex and capable enough to actually move the needle for an entire business division. Take, for example, Meta’s vision for a VR-enabled future: it all depends critically on the technical capability of the optical headset as well as the power of the AI algorithms driving it – which itself requires a lot of silicon to execute.

Microsoft’s gaming division is only competitive to the degree that its Xbox can stay at the cutting edge of graphic processing. Amazon Web Services finds its costs structure is tied to the price, performance and power profile of the CPUs they use to power their data centers. So they developed their proprietary Graviton2 microprocessor in collaboration with Arm.  There are very interesting business dynamics resulting from this that the panel explored.

At the lower, technical level this evolution is driven on the one hand by advances in AI/ML techniques, and on the other hand by advances in integration density with 3D-IC that have accelerated past reliance on just Moore’s Law. We see the latest HPC products from AMD and Nvidia and Intel are all multi-die chiplet systems. The recent industry collaboration on the release of the UCIe spec indicates how seriously these companies take the 3D-IC revolution as an enabler for the systems they want to build. Not to mention that just AI/ML algorithms are driving a leap in design sizes all on their own – see the wafer-scale engine from Cerebras which is explicitly targeted at ML training.

What this means from the Ansys point of view is that they are being called on to analyze increasingly large and complex multi-die systems. That is where the analysis/signoff market is going. However, the technical challenge extends well beyond simple massive capacity (which makes a cloud strategy a must-have for EDA tools). Even more challenging is the emergence of new physical effects that need to be simulated. So, the 3D-IC problem is not just quantitatively bigger, it is also qualitatively different. We call this the multiphysics challenge of 3D-IC.

The primary new physics is thermal analysis since heat dissipation is often the #1 limiting factor on these advanced designs (part of Cerebras’ secret sauce is how they manage to cool their ~15kW wafer). Of course, thermal analysis is not new but it is to most chip designers. It is an example of how chip, package, and PCB design is collapsing into a single design problem. Furthermore, thermal analysis screams out for a computational fluid dynamics simulation engine to model how the air flow and heatsink interact to set boundary conditions for the 3D-IC module. That’s another modeling physics pulled into the mix. And then there are the mechanical stress/warpage issues from having differential thermal expansion in various parts of the 3D-IC stack. Add a mechanical modeling engine to the mix.

One last example of new physics being jammed into the 3D-IC design problem space: electromagnetic analysis of high-speed signals. You see, what makes a 3D-IC integration fundamentally different from just placing two packaged chip next to each other on a PCB is that the inter-chip communication is very low-power and very high-bandwidth. If that can be done, then we can minimize the power/performance cost of going off-chip. But these interconnect traces absolutely require electromagnetic simulation for interference and coupling. How many digital designers are familiar with EM simulation?

Bottom line: The manufacturing process allows us to produce very fine-grain electrical integration of multiple chips. But the success of this market, which is driven in large part by bespoke silicon projects, is gated by the ability of designers to model, simulate, and verify the electrothermal interactions. I believe that is where the true bottleneck to adoption lies, and something Ansys tools are uniquely positioned to alleviate.

Also read:

Webinar Series: Learn the Foundation of Computational Electromagnetics

5G and Aircraft Safety: Simulation is Key to Ensuring Passenger Safety – Part 4

The Clash Between 5G and Airline Safety


Quantum Computing Trends

Quantum Computing Trends
by Ahmed Banafa on 04-17-2022 at 10:00 am

Math Physics Biology

Quantum Computing is the area of study focused on developing computer technology based on the principles of quantum theory. Tens of billions of public and private capitals are being invested in Quantum technologies. Countries across the world have realized that quantum technologies can be a major disruptor of existing businesses, they have collectively invested $24 billion in in quantum research and applications in 2021 [1].

A Comparison of Classical and Quantum Computing

Classical computing relies, at its ultimate level, on principles expressed by Boolean algebra. Data must be processed in an exclusive binary state at any point in time or what we call bits. While the time that each transistor or capacitor need be either in 0 or 1 before switching states is now measurable in billionths of a second, there is still a limit as to how quickly these devices can be made to switch state.

 As we progress to smaller and faster circuits, we begin to reach the physical limits of materials and the threshold for classical laws of physics to apply. Beyond this, the quantum world takes over, in a quantum computer, a number of elemental particles such as electrons or photons can be used with either their charge or polarization acting as a representation of 0 and/or 1. Each of these particles is known as a quantum bit, or qubit, the nature and behavior of these particles form the basis of quantum computing [2]. Classic computers use transistors as the physical building blocks of logic, while quantum computers may use trapped ions, superconducting loops, quantum dots or vacancies in a diamond[1].

Physical vs Logical Qubits

When discussing quantum computers with error correction, we talk about physical and logical qubits. Physical qubits are the physical qubits in quantum computer, whereas logical qubits are groups of physical qubits we use as a single qubit in our computation to fight noise and improve error correction.

To illustrate this, let’s consider an example of a quantum computer with 100 qubits. Let’s say this computer is prone to noise, to remedy this we can use multiple qubits to form a single more stable qubit. We might decide that we need 10 physical qubits to form one acceptable logical qubit. In this case we would say our quantum computer has 100 physical qubits which we use as 10 logical qubits.

Distinguishing between physical and logical qubits is important. There are many estimates as to how many qubits we will need to perform certain calculations, but some of these estimates talk about logical qubits and others talk about physical qubits. For example: To break RSA cryptography we would need thousands of logical qubits but millions of physical qubits.

Another thing to keep in mind, in a classical computer compute-power increases linearly with the number of transistors and clock speed, while in a Quantum computer compute-power increases exponentially with the addition of each logical qubit [4].

Quantum Superposition and Entanglement

The two most relevant aspects of quantum physics are the principles of superposition and entanglement.

Superposition: Think of a qubit as an electron in a magnetic field. The electron’s spin may be either in alignment with the field, which is known as a spin-up state, or opposite to the field, which is known as a spin-down state. According to quantum law, the particle enters a superposition of states, in which it behaves as if it were in both states simultaneously. Each qubit utilized could take a superposition of both 0 and 1. Where a 2-bit register in an ordinary computer can store only one of four binary configurations (00, 01, 10, or 11) at any given time, a 2-qubit register in a quantum computer can store all four numbers simultaneously, because each qubit represents two values. If more qubits are added, the increased capacity is expanded exponentially.

Entanglement: Particles that have interacted at some point retain a type of connection and can be entangled with each other in pairs, in a process known as correlation. Knowing the spin state of one entangled particle – up or down – allows one to know that the spin of its mate is in the opposite direction. Quantum entanglement allows qubits that are separated by incredible distances to interact with each other instantaneously (not limited to the speed of light). No matter how great the distance between the correlated particles, they will remain entangled as long as they are isolated. Taken together, quantum superposition and entanglement create an enormously enhanced computing power[3] .

Quantum computers fall into four categories [1]

  1. Quantum Emulator/Simulator
  2. Quantum Annealer
  3. Noisy Intermediate Scale Quantum (NISQ)
  4. Universal Quantum Computer – which can be a Cryptographically Relevant Quantum Computer (CRQC)

Quantum Emulator/Simulator

These are classical computers that you can buy today that simulate quantum algorithms. They make it easy to test and debug a quantum algorithm that someday may be able to run on a Universal Quantum Computer (UQC). Since they don’t use any quantum hardware, they are no faster than standard computers.

Quantum Annealer

A special purpose quantum computer designed to only run combinatorial optimization problems, not general-purpose computing, or cryptography problems. While they have more physical Qubits than any other current system they are not organized as gate-based logical qubits. Currently this is a commercial technology in search of a future viable market.

Noisy Intermediate-Scale Quantum (NISQ) computers.

Think of these as prototypes of a Universal Quantum Computer – with several orders of magnitude fewer bits. They currently have 50-100 qubits, limited gate depths, and short coherence times. As they are short several orders of magnitude of Qubits, NISQ computers cannot perform any useful computation, however they are a necessary phase in the learning, especially to drive total system and software learning in parallel to the hardware development. Think of them as the training wheels for future universal quantum computers.

Universal Quantum Computers / Cryptographically Relevant Quantum Computers (CRQC)

This is the ultimate goal. If you could build a universal quantum computer with fault tolerance (i.e., millions of error- corrected physical qubits resulting in thousands of logical Qubits), you could run quantum algorithms in cryptography, search and optimization, quantum systems simulations, and linear equations solvers.

Post-Quantum / Quantum-Resistant Codes

New cryptographic systems would secure against both quantum and conventional computers and can interoperate with existing communication protocols and networks. The symmetric key algorithms of the Commercial National Security Algorithm (CNSA) Suite were selected to be secure for national security systems usage even if a CRQC is developed. Cryptographic schemes that commercial industry believes are quantum-safe include lattice-based cryptography, hash trees, multivariate equations, and super-singular isogeny elliptic curves [1].

Difficulties with Quantum Computers [2]

•       Interference – During the computation phase of a quantum calculation, the slightest disturbance in a quantum system (say a stray photon or wave of EM radiation) causes the quantum computation to collapse, a process known as de-coherence. A quantum computer must be totally isolated from all external interference during the computation phase.

•       Error correction – Given the nature of quantum computing, error correction is ultra-critical – even a single error in a calculation can cause the validity of the entire computation to collapse.

•       Output observance – Closely related to the above two, retrieving output data after a quantum calculation is complete risks corrupting the data.

Ahmed Banafa, Author the Books:

Secure and Smart Internet of Things (IoT) Using Blockchain and AI

Blockchain Technology and Applications

Quantum Computing

References

1.     https://www.linkedin.com/pulse/quantum-technology-ecosystem-explained-steve-blank/?

2.     https://www.bbvaopenmind.com/en/technology/digital-world/quantum-computing-and-ai/

3.     https://phys.org/news/2022-03-technique-quantum-resilient-noise-boosts.html

4.     https://thequantuminsider.com/2019/10/01/introduction-to-qubits-part-1/

Also read:

Facebook or Meta: Change the Head Coach

The Metaverse: A Different Perspective

Your Smart Device Will Feel Your Pain & Fear


Tesla: Canary in the Coal Mine

Tesla: Canary in the Coal Mine
by Roger C. Lanctot on 04-17-2022 at 6:00 am

Tesla Canary in the Coal Mine

The automotive industry is tied up in knots over cybersecurity. Consumers expect their cars to be secure. Car makers spend millions on securing cars, but don’t know how, what, or if to charge consumers for security.

Meanwhile, most cyber penetration reports to organizations such as the Auto-ISAC are related to enterprise attacks. The only cars being regularly hacked are Teslas. Tesla is effectively the automotive industry’s canary in a coal mine.

Like the proverbial canary in a cage in a coal mine – whose asphyxiation might serve as a warning to miners – the high profile attacks on Teslas – the latest reported by a German teenager – are a persistent reminder of what is in store for the rest of the industry. While there have been infamous hacks (like the infamous Jeep hack of 2016), Tesla has been the target of everyone from teenagers to professional Chinese hacking organizations.

A consumer survey was released by vehicle infrastructure supplier Sonatus last week under the provocative headline: “Sonatus Survey Shows Majority of Consumers Would Spend Big to Alleviate Automotive Cybersecurity Concerns.” The survey found that “despite seemingly constant headlines about automotive cybersecurity breaches, over a third of respondents are not concerned about their vehicles being hacked.” Most automotive industry participants would consider that percentage of unconcerned respondents a little on the low side.

The Sonatus release continues: “Most of the surveyed consumers who did have cybersecurity concerns expressed a willingness to pay a premium for added security features, with nearly 60% of all consumers willing to spend at least $250, and 30% willing to spend at least $1,000.” This finding would be greeted by skepticism by most. The disconnect may reflect a definition of “security” within which Sonatus appears to have included vehicle theft.

Says the Sonatus press release: “With regards to specific concerns over what a hacker might do if they were to infiltrate a vehicle’s security system, consumers are most concerned about their vehicle being physically stolen, which is not something typically associated with cybercrime. 60% cited this as a key concern, compared to 55% that reported concerns of hackers gaining access to their personal data, 53% that have concerns about location tracking, and 52% that are concerned about hackers interfering with driving capabilities.”

Alas, Sonatus polluted its cybersecurity interest level findings (too high) with stolen vehicle and privacy violation concerns – after all, your phone is more likely to be tracked than your car. Cybersecurity is a problematic issue because consumers are less familiar with the likely scenarios associated with cyber vehicle crime – such as ransomware that might lock out a vehicle owner or brick the car by preventing it from being started.

While the spectacular Jeep hack, with its demonstration of remote control was alarming, anyone hacking a car is more likely to be after financial gain of some kind – not a remote joy ride of someone else’s car. What is really remote is the potential for a terrorist attack.

Most vehicle attacks in the news have been for sport and have typically involved disabling a car or remotely activating functions for fun. This contributes to the uneasy confidence of auto makers that continue to invest in hardening their vehicles and their networks in anticipation of an attack that has yet to materialize.

The low level of threat activity directed at vehicles is deceptive. With thousands of suppliers working with the typical auto maker, the level of vulnerability is extraordinarily high. This is especially so when taking into account networks of dealers and independent servicers.

You can add to the risk profile dozens of in-vehicle electronic control units, multiple in-vehicle networks, and a dozen or more wireless connections to the car. In addition, electric vehicles are not only interacting with network operating centers and telematics service providers, they are also plugging into the power grid.

The list of companies providing cybersecurity solutions is long and growing. These companies are targeting everything from in-vehicle gateways and ECUs to car maker network operating centers and engineering operations. At the same time, semiconductor suppliers themselves are building secure elements directly into their devices.

All of this points toward the dashboard-ization of vehicle management. Any car maker worth its welds is going to want a command center where the entire connected fleet can be monitored in real time for physical crashes or cyber penetrations. Some have had this in place for years.

But day after day it is Tesla seeing the brunt of vehicle-centric attacks, while legacy auto makers contend with hackers targeting their enterprise operations. The Sonatus survey highlights the growing awareness of cybersecurity among consumers, but it misrepresents the willingness of consumers to pay for cyber protection.

Consumers expect this protection and auto makers must provide it. In the end, it boils down to the value and reputation of a brand and how it is perceived by consumers. This is a question of consumer confidence, customer retention, and cost avoidance.

It’s time for auto makers to start establishing their cybersecurity credentials – along with theft and privacy protection. Tesla has established a reputation for paying hacker bounties for finding vulnerabilities and also for rapidly fixing them. It’s time to pay attention to that canary.

Also Read:

ISO 26262: Feeling Safe in Your Self-Driving Car

Chip Shortage Killed the Radio in the Car

A Blanche DuBois Approach Won’t Resolve Traffic Trouble


Podcast EP71: Critical Enablers for the Custom Silicon Revolution

Podcast EP71: Critical Enablers for the Custom Silicon Revolution
by Daniel Nenni on 04-15-2022 at 10:00 am

Dan is joined by Dr. Elad Alon, CEO and co-founder at Blue Cheetah Analog Design, Elad’s experience includes Professor of EECS at UC Berkeley, co-director of the Berkeley Wireless Research Center, and consulting or visiting positions with many global semiconductor companies.

Dan and Elad explore the trends for increasing custom chip development, the complexity of the process and how analog mixed signal and chiplet strategies are becoming critical enablers for success.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


The Lost Opportunity for 450mm

The Lost Opportunity for 450mm
by Scotten Jones on 04-15-2022 at 6:00 am

450mm Wafer SemiWiki


I spent several days this week at the SEMI International Strategy Symposium (ISS). One of the talks was “Can the Semiconductor Industry Reach $1T by 2030” given by Bob Johnson of Gartner. His conclusion was, that $1 trillion dollars is an aggressive forecast for 2030 but certainly we should reach $1 trillion dollars in the next 10 to 12 years. He also noted that the industry would need to nearly double to achieve this forecast (a 73% increase in wafer output). He further forecast ~25 new memory fabs at 100K wafers per month (wpm) and 100 new logic or other fabs at 50K wpm (300mm). It immediately struck me, where are we going to build all these fabs, where will the people come from to run them, and where would we get the resources required. Wafer fabs are incredibly energy and water intensive and produce large quantities of greenhouse gases.

At the same conference there was a lot of discussion of environmental impact. Across the entire semiconductor ecosystem there is growing awareness and actions to reduce our environmental impact – reuse, reduce, recycle.

What does this have to do with 450mm wafers you ask.

A 450mm wafer has 2.25 times the area of a 300mm wafer. If you build 450mm wafer fabs with the same wpm output as 300mm fabs you need approximately 2.25 times fewer fabs (even less due lower edge die losses), 25 memory fabs becomes 11 memory fabs and 100 logic or other fabs becomes 44 fabs. These are much more manageable numbers of fabs to build.

If you look at people required to run a fab, the number of people required is largely based on the number of wafers, by running fewer-bigger wafers the number of people required is reduced.

When 450mm was being actively worked on, the goals where the same tool footprint for the same wafer throughput (likely not achievable), the same chemical and gas, and utility usage per wafer, a 2.25x reduction in usage per unit area. There was a recognition that beam tools such as exposure, implant, and some metrology tools where the wafer surface was scanned would have lower throughput but even accounting for this my simulations projected a net cost reduction per die for 450mm of 20 to 25%.

Unfortunately, the efforts to develop 450mm have ended and the only 450mm wafer fab has been decommissioned. The 450mm effort was different than past wafer size conversions, at 150mm Intel was the company that led the transition and paid for a lot of the work and at 200mm it was IBM. At 300mm a lot of the cost was pushed onto the equipment companies, and they were left with a long time to recover their investments. At 450mm once again the costs were being pushed onto the equipment companies and they were very reluctant to accept this situation. In 2014 Intel (one of the main drivers of 450mm) had low utilization rates and an empty fab 42 shell and they pulled their resources off 450mm, TSMC backed off, equipment companies put their development efforts on hold and 450mm died.

At this point it is likely too late to revive 450mm, ASML have their hands just trying to produce enough EUV systems and getting high-NA into production. High-NA EUV systems for 300mm are already enormous – difficult to transport systems, making much bigger 450mm versions would be an unprecedented engineering challenge. I do think there is important lesson for the semiconductor industry here. The semiconductor companies have a long history of short-sighted squeezing of their suppliers on price often to their own long term detriment. Starting wafers are an excellent example, prices have been driven down so low that it isn’t economical for the wafer manufacturers to invest in new capacity and now the industry is facing shortages. It is only shortage driven price increases that are now finally making new investment economical.

Over the next decade, as we potentially double our industry while trying to reduce our environmental footprint our task would be much easier with 450mm wafer, but unfortunately our inability to work together and unwillingness to take a long term view has left us without this enhancement in our tool kit.

Also Read:

Intel and the EUV Shortage

Can Intel Catch TSMC in 2025?

The EUV Divide and Intel Foundry Services