Banner 800x100 0810

Podcast EP68: The Foundation of Computational Electromagnetics

Podcast EP68: The Foundation of Computational Electromagnetics
by Daniel Nenni on 03-25-2022 at 10:00 am

Dan is joined by Dr. Matthew Commens, product manager Ansys. Matt discusses an upcoming webinar series on the inner workings and capabilities of Ansys simulation software. How the series began, the impact and benefits and a view of the future are all covered.

Webinar Series: Learn the Foundation of Computational Electromagnetics

Dr. Matthew Commens, Principal Product Manager, HF at Ansys, Inc., first joined Ansys in 2001 working for Ansoft as an application engineer specializing in high frequency electromagnetic simulation. He is responsible for the strategic product direction of Ansys HFSS and is a recognized expert in the application of computational electromagnetics. Prior to Joining Ansys he worked as an antenna designer and simulation manager at Rangestar Wireless in Aptos, CA and as a nuclear magnetic resonance (NMR) probe designer at Varian Inc. in Palo Alto, CA. He is co-author on five patents in the areas of NMR, antenna design and electromagnetic simulation and holds a Ph.D. in Physics from Washington University in St. Louis, MO. and a B.S. in Physics from University of Missouri-Rolla (now Missouri University of Science and Technology).

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


WEBINAR: Overcome Aging Issues in Clocks at Sub-10nm Designs

WEBINAR: Overcome Aging Issues in Clocks at Sub-10nm Designs
by Daniel Nenni on 03-25-2022 at 8:00 am

Infinisim Webinar

We all know that designers work hard to reach design closure on SOC designs. However, what gets less attention from consumers is the effort that goes into ensuring that these chips will be fully operational and meeting timing specs over their projected lifetime. Of course, this is less important for chips used in devices with projected lifespans of a few years, such as cell phones. Yet, aging is a major issue for designs that go into applications that call for many years or even decades of operation. These include medical devices, aerospace, military, automotive, infrastructure and many more. Looking at the list above it should also be clear that many of these applications have implications for human safety. A broken cell phone is one thing, a malfunctioning aviation or automotive control system is quite another.

Verifying that a design meets timing specification, including clock tree skew, slew and jitter across process corners, while difficult, is a well understood process, with tools and methodologies available to support it. Evaluating if a chip has been designed to operate after 10 or 20 years of aging is a far more complex task, but an essential one. Frequently designers resort to guard banding to compensate for future aging effects. However, due to the nature of the processes involved in aging, simply adding timing margin may not be sufficient.

In fact, seemingly disconnected decisions about clock gating methods can  have big effects on how aging manifests in older designs. Infinisim, a leading provider of clock tree analysis solutions, discusses the ins and outs of aging and how it can be minimized and simulated before tape out in a white paper titled “CMOS Transistor Aging and its impact on sub 10nm Clock Distribution”. The clock tree plays a critical role in aging and is a good place to start when looking to minimize aging effects.

The Infinisim webinar Overcome aging issues in clocks at sub-10nm designs will cover the challenges associated with aging, limitations of existing methodologies and provide a strategy to increase aging verification coverage. An advanced clock analysis methodology to minimize the impact of aging through increased verification coverage will be presented. The target audience is clock architects and clock designers and timing verification engineers.

Abstract:
Aging is becoming a severe threat to integrated circuits (IC), leading to field failures as transistor sizes continue to decrease. Aging significantly affects the ability of transistors to maintain their operational characteristics and if not thoroughly analyzed, aging will eventually slow down the device and cause circuit failure. In previous design processes at 32nm, 14nm, and even 10nm, clock aging was relatively easily accounted for by guard banding on frequency, slew rates, and other design parameters. With sub 10nm processes we find this is no longer the case, and now more than ever, designers must ensure proper aging analysis of their clocks and implement design mitigations.

Presenter:
Roy Reyes has over 20 years of experience designing clocks for integrated circuits. He was the clock designer for three major CPU’s and several systems on chip (SOC) at Intel. Roy has a master’s degree from the University of Miami and completed several other engineering graduate classes at Virginia Tech, among other top schools. He has a patent in optical computing and has published papers with the department of defense, applied optics, and SPIE (society for optics and photonics). He has received multiple awards at Intel and while working for the department of defense.

Register Here:  Overcome aging issues in clocks at sub-10nm designs 

Also read:

White Paper: A Closer Look at Aging on Clock Networks

WEBNAR: Challenges in analyzing High Performance clocks at 7nm and below process nodes


Electronics, COVID-19, and Ukraine

Electronics, COVID-19, and Ukraine
by Bill Jewell on 03-25-2022 at 6:00 am

Electronics Production 2022

The outlook for electronics and semiconductors in 2022 is uncertain. Just as the world was returning to more normal conditions after (hopefully) the worst of the COVID-19 pandemic is over, Russia invaded Ukraine in February.

The International Monetary Fund (IMF) in a March 15 blog post asserted the Russian invasion of Ukraine will impact the global economy through three channels:

High prices for energy and other commodities will dampen demand.
Trade and supply chains in neighboring countries will be disrupted.
Investor uncertainty will tighten financial conditions.

The IMF’s January 2022 forecast called for World GDP to moderate from the pandemic recovery rate of 5.9% in 2021 to more sustainable rates of 4.4% in 2022 and 3.8% in 2023. In its March 15 blog, the IMF stated it will revise its GDP forecasts downward in its April update. The January forecast had Russia’s GDP growing 2.8% in 2022. After the Ukraine invasion, international sanctions on Russia will certainly drive its economy into recession. The three factors listed above will likely lower the economic outlook for most nations.

The impact on the electronics and semiconductor sectors will be indirect. There is no significant semiconductor or electronics manufacturing in Russia, Ukraine, or neighboring countries. However, lower overall demand in 2022 will reduce electronics demand to some extent.

The major electronics producing countries in Asia have mostly returned to normal growth. South Korea, which had a minimal pandemic related slowdown, is the strongest with three-month-average growth over 20% for the last eight months through January 2022. China is back to pre-pandemic growth rates in the 12% to 13% range. Taiwan also did not see a significant pandemic slowdown and is growing 13%. Vietnam electronics production declined in autumn 2021 due to pandemic-related shutdowns, but it returned to growth in February 2022. Japan had production declines at the end of 2021 following pandemic recovery growth of over 10% earlier in the year. Japan is continuing a long-term slowing of electronics production due to shifts to lower-cost Asian countries.

United States electronics production growth has been in the 4% to 5% range for the last five months through January 2022, above the pre-pandemic range of minus 1% to plus 2% in 2019. The 27 countries of the European Union (EU 27) showed a 14% decline in January 2022 after robust growth in the 30% to 40% range in the first half of 2021. In addition to pandemic effects, the EU 27 has benefited from a shift of production from the United Kingdom after Brexit. The UK has experienced flat to declining electronic production of the last three years except for pandemic recovery growth in April through August 2021.

Comparing data from the fourth quarter of 2021, when the world was in pandemic recovery, to the fourth quarter of 2019 (pre-pandemic) shows the trends by country and product. The chart shows the change in electronics production in local currency for 4Q 2021 versus 4Q 2019. The EU 27 and South Korea had growth over 30%. China, Taiwan, and Vietnam had change in the mid-teens to low 20% range. U.S. growth was a moderate 8%. Japan and the UK both had 9% declines as production shifted from Japan to other Asia countries and from the UK to EU countries.

PC unit shipments grew 28% from 4Q 2019 to 4Q 2021 driven by more people working and learning from home during the pandemic. Smartphone unit shipments were basically flat, down 2%.

The electronics industry appears to have largely overcome the pandemic slowdown and is back on track for steady growth. However, the Russia-Ukraine war is causing uncertainty for the global economy. The effect on electronics will likely be noticeable but not significant.

Also read:

Semiconductor Growth Moderating

COVID Still Impacting Electronics

2021 Finishing Strong with 2022 Moderating


Analog Design Acceleration for Chiplet Interface IP

Analog Design Acceleration for Chiplet Interface IP
by Tom Simon on 03-24-2022 at 10:00 am

Analog Generators Boost Designer Productivity

Compared to the automation of digital design, the development of automation for analog has taken a much more arduous path. Over the decades there have been many projects both academic and commercial to accelerate and improve the process for analog design. One of the most interesting efforts in this area is being spearheaded by Blue Cheetah Analog Design. They are taking a new approach to deliver no compromise PPA analog IP with standard integration collateral. I had a chance to talk to Blue Cheetah President and Co-Founder Krishna Settaluri recently about what sets them apart.

Analog Generators Boost Designer Productivity

At the core of their analog IP offering is the use of generators to rapidly deliver customized analog IP. They acknowledge that analog design still needs to be done by skilled analog designers and that generators are not a panacea. Rather, expert designers can leverage their generator technology, which is based on work spearheaded by Chief Scientist and Co-Founder Eric Chang, to allow the rapid development of production ready analog blocks.

One calculated move that Blue Cheetah has made is to focus initially on the chiplet interface market. This is a rapidly growing market that is undergoing early efforts at standardization. Blue Cheetah is a proponent of open standards and, amongst them, CEO and Co-Founder Elad Alon has been a strong voice in various standards committees. In this space there are Open Domain-Specific Architecture’s (ODSA) Bunch of Wires (BOW), Intel’s AIB, Open HBI, and the recent emergence of UCIe as the leading evolving standards. These all reflect the consensus that parallel interfaces make more sense than serial links for communicating between die that are within the same module or package.

Serialization and deserialization add unnecessary complexity for in-package links where there are sufficient numbers of bumps to send data in parallel. To increase efficiency the PHY layer for parallel interfaces can be stripped down to make the individual bit-lines as small as possible. For instance, BOW uses just two optional wires for FEC and AUX per 16 bits of interface and relies on a separate I2C or SPI link for control. Also, BOW offers the option of forgoing termination when it is used for short links. Of course, BOW can also be used with termination on links up to 20mm channel reach – which Blue Cheetah’s generators provide as a design choice.

Blue Cheetah started with the premise that generator technology, based on the analog generator framework (BAG) initially developed at UC Berkeley can meaningfully change the nature of analog design. Analog designers are just as important as ever, but generators let them work more efficiently and allow for much easier adjustment of design parameters and migration between processes. Blue Cheetah generators use what they call primitives to capture what is needed for a particular device in a specific process node. Generators allow the rapid creation of the heterogenous process PHYs often needed within a single module/package.

Blue Cheetah generators help with circuit and physical design efficiency resulting in fully characterized and optimized output. This makes it possible for designs to go from spec to sign-off quality GDS in their flow. Blue Cheetah’s customers receive the industry standard set of design views needed to integrate the PHY IP into their design, from simulation models to physical layout. The generators make it easy to adjust the IP during the design process to adapt to design changes such as ECOs, etc.

Blue Cheetah’s vision is for horizontalization of the analog mixed signal blocks that are needed as foundational IP. High quality commercialized custom IP for these building blocks can be a market changer for the industry. Starting with chiplet interface IP makes a lot of sense because it is a rapidly growing market with few entrenched proprietary vendors. Blue Cheetah generator technology looks like a realistic blending of automation and skilled designer input to produce no-compromise analog blocks. More information about Blue Cheetah is available on their website.

Also read:

Blue Cheetah Technology Catalyzes Chiplet Ecosystem

Podcast Episode 23: What are chiplets and why are they gaining popularity?

Alphawave IP and the Evolution of the ASIC Business

 

 


Experimenting for Better Floorplans

Experimenting for Better Floorplans
by Bernard Murphy on 03-24-2022 at 6:00 am

Repartitioning min

There is sometimes an irony in switching to a better solution in design construction or analysis. The new approach is so much better that you want to experiment to further optimize the design. Which then exposes another barrier to enjoying that newfound freedom. SoC design teams often find this when switching from crossbar interconnect architectures to network-on-chip (NoC) architectures. NoCs are much more effective in using floorplan space between blocks while still meeting latency and bandwidth goals. After all, this is the reason many design teams switch.

Then they want to further squeeze out area (and cost) by restructuring the design. Some restructuring in implementation is common, but these teams want to be more aggressive, slicing, dicing and merging blocks over multiple trials. That’s where they hit a barrier. Implementation flows aren’t built for this kind of manipulation. You decide up-front how you want to split logical hierarchy, and you stick to that. If you want a different partitioning, you must redesign the RTL hierarchy, a possible but painful manual process. Surely automation should be possible in this experimenting for better floorplans?

Why is this problem hard?

Hierarchy is a natural part of any large SoC design. Legacy functions, internal and external IPs, reusable subsystems, and division of labor between RTL design teams all contribute to hierarchy. Which is very useful for many purposes in verification and implementation but not easy to change if you want to experiment with partitioning.

What does repartitioning look like? Perhaps you have a big logical block which looks rather bulky in the floorplan. You might open more floorplan options if you split that into two pieces. Doesn’t seem difficult. Manually create two blocks with half the instances in one and half in the other. But now, you must ensure you maintain the same logical connectivity. In effect, rubber-banding connections across the split, through all the appropriate RTL constructs – new ports and connections as required. Looks manageable in a simple case, maybe, but add in tie-offs, opens, feedthroughs and other complications and this quickly becomes challenging. Further, the best place to split a block may not be in the middle. Some splits may be more effective than others. You need to experiment.

But wait – there’s more

Simple splits are only a start. Think of pushing a block into another block or popping a block out. Think of the grouping and ungrouping functions we find so useful in PowerPoint. All great ways to build new experiments, but each new experiment requires more painstaking manual reconstruction of the connectivity (and lots of opportunity for mistakes) to go along with those changes. Followed by equivalence checking to make sure you didn’t break anything.

Schedules and patience run out quickly when repartitioning is this difficult. Which obviously allows little opportunity to make significant improvements in the layout. There must be a better way.

Automating repartitioning

This problem is a natural for automation. Tedious and complex, with thousands of things to check and get exactly right, but the intent behind all those changes is simple enough to describe. This capability is built-in in the Arteris IP SoC & HSI Development Platform and is in production use in many leading SoC design shops today. They see extensive use, particularly in support of these floorplan repartitioning experiments, also in support of re-partitioning for low power optimization. If you have suffered through enough manual repartitioning, might want to give them a call.

Also read:

An Ah-Ha Moment for Testbench Assembly

Business Considerations in Traceability

Traceability and ISO 26262


The EUV Divide and Intel Foundry Services

The EUV Divide and Intel Foundry Services
by Scotten Jones on 03-23-2022 at 10:00 am

Intel IDM 2.0 Process Roadmap
The EUV Divide

I was recently updating an analysis I did last year that looked at EUV system supply and demand, while doing this I started thinking about Intel and their Fab portfolio.

If you look at Intel’s history as a microprocessor manufacturer, they are typically ramping up their newest process node (n), in volume production on their previous node (n-1) and ramping down the node before that (n-2 node). They don’t typically keep older nodes in production, for example, last year 10nm was n, 14nm was n-1 and 22nm was n-2. Intel had some 32nm capacity in Fab 11X but that has now been converted to a packaging Fab. This contrasts with someone like TSMC that built their first 130nm – 300mm fab in 2001 and is still running it plus their 90nm, 65nm, 40nm and 28nm fabs as well.

By the end of 2022 Intel should be ramping up their 4nm node, then in 2023 their 3nm node, and in 2024 their 20A (2nm) and 18A (1.8nm) nodes should ramp up. All of those are EUV based nodes and it would seem reasonable that by the end of 2024 Intel would have little use for non-EUV based processes for microprocessor production since their 7nm/10nm non-EUV nodes would be n-4/n-5 depending on how you treat 10nm/7nm.

If I look at Intel’s current and planned Fab portfolio, there are EUV capable fabs and older fabs that are unlikely to ever be used for EUV, in fact EUV tools require an overhead crane and many of Intel’s older fabs would likely require significant structural modifications to accommodate this, plus Intel is building 9 EUV based production fabs.

The following is a site by site look at Intel’s fabs:

  • New Mexico – Fab 11X phases 1 and 2 are Intel’s oldest production fabs and they are being converted to packaging Fabs. 11X-3D may continue to operate for 3D Xpoint. Intel recently discussed two more generations of 3D Xpoint and this is currently the only place to make it.
  • Oregon – Fab D1X phases 1, 2 and 3 now lead all of intel’s EUV based development and early production. Fabs D1C/25 and D1D are older development/production fabs that are unlikely to be converted to EUV and are currently being used for non-EUV production.
  • Arizona – Fabs 52 and 62 are EUV fabs under construction. Fab 42 is currently running non-EUV nodes but it was built as a EUV capable Fab and will likely be used for EUV someday. Fabs 12 and 32 are production fabs running non-EUV nodes and will likely never be converted to EUV.
  • Ireland – Fab 34 is an EUV fab under construction with equipment currently being moved in, this will likely be Intel’s first 4nm EUV node production site. Fabs 24 phases 1 and 2 are non-EUV production sites and will likely never be used for EUV (unless they get combined with Fab 34 at some point).
  • Israel – Fab 38 is an EUV fab under construction and will be a 4nm EUV node production site. Fabs 28 phases 1 and 2 are non-EUV node production and will likely never be used for EUV (unless they get combined with Fab 34 at some point).
  • Ohio – Silicon Heartland EUV based fabs 1 and 2 are in the planning stage.
  • Germany – Silicon Junction EUV based fabs 1 and 2 are in the planning stage.

In summary Intel is in various stages of running, building, or planning the following EUV based fabs, D1X phases 1, 2 and 3, Fabs 42, 52, and 62, Fab 34, Fab 38, Silicon heartland 1 and 2 and Silicon Junction 1 and 2. That is 3 development fabs/phases and 9 EUV based production fabs.

For non-EUV fabs still running, Intel has D1C/25, D1D, Fabs 12 and 32, Fab 24 phases 1 and 2, and Fab 28 phases 1 and 2. That is 8 non-EUV production Fabs. This really puts into perspective why Intel would want to get into the foundry business and support trailing edge processes. All these fabs can be used to produce any of Intel’s non EUV 10nm/7nm and larger processes plus likely with reasonable changes in equipment sets any of the processes they will be acquiring from the Tower acquisition.

Déjà vu all over again

Yogi Bera is famous for being humorously quotable and one of his famous quotes was “it is Déjà vu all over again”.

The last time Intel tried to get into the foundry business they failed to gain much traction. Foundry was still a second-class citizen at Intel, they didn’t have the design eco system and eventually exited the foundry business. One of the things that bothered me about Intel’s effort and in my opinion sent a message to foundry customers that foundry was second class was that Intel would develop a new process node, for example 32nm, they would introduce a high performance version for internal use and then a year later introduce the foundry (SOC) version.

Recently I saw an interview with Pat Gelsinger where he talked about 4nm being an internal process for Intel and then 3nm being the foundry version. 3nm is currently expected to come out approximately a year after 4nm. He then talked about 20A as an internal process and 18A as the foundry version. 18A is due to come out 6 to 9 months after 20A. I don’t think foundry customers will accept always being 6 to 12 months behind the leading edge and I think it sends the wrong message. He did say if a foundry customer really wanted to use 4nm they could, but he seemed to view 4nm and 20A as processes that should be tested internally before the next version is released more widely.

I do think Intel has an interesting opportunity. There is a shortage of foundry capacity at the trailing edge where Intel will likely be freeing up a lot of fab capacity and there is a shortage at the leading edge as well. In addition to that, there is a need for a second source at the leading edge. Samsung has a long history of over promising and under delivering on technology and yield. Companies like Qualcomm have repeatedly tried to work with Samsung, so they aren’t wholly dependent on TSMC and have been repeatedly forced back to TSMC. The latest example is the Qualcomm’s Snapdragon 8 gen 1 that is reported to have only 35% yield on Samsung’s 4nm node. If Intel can execute on their technology roadmap in a consistent basis with good yield, they can likely pick up a lot of second source and maybe even some primary source leading edge business particularly at Samsung’s expense. I could even see a company like Apple giving Intel some designs to strengthen their negotiating position with TSMC. I wouldn’t expect MediaTek, a Taiwan company located near TSMC, or AMD or NVDIA due to competitive concerns to work with Intel, but never say never.

EUV  shortage

As I mentioned at the outset it is a EUV supply and demand analysis I have been doing that triggered EUV gap ideas. As I outlined above Intel plans to build out and equip 9 EUV based Fabs. At the same time TSMC 5nm is widely believed to have ended 2021 at 120 thousand wafer per month capacity. TSMC has announced they expect to double the end of 2021 capacity on 5nm, by the end of 2024 and that is before the Arizona 5nm fab comes online. TSMC has talked about 3nm being an even bigger node than 5nm. TSMC has also started planning on a 4 phase 2nm fab with a second site in discussion. Samsung started using EUV for one layer on their 1z DRAM and then 5 layers on their 1a DRAM. Samsung is planning a new EUV based logic fab in Texas and is building out logic and DRAM capacity in Pyeongtaek. SK Hynix has started using EUV for DRAM, Micron has pulled in their DRAM EUV use from the delta to gamma generation and even Nanya is talking about using EUV for DRAM. This begs the question, will there be enough EUV tools available to support all these needs and my analysis is that there won’t.

In fact, I believe there will be demand for 20 more EUV tools than ASML can produce each of the next 3 years. To put that is perspective, ASML shipped 42 EUV systems in 2021 and is forecasting 55 system in 2022. Interestingly I saw a story today where Pat Gelsinger commented that he is personally talking to the CEO of ASML about system availability and admitted that EUV system availability will likely gate the ability to bring up all the new fabs.

I think another impact the EUV system shortage will drive is a different view of what layers to use EUV on. If a layer is currently done with multi-patterning more complex than double patterning EUV is generally cheaper. EUV also enables simpler design rules, more compact layouts, and potentially better performance. EUV will be even more important as the switch is made to horizontal nanosheets. I believe companies will be forced to prioritize EUV use to the layers where it has the most impact and continue to use multi-patterning for other layers.

Also Read

Intel Evolution of Transistor Innovation

Intel Discusses Scaling Innovations at IEDM

Intel 2022 Investor Meeting


Webinar: Simulate Trimming for Circuit Quality of Smart IC Design

Webinar: Simulate Trimming for Circuit Quality of Smart IC Design
by Daniel Nenni on 03-23-2022 at 6:00 am

p1

Advanced semiconductor nanometer technology nodes, together with smart IC design applications enable today very complex and powerful systems for communication, automotive, data transmission, AI, IoT, medical, industry, energy harvesting, and many more.

However, more aggressive time-to-market and higher performance requirements force IC designers to look for advanced and seamless design flows with tools and methodologies to overcome these challenges. In this context for high-precision circuit applications, quality trimming is becoming a very important step before tape-out because the increased performance variation induced by process statistics cannot be reduced only at design level.

Most of today’s trimming applications are based on Monte Carlo Analysis to ensure that a trimming step is executed for each simulation sample. Unfortunately, most of the time this task requires custom scripts to setup the right sequence of multiple simulations and cannot be reliable for high-sigma robustness application (beyond 3 sigma) at long tail distributions. MunEDA provides an enhanced Dependent Test Feature for circuit trimming within its EDA design tool suite WiCkeD. This ensures for each simulation sample an easy-to-use and seamless trimming procedure as well as controlled switching of operating conditions, suitable for circuit verification, high-sigma robustness and circuit sizing/optimization.

In this webinar we’ll discuss typical applications of trimming methods, and will show how to set up sequences of multiple dependent simulations in MunEDA’s WiCkeD circuit sizing & verification tool suite. We’ll then discuss how process variation, local variation, temperature and Vdd variation, have to be treated differently for a correct analysis result. The measurements before trimming are usually taken at fixed operating conditions, whereas after trimming the circuit has to work at multiple operating conditions.

For documentation purposes, the performance variation with and without trimming has to be simulated. Simulation of the trimming procedure involves different methods of calculating trimming settings from initial measurement results. We’ll discuss ways to set up scripts, Verilog code or use simulator outputs to decide on trimming settings.

After setting up the simulation procedure, a typical analysis step is standard Monte Carlo. But since the simulation setup in MunEDA WiCkeD is general and not limited to Monte Carlo, it can run sensitivities, optimization, and high sigma robustness analysis with the trimmed circuit as well.

Especially the topic of parametric high-sigma analysis is interesting for post-trimming performance analysis, because the distribution shape of trimmed performance metrics often deviates significantly from the normal distribution.

Here is the REPLAY

The speaker is Michael Pronath. I have known Michael for many years and enjoy working with him. He has a PhD in Electronics and has been with MunEDA since the beginning. Today he is VP of Products and Solutions. With 20+ years of field experience Michael is an excellent speaker and worth listening to, absolutely.

Also Read

Webinar: AMS, RF and Digital Full Custom IC Designs need Circuit Sizing

WEBINAR: Using Design Porting as a Method to Access Foundry Capacity

Numerical Sizing and Tuning Shortens Analog Design Cycles


Co-Developing IP and SoC Bring Up Firmware with PSS

Co-Developing IP and SoC Bring Up Firmware with PSS
by Kalar Rajendiran on 03-22-2022 at 10:00 am

Creating a Driver

With ever challenging time to market requirements, co-developing IP and firmware is imperative for all system development projects. But that doesn’t make the task any easier. Depending on the complexity of the system being developed, the task gets tougher. For example, different pieces of IP may be the output of various teams distributed geographically. Some pieces of IP may be sourced from third-party IP suppliers. The SoC integration testing not only has to contend with quality and reliability of the IP blocks but also with testing various operating scenarios. And then there is the matter of verification.

Verification of complex SoC designs involves multiple platforms including emulation and FPGA prototyping. Each platform usually requires its own way of specifying the tests. This results in expending a lot of time and effort in recreating the same test information for various platforms for the same project. What if there is a way to describe the verification intent as a single abstract specification? Accellera Systems Initiative (Accellera) developed the Portable Test and Stimulus Standard (PSS) to address this need. Accellera is an independent, not-for-profit organization dedicated to create, support, promote, and advance system-level design, modeling, and verification standards for use by the worldwide electronics industry.

Portable Test and Stimulus Standard

The following is a description from the Accellera website.

“The Portable Test and Stimulus Standard (PSS) defines a specification to create a single representation of stimulus and test scenarios usable by a variety of users across many levels of integration under different configurations. This representation facilitates the generation of diverse implementations of a scenario that run on a variety of execution platforms, including, but not necessarily limited to, simulation, emulation, FPGA prototyping, and post-silicon. With this standard, users can specify a set of behaviors once and observe consistent behavior across multiple implementations.”

Can PSS help with co-developing IP and SoC bring-up firmware? This is the topic of a talk by Matthew Ballance from Siemens EDA at the 2022 DVCon. He presents details of leveraging PSS for co-developing and co-verifying IP and firmware that eases SoC integration testing challenges. Matthew is a senior principal product engineer and portable stimulus technologist. The following is a synopsis of the salient points from his talk.

While it is straightforward to write directed and constrained random UVM sequences to exercise IP behavior, integration tests at the SoC level are more complicated. With many different pieces of IP getting integrated, there are lots of scenarios to exercise and ensuring the interoperability of multiple drivers become a challenge.

Interoperability Framework

While the need for interoperability exists, there is no need to reinvent the wheel for implementing a framework. A real time operating system (RTOS) provides such a framework, having to deal with multiple drivers from different sources. A RTOS is designed to work on very low power with constrained resources too. Matthew uses Zephyr, an open-source RTOS in his presentation. For a typical test configuration, the memory footprint for the Zephyr RTOS image is around 8KB, which is attractive for an SoC verification environment.

Creating Drivers

Creating device drivers in C code calls upon the same knowledge and skills that UVM sequence code writers use with System Verilog. The Zephyr RTOS specifies a format for drivers in the system and defines the data types and structures that fit into the driver framework. This makes it easy to configure the various aspects of the drivers. Zephyr RTOS defines several standard APIs for supporting standard devices such as DMAs or timers but can support custom APIs as well. Refer to Figure below for a sample piece of driver code.

PSS Building Blocks

There are two parts to a PSS model. The first part addresses the test intent and the second part handles the test realization. This structure allows for modeling scenarios in terms of high-level abstractions in the test intent section, reserving the low-level details to the test realization section. This makes it easier for creating multiple tests just like it is done in System Verilog by changing the seed to create new test variants. What is handed off to the SoC team includes not just RTL deliverables but also the PSS code along with driver code. At the SoC level, all IP blocks and firmware are integrated and managed under the Zephyr RTOS framework.

Refer to Figure below for a sample piece of PSS code. The bottom portion of this code segment is a call to the driver code. In this example, it is an action for doing a memory to memory transfer on a DMA channel. The call contains the details of the channel id and spells out the resource that the transfer requires.

Summary

Verification at the SoC level can be performed efficiently using PSS. A key requirement for this, is of course making lower-level driver firmware available to be called via PSS actions. More details about creating device drivers and connecting PSS to the device drivers can be found in a technical paper authored by Matthew. Zephyr RTOS is one way to enable multiple firmware modules to interoperate. There is an expanded list of IP deliverables as a result of co-developing IP and firmware using this approach. In addition to the usual RTL deliverables, driver firmware and reusable PSS code are also handed over to the SoC verification team.

Matthew’s DVCon talk can be accessed here by registering at the DVCon website.

Matthew’s presentation slides can be downloaded here.

Matthew’s technical paper can be downloaded here.

Also read:

Leveraging Virtual Platforms to Shift-Left Software Development and System Verification

Using a GPU to Speed Up PCB Layout Editing

Dynamic Coherence Verification. Innovation in Verification


Optimizing AI/ML Operations at the Edge

Optimizing AI/ML Operations at the Edge
by Tom Simon on 03-22-2022 at 6:00 am

Optimizing Edge Based AI ML

AI/ML functions are moving to the edge to save power and reduce latency. This enables local processing without the overhead of transmitting large volumes of data over power hungry and slow communication links to servers in the cloud. Of course, the cloud offers high performance and capacity for processing the workloads. Yet, if these workloads can be handled at the edge, albeit with reduced processing power, there is still likely to be a net advantage in power and latency. In the end it boils down to the performance of the edge based AI/ML processor.

As a Codasip white paper points out, embedded devices are typically resource constrained. Without the proper hardware resources, AI/ML at the edge will not be feasible. The white paper titled “Embedded AI on L-Series Cores” states that conventional microcontrollers, even when they have FP and DSP units are hard pressed to run AI/ML. Even with SIMD instructions there is still more required to achieve good results.

Google’s introduction of TensorFlow Lite for Microcontrollers (TFLite-Micro) in 2021 has opened the door for edge-based inference on hardware targeted for IoT, and other low power and small footprint devices. TFlite-Micro uses an interpreter with a static memory planner. Most importantly it also supports vendor specific optimizations. It runs out of the box on just about any embedded platform. With this it delivers operations such as convolution, tensor multiplication, resize and slicing. But the domain-specific optimizations it offers mean that further improvements are possible through embedded processor customization.

Codasip offers configurable application specific processors that can make good use of the TensorFlow Lite-Micro optimization capability. The opportunity for customization arises because each application will have its own neural network and training data. This makes it advantageous to tailor the processor for the particular needs of its specific application.

All of Codasip’s broad spectrum of processors can run TFLite-Micro. The white paper focuses on their L31 embedded core running the well known “MNIST handwritten digits classification” training set and a neural net with two convolutional and pooling layers, at least one fully-connected layer, vectorized nonlinear functions, data resize and normalization operations.

During the early stages of the system design process, Codasip lets designers run Codasip Studio to profile the code to see where things can be improved. In their example ~84% of the time is spent in the image convolution function. Looking at the source code they identify the code that is using the most CPU time. Using disassembler output they determine that creating a new instruction that combines the heavily repeated mul and c.add operations, will improve performance. Another change they evaluate is replacing vector loads with loading bytes using an immediate address increment.

Optimizing Edge Based AI/ML

The Codasip Studio profiler can provide estimates of the processor’s power and area. This helps designers choose between standard variants of the L31 core. In this case they explored what the effects of removing the FPU would be. TFLite-Micro supports quantization of neural network parameters and input data. With integer only data the FPU can be dispensed with. Of course, there is a trade off in accuracy, but this can be evaluated as well at this stage of the process. The table below shows the benefits of moving to integer and using a quantized neural model.

The Codasip white paper concludes with a closer look at how the L31 operates in this use case with the new instructions and compares it to running before the instructions were added. Using their software tools, it is possible to see the precise savings. Having this kind of control over the performance of an embedded processor can provide a large advantage in the final product. The white paper also shows how Codasip’s CodAL language is used to easily create the assembly encoding for new instructions. CodAL makes it easy to iterate while defining new instructions to achieve the best results.

To move AI/ML operations to the edge designers must look at every avenue to optimize the system. In order to gain the latency improvements and overall power saving that edge-based processing promises, every effort must be made to make the power and performance profile of the embedded processor as good as possible. Codasip demonstrates an effective approach to solving these challenges. The white paper is available for download on the Codasip website.

Also read:

Webinar: From Glass Break Models to Person Detection Systems, Deploying Low-Power Edge AI for Smart Home Security

Getting to Faster Closure through AI/ML, DVCon Keynote

Unlocking the Future with Robotic Precision and Human Creativity


Webinar Series: Learn the Foundation of Computational Electromagnetics

Webinar Series: Learn the Foundation of Computational Electromagnetics
by Matt Commens on 03-21-2022 at 10:00 am

image 2 for semiwiki blog

The electromagnetism problems upon which we spent many hours laboring away on homework in college has a mathematical formulation originally developed by Maxwell, Lorentz, Gauss, Faraday and others. In their full forms, these formulas are partial differential equations that come in many versions – both differential and integral.

The resulting set of equations are elegant in their form but complex in their usage. Only for the simplest systems are there closed form solutions. But there are a multitude of systems that can potentially be analyzed with Maxwell’s Equations which are beyond the scope of pen and paper. This need to solve the more complex to get the answers, as defined by Maxwell’s Equations, ultimately led to the birth of Computational Electromagnetics, a field in engineering and applied physics in which Ansys has been in the forefront for over three decades.

Since joining the HFSS team in 2001 I personally have benefitted from working with the very engineers and developers that have built HFSS over these decades. By understanding the many challenges they have overcome such as scale and speed I have a deeper knowledge of computational electromagnetics as well as a solid understanding of how to effectively leverage these solutions. And no doubt others would benefit as well from having this deeper background knowledge that I was fortunate to accumulate over the years.

In order to provide this deeper insight to a broader audience, the leading technologists in Computational Electromagnetics at Ansys will publicly discuss the work they’ve done over the years in both breadth and depth. A five-part Electromagnetics Foundation webinar series is launching on March 22nd. Ansys experts will pull back the curtain and reveal what is going on within the EM solvers they’ve developed over the last 30 years. The discussions will go into detail regarding numerical methods (time vs frequency approaches, integral versus differential, etc), analytical methods such as full wave, asymptotic, quasi-static and scattering & bouncing rays (SBR), differences between high and low frequency analysis (from electric motors up to photonic wavelengths), modeling & simulation scalability from micro to macro, the applicability of distributed computing to EM simulation, and numerous application examples.

The webinars are spaced roughly 1-2 weeks apart, and are available for registration here:

Electromagnetics Foundation Webinar Series | Ansys

And here’s the lineup:

Foundations of Computational Electromagnetics

An overview of different approaches used for computational electromagnetics and the trade-offs for choosing the best solution method.

An Overview of the Foundations of HFSS and Maxwell Solver Technologies

Figure 1 – HFSS 3D Layout

A review of the various numerical methods (finite elements, integral equations, etc.) included in HFSS and Maxwell.

The Foundation of Domain Decomposition Technologies in HFSS

A theoretical overview of domain decomposition formulations and dive into how HFSS solvers evolved over the past decade.

Learning Ray Tracing Methods Foundations for Electromagnetics

Figure 2 – 5G Antenna Array

Foundations of shooting and bouncing rays (SBR) as a computational electromagnetic (CEM) methodology.

The Foundation of Computational Optics and Photonics

Covers the basics of ray tracing, surface and volume scattering models, full-wave time and frequency domain electromagnetic solvers for optics, photonics and quantum photonic effects.

This is your chance at a unique opportunity to learn how electromagnetic simulation is done at the very bleeding edge.

Also read: 

The Clash Between 5G and Airline Safety

The Hitchhiker’s Guide to HFSS Meshing

The 5G Rollout Safety Controversy