Bronco Webinar 800x100 1

TSMC COVID-19 and Double Digit Growth in 2020

TSMC COVID-19 and Double Digit Growth in 2020
by Daniel Nenni on 04-17-2020 at 10:00 am

Mark Liu CC Wei TSMC


TSMC has had an incredible run since its founding in 1987 which spans most of my 36 year semiconductor career. Even in these troubled times TSMC is a shining bellwether with double digit growth expectations while the semiconductor industry will be flat or slightly down. Let’s take a close look at the TSMC Q1 2020 conference call and see what else we can learn.

“On March 18, we found one employee who tested positive for COVID-19 and immediately began receiving appropriate care. Today, this employee has recovered, is out of the hospital and is staying at home for additional quarantine. We were able to suitably trace all the other individuals who were in contact. The neighboring employees have all tested negative, while all other employees who were in contact has entered and completed the 14-day self-quarantine and now back to work. As a result of the strict preventive measures taken by TSMC, we have not seen any disruption of our fab operations so far.”

This does not surprise me at all. Taiwan learned a very important lesson during the SARS outbreak in 2002. I remember traveling during this time and going through extra medical checks at the TPE airport. Taiwan installed medical imaging equipment that took our temperatures after we got off the planes. It is easy to remember since I had to remove my hat and got to see how big my brain is. It really is big, hat size XL.

One thing you can say about TSMC is that they have built their business on experience and humility, absolutely.

Dr. C.C. Wei:

“Looking ahead to the second half of this year. Due to the market uncertainty, we adopt a more conservative view as we expect COVID-19 to continue to bring some level of disruption to the end market demand. For the whole year of 2020, we now forecast the overall semiconductor market, excluding memory growth, to be flattish to slightly decline, while foundry industry growth is expected to be high single-digit to low-teens percentage.”

In my opinion we will see a hockey-stick-like semiconductor recovery in Q4 2020. Never before have we seen the entire world united in a common cause. Never before have we seen such worldwide compassion and cooperation. COVID-19 really is a globally uniting event and it could not have come at a better time in my opinion. The world will be a much safer and more productive place in 2021 and beyond, that is my heartfelt belief.

“Now let me talk about the progress and development of 5G and HPC. With the recent disruption from COVID-19, we now expect global smartphone units to decline high single digit year-over-year in 2020. However, 5G network deployment continues and OEMs continue to prepare to launch 5G phones. We maintain our forecast for mid-teens penetration rate for 5G smartphone of the total smartphone market in 2020.”

It is understandable that the edge devices will take a pause this year but remember we are in a data driven society. With the entire world sheltering in place the amount of data generated is increasing exponentially. SemiWiki traffic alone is up 30%. Our webinar series is breaking registration and attendance records. The world wide communications infrastructure is being upgraded like never before and that means semiconductor strength.

There has been a lot of fake news of late surrounding the TSMC process technology so let’s get this straight from the horse’s mouth (American idiom for the truth):

“Now let me talk about the ramp-up of N7, N7+ and the status of N6. In its third year of ramp, N7 continue to see very strong demand across a wide spectrum of products for mobile, HPC, IoT and automotive applications. Our N7+ is entering its second year of ramp using EUV lithography technology while paving the way for N6. Our N6 provides a clear migration path for next-wave N7 products, as the design rules are fully compatible with N7.”

“N6 has already entered its production and is on track for volume production before the end of this year. N6 will have one more EUV diode than N7+ and will further extend our 7-nanometer family well into the future. We expect our 7-nanometer family to continue to grow in its third year and reaffirm it will contribute more than 30% of our wafer revenue in 2020.”

“Now let me talk about our N5 status. N5 is already in volume production with good yield. Our N5 technology is a full node stride from our N7, with 80% logic density gain and about 20% speed gain compared with N7. N5 will adopt EUV extensively. We expect a very fast and smooth ramp of N5 in the second half of this year driven by both mobile and HPC applications. We’ll reiterate 5-nanometer will contribute about 10% of our wafer revenue in 2020.”

“N5 is the foundry industry’s most advanced solution with best PPA. We observed a higher number of tapeouts, as compared with N7 at the same period of time. We will offer continuous enhancements to further improve the performance, power and density of our 5-nanometer technology solution into the future as well. Thus, we are confident that 5-nanometer will be another large and long-lasting node for TSMC.”

“Finally, I will talk about our N3 status. Our N3 technology development is on track, with risk production scheduled in 2021 and target volume production in second half of 2022. We have carefully evaluated all the different technology options for our N3 technology, and our decision is to continue to use FinFET transistor structure to deliver the best technology maturity, performance and costs.”

“Our N3 technology will be another full node stride from our N5, with about a 70% larger density gain, 10 to 15 speed gain and 25% to 30% power improvement as compared with N5. Our 3-nanometer technology will be the most advanced foundry technology in both PPA and transistor technology when it is introduced and will further extend our leadership position well into the future.”

If you have questions about this please post in the comments section and let the SemiWiki community of experts answer. Just say no to fake news….


Lithography Resolution Limits – Arrayed Features

Lithography Resolution Limits – Arrayed Features
by Fred Chen on 04-17-2020 at 6:00 am

Lithography Resolution Limits Arrayed Features

State-of-the-art chips will always include some portions which are memory arrays, which also happen to be the densest portions of the chip. Arrayed features are the main targets for lithography evaluation, as the feature pitch is well-defined, and is directly linked to the cost scaling (more features per wafer) from generation to generation. To that end, this article (second in the series on lithography resolution limits) focuses on the lithography resolution limits of arrayed feature patterning.

Minimum pitch resolution
A lithography tool is specified by the wavelength it uses, e.g., 193 nm for ArF, 13.5 nm for EUV, as well as its numerical aperture, i.e., the power of its final optic element (lens for ArF, KrF, i-line, mirror for EUV). The formula for the ideal minimum pitch between two lines in an array is

This result is derived from the grating equation [1]. Basically, the minimum pitch is realized by the interference of two beams which form the maximum angles with the optical axis, whose sines differ by wavelength/pitch. The difference of sines is at most equal to twice the numerical aperture – this gives the previously stated ideal minimum pitch. Realistically, though, the difference of sines must deduct the finite angular tolerance of the beams. The actual minimum pitch should therefore be

Hence, while for a wavelength of 193 nm, numerical aperture of 1.35, we ideally expect a minimum pitch of 71.5 nm, in reality it is 76 nm. Likewise for the EUV tool with nominal wavelength of 13.5 nm, numerical aperture of 0.33, the minimum pitch was recently demonstrated to be 24 nm [2], not the ideal 20.45 nm.

For two-dimensional arrays (square arrays, rectangular arrays, triangular arrays), the patterns can be generated by crossed line arrays, with best results achieved by using an attenuated (~5%) phase-shifting effect by the mask [3], so the same minimum pitch resolution limit, given by equation (2), applies as for lines.

In the previous article [4], it was noted that for a pair of features, the Rayleigh criterion (0.61 wavelength/numerical aperture) is used to determine the resolution. With arrayed features, although the pitch is already predetermined, the Rayleigh criterion applies if the array pitch is much wider than the distance set by that criterion; otherwise, it is the pitch (specifically, the half-pitch) that decides the resolution.

Self-aligned patterning: the ideal opportunity for arrayed features
When the minimum pitch needs to go below 0.5 wavelength/numerical aperture, a single exposure is not sufficient to pattern the array. A second exposure, such as the previously described LELE (litho-etch-litho-etch) approach [4], can achieve half the pitch, but alignment between the two exposures cannot be guaranteed. Self-aligned patterning approaches would be better. The most commonly practiced approach is Self-Aligned Double Patterning (SADP). Its earliest comprehensive description is given in US Patent 5328810, assigned to Micron after being filed in 1992 [5].

Figure 1 shows the first steps of basic SADP.

Figure 1. Basic SADP flow following standard lithography.

In this drawing, it is indicated clearly that the top of the spacer is eroded during the process. Also, it is the cost-reducing preference to use photoresist as the starting feature, rather than another etched material.

Figure 2 shows the completion of the SADP process.

Figure 2. Completion of SADP process.

The new feature pitch on the substrate is now half the original photoresist feature pitch. Hence, this allows a doubling of line density, without an additional exposure. Sharp eyes may note that the distance between features in the center is a little wider than the distance between features where the photoresist was originally located. This effect is known as “pitch walking” [6]. This can arise from the original photoresist pattern, in combination with the spacer thickness, and the amount of spacer erosion. To manage the pitch walking the critical dimension (CD) of the starting photoresist feature must be in sync with the spacer thickness and erosion rate. Alternatively, a gapfill material may be deposited after the spacer film is deposited [5,7].

This protects the exposed spacer side from erosion, but leaves extra spacer material to be removed later along with the gapfill material, as well as the starting core feature. This can be extended, however, to more than doubling feature density. For example, Samsung’s US Patent 7842601 [8] describes the double spacer approach to reducing line pitch to one-third its original value (Figure 3). This allows a 78 nm pitch (~22nm foundry node design rule) to be immediately reduced to 26 nm (<5nm foundry node design rule) in a single exposure, without using EUV.

Figure 3. Self-aligned triple patterning (SATP) by the use of two spacers.Two-dimensional self-aligned patterning

When the SADP process is applied to two-dimensional patterns, the possibilities expand. For example, in Figure 4, features on a square lattice are doubled in density.

Figure 4. Two-dimensional SADP on a square lattice doubles feature density.

The central added feature is expected to round out like the original corner features of the lattice cell. Going even further, a triangular or hexagonal lattice allows feature density to be tripled.

Figure 5. Two-dimensional SATP on a triangular lattice triples feature density.

The latter approach has already been used in Samsung’s 20nm DRAM [9] for the honeycomb capacitor patterning.

Double SADP/SATP in 2D?
By repeating the SADP/SATP processes described above, the arrayed feature density increases in leaps and bounds. Double SADP quadruples density for line arrays and square lattices; hence, this is also referred to as self-aligned quadruple patterning (SAQP). Double SATP in two dimensions noncuples (multiplies 9x) density for triangular lattices.

The feasibility of double SADP is tied to the process complexity. The complexity of double SADP is increased over that of single SADP, but having several consecutive etch steps which can be executed at the same etch station is easier to manage. The etch rates of three materials (core, spacer, substrate) are considered simultaneously in any case. On the other hand, a new EUV resist process flow may involve added deposition and treatment steps inserted (Figure 6). In particular, the new underlayer material being etched could have its own station, as it may be organic [10] or metal-based [11]. The underlayer benefit is expected from the effects of secondary electrons released by EUV light [12].

Figure 6. EUV resist process steps can still be of comparable complexity compared to double 2D SADP/SATP.

It is quite clear that self-aligned spacer patterning is a very powerful patterning techniques for arrayed features. In upcoming articles, the use of self-aligned patterning for specific cases involving complicated array layouts will be examined.

References
[1] https://en.wikipedia.org/wiki/Diffraction_grating

[2] https://www.imec-int.com/en/articles/imec-demonstrates-24nm-pitch-lines-with-single-exposure-euv-lithography-on-asml-s-nxe-3400b-scanner

[3] A. K-K. Wong, Optical Imaging in Projection Microlithography (SPIE, 2005), p. 87.

[4] https://www.linkedin.com/pulse/lithography-resolution-limits-paired-features-frederick-chen/

[5] T. A. Lowrey, R. W. Chance, D. A. Cathey, US Patent 5328810, assigned to Micron, filed Nov. 25, 1992.

[6] https://www.semiconkorea.org/en/programs/STS/S4.-Plasma-Science-and-Etching-Technology/SAQP-Pitch-Walking-Improvement-Path-Finding-by-Simulation-

[7] A. E. Carlson, US Patent 8101481, assigned to the Regents of the University of California, filed Feb. 25, 2008.

[8] J-Y. Lee, J-S. Park, S-G. Woo, US Patent 7842601, assigned to Samsung, filed Apr. 20, 2006.

[9] J. M. Park et al., “20nm DRAM: A new beginning of another revolution ,” IEDM 2015.

[10] J. Li et al., “A Chemical Underlayer Approach to Mitigate Shot Noise in EUV Contact Hol9e Patterning,” Proc. SPIE 9051, 905117 (2014).

[11] A. De Silva et al., “High-Z metal-based underlayer to improve EUV stochastics,” Proc. SPIE 11147, 111470W (2019).

[12] https://spie.org/news/6518-successes-and-frontiers-in-extreme-uv-patterning?SSO=1

Related Lithography Posts


Cadence – Defining a Roadmap to the Future

Cadence – Defining a Roadmap to the Future
by Mike Gianfagna on 04-16-2020 at 10:00 am

Screen Shot 2020 04 08 at 7.46.46 PM

Cadence recently published a position paper that details a set of enabling technologies that will be needed for product design going forward. Entitled Intelligent System Design, the piece describes the changing landscape of system design and the requirements for success. Cadence has built a branded approach to address these needs called, appropriately, the Intelligent System Design™ strategy. There is a short discussion of Cadence’s capabilities at the end of the piece, but most of the discussion is a thoughtful overview of what is changing in system design and what needs to be done to facilitate those changes.

I have a few comments and observations about what Cadence is up to, but I’ll hold that until later. The vision conveyed by this position paper is far bigger than any specific product.

In my view, Intelligent System Design hits home in meaningful and relevant ways on many fronts. The piece begins by setting the stage for the current wave of innovation. To effectively compete, system companies are designing their own chips and semiconductor companies are delivering software stacks along with their silicon to enable competitive differentiation.

Cadence decomposes these trends in a hierarchical way, examining the requirements for design excellence, system innovation and pervasive intelligence. You really need to read the paper to get the full impact, it’s only five pages long by the way. To whet your appetite, I’ll provide a quick summary of each the three areas treated.

Design Excellence: The bread and butter of EDA was, for a long time, logic design, logic synthesis, place and route, timing closure, design rule check, test generation and tapeout. While those items are still necessary, there is now a lot more to deal with. Process variation, IP reuse, power and signal integrity, software interactions and complex system validation are just some of the new requirements that must all be co-optimized to achieve a successful tapeout. Cloud computing factors into the discussion as well.

System Innovation: Co-optimization comes into play here as well. A successful SoC must be analyzed and optimized in the context of the system for which it is intended. The PCB and the complex and potentially 2.5 or 3D package must be co-analyzed and optimized along with the chip itself. There are plenty of signal integrity challenges to addresses here. Software is also part of system innovation. To make it more interesting, design teams must develop the software for a new SoC before the SoC exists.

Pervasive Intelligence: Deep learning is finding its way into all kinds of everyday products. The challenges to accomplish the design-in of this technology may not be as well known. Power and latency are requiring a lot of these new technologies to be resident in a more local sense, at the edge of the cloud if you will vs. in the cloud. Doing this in a cost-effective way is very challenging. It turns out EDA tools and design flows can be improved to make deep learning design easier by using deep learning in the design process itself. Something of a recursive process.

The Cadence strategy: At the end of the paper, Cadence briefly discusses their strategy to address the three areas mentioned above. You can certainly learn a lot more about their approach by visiting the Cadence website. There’s lots of new and fresh content there.

In closing, I want to touch briefly on the third item, pervasive intelligence. This is an area where I believe Cadence is truly practicing what they preach. I recently posted a conversation with Cadence’s Paul Cunningham on machine learning at Cadence. In it, Paul detailed the Cadence vision of how machine learning can be used to both improve EDA algorithms and leverage learning from prior runs to make the flow better for future runs. Soon after that discussion, Cadence issued a press release about their new digital full flow. That flow uses machine learning in the ways Paul described. Having a good strategy is important. Actually, using it is also important, but often difficult.

I think Cadence expresses some great visions in this new position paper, visions that can be implemented thanks to the technology available today. I’ll keep watching as this unfolds.


Breker Tips a Hat to Formal Graphs in PSS Security Verification

Breker Tips a Hat to Formal Graphs in PSS Security Verification
by Bernard Murphy on 04-16-2020 at 6:00 am

Breker security tables

It might seem paradoxical that simulation (or equivalent dynamic methods) might be one of the best ways to run security checks. Checking security is a problem where you need to find rare corners that a hacker might exploit. In dynamic verification, no matter how much we test we know we’re not going to cover all corners, so how can it possibly be useful? Wouldn’t formal methods be much better?

Dave Kelf (CMO for Breker) makes a point that security verification is inherently a negative verification problem. Unlike positive testing where you’re checking that a specific scenario works as expected, in security verification you need to check all possibilities, as you ideally would in negative testing. For example, in a positive test, we would check the key can be read through the crypto block. In security, we have to ask, “is there any other way that this can be done?”. The strength of formal is that it can analyze that entire state space and find paths you had not considered.

But while formal is ideal for completeness, it’s limited in scope – by the size of the state space and by the degree to which you have to abstract and decompose complex problems, leaving you to wonder what you might have overlooked in all that complexity. Formal also can’t work with software, a real problem for embedded system validation. Conversely, simulation doesn’t care – you can run whatever size system you have with whatever mixed levels you need.

Nevertheless, the completeness of the graph-approach is appealing. Breker have developed a way to build a conceptually similar graph at the system level, not automatically from RTL as a formal tool would but semi-manually / semi-automatically from a series of tables describing key aspects of the SoC system architecture.

Then PSS becomes a pretty logical bridge to testing complete negative intent on a high-level graph rather than conventional formal gate-level paths. Breker has an app for that. In the security TrekApp, you can define a security policy through tables, in master/slave connectivity, security/privilege options and memory address zones.

An advantage in starting with these tables is that it’s easy to see what might be missing – trivially that you missed a master/slave option, you forgot to specify whether an access/privilege on the master and an access/privilege option on the slave is a valid (permitted) combination or not.

Going one level deeper, you can also define, in another table, various memory regions with corresponding secure and privilege accessibilities. These definitions are essential for later dynamic tests to check that it isn’t possible, through some unapparent sequence of actions (again a negative test), to read from or write into a secure/privileged memory region from a transaction not allowed to perform those actions.

Think for example of an ARM TrustZone environment in which one or more masters may at times be operating in a secure mode with a certain level of privilege, or a non-secure mode. Meanwhile slaves, some secure with low privileges, some secure with higher privileges are communicating with masters and trying to read from or write to regions in memory, each of which also have assorted privilege and secure settings. That’s a lot of combinations to worry about. Are you sure your tests are really going to cover them all?

The Breker security TrekApp will map the master/slave, secure/privilege and memory region tables into the Trek internal format, then build a graph – in effect a system-level state graph – which can generate tests for all possible transactions across that graph. Their test suite synthesis will then map that to realized sequences of tests, which you can then plug into your UVM testbench or software driven SoC test. A comprehensive sequence of tests that can cover all paths through the graph, including those you might not consider but a hacker may attempt.

That looks like a pretty valuable capability to me. You can learn more about the security TrekApp HERE.

Also Read

Verification, RISC-V and Extensibility

Build More and Better Tests Faster

Taking the Pain out of UVM


The Story of Ultra-WideBand – Part 5: Low power is gold

The Story of Ultra-WideBand – Part 5: Low power is gold
by Frederic Nabki & Dominic Deslandes on 04-15-2020 at 10:00 am

Wide Band Series SemiWiki

How can ultra-wideband done right do more with less energy

In the previous part, we discussed how the time-frequency duality can be used to reduce the latency. When you compress in time a wireless transmission, you reduce the time it takes to hop from a transmitter to a receiver. Another very interesting capability enabled by the time-frequency duality is the possibility to reduce the power consumption, to a level never seen before.

In a world where everything goes wireless and all devices are required to be remotely controlled, the importance of power consumption is growing significantly. In a simple sensor node composed of four parts (sensor, microcontroller, PMU and transceiver), the wireless transceiver is the main contributor to the total power consumption by a large margin. Indeed, the percentage of the power used for the wireless function can exceed 90% of total power consumption. Power consumption of wireless headsets, game controllers, and computer keyboards and mice is dominated by the wireless transceiver.

Power reduction has been driving the development wireless chips over the last 15 years. After years of development, BLE was ratified in 2006 to address the power consumption of Bluetooth. More recently, Bluetooth 5.2 added features to reduce consumption for different applications, including audio. However, these modifications are mostly incremental. Fundamentally, the reduction in power consumption is physically limited by the architecture; a carrier-based transceiver will always require a significant amount of power to start, stabilize and maintain its RF oscillator. After two decades of optimization, Bluetooth has reached its point of diminishing return. This is true for all narrowband technologies: gaining an order of magnitude requires a new paradigm in wireless transmission. Here’s why:

The Narrowband Penalty
In the chart above, you can see the two significant power penalties inherent in all narrowband radio architectures like Bluetooth:

  • Crystal oscillator overhead (lower left) cripples low data rate performance: Bluetooth uses a ~20 MHz crystal oscillator, which requires a few milliwatts to power up and stabilize. UWB radios, like the one developed by SPARK Microsystems, can operate using impulses that don’t require a high frequency crystal oscillator and can be designed to operate with a low timing power consumption overhead.
  • Carrier overhead (upper middle) penalizes high data rate performance: Transmitting a large amount of data over a narrow bandwidth channel such as that used in Bluetooth radios requires lots of time and power, as explained in part 4. Large amounts of data can be transmitted far more quickly when spread across a wide bandwidth, keeping the transmitter on for a much shorter duration and reducing power consumption significantly. This means for the same amount of consumed power, UWB can transmit much more data. (far upper right)

How UWB Avoids the Narrowband Penalties
If you start with a blank page to design a short range (50-100m) wireless protocol that minimizes power consumption and latency and maximizes data rate, you would probably go through this thought process:

  • First, minimize the time the transmitter and the receiver are powered on. To do that, each symbol should be as short as possible. From the time-frequency duality we know that a signal that is short in time has a wide bandwidth, so the solution will utilize wideband communications, hence the choice of the unlicensed UWB spectrum.
  • Second, ensure that the transmitter and receiver can be started and shutdown as quickly as possible. This makes it difficult to use transceivers that use traditional high accuracy RF oscillators. The optimal architecture to minimize power consumption is the use of an UWB impulse radio that forgoes the need for an RF carrier per se.

As you can see from data on the previous graph, that approach delivers the lowest possible power profile for short range communications. This is the approach SPARK Microsystems has taken for its UWB transceivers.

UWB’s Advantages
Because UWB does not use a high-frequency carrier oscillator, UWB transceivers can be turned on very quickly and transmit a far higher data rate than a narrowband radio for a given power level. This, coupled with the low latency described in Part 4, makes UWB an ideal solution for the next generation of low-power wireless applications.

Why did Narrowband Prevail in the 1920’s?
Although ships were required to install spark gap radios after the Titanic disaster, as discussed in part 1, wideband technology of the time had two major drawbacks:

  • They were extremely noisy, with poor frequency control. Transmission had to stop to enable reception on nearby frequencies. Interference was thus a big problem.
  • They could not be easily modulated to handle voice or other higher data rate communications

By the 1920’s, vacuum tube technology and superheterodyne circuits enabled narrowband radios to take over rapidly escalating demand for voice and other communications.

In the final part of this series, we will summarize how military and commercial technology developments, along with worldwide spectrum allocations, have created a unique opportunity for UWB to dominate short range communications in the 2020’s and beyond.

About Frederic Nabki
Dr. Frederic Nabki is cofounder and CTO of SPARK Microsystems, a wireless start-up bringing a new ultra low-power and low-latency UWB wireless connectivity technology to the market. He directs the technological innovations that SPARK Microsystems is introducing to market. He has 17 years of experience in research and development of RFICs and MEMS. He obtained his Ph.D. in Electrical Engineering from McGill University in 2010. Dr. Nabki has contributed to setting the direction of the technological roadmap for start-up companies, coordinated the development of advanced technologies and participated in product development efforts. His technical expertise includes analog, RF, and mixed-signal integrated circuits and MEMS sensors and actuators. He is a professor of electrical engineering at the École de Technologie Supérieure in Montreal, Canada. He has published several scientific publications, and he holds multiple patents on novel devices and technologies touching on microsystems and integrated circuits.

About Dominic Deslandes
Dr. Dominic Deslandes is cofounder and CSO of SPARK Microsystems, a wireless start-up bringing a new ultra low-power and low-latency UWB wireless connectivity technology to the market. He leads SPARK Microsystems’s long-term technology vision. Dominic has 20 years of experience in the design of RF systems. In the course of his career, he managed several research and development projects in the field of antenna design, RF system integration and interconnections, sensor networks and UWB communication systems. He has collaborated with several companies to develop innovative solutions for microwave sub-systems. Dr. Deslandes holds a doctorate in electrical engineering and a Master of Science in electrical engineering for Ecole Polytechnique of Montreal, where his research focused on high frequency system integration. He is a professor of electrical engineering at the École de Technologie Supérieure in Montreal, Canada.


Artificial Intelligence in Micro-Watts: How to Make TinyML a Reality

Artificial Intelligence in Micro-Watts: How to Make TinyML a Reality
by Mike Gianfagna on 04-15-2020 at 6:00 am

Eta Compute ECM3532

TinyML is kind of a whimsical term. It turns out to be a label for a very serious and large segment of AI and machine learning – the deployment of machine learning on actual end user devices (the extreme edge) at very low power. There’s even an industry group focused on the topic. I had the opportunity to preview a compelling webinar about TinyML. A lot of these topics were explained very clearly, with some significant breakthroughs detailed as well.

The webinar will be broadcast on April 21, 2020 at 10AM Pacific time. I strongly urge you to register for Artificial Intelligence in Micro-Watts: How to Make TinyML a Reality here.

The webinar is presented by Eta Compute. The company was founded in 2015 and focuses on ultra-low power microcontroller and SoC technology for IoT. The webinar presentation is given by Semir Haddad, senior director of product marketing at Eta Compute. Semir is a passionate and credible speaker on the topic of AI and machine learning, with 20 years of experience in the field of microprocessors and microcontrollers. Semir also holds four patents. In his own words, “all of my career I have been focused on bringing intelligence in embedded devices.”

The webinar focuses on the deployment of deep learning algorithms at the extreme edge of IoT and presents an innovative new chip from Eta Compute for this market, the ECM3532. Given the latency, power, privacy and cost issues of moving data to the cloud, there is strong momentum toward bringing deep learning closer to the end application. I’m sure you’ve seen many discussions about AI at the edge. This webinar takes it a step further, to the extreme edge. Think of deep learning in products such as thermostats, washing machines, health monitors, hearing aids, asset tracking technology and industrial networks to name a few. The figure below does a good job portraying the spectrum of power and performance for the various processing nodes of IoT.

A power budget of ~1MW is daunting and this is where the innovation of Eta Compute and the ECM3532 shine. Semir does a great job explaining what the challenges of ultra-low power and ultra-low cost deployment for deep learning are. I encourage you to attend the webinar to get the full story. Here is a brief summary to whet your appetite.

Traditional MCUs and MPUs operate in a synchronous nature. Getting timing closed on a design like this over process, voltage and temperature conditions is quite challenging. As power consumption is proportional to the square of the operating voltage, lowering the voltage can reduce power. But this approach will reduce operating frequency to allow timing closure. An impossible balancing act to get to low power and high performance. Dynamic voltage and frequency scaling (DVFS) is one way to address this problem, but the impacts of approaches like this across the chip continue to make it difficult to achieve the optimal balance of power and performance for a synchronous design.

Eta Compute approaches the problem in a different way with continuous voltage and frequency scaling (CVFS). They are the inventor of this technology, with seven patents for both hardware and software, with more patents in the pipeline. The key innovation here is a major re-design of the processor architecture to allow self-timed performance on a device-by-device basis. This allows easier timing closure and results in higher performance for the same voltage when compared to traditional approaches. Their approach also allows frequency and voltage to be controlled by software. For example, if the user sets the frequency for a particular workload, the voltage will adjust automatically.

The bottom line is a 10X improvement in energy efficiency, which is a game changer. Eta Compute also examined what was needed for TinyML from an architectural point of view. It turns out that DSPs are better at some parts of deep learning for IoT and CPUs are better for other parts. So, the ECM3532 supports a dual core architecture, with both Arm M3 and dual MAC DSPs on board that can operate at independent frequencies. There is a lot more in-depth discussion on this and other topics during the webinar.

I will leave you with some information on availability. An ASIC version of the architecture, the ECM3531 and an evaluation board is available now. Samples of the full ECM3532 AI platform and evaluation board will be available in April 2020 with full production in May 2020. Eta Compute is also working on a software environment (called the TENSAI platform) to help move your deep learning application from the bench to the ECM3532 with full access to all the optimization technologies.

There is a lot more eye-popping power and performance information presented during the webinar. I highly recommend you register and catch this event here.

 


Project-Centric Design Process, or IP-centric

Project-Centric Design Process, or IP-centric
by Daniel Payne on 04-14-2020 at 10:00 am

projects

How do most IC design teams organize their work during the design process?

Most design teams would say that they organize their work into a project-centric view, and that at the beginning of the process use a tool for requirements management, maybe a bug tracker, or some design management tool. On the four IC designs that I worked on in the 70’s and 80’s, each one took a project-centric view, and there was virtually zero IP reuse going on.

Let’s take a closer look at some common issues that arise with a project-centric approach to SoC design.

Scalability

A team starts out on a new SoC and then someone in the CAD groups sets up a new project in each of their tools, like:

  • Requirements management
  • Bug tracking
  • Design Management

Most electronic products tend to reuse cells, blocks, modules and sub-systems from previous products, but how does a project-centric flow account for any of this IP reuse?

Any dependencies in these IP blocks are not really handled with point tools that basically silo design data. Each new project then gets a new DM server instance, and who is going to maintain these servers for years or even decades?

If your company has four concurrent projects going on, then who is tracking what is commonly used between each of the projects, when all of the tools are setup per project?

If your tools only understand the scope of what’s inside each Project, then there’s a gap of knowing what happens if a common cell, block, module or sub-system (IP)  is changed or a bug is fixed, creating a new version.

Collaboration

Common IP blocks being used within multiple projects makes collaborating a challenge, because each project has their own permission settings, as individual servers are setup per project. Who wants to stop an ongoing project to request access to all IP blocks being used?

Traceability

When purchasing design or verification IP you have to sign a license agreement with each vendor, and these vendors want to track how many instances of their valuable IP is being used to ensure that the agreement terms are being met. You really want to know how all IP is being used, across all projects, not just within one project.

Countries have laws in place regarding how silicon IP is being used or exported, and for American companies the U.S. Department of State has defined the International Traffic in Arms Regulations (ITAR). Your company needs to know how each IP block complies with ITAR or other local requirements.

If a bug is found inside some IP block, and that block is re-used in multiple projects, then how does each project team hear about the bug fix?

IP-centric Design Process

There is an alternative approach to a project-centric design process with its challenges, and that is to use an IP-centric design process. Instead of each project being a silo of design data, each project can be treated as an IP block as part of a connected hierarchy of other IP blocks as shown below:

With this IP-centric approach each Project continues to have its own permissions and DM backend as desired. IP metadata goes along with each IP block, so that all users of an IP block have all the info they need when reusing. Even dependencies from bug tracking tools and requirements tools are integrated into this IP-centric view.

Scaling works well because there’s a centralized server that can be quickly update once there’s an IP update, then its effects are seen in all projects. This is the approach that Methodics has taken with Percipient, their IP Lifecycle Management (IPLM) tool. Shown below are four projects being managed with the Percipient central server.

Your company can even follow a Zero-Downtime upgrade policy while using a central server approach.

When a bug is found for an IP block in Project A, then an engineer would file a bug report under Project A. Engineers on Project B and C would then note that a new bug was just filed on the re-used IP block.

Summary

Times have changed, and IC designs are getting larger every day, so the approach that your company and teams take makes a difference. The project-centric approach worked OK for small designs with little IP reuse, however for today’s SoC projects you’d be better served with the IP-centric approach being offered by Methodics. I like how they’ve integrated with other bug tracking and DM tools, so you don’t have to ask your CAD group to customize lots of point tools to play well together.

Here’s a final view of how Percipient provides several useful management features.

To read the complete 10 page White Paper, browse here.

Related Blogs


Innovation in Verification April 2020

Innovation in Verification April 2020
by Bernard Murphy on 04-14-2020 at 6:00 am

Innovation

This blog is the next in a series in which Paul Cunningham (GM of the Verification Group at Cadence), Jim Hogan and I pick a paper on a novel idea we appreciated and suggest opportunities to further build on that idea.

We’re getting a lot of hits on these blogs but would like really like to get feedback also.

The Innovation

Our next pick is Metamorphic Relations for Detection of Performance Anomalies. The paper was presented at the 2019 IEEE/ACM International Workshop on Metamorphic Testing. The authors are from Adobe, the University of Wollongong, Australia and the Swinburne University of Technology, Australia.

Metamorphic testing (MT) is a broad principle to get around the oracle problem – not having a golden reference to compare for correctness. Instead it checks relationships expected to hold between related tests. Maybe for a distribution in runtimes, or correspondence between two software runs with code changes, or many other examples.

The authors applied the principle to test performance in software called a tag manager. Tags are slivers of JavaScript inserted in a web page to collect information from page views. Consumer-focused companies may have 50+ tags on a page, a maintenance headache. Tag managers allow marketing to quickly update these without web expertise, at the expense of some added page load time.

The authors tested load times for an Adobe tag manager. Since multiple factors influence load, they expected a distribution. The metamorphic relationship they chose was that load times with tagging should be shifted (by tag support overhead) from load times without tagging, but that distributions should otherwise be similar.

The relationship held in most cases except one where the managed distribution became bimodal. This they tracked to a race condition between different elements of the code. Depending on execution order a certain function would or would not run, causing the bimodal distribution. This was a bug; the function should have run in either case. When fixed, the distribution again became unimodal. The authors also describe how they automated this testing.

Paul

I like this. I see it as a way to do statistical anomaly-based QA. You compare a lot of runs, looking at distributions to spot bugs. I see a lot of applications: anything performance-related, heuristic-based, machine-learning-based will be naturally statistical. Distribution analyses can then reveal more complex issues than pass/fail analyses. MT gives us tools to find those kinds of problem.

For functional verification, this is a new class of coverage we can plan and track alongside traditional static and dynamic coverage metrics. I’m excited by the idea that a whole new family of chip verification tools could be envisioned around MT, and I welcome any startups in this space who want to reach out to me.

The main contribution in this paper assumes, given some performance metric with random noise, you’re going to have a distribution.  Mu/sigma alone don’t fully classify the distribution. If the it’s multi-modal, maybe there’s a race? Now I’m looking to distribution modality to detect things like race conditions. That’s great and got me thinking how we might use this in our QA.

They discuss mechanics to automate detecting bi-modality, but then raise another possibility – using machine learning to check for changes between distributions. Mathematical characterization may not be as general as training a neural network to detect anomalies between different sets of runs. Similar to what credit card companies do in analyzing your spending patterns. If an anomaly is detected, maybe you’ve been hacked.

MT could find problems sooner and at finer levels than traditional software testing. The latter will find obvious memory leaks or race conditions, but MT plus statistical analysis may probe more sensitively for problems that might otherwise be missed.

Finally, the authors discuss outliers in the distribution, that these should remain similar between distributions. I’m excited to see how they develop this further, how they might detect difference in outliers and what bugs those changes might uncover.

Generally, I see significant opportunity in exploiting these ideas.

Jim

This is the first of the papers we’ve looked at in this series which to me is more than just a feature. This paper would definitely be worth putting money behind, trying to get to production. It looks like a product, perhaps a new class of verification tool. It might even work as a startup.

It reminds me of Solido and Spice. We used similar techniques to get beyond the regular statistical distributions – they were at six sigma already, very hard to get better. They had to start doing stuff like this to go further. I heard “no-one’s going to buy more, they already have spice”. Well they did buy a lot more. There is appetite out there for innovation of this kind.

I’m also very interested in the security potential, especially for the DoD. Another worthy investment area.

Me

As Paul says, MT is a rich vein, too rich to address in one blog. I’ll add one thought I found in this paper. We invest huge amounts of time and money in testing. For passing tests, the only value we get is that they didn’t fail. Can we extract more? Maybe we can through MT.

To see the previous paper click HERE.


Linley Spring Processor Conference Kicks Off – Virtually

Linley Spring Processor Conference Kicks Off – Virtually
by Mike Gianfagna on 04-13-2020 at 10:00 am

Linley Gwennap

The popular Linley Processor Conference kicked off its spring event at 9AM Pacific on Monday, April 6, 2020. The event began with a keynote from Linley Gwennap, principal analyst and president at The Linley Group. Linley’s presentation provided a great overview of the application of AI across several markets. Almost all of the conference is focused on AI.

Before getting into Linley’s keynote, I want to comment on the overall event. Delivering a live event through the internet is challenging. Holding attention spans, dealing with network glitches and capturing the spontaneous nature of the interaction between the speaker and the audience is not easy to accomplish. I suspect there are a lot of newly minted web meeting aficionados these days, so you know what I mean.

Simply put, the Linley Processor Conference appears to be doing a thoughtful and well-planned job of delivering the closest thing to a live, in-person event. Each presentation is followed by a relatively short Q&A.  Questions are queued from written requests from the audience. This definitely works much better than opening everyone’s audio and hoping you can hear just one person at a time. After the short Q&A, there are separate break-out meetings with each speaker at the end of each day. These tend to be smaller meetings and some speakers do open up audio for these events to foster an interactive discussion.

Mike Demler, senior analyst at The Linley Group, moderated several presentations on ultra-low power AI during the first day. Each presentation was quite engaging, using slides, real-time demos and full-motion video of the speaker. I dropped in on all of the break-out sessions. All had good attendance (with Linley having the largest audience). These Q&A sessions were less formal than the presentations.

Thanks to the strong presenters and highly engaged audience, these sessions touched on all sorts of relevant and useful topics. I particularly liked the way Jonathan Tapson, chief scientific officer at GrAI Matter Labs demonstrated how his company achieves sparse processing with a real-time self-driving car demo. There are also breaks sprinkled throughout the event with slide shows from the various sponsors. A good time to check out these technologies or get another cup of coffee. The sessions run from 9AM to 12:45PM over four days. Another good move as a full-day web meeting is too much for most.

If you weren’t able to register for the event, keep watching the Linley site. The Linley Group will develop presentation materials and videos of the conference and make them available sometime after the event concludes.

Back to Linley’s keynote. The tropics covered include:

  • Deep learning trends
  • AI in the data center
  • AI in automotive
  • AI at the edge
  • Ultralow power AI

I won’t attempt to capture all the information presented here. You can catch the replay of Linley’s keynote for that. I will offer a few nuggets presented on each topic.

Deep learning trends: Model growth is exploding. Image processing models are growing at 2X per year – increased accuracy means increased size. The same is true for natural language processing.  Some models have 17 billion parameters. That’s not a typo. Architectures support both large numbers of simple processors (hundreds of thousands per chip) as well as a smaller number of complex processors. The decision of which way to go depends on the workload and your business plan. Convolution accelerators, systolic arrays, sparse computing, in-memory computing, binary neural networks, analog computing and more are all touched on.

Data center: NVIDIA is still the leader but, there is a lot of competition in this multi-billion-dollar market. What will the new announcements from NVIDIA be? Competitors discussed in this market include Cerebras, Intel (with its Habana acquisition to replace Nervana), Huawei, Graphcore, Groq, Xilinx, SambaNova, Alibaba, Google, Microsoft, Amazon and Baidu.  The challenges of developing a new software stack is discussed as well.

Automotive: Autonomous driving deployment is taking longer than expected. Limited Level 3 capability is available now. Level 4 is next, likely implemented as commercial fleets (taxis, trucking, etc.). Vendors discussed include GM, Tesla, Waymo, Intel/Mobileye and NVIDIA.  Will Level 5 ever happen?  Listen to the keynote.

The edge: The general move from the cloud to the edge motivated by things like power, latency, scalability, reliability and privacy were discussed. The edge is really a hierarchy of processing capability. AI accelerators in smartphones is also discussed. AI for embedded applications was also discussed. The barrier to entry here is lower, so this is a potential area of large growth. There is a long list of companies mentioned.

Ultralow power: Power optimization was discussed throughout the presentation. The TinyML Foundation and the TinyML Summit were discussed. Much of this work focuses on embedded applications.

That’s a quick overview of Linley’s keynote. If you missed it, I highly recommend you watch the replay. All event proceedings and video replays are available here.


Webinar on Transient Simulation of Power Transistors in Converter Circuits

Webinar on Transient Simulation of Power Transistors in Converter Circuits
by Tom Simon on 04-13-2020 at 6:00 am

PTM TR high side low side currents 300x182 1

Magwel is offering a webinar that takes a deeper look at how Power Transistors can be more accurately simulated in converter circuits to provide extremely accurate information about switching efficiency. DC converter circuit efficiency has a big effect on the battery life of mobile devices and can affect performance and efficiency for wall-power operated circuits. One large consideration is that PowerMOS devices themselves do not operate as ideal devices. Performing circuit analysis at the device pins leaves out important information about what is happening inside these devices.

PowerMOS devices are really an assembly of large numbers of parallel intrinsic devices with a complex and distributed structure. As such, switching does not occur simultaneously across all the intrinsic devices. In converter circuits PowerMOS RC delays can affect Vgs over time at the gate contacts in low and high side transistors. Previously it has been difficult to run full circuit simulations that take this into consideration. Fine grain extraction of gate, source and drain interconnect is difficult for traditional circuit level extractors. Designers have struggled with this lack of visibility up until now.

Magwel offers a tool specifically targeted at realizing comprehensive and accurate simulation of converter circuits, including the complex internals of PowerMOS devices. Magwel’s PTM-TR does several unique things to provide transparency into the detailed switching behavior of PowerMOS devices. PTM-TR uses a solver-based extractor to correctly and accurately determine parasitics for the internal metallization within PowerMOS devices. The gate regions are divided up according to user set parameters and the intrinsic device model is applied to create a simulation view of the device that incorporates its full internal structure. This model is known as a Fast3D model and is used by PTM-TR with Cadence Spectre® to co-simulate dynamic gate switching behavior at each time step of circuit operation.

Because the Fast3D model is used in conjunction with Spectre circuit simulation, it can be used with test benches, or to perform any desired simulation, such as corner analysis. PTM-TR comes with the additional benefit of showing graphically a field view of the device internals at each time step. During early switching with only small sections of the device turned on higher than expected current densities are possible – leading to EM and thermal issues. With PTM-TR designers can modify and test PowerMOS devices to achieve optimal performance and reliability.

To learn more and see how Magwel’s PTM-TR helps engineers optimize switching performance in converter circuits, sign up for the free webinar replay. Magwel Application Engineer Allan Laser will present an overview of the tool and then go through a demo that shows the simulation results and detailed insight into internal device operation during transient operation.