Semiwiki EDA Webinar 800x100

Podcast EP89: An Overview of NXP’s MCX MCU Products with CK Phua

Podcast EP89: An Overview of NXP’s MCX MCU Products with CK Phua
by Daniel Nenni on 06-22-2022 at 10:00 am

Dan is joined by CK Phua of NXP. CK joined Philips Semiconductors in 1993 and worked in various roles including quality, applications engineering, product engineering and technical marketing. After Philips, CK joined Freescale in 2012 and rejoined NXP through the Freescale merger. CK is now a Product Manager for Microcontrollers in the Edge Processing Business Line.

CK provides a detailed overview of NXP’s MCX product line and its product families, including architecture and capabilities across a broad range of applications. The supporting development environment is also discussed, as well as security capabilities.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


TSMC 2022 Technology Symposium Review – Process Technology Development

TSMC 2022 Technology Symposium Review – Process Technology Development
by Tom Dillinger on 06-22-2022 at 5:00 am

finFLEX

TSMC recently held their annual Technology Symposium in Santa Clara, CA.  The presentations provided a comprehensive overview of their status and upcoming roadmap, covering all facets of process technology and advanced packaging development.  This article will summarize the highlights of the process technology updates – a subsequent article will cover the advanced packaging area.

First, here is a brief overview of some of the general observations and broader industry trends, as reported by C.C. Wei, TSMC CEO.

General

  • “This year marks TSMC’s 35th anniversary. In 1987, we had 258 employees in one location, and released 28 products across 3 technologies.  Ten years later, we had 5,600 employees, and released 915 products across 20 technologies.  This year in 2022, we have 63,000 employees, and will release 12,000 products across 300 technologies.” 
  • “From 2018 to 2022, the volume of 12” (equivalent) wafers has had an annual CAGR exceeding 70%. In particular, we are seeing a significant increase in the number of ‘big die’ products.”  (>500mm**2)
  • “In 2021, TSMC’s North America business segment shipped more than 7M wafers and over 5,500 products. There were 700 new products tapeouts (NTOs).  This segment represents 65% of TSMC’s revenue.”
  • “Our gigafab expansion plans have typically involved adding two new ‘phases’ each year – that was the case from 2017-2019. In 2020, we opened six new phases, including our advanced packaging fab.  In 2021, there were seven new phases, including fabs in Taiwan and overseas – advanced packaging capacity was added, as well. In 2022, there will be 5 new phases, both in Taiwan and overseas.” 
    • N2 fabrication: Fab20 in Hsinchu
    • N3: Fab 18 in Tainan
    • N7 and N28: Fab22 in Kaohsiung
    • N28: Fab16 in Nanjing China
    • N16, N28, and specialty technologies: Fab23 in Kumanoto Japan (in 2024)
    • N5 in Arizona (in 2024)
  • “TSMC has 55% of the worldwide installed base of EUV lithography systems.”
  • “We are expanding our capital equipment investment significantly in 2022.” (The table below highlights the considerable jump in cap equipment planned expenditures.)
  • “We are experiencing stress in the manufacturing capacity of mature process nodes. In 35 years, we have never increased the capacity of a mature node after a subsequent node has ramped to high volume manufacturing – that is changing.”
  • “We continue to invest heavily in “intelligent manufacturing”, focusing on precision process control, tool productivity, and quality. Each gigafab handles 10M dispatch orders per day, and optimizes tool productivity.  Each gigafab generates 70B data points daily to actively monitor.” 

For the first time at the Symposium, a special “Innovation Zone” on the exhibit floor was allocated. The recent product offerings from a number of start-up companies were highlighted.  TSMC indicated, “We have increased our support investment to assist small companies adopt our technologies.  There is a dedicated team that focuses on start-ups.  Support for smaller customers has always been a focus.  Perhaps somewhere in this area will be the next Nvidia.”

Process Technology Review

With a couple of exceptions discussed further on, the process technology roadmap presentations were somewhat routine – that’s not a bad thing, but rather an indication of ongoing successful execution of prior roadmaps.

The roadmap updates were presented twice, once as part of the technology agenda, and again as part of TSMC’s focus on platform solutions.  Recall that TSMC has specifically identified four “platforms” that individually receive development investment to optimize the process technology offerings:  mobile; high-performance computing (HPC); automotive; and IoT (ultra-low power).  The summaries below merge the two presentations.

N7/N6

  • over 400 NTOs by year-end 2022, primarily in the smartphone and CPU markets
  • N6 offers transparent migration from N7, enabling IP re-use
  • N6RF will be the RF solution for upcoming WiFi7 products
  • there is an N7HPC variant (not shown in the figure above), providing ~10% performance improvement at overdrive VDD levels

For N6, logic cell-based blocks can be re-implemented in a new library for additional performance improvements, achieving a major logic density improvement (~18%).

N5/N4

  • in the 3rd year of production, with over 2M wafers shipped, 150 NTOs by year-end 2022
  • mobile customers were the first, followed by HPC products
  • roadmap includes ongoing N4 process enhancements
  • N4P foundation IP is ready, interface IP available in 3Q2022 (to the v1.0 PDK)
  • there is an N5HPC variant (not shown in the figure above, ~8% perf improvement, HVM in 2H22)

As with the N7/N6, N4 provides “design re-use” compatibility with N5 hard IP, with a cell-based block re-implementation option.

The complexity of SoC designs for the automotive segment is accelerating.  There will be an N5A process variant for the automotive platform, qualified to AEC-Q100 Grade 1 environmental and reliability targets (target date: 2H22).  The N5A automotive process qualification involves both modeling and analysis updates (e.g., device aging models, thermal-aware electromigration analysis).

N3 and N3E

  • N3 will be in HVM starting in the second half of 2022
  • N3E process variant in HVM one year later; TSMC is expecting broad adoption across mobile and HPC platforms
  • N3E is ready for design start (v0.9 PDK), with high yield on the standard 256Mb memory array qualification testsite
  • N3E adds the “FinFLEX” methodology option, with three different cell libraries optimized for different PPA requirements (more at the end of this article)

Note that N3 and N3E are somewhat of an anomaly to the prior TSMC process roadmap.  N3E will not offer a transparent migration of IP from N3.  The N3E offering is a bit of a “correction”, in that significant design rule changes to N3 were adopted to improve yield.

TSMC’s early-adopter customers push for process PPA updates on an aggressive timeline, whether an incremental, compatible variant to an existing baseline (e.g., N7 to N6, N5 to N4), or for a new node.  The original N3 process definition has a good pipeline of NTOs, but N3E will be the foundation for future variants.

N2

  • based on a nanosheet technology, target production date: 2025
  • compared to N3E, N2 will offer ~10-15% performance improvement (@iso-power, 0.75V) or ~25-30% power reduction (@iso-perf, 0.75V); note also the specified operating range in the figure above down to 0.55V
  • N2 will offer support for a backside power distribution network

Parenthetically, TSMC is faced with the dilemma that the requirements of the different platforms have such a broad range of targets for power, performance, and area/cost.  As was noted above, N3E is addressing these targets with different libraries, incorporating a different number of fins that define the cell height.  For N2 library design, this design decision is replaced by a process technology decision on the number of vertically-stacked nanosheets throughout (with some allowed variation in the device nanosheet width).  It will be interesting to see what TSMC chooses to offer for N2 to cover the mobile and HPC markets, in terms of the nanosheet topology.  (The image below from an earlier TSMC technical presentation at the VLSI 2022 Conference depicts 3 nanosheets.)

NB:  There are two emerging process technologies being pursued to reduce power delivery impedance and improve local routability – i.e., “buried” power rail (BPR) and “backside” power distribution (BSPDN).  The initial investigations into offering BPR have quickly expanded to process roadmaps that integrate full BSPDN, like N2.  Yet, it is easy to get the two acronyms confused.

Specialty Technologies

TSMC defines the following offerings into a class denoted as “specialty technologies”:

  • ultra-low power/ultra-low leakage (utilizing an ultra-high Vt device variant)
    • requires specific focus on ultra-low leakage SRAM bitcell design
    • N12e in production, N6e in development (focus on very low VDD model support)
  • (embedded) non-volatile memory
    • usually integrated with a microcontroller (MCU), typically in a ULP/ULL process
    • RRAM
      • requires 2 additional masks, embedded in BEOL (much lower cost than the 12 masks for eFlash)
      • 10K write cycles (endurance specification), ~10 years retention @125C
    • MRAM
      • 22MRAM in production, focus is on improving endurance
      • 16MRAM for Automotive Grade 1 applications in 2023
  • power management ICs (PMIC)
    • based on bipolar-CMOS-DMOS (BCD) devices: 40BCD+, 22BCD+
    • for complex 48V/12V power domains
    • requires extremely low device R_on
  • high voltage applications (e.g., display drivers, using N80HV or N55HV)
  • analog/mixed-signal applications, requiring unique active and passive structures (e.g., precision thin-film resistors and low noise devices, using N22ULL and N16FFC)
  • MEMS (used in motion sensors, pressure sensors)
  • CMOS image sensors (CIS)
    • pixel size of 1.75um in N65, 0.5um in N28, transitioning to N12FFC
  • radio frequency (RF), spanning from mmWave to longer wavelength wireless communication; the upcoming WiFi7 standard was highlighted

“The transition from WiFi6 to WiFi7 will require a significant increase in area and power, to support the increased bandwidth requirements – e.g., 2.2X area and 2.1X power.  TSMC is qualifying the N6RF offering, with a ~30-40% power reduction compared to N16RF.  This will allow customers currently using N16RF to roughly maintain existing power/area targets, when developing WiFi7 designs.”

The charts below illustrate how these specialty technologies are a fundamental part of platform products – e.g., smartphones and automotive products.  The characteristic process nodes used for these applications are also shown.

Although the focus of smartphone development tends to be on the main application processor, the chart below highlights the extremely diverse requirements for specialty technology offerings, and their related features.  In the automotive area, the transition to a “zonal control” architecture will require a new set of automotive ICs.

N3E and FinFLEX

The FinFLEX methodology announcement was emphasized, with TSMC indicating “FinFLEX will offer full-node scaling from N5.”

As FinFET technology nodes have scaled – i.e., from N16 to N10 to N7 to N5 – the fin profile and drive current_per_micron have improved significantly.  Standard cell library design has evolved to incorporating fewer pFET and nFET fins that define the cell height (specified in terms of the number of horizontal metal routing tracks).  As illustrated above, the N5 library used a 2-2 fin definition – that is, 2 pFET fins and 2 nFET fins to define the cell height.  (N16/N12 used a 3-3 configuration.)

The library definition for N3E was faced with a couple of issues.  Mobile and HPC platform applications are increasingly divergent, in terms of their PPA (and cost) goals.  Mobile products focus on circuit density to integrate more functionality and/or reduced power, with less demanding performance improvements.  HPC is much more focused on maximizing performance.

As a result, N3E will offer three libraries, as depicted in the figure above:

    • an ultra low power library  (cell height based on a 1-fin library)
    • an efficient library (cell height based on a 2-fin library)
    • a performance library (cell height based on a 3-fin library)

The figure below is from TSMC’s FinFLEX web site, illustrating the concept (link).

Now, offering multiple libraries for integration on a single SoC is not new.  For years, processor companies have developed unique “datapath” and “control logic” library offerings, with different targets for:  cell heights, circuit performance, routability (i.e., max cell area utilization), and distinct logic offerings (e.g., wide AND-OR gates for datapath multiplexing).  Yet, the physical implementation of SoC designs using multiple libraries relied upon a consistent library per design block.

The unique nature of the FinFLEX methodology is that multiple libraries and multiple track heights will be intermixed within a block. 

After the TSMC Symposium, additional information became available.  A block design will alternate rows for the two libraries.  For example, a 3:2 block design will have alternate row heights accommodating cells from the 3-fin and 2-fin library designs.  A 2:1 block design will have alternate rows for cells from the 2-fin and 1-fin libraries.

TSMC indicated, “Different cell heights (in separate rows) are enabled in one block to optimize PPA.  FinFLEX in N3E incorporates new design rules, new layout techniques, and significant changes to EDA implementation flows.”

There will certainly be more information to come about FinFLEX and the changes to the general design flow.  Off-hand, there will need to be new approaches to:

    • physical synthesis
      • how will synthesis improve timing on a critical signal
      • will synthesis strive to provide a netlist with a balanced ratio of cells from the two libraries for the alternating rows

For example, to improve timing on a highly-loaded signal, synthesis would typically update a cell assignment in the library to the next higher drive strength – e.g., NAND2_1X to NAND2_2X.

With FinFLEX, additional options are available with the second library – e.g., whether an update to NAND2_1X_2fin would use NAND2_2X_2fin or NAND2_1X_3fin.  Yet, if the latter is chosen, the new cell will need to be “re-balanced” to a different row in the block floorplan.  The effective changes in performance and input/output wire loading for these choices are potentially quite complex to estimate during physical synthesis.

The cell selection options get even more intricate when considering specific flop cells to use, given not only the differences in clock-to-Q delays, but also the setup and hold time characteristics, and input clock loading.   When would it be better for individual flop bits in a register to use different output drive strengths in the same library (and be placed locally) versus having register bits re-balanced to a row corresponding to a different library selection?

With an alternating row configuration, the assumption is that there will be an even mix of cells from the two libraries.  Yet, the synthesis of a block may only require a small percentage of “high-performance” cells to meet timing objectives.   An output netlist without a balanced mix of library cells may have low overall utilization, suggesting a uniform row, single-library block floorplan may be suitable instead.  This may result in iterations in the chip floorplan (and likely, revisions in the power distribution network, as well).

    • sub-block level IP integration

Blocks often contain a number of small hard IP macros, such as register files (typically provided by a register file generator).  With non-uniform row heights, the algorithms in the generator become more complex, to align the power continuity between the macro circuits and the cell rows.  And, there will be placement restriction rules that will need to be added to the hard IP models.

    • timing/power optimizations during physical design

Similarly to the physical synthesis block construction options, there will be difficult decisions on cell selection during the timing and power optimization steps in the physical design flow.  For example, if a cell can reduce its assigned drive strength to save power while still meeting timing, would a change in library selection, and thus row re-balancing, be considered?  Would the corresponding changes in the cell placement negate the optimization?

and, last but most certainly not least,

    • Will there be new EDA license costs to enable N3E FinFLEX?

(Years ago, the CAD department manager at a previous employer of mine went viral at the license cost adder to enable placement and routing for multipatterning requirements.  Given the significant EDA investment required to support FinFLEX, history may repeat itself with additional license feature costs.)

The FinFLEX methodology definitely offers some intriguing options.  It will be extremely interesting to see how this approach evolves.

Analog design migration automation

Lastly, TSMC briefly highlighted work they are pursuing in the area of assisting designers migrate analog/mixed-signal circuits and layouts to newer process nodes.

Specifically, TSMC has defined a set of “analog cells”, with the capability to take an existing schematic, re-map to a new node, evaluate circuit optimizations, and migrate layouts, including auto-placement and (PG + signal) routing.

The definition of the analog cell libraries for N5/N4 and N3E are complete, with N7/N6 support to follow.  TSMC showed an example of an operational transconductance amplifier (OTA) that had been through the migration flow.

Look for more details to follow. (This initiative appears to overlap with comparable features available from EDA vendor custom physical design platforms.)

A subsequent article will cover TSMC’s advanced packaging announcements at the 2022 Technology Symposium.

-chipguy

Also read:

Three Key Takeaways from the 2022 TSMC Technical Symposium!

Inverse Lithography Technology – A Status Update from TSMC

TSMC N3 will be a Record Setting Node!


Qualcomm’s AI play

Qualcomm’s AI play
by Anand Joshi on 06-21-2022 at 10:00 am

int nvda qcom

Qualcomm is a common name in mobile industry for chips. The company has generated $33 billion in revenue in 2021 and continues to march ahead with its innovations. However, Qualcomm doesn’t get the same visibility and mention as Nvidia and Intel in the world of AI chips. By our estimate, Qualcomm’s contribution to AI chip market is comparable to Intel and Nvidia given the volume shipment of smartphones and silicon content dedicated to AI in recent years. Qualcomm has been steadily making progress on key AI chip markets and perhaps has the most diverse and comprehensive portfolio to cater all AI chip markets.

Figure shows different segments within AI chip market and products in each

AI chip market has grown significantly in the past few years and you can read all about it in JP Data’s latest report on AI chips. According to the analysis, overall AI chip market can be best segmented by power consumption: data center AI chips segment (50+W), mid power AI chips  (5-50W, primarily for automotive and such markets), low power AI chips (0.1-5W, primarily for mobile and client computing) and ultra-low power AI chips (<0.1W for always on applications).  There’s no sign of slowdown in AI yet with enterprises as well as edge device markers eager to test out new solutions. Many use cases and exciting applications are continuing to emerge. Proof of concept applications that are going into production are driving the need for AI inference chips.

Qualcomm is poised to play in all markets which sets it apart from other companies. For the data-center market, the company has introduced AI100 chip and results submitted on MLPerf compete well with Nvidia. Qualcomm boasts its significantly higher performance per watt than the competition. Qualcomm is actively adapting its Snapdragon product line to support automotive market and recently claimed design wins at BMW. Qualcomm’s dominance in low power market segment within mobile world is well known and needs no introduction. The same chips offers ultra low power mode for always on applications enabling a whole new set of AI use cases for device manufacturers.

This makes its portfolio even more comprehensive than Nvidia and Intel if we keep training aspect aside. Nvidia for example, doesn’t have products in the mobile space and neither does Intel. Intel and Nvidia don’t have solutions for ultra-low power market either.

Qualcomm was somewhat late to the party and focused earlier on accelerating AI via enhancing its Hexagon DSP and Adreno GPU. The company then acquired Nuvia to create new AI accelerator. At Microsoft’s 2022 Build conference, the company announced Project Volterra, a new device powered by Snapdragon chips that contain AI accelerator, NPU.  The dedicated accelerator will become part of Microsoft’s Windows 11. Via the included SDK to build AI applications, the chip will enable AI usage within large number of Windows applications to potentially challenge X86 dominance in PC world.

Qualcomm has invested heavily into AI since. Qualcomm announced 100 million AI fund way back in 2018, has aggressively invested in AI R&D and released SDK that allows developers to take a model and customize it for mobile, automotive, IoT, robotics or other markets.  While there is no data on active AI developers for Qualcomm, we expect the number to be much lower than bragging rights gained by Nvidia and Intel. In fact, Google trends  search reveals that the searches for Qualcomm AI are far below Nvidia AI or Intel AI suggesting that there’s a lot of catching up to do.

The AI chip market is still emerging. Nvidia has become de-facto standard in training but the inference market is just starting its ramp up. If Qualcomm is indeed able to offer a consistent software experience across different market segments, it has a potential to become a formidable player in the AI chip market.

Also read:

A Fresh Look at HLS Value

How to Cut Costs of Conversational AI by up to 90%

HLS in a Stanford Edge ML Accelerator Design


A Fresh Look at HLS Value

A Fresh Look at HLS Value
by Bernard Murphy on 06-21-2022 at 6:00 am

Streaming min

I’ve written several articles on High-Level Synthesis (HLS), designing in C, C++ or SystemC, then synthesizing to RTL. There is unquestionable appeal to the concept. A higher level of abstraction enables a function to be described in less lines of code (LOC). Which immediately offers higher productivity and implies less bugs because the number of bugs in any kind of code scales pretty reliably with LOC. Simulation for architectural design and validation runs multiple orders of magnitude faster, allowing for broader experimentation with options. It also can run much larger tests like image recognition on streaming video, a tough goal for RTL simulations. Yet these methods have largely been restricted to specialized design objectives it seemed. Signal processing functions, some simple ML inference engines, that sort of thing.

I’m always willing to be re-educated, especially when I can hear from customers. Siemens EDA just hosted a webinar, mostly customer talks on use of HLS with just a little marketing thrown in. Pretty much a full day of presentations, centering around a few core applications, which made me rethink my position. The algorithm classes the technology best serves haven’t changed so much. What has changed is that big market needs have shifted to overlap more with those algorithms. Check out which companies presented on these topics. Naturally, when these speakers talked about HLS, they meant Catapult from Siemens EDA.

Video Codecs

There’s been a massive worldwide increase in cloud video workload. According to Google, video now accounts for more than 80% of internet traffic, thanks to streaming and YouTube in particular. Aki Kuusela of Google said that this volume demands warehouse scale encoding with fast throughput. From his perspective the whole warehouse must be viewed as a system – storage, networking, codec, compute, etc. – to optimize for this level of traffic and throughput. Moreover, codecs must support a variety of video formats, required to be seamless from the latest formats, to popular standards to legacy standards. Think of YouTube; every minute 500 hours of new content is uploaded, and tens of thousands of live streams must be served simultaneously.

Off the shelf solutions can’t meet this need. For the same reason Google built their own ML training platforms (TPUs), they are building their own codecs which must be optimized across traffic diversity, quality, throughput, and availability that only they can reproduce. Google started early with HLS to integrate with the YouTube stack. Nvidia is doing very similar work, also on video codecs. The world leader in GPUs, for gaming, for graphics, for AI needs to have the fastest and highest quality video. Of course they are building their own codecs.

Object detection for the Mars sample return program

Another cool video example (but not a codec) is from NASA/JPL. This from the team that brought you Ingenuity, the Mars helicopter. Now they are designing something called a Harris corner detector, an image-related algorithm, as a part of development for the Mars sample return project. The original implementation was in RTL as a DSP-like function, but this proved difficult to optimize. The speaker describes approaches using SystemC, implementing a DSP process or a Kahn (essentially self-timed) process, using the flexibility HLS offers for experimenting with these options.

OK, so video applications like these are still in that same algorithmic niche I was talking about earlier. But the business relevance of the video processing niche has exploded. Carrying HLS along with it.

Wireless applications

NXP, as a leader in automotive electronics, is working on a complete baseband for ultra-wideband (UWB). The technology you will soon be using for ultra-secure keyless entry to your car (your current Bluetooth-enabled keyless entry is not so secure). At some point maybe also contactless payment for the same reason. They found their traditional approach to designing the baseband, starting from Simulink, was too slow to converge. Much of the functionality here is signal processing; think filters and equalizers in multiple channels for example. Such a design demands high levels of parallelism at high clock rates which is difficult to architect in a timing-unaware platform. The application must also be very low power; think of UWB in a car key fob, running off a coin cell battery. These designs must build on custom-crafted signal processing.

A new company, Viosoft, is building a complete RAN physical layer for 5G (the radio unit piece of the network), from rate matching/channel mapping to model, time/frequency synchronization, MIMO/beamforming to RF processing and more. This must handle multiple bandwidth and latency requirements and multiple transmission frequencies. Once more lots of signal processing with huge demand for flexibility. The application will be built on an FPGA but still must be power optimized because it will be sitting in a potentially remote location.

Wireless, lots of signal processing, and low power demand. Once again requiring custom design solutions, built through HLS.

Smart sensing and wireless power transfer

ST provided a fascinating 3-part pitch. The first section was on infrared sensing for people detection in a room using a smart sensor. This technology can be useful for energy-saving controls. Sensing is on a grid within a room, allowing for machine learning of patterns of movement, thus a neural network which is where they use HLS.

The next application was a Qi (wireless power transmission) demodulator, a modem-like (and therefore DSP-like) function extracting power rather than information from the signal. The third application was a contactless infrared sensor, something familiar to all of us now thanks to COVID. A prior implementation did the temperature calcs in an embedded processor. This work pushes the calculation into the smart sensor, first establishing a correction for ambient temperature and for the sensed object temperature, then using Stefan Boltzmann law (yay physics!) to compute the temperature of the object. Note these are simply formulae, not DSP or ML operations, but they do use floating point math for precision, so the HLS approach was an easy choice.

What I like here is the applicability of HLS to these consumer-oriented applications, where cost and power will both be critical.

Wrap up

I skipped a couple of talks, one from Nvidia research on modeling interconnect in SystemC to get some feel for latencies as a function of layout. Another was from Siemens EDA on MatchLib, the open-source library originally developed by Nvidia in support of this modeling. All good stuff but not directly relevant to my theme here of the compelling demand for HLS in multiple applications.

Bottom line, best fit algorithms still tend to be signal processing centric, but big markets now see huge value in custom hardware development around those algorithms. You can watch the entire set of talks HERE.

Also read:

HLS in a Stanford Edge ML Accelerator Design

Standardization of Chiplet Models for Heterogeneous Integration

Using EM/IR Analysis for Efinix FPGAs


How to Cut Costs of Conversational AI by up to 90%

How to Cut Costs of Conversational AI by up to 90%
by Dave Bursky on 06-20-2022 at 10:00 am

20 Tbps 2D NoC

The burgeoning use of conversational artificial intelligence (CAI) in consumer and business applications places a heavy computational burden on both front-end and back-end systems that provide the natural language processing (NLP). NLP systems rely on deep learning (a subset of machine learning) to automate speech recognition, perform the NLP functions, and then provide the text to speech output. To cut costs of the NLP systems, Achronix and Myrtle.ai have partnered, promising to cut costs by 90% as well as reducing the hardware requirements, described in this whitepaper.

Myrtle.ai, a technology specialist in FPGA AI inferencing, implements performant recurrent neural networks (RNN)-based networks on FPGAs using their MAU inferencing acceleration engine. The MAU engine, integrated into the Achronix Speedster®7t AC7t1500 FPGA, leverages key architectural aspects of the Speedster7t architecture to drastically increase the acceleration of real-time automatic speech recognition (ASR) neural networks. That translates into a 2500% increase in the number of real-time streams that can be processed when compared to a server-class CPU.

The CAI pipeline is often defined by three key functional blocks:

  1. Speech to text (STT), also known as automatic speech recognition (ASR)
  2. Natural language processing (NLP)
  3. Text to speech (TTS) or speech synthesis

Such pipelines are found in the millions virtual voice assistants such as Apple’s Siri or Amazon’s Alexa, or voice search assistants on laptops such as Microsoft’s Cortana, as well as automated call center (or contact center) agents and many other applications. The deep learning algorithms that power these CAI services are either processed on the local electronic device or aggregated in the cloud for remote processing at scale. Large-scale deployments supporting millions of consumer interactions represent  extremely large compute processing challenges that hyperscaler providers have addressed by developing specialized silicon devices to address the processing of these services.

State of the art ASR algorithms are implemented with end-to-end deep learning. Recurrent neural networks (RNN), unlike convolutional neural networks (CNNs), are common in speech recognition. As noted in “CNN vs. RNN: How are they different?” by David Petersson from TechTarget. RNNs are better suited for processing temporal data, aligning well with ASR applications. RNN-based models require high compute capabilities and high memory bandwidths to process the neural network model within the strict latency targets required for conversational systems. When real-time or automated responses are too slow, the system appears sluggish and unnatural. Often low latency is only achieved at the expense of the processing efficiency which pushes up costs and can become too large for practical deployment.

Competing FPGA architectures in the ML acceleration segment claim teraoperations/second (TOPS) rates for inferencing as high as 150 TOPS. Yet in real-world applications, especially those which are latency sensitive such as ASR, these FPGAs fall well short of their headline TOPS rates due to their inability to efficiently transfer data between the compute and external memory. The Achronix Speedster7t architecture strikes the right balance of compute engines, eight high-speed memory interfaces (4 Tbit/s GDDR6 memory interfaces) and high-throughput data transfers (20 Tbit/s network on chip), yielding a device that can deliver 64% of the headline TOPS rates for real- time, low-latency ASR workloads (see the figure).

At the heart of the Speedster 7t architecture are the 2560 machine-learning processor (MLP) blocks. These blocks contain an optimized matrix/vector multiplication function capable of 32 multiplies and one accumulate in a single clock cycle. This is the foundation for the compute engine architecture. Block RAM (BRAM) is co-located with each of the 2560 instances of the MLPs in the AC7t1500, which equates to lower latency and higher throughput. Myrtle.ai’s MAU low latency, high throughput ML inferencing engine has been integrated into the Achronix Speedster7t FPGA, leveraging 2000 of the 2560 MLPs. Because the MLP is a hard block, it can run at a much higher clock rate than if implemented in the FPGA fabric itself.

Most ASR solutions offered by large-scale cloud service providers such as Google, Amazon, Microsoft Azure, and Oracle allow service providers to build products on top of these cloud APIs. However, the service providers face increasingly large bills as their operations scale out, and those products achieve success in the market.

The publicly advertised cost of the larger ASR providers range from $0.01 to $0.025 per minute, and Industry reports suggest that the average call center call is approximately five minutes. Consider a large enterprise data or call center services company fielding 50,000 calls per day at five minutes per call. At the stated rates above, the cost of the ASR processing would range from $1,500 to $6,000 per day or $500,000 to $2,000,000 per year. The Achronix and Myrtle.ai solution can support 4000 RTS on one accelerator card, delivering the capacity to handle over one million calls per day.

There are many factors that would dictate the cost of a stand-alone ASR appliance. For this particular example, assume the Achronix ASR acceleration solution delivered on an FPGA-based PCIe card integrated into an x86-based 2U server. Sold from a system integrator, this appliance might be $50,000 and the annual cost of running the server could double that cost. This leads to $100,000 for the first year for an on-premise ASR appliance. Comparing this on-premise solution versus cloud API services, the end user can enjoy a savings of 5X to 20X in the first year.

Achronix and Myrtle.ai are teaming up to deliver an ASR platform consisting of a 200W, x16 PCIe Gen4-based accelerator card and the associated software which together can sustain up to 4000 RTS concurrently, processing up to 1 million five-minute transcriptions per 24-hour period. Comparing this PCIe accelerator card on a single ×86 server to the cost of cloud ASR services, the first year CAPEX and OPEX can be reduced by as much as 90%.

To download the full whitepaper, visit achronix.com.

Also read:

Benefits of a 2D Network On Chip for FPGAs

5G Requires Rethinking Deployment Strategies

Integrated 2D NoC vs a Soft Implemented 2D NoC


Casting Light on OpenLight’s Open Silicon Photonics Platform

Casting Light on OpenLight’s Open Silicon Photonics Platform
by Kalar Rajendiran on 06-20-2022 at 6:00 am

The Growing Silicon Photonics Market

For many decades now, modern optical technology has been deployed in networking infrastructure, for long haul and medium haul links to support internet communications. The foundation of this technology is photonics, which is the science of generation, manipulation and detection of light for performing functions otherwise achieved using electronics. A fiber-optic module serves as a photoelectric converter to bi-directionally interface the optic side to the electronic side of a communications infrastructure.

Current Market Trends

Over the recent past, there has been an explosive growth of data (in zettabytes) due to the proliferation of mobile applications. In addition, hyperscale data centers, deep learning, 5G and video streaming applications call for higher performance at very low power consumption. With bandwidth, latency, power and reach being key elements relating to connectivity, the above trend has renewed interest in silicon photonics.

Silicon Photonics

Silicon photonics uses silicon as an optical medium where the silicon is patterned into micro-photonic components. Using current semiconductor fabrication techniques, hybrid devices with both optical and electronic components can be integrated on to a monolithic chip. This helps provide very high speed data transfers between and within chips and the continuation of the Moore’s law benefits. Products can enjoy speed improvements at reduced power consumption for data communications as well as ultrasensitive sensing applications such as LiDAR and healthcare.

But a major challenge for silicon photonics is the laser integration and the high cost associated with the manufacturing, addition, assembly, and alignment of those discrete lasers. This becomes an even bigger challenge as the number of laser channels and the overall bandwidth increases.

The Birth and Unveiling of OpenLight

Synopsys already offers an electronic photonic design automation solution that consists of OptoCompiler, OptSim, PrimeSim, Photonic Device Compiler and IC Validator design software products. It is not surprising that Synopsys announced a majority ownership in a new independent company that it jointly launched with Juniper Networks, back in April of this year. The April announcement simply stated that the as-yet unnamed company would deliver the industry’s first open-foundry silicon photonics platform with integrated lasers. The platform was to integrate silicon photonics assets that were spun out from Juniper Networks to the new company. These assets included more than 200 patents on photonic device design and process integration.

Earlier in June, the new company unveiled itself, revealing its brand identity and technology portfolio. OpenLight’s executive team brings decades of hands-on photonics design experience and is led by Dr. Thomas Mader, Chief Operating Officer, Dr. Daniel Sparacin, VP of Business Development and Strategy, and Dr. Volkan Kaman, VP of Engineering.

OpenLight’s Solution

The open platform includes integrated lasers, optical amplifiers, modulators, photodetectors, and other key photonic components to form a complete solution for low-power, high-performance photonics ICs. By processing IndiumPhosphide (InP) materials directly onto the silicon photonics wafer, the platform reduces the cost and time of adding lasers. This in turn enables scalability and improved power efficiency. In addition, the monolithically integrated lasers on silicon wafers increases overall reliability and simplifies packaging.

The first offering of the platform supports Tower Semiconductor’s PH18DA fabrication process and has passed the process qualification and reliability tests. As a demonstration vehicle, first samples of 400G and 800G reference designs with integrated lasers are expected to be available in summer 2022.

In addition, OpenLight offers select photonic integrated circuit (PIC) designs and design services to its customer base to accelerate time-to-market.

Value Proposition to its Customer Base

OpenLight’s platform will provide a new level of laser integration and scalability to accelerate the development of high-performance photonic integrated circuits (PICs).  Customers will benefit from access to a complete photonics library of industry-standard EDA tools and other key photonic components.

The target customer base spans a broad range covering applications such as datacom, telecom, LiDAR, healthcare, HPC, AI, and optical computing.

Integration Enabled Differentiation

While in the field of Calculus, differentiation and integration are opposite things, when it comes to products, integration enables differentiation in solutions. That is certainly the case with Silicon Photonics. OpenLight is boldly pitching its open foundry aspect, silicon photonic integration capability and channel and volume scalability benefit with its tagline, “Open. Integrated. Scalable.”

For more details, visit OpenLight’s website.

Also read:

DesignDash: ML-Driven Big Data Analytics Technology for Smarter SoC Design

Coding Guidelines for Datapath Verification

Very Short Reach (VSR) Connectivity for Optical Modules


Obscuration-Induced Pitch Incompatibilities in High-NA EUV Lithography

Obscuration-Induced Pitch Incompatibilities in High-NA EUV Lithography
by Fred Chen on 06-19-2022 at 10:00 am

High NA EUV Lithography 1

The next generation of EUV lithography systems are based on a numerical aperture (NA) of 0.55, a 67% increase from the current value of 0.33. It targets being able to print 16 nm pitch [1]. The High-NA systems are already expected to face complications from four issues: (1) reduced depth-of-focus requires thinner resists, which are more susceptible to pinholes as well as stochastic defects, and require new etch transfer and metrology techniques [1,2]; (2) increased sensitivity to blur from electrons [3]; (3) throughput considerations due to using half the size of the current 26 mm x 33 mm field [4]; and (4) the central obscuration of the pupil [5,6] leading to a variety of imaging effects [2,7].

The last issue, however, presents the most fundamental limitation, when considering which pitches are expected to be imaged. The smallest pitches (18 nm or less) lines have difficulty with the required illumination wreaking havoc on the diffraction patterns of (a) larger pitch (>25 nm) lines and (b) even larger pitch (up to 44 nm) staggered 2D arrays [8]. The dots spanning the range of illumination angles for 16 nm and 18 nm pitches fit inside the dipole leaf shapes in the plots below, with the red dots indicating illumination angles forbidden by the corresponding features.

As can be seen in the plots, over half of the possible illumination space is forbidden. This reduction in pupil fill to <20% is enough to impact the throughput [6,8]. For 16 nm pitch, the space is practically closed. Layouts may need to be separated out by illumination.

References

[1] https://www.imec-int.com/en/articles/high-na-euvl-next-major-step-lithography

[2] https://www.linkedin.com/pulse/cautions-using-high-na-euv-frederick-chen/

[3] https://www.linkedin.com/pulse/demonstration-dose-driven-photoelectron-spread-euv-resists-chen/; https://www.linkedin.com/pulse/adding-random-secondary-electron-generation-photon-shot-chen/; https://www.linkedin.com/pulse/electron-spread-function-euv-lithography-frederick-chen/

[4] A. H. Gabor et al., “Effect of high NA “half-field” printing on overlay error,” Proc. SPIE 11609, 1160907 (2021).

[5] B. Kneer et al., “EUV Lithography Optics for sub 9 nm Resolution,” Proc. SPIE 9422 (EUV VI), 94221G (2015).

[6] B. Bilski et al., “High-NA EUV imaging: challenges and outlook,” Proc. SPIE 11177 (EMLC 2019), 111770I (2019).

[7] https://www.linkedin.com/pulse/stochastic-sidelobe-risks-tradeoffs-high-na-euv-systems-chen/

[8] Pitches Forbidden by the Central Obscuration in High-NA EUV Lithography (video): https://www.youtube.com/watch?v=1HV2UYABh4E

This article originally appeared in LinkedIn Pulse: Obscuration-Induced Pitch Incompatibilities in High-NA EUV Lithography 

Also read:

The Electron Spread Function in EUV Lithography

Double Diffraction in EUV Masks: Seeing Through The Illusion of Symmetry

Demonstration of Dose-Driven Photoelectron Spread in EUV Resists

Adding Random Secondary Electron Generation to Photon Shot Noise: Compounding EUV Stochastic Edge Roughness


CHIPS for America DOA?

CHIPS for America DOA?
by Robert Maire on 06-19-2022 at 6:00 am

CHIPS for America 2022
  • We think hopes for CHIPS for America is fading fast
  • Politics, Jan 6th, guns, inflation, partisanship will likely block it
  • Alternative to building US semis is knocking down China chips
  • The only political option may be more restrictions on China

Chips for America act seems drowned out by partisan screaming

We have been saying, for some time now, that the odds of passing a compromise version of Chips for America has been fading. It seems like that probability is fading fast as we close in on both summer vacation and the fall elections. There has been a non-stop flow of large news items that have taken up everyone’s mindshare and especially those in the government.

Between Ukraine, guns, January 6th hearings and inflation…not to mention the stock market….the news flow is overwhelming.

The partisan divide has grown much wider than ever with total lack of cooperation that gets worse as the prospect of a change in control gets closer.

Cars are getting built even the Ford F150 lightning

One of the main things that got legislators attention is when the car industry ground to a halt due to lack of chips. That is not longer the case, America can get its beloved F150 and even the electric version (although at a huge premium over sticker).

So the issue that brought chips to the forefront of American minds has faded quickly. I can only imagine someone bringing up the semiconductor issue in the halls of Congress being laughed at and told the problems over because I can now buy a car.

As inflation moves to center stage spending money looks bad

Spending money on a problem that seems to no longer exist is likely not popular. The semiconductor industry may be fading back into the woodwork where it came from. Spending $52B right now seems far fetched. We had our moment in the sun and blew it.

Anti China sentiment still exists and may be worse

Meanwhile the Taiwan strait is still a parade of US and Chinese Navy ships and the provocations are worse than ever. It certainly doesn’t help when a Chinese official calls for the seizing of TSMC in the public media. Something I am sure is in China’s minds but would never be so bold as to publicly suggest.

Top Economist Urges China to Seize TSMC If US Ramps Up Sanctions

Obviously China could never seize TSMC as it would be a hollow prize. Within a week, operations would cease due to lack of critical support of the entire tool infrastructure industry much as we saw already happen in China with Fujian Jinhua which stole trade secrets from Micron. So threats of seizure are themselves hollow.

This all amounts to an interesting stand off reminiscent of mythology of Tantalus.

Rather than pump up US chips, knock down China Chips

While Chips for America may be dead or dying the concern about China remains. If you can’t keep the US ahead in the Chip industry by throwing a paltry amount of money at it then the next best alternative is to knock down the Chinese chip industry instead to accomplish a similar goal.

Sanctions and embargoes don’t cost money that would cause an unpopular partisan fight in Congress. Both Republicans and Democrats are concerned about China and it will likely be much easier to compromise on something that doesn’t cost money, especially in inflation in a critical election year…sanctions.

Semiconductor sanctions in China- A cheap compromise?

Slapping further sanctions on China is likely to be more palatable to legislators and voters. It doesn’t cost money and will hurt China where it hurts most…in semiconductors.

Russian auto manufacturers have had to go back to stone age cars due to the lack of chips so its very clear evidence that chip sanctions work very well for zero money.

The US can tighten restrictions on most all types of semiconductor equipment and many chip exports to China. Not just leading edge or military specific but more mundane chips such as what happened in Russia. Obviously not going anywhere near as far as the outright embargo in Russia but at the least a tightening. Its not like the Chinese can seize TSMC in response.

More political support for sanctions than spending

Given the success of Russian sanctions it seems the likely path instead of spending will be sanctions on China. This obviously does have risks and victims. Clearly the US semiconductor equipment industry will not be happy to have their number one market for product constrained in any way. Companies that depend highly on China, such as Apple, who are already trapped in the middle may get squeezed even more.

Its unclear how China could respond. Would this push them closer to Russia? But China is pretty close to Russia anyway. The US probably wants to be less reliant on China anyway. Biden’s statements on Taiwan seem to have happened without significant response. We may view sanctions as an effective weapon without as much risk as previously thought.

Our summary is that the death of CHIPS for America may be the cause of more sanctions rather than spending which may have the opposite of the desired effect on the chip industry but perhaps legislators don’t care anymore.

The Stocks

We did not see much positive impact from the CHIPS for America act anyway. $52B spread over 5 years cut up into little pieces meant very little impact on individual companies and more likely more in line with pork barrel politics. So we think the lack of CHIPS for America has near zero impact on the industry either short or long term.

The bigger impact is if we have sanctions instead of spending. There will certainly be more near term impact if semiconductor and semiconductor equipment sales to China slows, probably most notably on semiconductor equipment companies.

In the long run it likely equalizes as semiconductor demand is likely a zero sum game which means that semiconductors not used in China will be used elsewhere.

There is obviously more risk to consumer goods manufacturing in China as it may be difficult to differentiate Chips helping China from chips going back to US consumers. Apple and PC manufacturers would have more risk.

Sanctions may also help the re-shoring of chip production to the US as companies would not to be exposed to having their supply chain routed through China.

Over the longer term US consumer companies would likely lose out to local Chinese companies as we saw in the smart phone market or APP market so maybe the Chinese market isn’t a loss of anything that was going to be lost anyway.

It certainly complicates the supply chain which has yet to recover and makes sourcing of rare earth elements and critical gases even worse than the loss of Ukraine.

We would hope that if sanctions are deployed instead of CHIPS for America that they are done so slowly and carefully so as to minimize the shock to an already damaged technology supply system.

About Semiconductor Advisors LLC
Semiconductor Advisors is an RIA (a Registered Investment Advisor) specializing in technology companies with particular emphasis on semiconductor and semiconductor equipment companies. We have been covering the space longer and been involved with more transactions than any other financial professional in the space. We provide research, consulting and advisory services on strategic and financial matters to both industry participants as well as investors. We offer expert, intelligent, balanced research and advice. Our opinions are very direct and honest and offer an unbiased view as compared to other sources.

Also read:

Has KLA lost its way?

LRCX weak miss results and guide Supply chain worse than expected and longer to fix

Chip Enabler and Bottleneck ASML


Semiconductors Weakening in 2022

Semiconductors Weakening in 2022
by Bill Jewell on 06-18-2022 at 6:00 am

June 2022 companies

The semiconductor market in 2022 is weakening. Driving factors include rising inflation, the Russian war on Ukraine, COVID-19 related shutdowns in China, and lingering supply chain issues. Four of the top 14 semiconductor companies (Intel, Qualcomm, Nvidia and Texas Instruments) are expecting lower revenues in 2Q 2022 versus 1Q 2022. All four cited COVID-19 related lockdowns in China as a factor. China locked down several major cities including Shanghai and Beijing in April in May due to rising COVID cases. The shutdowns were lifted on June 1, but since then temporary shutdowns were reimposed to fight emerging cases. The shutdowns significantly impacted manufacturing in China.

Six non-memory companies expect revenue growth in 2Q 2022 from 1Q 2022 ranging from 3% to 7%. Three of these companies (Infineon Technologies, STMicroelectronics and NXP Semiconductors) have significant automotive business contributing to their growth. AMD’s 1Q 2022 reported revenue was up 22% from 4Q 2021 largely due to its acquisition of Xilinx which was completed midway through the quarter. It’s outlook for 2Q 2022 growth is 10%, also including Xilinx. Excluding the effect of the Xilinx acquisition, AMD’s revenue grew 10.4% in 1Q 2022 and is expected to grow about 3% in 2Q 2022. The weighted average revenue growth of the 10 largest non-memory companies in 1Q 2022 versus 4Q 2021 was 4%. The weighted average outlook for 2Q 2022 is a decline of 1% from 1Q 2022.

Memory companies have a brighter outlook than non-memory companies. Micron’s guidance for its fiscal quarter which ended in early June was an increase of 11.7% from the prior quarter. Samsung, SK Hynix and Kioxia all reported demand for both DRAM and flash memory remains solid.

The outlook for the global economy is diminishing due to the factors listed earlier. The June 2022 forecast from the World Bank is for only 2.9% growth in global GDP in 2022 following 5.7% growth in 2021. In January 2022, the World Bank projected 4.1% growth in 2022 global GDP. Among advanced economies, the U.S. and the Euro area are expected to show 2.5% GDP growth in 2022, less than half the 2021 rate. Among emerging and developing economies, China’s 2022 GDP growth is forecast at 4.3%, well below 2021’s 8.1%, due primarily to COVID related shutdowns. Russia’s GDP should decline 8.9% in 2022 due to its war on Ukraine and resulting boycotts. India’s economy remains strong, with 2022 GDP growth targeted at 7.5%, the highest among major economies. The outlook for 2023 is similar to 2022, with the World Bank calling for 3.0% global GDP growth.

In the U.S., the chance of a recession in the next 12 months is 30%, according to Bloomberg’s May 2022 survey of economists. The Federal Reserve this month projected inflation would be 5.2% in 2022 based on its personal consumption expenditures index. The Fed expects inflation to moderate to 2.6% by the end of 2023. Inflation fears led the Federal Reserve this week to raise its benchmark interest rate by 75 basis points, the largest increase in 28 years. The European Central Bank plans to raise interest rates by 25 basis points in July.

The outlook for key semiconductor market drivers is also abating. Earlier this month IDC projected declines in 2022 shipments of both smartphones and PCs. Smartphones are forecast to decline 3.5% in 2022 after 6% growth in 2023. IDC expects smartphones to recover to 5% growth in 2023. PCs boomed in 2020 and 2021 with double-digit growth driven by work-at-home and learn-at-home trends due to the COVID-19 pandemic. IDC forecasts a decline of 8.2% for PCs in 2022. PCs should grow 1% in 2023, in line with pre-COVID trends.

The automotive industry is the only bright spot among major drivers. In May 2022, S&P Global Mobility (which merged with IHS Markit) expects light vehicle production to grow 4.1% in 2022 after 3.5% growth in 2021. Pent-up demand for vehicles would drive even higher growth, but production is limited by shutdowns in China, supply chain issues, and the war in Ukraine. Vehicle production is forecast to grow a healthy 9.4% in 2023.

With the weakening global economy and declines in shipment of key drivers, we at Semiconductor Intelligence have lowered our semiconductor market forecast for 2022 to 9% from 15% in February. The 2Q 2022 semiconductor market will likely decline by about 1% to 2% from 1Q 2022. The second half of 2022 should be weaker than typical trends. The only reason 2022 could see high single-digit growth is due to the strong quarter-to-quarter growth in 2021. The 1Q 2022 semiconductor market was up 23% from a year ago. Year-to-year growth should be in the low single-digits to flat by 4Q 2022. Other forecasts for the 2022 semiconductor market range from 11% from IC Insights to 16.3% from WSTS.

The weakness in the 2022 semiconductor market should continue into 2023. Our preliminary forecasts for 2023 is 3% growth. Other 2023 forecasts are 3.6% from Gartner and 5.1% from WSTS.

Also Read:

Semiconductor CapEx Warning

Electronics, COVID-19, and Ukraine

Semiconductor Growth Moderating


Podcast EP88: A conversation with Maheen Hamid, one of Silicon Valley’s 100 Most Influential Women

Podcast EP88: A conversation with Maheen Hamid, one of Silicon Valley’s 100 Most Influential Women
by Daniel Nenni on 06-17-2022 at 10:00 am

Dan is joined by Maheen Hamid, Chief Operating Officer and Chief Financial Officer at Breker Verification Systems and a recipient of Silicon Valley Business Journal’s 100 most influential women award. Maheen discusses her journey to Silicon Valley and Breker, beginning with her upbringing in Bangladesh. Maheen married Adnan Hamid, Executive President and CTO at Breker shortly before the company’s formation. They started the company together.

She offers many insightful comments about high technology, Silicon Valley and its impact on the world through the lens of her experiences beginning in Bangladesh.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.