wide 1

WEBINAR: FPGAs for Real-Time Machine Learning Inference

WEBINAR: FPGAs for Real-Time Machine Learning Inference
by Don Dingee on 11-30-2022 at 6:00 am

An server plus an accelerator with FPGAs for real-time machine learning inference reduces costs and energy consumption up to 90 percent

With AI applications proliferating, many designers are looking for ways to reduce server footprints in data centers – and turning to FPGA-based accelerator cards for the job. In a 20-minute session, Salvador Alvarez, Sr. Manager of Product Planning at Achronix, provides insight on the potential of FPGAs for real-time machine learning inference, illustrating how an automatic speech recognition (ASR) application might work with acceleration.

High-level requirements for ASR

Speech recognition is a computationally intensive task and an excellent fit for machine learning (ML). Language differences aside, speakers have different inflections and accents and vary in their use of vocabulary and grammar. Still, sophisticated ML models can produce accurate speech-to-text results using cloud-based resources. Popular models include connectionist temporal classification, listen-attend-spell, and recurrent neural network transducer.

A deterministic, low-latency response is essential. Transit time from an edge device to the cloud and back is low enough on fast 5G or fiber networks to make speech processing the dominant term in response time. Interactive systems add natural language processing and text-to-speech features. Users expect a normal conversation flow and will accept short delays.

Accuracy is also a must, with a low word error rate. Correct speech interpretation depends on what words are present in the conversational vocabulary. Research continues into ASR improvements, and flexibility to adopt new algorithms with a better response in speed or accuracy is a must-have for an ASR system.

While cloud-based resources offer the potential for more processing power than most edge devices, they are not infinitely scalable without tradeoffs. Capital expenditure (CapEx) costs and energy consumption can be substantial in scaled-up, high-throughput configurations that simultaneously take speech input from many users.

FPGA-based acceleration meets the challenge

Multiply-accumulate workloads with high parallelization, typical of most ML algorithms, don’t fit CPUs well, requiring some acceleration to hit performance, cost, and power consumption goals. Three primary ML acceleration vehicles exist: GPUs, ASICs, and FPGAs. GPUs offer flexibility but tend to drive power consumption through the roof with efficiency challenges. ASICs offer tuned performance for specific workloads but can limit flexibility as new models come into play.

FPGA-based acceleration checks all the boxes. By consolidating acceleration in one server with high-performance FPGA accelerator cards, server counts drop drastically while determinism and latency improve. Flexibility for algorithm changes is excellent, requiring only a new FPGA bitstream for new model implementations. Eliminating servers reduces up-front CapEx, helps with space and power consumption, and simplifies maintenance and OpEx.

 

 

 

 

 

 

 

 

High-performance FPGAs like the Achronix Speedster7t family have four features suited for real-time ML inference. Logic blocks provide multiply-accumulate resources. High bandwidth memory keeps data and weighting coefficients flowing, and high-speed interfaces provide the connection to the host server platform. FPGA logic also supports various computational precision needs, yielding ML inference accuracy and lowering ML training requirements.

Overlays help non-FPGA designers

Some ML developers may be less familiar with FPGA design tactics. “An overlay can optimally configure the hardware on an FPGA to create a highly-efficient engine, yet leave it software programmable,” says Alvarez. He expands on how accelerator IP from Myrtle.ai can be configured into the FPGA, abstracting the user interface, upping the clock rate, and utilizing hardware better.

 

 

 

 

 

 

 

 

Alvarez wraps up this webinar on FPGAs for real-time machine learning with a case study describing how an accelerated ASR appliance might work. With the proper ML training, simultaneously transcribing thousands of voice streams with dynamic language allocation becomes possible. According to Achronix:

  • One server with a 250W PCIe Speedster 7t-based accelerator card can replace 20 servers without acceleration
  • Each accelerated server delivers as many as 4000 streaming speech channels
  • Costs and energy consumption both drop by up to 90% by using an accelerated server

Although the example in this webinar is specific to ASR, the principles apply to other machine learning applications where FPGA hardware and IP accelerate inference models. When time-to-market and flexibility matter and high performance is required, FPGAs for real-time machine learning inference are a great fit. Follow the link below to see the entire webinar, including the enlightening case study discussion.

Achronix Webinar: Unlocking the Full Potential of FPGAs for Real-Time Machine Learning Inference

Also Read:

WEBINAR The Rise of the SmartNIC

A clear VectorPath when AI inference models are uncertain

Time is of the Essence for High-Frequency Traders


IDEAS Online Technical Conference Features Intel, Qualcomm, Nvidia, IBM, Samsung, and More Discussing Chip Design Experiences

IDEAS Online Technical Conference Features Intel, Qualcomm, Nvidia, IBM, Samsung, and More Discussing Chip Design Experiences
by Daniel Nenni on 11-29-2022 at 10:00 am

IDEAS 2022 Just Topics Icon

Ansys is hosting IDEAS Digital Forum 2022, a no-cost virtual event that brings together industry executives and technical design experts to discuss the latest in EDA for Semiconductors, Electronics, and Photonics.

See the full online conference agenda and list of speakers at www.ansys.com/IDEAS. The free registration will allow you to attend the event on December 6th or on-demand any time after that.

IDEAS will start with Keynote addresses from Raja Koduri from Intel, Pankaj Kukkal from Qualcomm, and insights into the metaverse from DP Prakash with start-up Youtopian.

Keynote Speakers and Panelists at IDEAS on December 6th, 2022

You can also attend the IDEAS Panel Discussion in the afternoon on the topic of “Thermal Management: How to Keep Your Cool When Chips Get Hot. The moderated panel discussion will include Jean-Philippe Fricker from Cerebras,  Roopashree HM from Texas Instruments, and Bill Mullen, senior director of R&D at Ansys.

Following the Keynotes there are 8 technical tracks on topics covering Thermal Integrity, Power Integrity, Timing Closure, Electromagnetics, Machine Learning, Hardware Security, and Photonics. Over 20 companies are participating in IDEAS to present case studies of their production designs including:

Intel                     Qualcomm                        Nvidia

Samsung             MediaTek                          IBM

GUC                      HP Enterprise                    NXP      

Select authors will be available for Q&A chat with the event attendees after their presentations – don’t miss this opportunity to interact with industry experts.

To see the full agenda, Register now for IDEAS and add this premier event to your calendar.

For more information, contact Marc Swinnen

Ansys is at the forefront of electronic design enablement in partnership with the world’s leading companies for 2.5D/3D-IC, AI and machine learning, high-performance computing, 5G, telecommunications, aerospace and autonomous vehicles.

Join us for the IDEAS Digital Forum — a place to catch up on industry best practices and the latest advances in semiconductor, electronic and photonic design. IDEAS will explore future trends with keynotes from industry leaders and offer technical insights by expert chip designers from many of the world’s largest electronic and semiconductor companies. IDEAS will give you a close-up view of some of the leading companies most advanced electronic design projects in the world.

Meet your industry peers and fellow designers from around the world at this premier virtual event for networking, sharing and learning the latest in multiphysics technology for electronic, photonic, and semiconductor design.

This free event will be hosted as a virtual, on-line event.

About Ansys

When visionary companies need to know how their world-changing ideas will perform, they close the gap between design and reality with Ansys simulation. For more than 50 years, Ansys software has enabled innovators across industries to push boundaries by using the predictive power of simulation. From sustainable transportation to advanced semiconductors, from satellite systems to life-saving medical devices, the next great leaps in human advancement will be powered by Ansys.

Take a leap of certainty … with Ansys.

Also Read:

Whatever Happened to the Big 5G Airport Controversy? Plus A Look To The Future

Ansys’ Emergence as a Tier 1 EDA Player— and What That Means for 3D-IC

What Quantum Means for Electronic Design Automation


Integration Methodology of High-End SerDes IP into FPGAs

Integration Methodology of High-End SerDes IP into FPGAs
by Kalar Rajendiran on 11-29-2022 at 6:00 am

AlphaCORE100 Multi Standard SerDes

Over the last couple of decades, the electronics communications industry has been a significant driver behind the growth of the FPGA market and continues on. A major reason behind this is the many different high-speed interfaces built into FPGAs to support a variety of communications standards/protocols. The underlying input-output PHY technology involved in implementing these standards is the serializer-deserializer (SerDes) technology. FPGA as a technology is complex and challenging to begin with, even before high-speed interfaces are taken into account. And SerDes PHY designs are complex and challenging in their own right. When these two are brought together, the implementation gets trickier, which is generally why there is a lag in incorporating the most advanced SerDes designs into FPGAs. But what if the status quo can be changed? This was the objective behind a collaborative effort between Alphawave IP and Achronix, the results of which were presented at the TSMC OIP Forum in October.

Challenges in Integrating High-End SerDes into FPGAs

Interdependencies between the SerDes and the FPGA fabric may lead to floorplanning challenges for the integrated chip. In addition to the layout challenges, even minor differences in metal stack choices between the fabric and the SerDes may adversely impact the power, performance and area (PPA) of either of these components.

FPGAs have to support a large number of line rates and protocols and protocol variants with diverse electrical channel requirements. The line rates range from 1Gbps to 112Gbps using NRZ or PAM4 signaling schemes to deliver the speed performance. This combinatorial requirement places a heavy burden on the modeling used for simulations. Each line rate/protocol combination needs to be validated pre-silicon and post-silicon based on highly accurate models.

Requirements for Successful Integration

Whether it is the SerDes or the FPGA fabric, architectural enhancements are made which will impact the SerDes integration with the FPGA fabric. To avoid surprises at integration time, architectures need to be discussed early on and agreed upon so proper sim models can be developed for validating. An overly optimistic model would force a radical change in the architecture and an pessimistic model would deliver a PPA uncompetitive solution. Neither of these two situations are desirable.

A close collaboration between the SerDes IP vendor and the FPGA integrator is required early on for developing accurate models. The close partnering is also needed for ensuring optimal floorplanning, power planning, bump map planning, timing, etc.

Scope of Alphawave IP and Achronix Collaboration

Achronix’s high-end FPGAs support multi-standard protocols such as 1GbE through 400GbE, PCIe Gen5, etc., including custom protocols to support non-standard speeds such as 82Gbps (for example). The SerDes 112 Gbps uses a different architecture compared to the 56Gbps SerDes and uses the PAM4 signaling scheme. The design uses a digital ADC and is a built around a DSP-based architecture.

The goal of the collaborative effort was to achieve successful integration of Alphawave IP’s AlphaCORE100 multi-standard SerDes with Achronix’s Speedster7t FPGA fabric.

Test Chip

A Test chip was built to validate the early sim models. The Test chip was implemented in TSMC’s N7 process and included four data channels, full AFE, digital PLLs and DLLs, BIST and additional test circuity for characterization.

Successful Results

As presented in the plots below, the simulation results based on the early models developed through the collaborative efforts correlated very well with Test chip measurements in the lab. The high accuracy models enabled Achronix to produce first-time-right Speedster7t FPGAs with Alphawave IP’s AlphaCore100 SerDes IP to support PCIe Gen5x16 and Gen5x8 as well as 400GbE.

The results of full simulation also correlated well with BER measurements from the lab for a wide range of channel loss conditions.

For more details, please connect with Achronix and Alphawave IP.

Also Read:

WEBINAR The Rise of the SmartNIC

A clear VectorPath when AI inference models are uncertain

Time is of the Essence for High-Frequency Traders


The Role of Clock Gating

The Role of Clock Gating
by Steve Hoover on 11-28-2022 at 10:00 am

The Role of Clock Gating

Perhaps you’ve heard the term “clock gating” and you’re wondering how it works, or maybe you know what clock gating is and you’re wondering how to best implement it. Either way, this post is for you.

Why Power Matters

I can’t help but laugh when I watch a movie where the main characters are shrunk down to the size of grains of sand and they have to fight off ants the size of T-Rexes. It’s not that I’m amused by giant ants; it’s the unquestioned assumption that our bodies would act the same at such a small scale that gets to me. I know, it’s just a movie, but it still makes me cringe. Things just wouldn’t scale that way. If I’ve got my physics right, our mass would scale cubically, while our surface area would scale quadratically as would our strength. As a result, we’d be super strong, but it wouldn’t matter since we’d freeze to death almost instantly from heat loss. Plenty of other factors come into play, and I’m sure I’d get them wrong if I tried, but you get the idea.

But, what does “Honey I Shrunk the Kids” have to do with clock gating? Well, I began designing silicon in the 90s. At that time the only thing that mattered was performance. Since then, transistors have shrunk a bit–a lot actually–just like Rick Moranis. And their properties scale by different factors. One that’s getting out of control is power. According to “The Dark Silicon Problem and What it Means for CPU Designers”, heat generation per unit of silicon area is “somewhere between the inside of a nuclear reactor and the surface of a star,” and that was in 2013. Power is now a first-order concern. In fact, we find ourselves in a new situation where we have more transistors available to us than we can afford to use. In a very real sense, the best way to get more performance is now to save more power.

I was motivated to write this post today by a Linkedin notification I received this morning, letting me know that I had been quoted in a post by Brian Bailey of Semiconductor Engineering entitled “Taking Power More Seriously”. Bailey provides an excellent high-level overview of the myriad challenges of designing for power. One of those challenges is to implement fine-grained clock gating. As an EDA developer myself, of tools that, among other things, automate clock gating, I felt it timely to dive deeper into the topic.

What is clock gating?

Several factors contribute to a circuit’s power consumption. The logic gates have static or leakage power that is roughly constant as long as a voltage is applied to them, and they have dynamic or switching power resulting from toggling wires. Flip-flops are rather power-hungry, accounting for maybe ~20% of total power. Clocks can consume even more, perhaps ~40%! Global clocks go everywhere, and they toggle twice each cycle. As we’ll see, clock gating avoids toggling the clock when clock pulses are not needed. This reduces the power consumption of clock distribution and flip-flops, and it can even reduce dynamic power for logic gates.

Even in a busy circuit, when you look closer, most of the logic is not doing meaningful work most of the time. In this trace of a WARP-V CPU core, for example, the CPU is executing instructions nearly every cycle. But the logic computing branch targets isn’t busy. It is only needed for branch instructions. And floating-point logic is only needed for floating-point instructions, etc. Most signal values in the trace below are gray, indicating that they aren’t used.

CPU waveform showing clock gating opportunity

As previously noted, a significant portion of overall power is consumed by driving clock signals to flip-flops so the flip-flops can propagate their input values to their outputs for the next cycle of execution. If most of these flip-flop input signals are meaningless, there’s no need to propagate them, and we’re wasting a lot of power.

Clock gating cuts out clock pulses that aren’t needed. (Circuits may also be designed to depend on the absence of a clock pulse, but let’s not confuse matters with that case.) The circuit below shows two clock gating blocks (in blue) that cut out unneeded clock pulses and only pulse the clock when a meaningful computation is being performed.

Illustration of clock gating

In addition to reducing clock distribution and flip-flop power, clock gating also guarantees that flip-flop outputs are not wiggling when there are no clock pulses. This reduces downstream dynamic power consumption. In all, clock gating can save a considerable amount of power relative to an ungated circuit.

Implementing Clock Gating

A prerequisite for clock gating is knowing when signals are meaningful and when they are not. This is among the aspects of higher-level awareness inherent in a Transaction-Level Verilog model. The logic of a “transaction” is expressed under the condition that indicates its validity. Since a single condition can apply to all the logic along a path followed by transactions, the overhead of applying validity is minimal.

Validity is not just about clock gating. It helps to separate the wheat from the chaff, so to speak. The earlier CPU waveform, for example, is from a TL-Verilog model. Debugging gets easier as we have automatically filtered out the majority of signal values, having identified them as meaningless. And we know they are meaningless because of automatic checking that ensures that these values are not inadvertently consumed by meaningful computations.

With this awareness in our model, default fine-grained clock gating comes for free. The valid conditions are used by default to enable our clock pulses.

The full implications of having clock gating in place from the start may not be readily apparent. I’ve never been on a project that met its goals for clock gating. We always went to silicon with plenty of opportunity left on the table. This is because power savings is always the last thing to be implemented. Functionality has to come first. Without it, verification can’t make progress, and verification is always the long pole. Logic designers can’t afford to give clock gating any real focus until they have worked through their functional bug backlogs, which doesn’t happen until the end is in sight. At this point, many units have already been successfully implemented without full clock gating. The project is undoubtedly behind schedule, and adding clock gating would necessitate implementation rework including the need to address new timing and area pressure. Worse yet, it would bring with it a whole new flood of functional bugs. As a result, well, let’s just say we’re heating the planet faster than necessary. Getting clock gating into the model from the start, at no cost, completely flips the script.

Conclusion

Power is now a first-order design constraint, and clock gating is an important part of an overall power strategy. Register transfer level modeling does not lend itself to the successful use of clock gating. A transaction-level design can have clock gating in place from the start, having a shift-left effect on the project schedule and resulting in lower-power silicon (and indirectly higher performance and lower area as well). If you are planning to produce competitive silicon, it’s important to have a robust clock-gating methodology in place from the start.

Also Read:

Clock Aging Issues at Sub-10nm Nodes

Analyzing Clocks at 7nm and Smaller Nodes

Methodology to Minimize the Impact of Duty Cycle Distortion in Clock Distribution Networks


Ant Colony Optimization. Innovation in Verification

Ant Colony Optimization. Innovation in Verification
by Bernard Murphy on 11-28-2022 at 6:00 am

Innovation New

Looking for better ways to search a huge state space in model checking, Ant Colony Optimization (ACO) is one possible approach. Paul Cunningham (Senior VP/GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is Ant Colony Optimization Based Model Checking Extended by Smell-like Pheromone. The authors presented the paper in the Biologically inspired Information and Communications Technologies 2015 conference. At publication the authors were at The Tokyo University of Science, NIT and SRA, all in Japan.

ACO is a method to efficiently find paths through a large state space using swarm intelligence. You might first want to read this intro to the general ACO technique here. The main refinement in this month’s paper adds a pheromone for the food (goal) to the pheromone trail the ants leave. The paper proposes a method using ACO to validate safety properties (a bad thing doesn’t happen). The method treats the property as a food source. If the ants find a path to the property that becomes a counterexample.

ACO has been studied for many optimization domains – the travelling salesman problem, scheduling, device sizing, antenna optimization and more. Google Scholar shows it as a very popular area for study – over 13k results for 2022 alone.

Paul’s view

What an eye-catching title for a paper! I wasn’t aware of ant colony optimization (ACO) as a technique for shortest path finding in a graph. Seeing this paper prompted me to first take a short random walk (no pun intended!) on the internet to learn more. Wikipedia was a good place to start as always.

In nature ants find an efficient path from their nest to some food though pheromones. They wander around randomly but with a bias towards walking where other ants have traveled before. Ants taking a shorter route from nest to food will make more trips in a time interval compared to ants taking longer routes. When integrated over thousands of ants making hundreds of journeys, shorter paths get more ant traffic which means more pheromones on those paths. Hence even more ants take those paths, until eventually all ants are marching in a continuous stream along the shortest path from nest to food..

ACO mimics this method algorithmically to find a shortest path through a graph. The probability of an ant following an edge in a graph depends on a combination of a static weight for that edge (e.g. 1/d where d is the distance along that edge) and a “pheromonal” weight that depends on how many other ants walked that same edge recently.

This month’s paper is on using ACO for model checking – to help efficiently search for a shortest path from initial state to error state in the state graph of a circuit. The authors’ key contribution is to augment the classic ACO edge probability function with a third component, a “goal pheromone”, to model ants “smelling” a food source. With each iteration of the ACO algorithm (where ants pick and walk along some edge in the graph), the goal pheromone simultaneously propagates backwards along edges starting from the error states (i.e. the food/goal) of the circuit. The goal pheromone assigns a much higher probability to an edge than a regular ant pheromone, so ants are most likely to walk along any edge they reach that is marked by the goal pheromone.

It’s a neat idea, and the paper is well written and easy to understand. However, the authors acknowledge that in any complex circuit the path depth from initial to error state is deep. So tracing backwards from the error states doesn’t particularly help. The search space diverges long before any ant can “smell” the food. Also, in the BDD/SAT-based world there are many other symbolic methods to narrow the search space in model checking. These would need to be benchmarked with ACO before a compelling commercial case for using it could be made.

Raúl’s view

In this 2015 paper, Kumazawa et.al. present an extension to Ant Colony Optimization (ACO) for model checking. The state of the art then was ACOhg – ACO for huge graphs. ACOhg differs from ACO in limiting the number of steps an ant can take to λc  to reduce the time and memory consumption. This limit may prevent ants from finding a final state that violates the property, so must be tuned carefully. The extension in this paper, EACOhg (Extended ACOhg) introduces an additional goal pheromone that mimics the smell from food in the real world and is stronger than the normal ant pheromone. The goal pheromone is put on transition edges once a final state is reached, and these edges are selected in preference to the edges with the normal pheromone.

Experimental results on 3 examples with 3843, 31747 and 266,218 states respectively are compared with ACOhg. They show a reduction in execution time of 10-45%. This with a slight increase in memory consumption of up to 5% (to hold the goal pheromones). Most important, the show reduction of the length of counterexamples (path length to a violation of the property being verified) of 15%-70%. Since gains are smallest in the largest model, the authors hypothesize that “the reason may be that the effect of smell- like pheromone is compromised in a large model” and plan to overcome that in future work.

The paper is largely self-contained and is an easy and enjoyable read. Results are meaningfully better the prior state-of-the-art, but the method is heuristic and requires some tuning. The authors list 9 coefficients which they deem to be “the best settings we decided through preliminary experiments”. They do not discuss the length of these experiments. And the usefulness of these coefficients for a different set of problems is not a given. Overall, a good introduction to ACO. I’m not aware of EDA solutions that use ACO in practice, but “the use of ACO for the solution of dynamic, multiobjective, stochastic, continuous and mixed-variable optimization problems is a current hot topic, as well as the creation of parallel implementations capable of taking advantage of the new available parallel hardware” [scholarpedia.org], giving about 10,000 hits for 2022 alone in Google scholar.


A Crash Course in the Future of Technology

A Crash Course in the Future of Technology
by Vivek Wadhwa on 11-27-2022 at 2:00 pm

A crash course in the future of technology

One of the harshest lessons we learned during the recent pandemic is the power of exponentials. As human beings, we are linear thinkers and can’t fathom how doublings of viruses — or technologies — can be destructive and disrupt everything. In my university classes and talks to business executives, I have always had to explain how information technologies of all kinds double their power, price performance, capacity, and bandwidth every year or two, and how these advances always catch us by surprise. Now almost everyone on the planet knows what exponential means.

But it isn’t all bad. The exponential advances in technology and their convergences are making it possible to solve some of the grand challenges of humanity: problems such as disease, hunger, lack of clean water, energy shortages, and poor education. Amazing things are now possible.

In a talk to some of Silicon Valley’s brightest engineering minds, semiconductor designers at the launch of Cadence’s Certus Closure Solution, I discussed what we can expect in the next decade thanks to the exponential technology advances they are enabling.

This future includes things we long dreamed about, such as humanoid robots (like Rosie from the Jetsons); bionic upgrades (Steve Austin, the Six Million Dollar Man); and flying cars (or drones). Believe it or not, we may also soon be able to cure practically every disease, including cancer; transition into an era of unlimited clean and almost free energy; produce all the meat we need without killing animals; and launch the next green revolution, one that does not require dangerous pesticides and climate-destroying fertilizers. We can literally create the future of Star Trek, the one in which humankind has solved its key problems and focuses on exploring the stars, seeking out new worlds, and building more knowledge and wisdom.

But there is also a dark side, because the same technologies that can be used for good can do evil, and we could end up in the dystopian future of Mad Max. So, in my talk, I also highlighted the dangers of these technologies, including the jobless future and creation of superhumans.

By the way, I covered all this in twenty minutes!

I encourage you to watch the video below and think about how you can help take us into the Star Trek future:

If, after watching the video, you want to learn more, I have a recommendation for two books I’ve written that you may want to read. The first, The Driver in the Driverless Car: How Your Technology Choices Create the Future, explains what the technologies are that are making all this possible as well as their dangers. The second book, From Incremental to Exponential: How Large Companies Can See the Future and Rethink Innovation, will be useful when you are ready to make the leap into the future and help your company transform itself — or to build a world-changing business yourself.

You really can make an impact if you try, the key technologies that you need are now relatively inexpensive and available worldwide. Anyone, anywhere, can now contribute to global innovation and solve big problems.

Also Read:

The Metaverse: Myths and Facts

Facebook or Meta: Change the Head Coach

Thick Data vs. Big Data


Podcast EP126: Unifying RF and Optics with POET’s Optical Interposer Platform

Podcast EP126: Unifying RF and Optics with POET’s Optical Interposer Platform
by Daniel Nenni on 11-25-2022 at 10:00 am

Dan is joined by Dr. Suresh Venkatesan, chairman and CEO, POET Technologies. Suresh joined POET Technologies from GLOBALFOUNDRIES where he served as senior vice president, Technology Development. He is an industry veteran with over 22 years of experience in semiconductor technology development. Prior to joining GLOBALFOUNDRIES, he held various leadership positions with Freescale Semiconductor in Austin, Texas. Dr. Venkatesan holds over 25 US patents and has co-authored over 50 technical papers.

In this podcast, Suresh provides an overview of POET’s unique Optical Interposer technology. He discusses how it delivers a packaging platform that unifies RF and optical interconnect. Applications at 100/200G, 600G and 1.6TB are discussed.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


TSMC OIP – Enabling System Innovation

TSMC OIP – Enabling System Innovation
by Daniel Payne on 11-25-2022 at 6:00 am

TSMC OIP roadmap min

On November 10th I watched the presentation by L.C. Lu, TSMC Fellow & VP, as he talked about enabling system innovation with dozens of slides in just 26 minutes. TSMC is the number one semiconductor foundry in the world, and their Open Innovation Platform (OIP) events are popular and well attended as the process technology and IP offered are quite compelling to many semiconductor design segments. The TSMC technology roadmap showed a timeline of both FinFET and Nanosheet plans out through 2025.

Starting with N3 there’s something new called FinFlex that used Design Technology Co-Optimization (DTCO), promising an improved Power, Performance and Area (PPA) for segments like energy-efficient and high-performance. With the FinFlex approach a designer can choose from three transistor configurations, based on their design goals:

  • 3-2 fin blocks, for high-performance
  • 2-2 fin, for efficient performance
  • 2-1 fin, for lowest-power, best density

The history of fin block choices used in process nodes N16 to N3 are shown below:

EDA vendors Synopsys, Cadence, Siemens EDA and ANSYS have updated their tools to support FinFlex, and within a single SoC you can even mix the fin block choices. Along timing critical paths you can use high-fin cells, while non-critical path cells can be low fin. As an example of process scaling benefits, Lu showed an ARM Cortex-A72 CPU implemented in N7 with 2 fin, N5 with 2 fin, and finally N3E with 2-1 fin:

IP cells for N3E come from several vendors: TSMC, Synopsys, Silicon Creations, Analog Bits, eMemory, Cadence, Alphawave, GUC, Credo. There are three states of IP readiness: silicon report ready, pre-silicon design kit ready, and in development.

Analog IP

At TSMC their analog IP is using a more structured regular layout, which produces a higher yield and lets EDA tools automate the analog flow to improve productivity. The TSMC Analog Cell has a uniform poly and oxide density, helping with yield. Their analog migration flow, automatic transistor sizing and matching driven Place and Route enables design flow automation with Cadence and Synopsys tools.

Analog cells can be migrated through steps of: Schematic migration, circuit optimization, auto placement and auto routing. As an example, migrating a VCO cell from N4 to N3E using their analog migration flow took 20 days, versus a manual approach requiring 50 days, some 2.5X faster.

3DFabric

TSMC has three types of packaging to consider:

There are eight choices of packaging in 3DFabric:

A recent example using SoIC packaging was the AMD EPYC Processor, a data center CPU, which showed a 200X interconnect density improvement over 2D packaging,  a 15X density improvement over traditional 3D stacking, producing a 50-80% better CPU performance.

3D IC design complexity is addressed through 3Dblox, a methodology using a generic language for EDA tool interoperability, covering the physical architecture and logic connectivity. The top four EDA vendors (Synopsys, Cadence, Siemens, Ansys) have readied their tools for the 3Dblox approach by completing a series of five test cases: CoWoS-S, InFO-3D, SoIC, CoWoS-L 1, CoWoS-L 2.

TSMC has created a 3DFabric alliance by collaborating with vendors across the realms of: IP, EDA, Design Center Alliance (DCA), Cloud, Value Chain Alliance (VCA), Memory, OSAT, Substrate, Testing. For memory integration TSMC partners with Micron, Samsung Memory and SK hynix, to enable CoWoS and HBM integration. EDA test vendors include: Cadence, Siemens EDA and Synopsys. IC test vendors include: Advantest and Teradyne.

Summary

Semiconductor design companies like AMD, AWS and NVIDIA are using the 3DFabric Alliance, and that number will only increase over time as the push to use 2D, 2.5D and 3D packaging attract more product ideas. TSMC has a world-class engineering team working on DTCO, with enough international competition to keep them constantly innovating for new business. Market segments for digital, analog and automotive will benefit from the TSMC technology roadmap choices announced in FinFlex. 3D chip design is supported by the teamwork gathered in the 3DFabric Alliance.

Related Blogs


Mobility is Dead; Long Live Mobility

Mobility is Dead; Long Live Mobility
by Roger C. Lanctot on 11-24-2022 at 10:00 am

Mobility is Dead Long Live Mobility

What the hell is going on in the automotive industry. Every automotive executive is talking mobility, mobility, mobility while simultaneously divesting every mobility asset that has been amassed over the past 10 years of surging mobility mania.

The latest spinoff of a mobility asset was Volkswagen’s sale of its WeShare car sharing operation to Berlin-based Miles Mobility. The move was a bit of a shock given Volkswagen’s build up of autonomous vehicle technology with its Ford-Argo.ai joint venture; its Moia demand-responsive transit solution; and even its acquisition of Europcar car rental operations.

According to the terms of the MILES deal:

  • Volkswagen WeShare has been acquired by MILES Mobility and will be integrated into the car sharing company’s portfolio
  • WeShare customers will “benefit” from offerings in eight German cities
  • Volkswagen will deliver 10,000 all-electric vehicles to MILES from 2023
  • MILES Mobility will be integrated into Volkswagen’s mobility platform

Volkswagen looked ready to bring its unique brand of mobility to the world on a scale comparable to its world-leading vehicle sales. But, no, Volkswagen stepped back from its Argo.ai self-driving venture with Ford Motor Company and now WeShare has been banished leaving Moia and Europcar and ambitious mobility visions in tatters.

To be fair, VW’s WeShare was up against both MILES, an aggressively expanding upstart, and SIXT a global car rental leader with overwhelming resources, infrastructure, and marketing muscle. Maybe it’s not so much of a shock that VW folded its mobility tent. An important wrinkle, of course, is the fact that VW was up against B2C partners with greater expertise and ability to communicate and interact directly with consumers – not a strength at VW, a traditional car making B2B operator selling cars through dealers.

Volkswagen is not alone. The company has simply joined the conga line of car companies shimmying their way out of mobility from BMW and Mercedes Benz selling Share Now to Stellantis to GM’s laying off Maven car sharing. (The Maven brand lives on, kind of.)

But the list of mobility ventures shuttered or sold off is long – maybe too long – to list here. One of the most recent was Ford’s off-loading of its TransLoc demand responsive transit operation to Modaxo earlier this year. TransLoc was a large, nationwide operation managing hundreds of millions of rides for hundreds of operators – but Ford turned in a different direction.

Ford has been a leader in abandoning mobility acquisitions, start ups, and trial programs. Nothing has seemed to stick – which has been the way at most auto makers.

The post-mortems on these ventures usually reveals a reassessment by bean counters proclaiming the obvious – that the operations were unprofitable. The pulling of the plug comes next.

And then there were four or maybe more? Renault (Mobilize), Stellantis (FreeToMove), Hyundai (Mocean), and Toyota (Kinto) remain committed to the mobility mantra with operations in various locations around the world including car sharing, ride hailing, rental and subscription-based mobility operations.

Chinese car makers and their partners, perhaps reflecting China’s car sharing leadership, have been stepping into the mobility void with car sharing operations of their own and subscription-based vehicle access offerings. Nio Motors and Lynk & Co. are the most prominent here, but they have company.

Subscriptions are rapidly emerging as the go to solution, displacing “mobility” as the dominant modality.  Here, too, car makers have launched and crashed multiple subscription-based solutions, but the concept has caught on with third parties like Autonomy and FINN and dozens of others.

The automobile subscription appears to be an idea whose time has come. Cars are in short supply and interest rates are on the rise and the workplace is mobile – with multiple rounds of layoffs hitting the news.

Subscriptions eliminate tortuous vehicle acquisition paperwork at the dealership and tend to sidestep credit checks and most obviously, the long-term commitment to a particular automobile. Speed and simplicity are driving this trend.

The even more fundamental attraction derives from the need among car makers to capture vehicle revenue on an ongoing basis – along with the need to manage battery reverse logistics at the vehicle’s end of use. Car subscription terms and conditions still may look onerous to some – too expensive or restrictive. But car makers and third parties are steadily reducing their rates, easing their terms, and expanding the range of available vehicles.

Strategy Analytics has identified 46 individual car subscription operators including dozens of startups. Many of these operators are focusing on electric vehicles – the price tags of which have placed them beyond the means of many.

Current economic conditions appear to favor car subscriptions as they simultaneously discourage vehicle acquisitions. Subscriptions are clearly a bright light in a bleak mobility landscape.

Also Read:

Requiem for a Self-Driving Prophet

Musk: The Post-Truth Messiah

Flash Memory Market Ushered in Fierce Competition with the Digitalization of Electric Vehicles


AMAT and Semitool Deja Vu all over again

AMAT and Semitool Deja Vu all over again
by Robert Maire on 11-24-2022 at 6:00 am

Lam SemiWiki SemiSysco

-Lam/Semisysco is a repeat of AMAT/Semitool
-Adds wet processing that larger companies lack
-Semisysco tools amazingly similar to those made by Semitool
-Lightning strikes twice for Ray Thompson

Deja Vu all over again

In a clear example of history repeating itself, Lam Research bought Semisysco, a company founded by Ray Thompson, in 2012, 3 years after he sold his first company, SemiTool to Applied Materials, in 2009 for $364M.

The tools made by Semisysco look amazingly similar, although updated versions, of the wet processing tools made by SemiTool back in 2009. These tools include an updated version of an SRD (spin rinser dryer) that has been around for decades as well as batch and single wafer wet processing tools.

Wet processing is always the overlooked stepchild

Wet cleaning and processing has always played second fiddle to plasma based dep and etch tools which are the rock stars of the front end process after the lead singer of litho. Tokyo Electron does track tools which spin on photoresist but cleaning and plating are different. Semitool was also first into copper plating years ago when copper took over from aluminum. Wet processing still plays a critical role but the large companies seem to view it as lesser technology and thus do not focus on it. The fact that large companies did not focus on it gave an opportunity to smart entrepreneurs like Ray Thompson to take up the slack.

Ray Thompson does it again

We didn’t think that Ray Thompson would sit around and enjoy the coffee shop (Sykes) he bought in Kalispell Montana after selling SemiTool to Applied in 2009. He founded TG Tech and through that also founded Semisysco in 2012 along with Herbet Oetzlinger who also was a long time employee of Semitool. Herbert is a close member of the SemiTool “family”.

Semisysco adds to prior purchase of SEZ

Lam had purchased wet processing company SEZ of Villach Austria in 2007 for $447M. Coincidentally Semisysco is in both Kalispell Montana, (Rays home town) as well as Villach Austria. It funny how coincidences go…..

More coincidences

In another strange family twist/coincidence, David Lam (not the same David Lam that founded Lam Research but close) is on the board of Semisysco representing a VC firm, Atlantic Bridge, which is a shareholder in Semisysco.

Easy acquisition for Lam

This is a very easy, convenient acquisition for Lam as it can slide into the former SEZ group. It gives them more capabilities in various processes but it is not much of a needle mover in terms of overall revenue

The stocks

We remain cautious/negative on the stocks and feel the run up of the entire group off of ASML’s news was unfounded as it was unique to ASML and not an indicator that the down cycle is anywhere near over. While the downcycle will not be as bad as downcycles past , it is none the less going to take revenues down about 20% or so in 2024 so not to be so easily dismissed as over and done with

In our channel checks it is abundantly clear that orders are being both canceled as well as delayed and equipment makers are quickly rearranging the order book.

Semiconductor makers , with the clear exception of TSMC, are also negatively impacted.

Global Foundries announced a restructuring and layoff as its clear they will see a downturn in business as customers flow back to their preferred fab, TSMC.

In our view only TSMC and ASML are the most immune to the downturn. TSMC will keep its fabs full by taking back overflow customers who had to go to second and third tier foundries when TSMC turned them away.

ASML has an order book so long and strong they can’t increase production fast enough due to the Zeiss limitation. Other companies not so much…..Lam, AMAT & TEL and even KLAC will see weaker business over time, especially into 2024.

The dead cat bounce of stocks is from the belief that the worst news is over (which it may be) but numbers still have to come down as the flow of news over the next year will remain generally negative.

About Semiconductor Advisors LLC‌

Semiconductor Advisors is an RIA (a Registered Investment Advisor),
specializing in technology companies with particular emphasis on semiconductor and semiconductor equipment companies. We have been covering the space longer and been involved with more transactions than any other financial professional in the space. We provide research, consulting and advisory services on strategic and financial matters to both industry participants as well as investors. We offer expert, intelligent, balanced research and advice. Our opinions are very direct and honest and offer an unbiased view as compared to other sources.

Also Read:

KLAC- Strong QTR and Guide but Backlog mutes China and Economic Impact

LRCX down from here – 2023 down more than 20% due to China and Downcycle

Is ASML Immune from China Impact?