Banner 800x100 0810

3D IC – Managing the System-level Netlist

3D IC – Managing the System-level Netlist
by Daniel Payne on 09-27-2022 at 10:00 am

2.5D IC min

I just did a Google search for “3D IC”, and was stunned to see it return a whopping 476,000 results. This topic is trending, because more companies are using advanced IC packaging to meet their requirements, and yet the engineers doing the 3D IC design have new challenges to overcome. One of those challenges is creating a system-level netlist so that 3D netlist verification tools can be run to ensure that there are no connectivity errors.

Here’s a cross-section of a 2.5D IC with chiplets containing multiple HBM and an SoC, using a silicon interposer with an organic substrate. Connectivity of this system could be captured in a Verilog netlist format, or even a CDL/SPICE format.

2.5D IC with memory and SoC

Stacking chips in 3D face-to-face is another advanced packaging method.

3D IC

Chip engineers and package engineers often use different tools and flows to solve issues like connectivity. Ideally, there would be a system-level connectivity flow that understands both the chip and package domains.

Siemens EDA is a vendor that has tools that span both realms of IC and packaging, and their connectivity product is called Xpedition Substrate Integrator (xSI). With the xSI tool an engineer can import multiple die, interposer, package and PCB abstracts, then build a system-level model of the connectivity. After a system-level netlist has been exported from xSI, it’s ready to be used by an LVS tool like Calibre.

Running Calibre in netlist versus netlist mode is a method to check that the system-level netlist from xSI matches each chip netlist. The xSI tool has a wizard GUI to help you create a Calibre 3DSTACK netlist and run control.

xSI wizard for netlist vs netlist

The Calibre runset takes care of netlist conversions, die name mapping between IC and package, and any desired Calibre options. A clean report means that xSI was used properly to build a system connectivity.

For 3D-IC designs the silicon interposer could be in CDL or Verilog format, but the organic substrate is designed by the packaging group using CSV or ODB++ format. Designers may need to short or open certain signals, but that would result in LVS comparison errors.

For a multi substrate 3D-IC design, using a silicon interposer plus organic substrate, the package team could user one name for a net, while the interposer team uses a different name for the same net. With xSI there’s a way to make this connection between two different net names, it’s called an interface part.

As an example, the following interposer has a net TEST_CLK, which is connected to the package substrate net pkg_TEST_CLK. The interface part allows these two differently name nets to be connected, and then running Calibre 3DSTACK will produce no false LVS errors.

Interface part in xSI

Sometimes in a 3D-IC assembly you need to short unneeded signals to ground, or even short two power planes together, but these nets are not connected in the system netlist. While creating the source netlist for Calibre 3DSTACK you can create a shorts list with the net mapping feature.

Summary

3D netlists present challenges to the IC and package design process, so Siemens EDA has come up with a tool flow using xSI and Calibre tools. Building the correct system-level netlist is validated by running a netlist vs netlist comparison. When you need to account for opens and shorts, then they can be waived by design. Even different net names between package and interposer design teams are supported with this flow of xSI and Calibre.

The complete nine-page white paper is online here.

Related Blogs


Arm 2022 Neoverse Update, Roadmap

Arm 2022 Neoverse Update, Roadmap
by Bernard Murphy on 09-27-2022 at 6:00 am

Neoverse update min

Arm recently provided their annual update on the Neoverse product line, targeting infrastructure from cloud to communication to the edge. Chris Bergey (SVP and GM for infrastructure) led the update, starting with a shock-and-awe pitch on Neoverse deployment. He played up that Arm-based servers are now in every major public cloud across the world. AWS of course, Google cloud, Microsoft Azure, Alibaba and Oracle all support Arm-based instances. In 5G RAN, Dell, Marvell, Qualcomm, Rakuten, HPE announced partnerships, joining Nokia, Lenovo, Samsung and more in this space. NVIDIA announced their Grace server CPU and HPE their ProLiant servers, also Arm-based. Like I said – shock and awe.

Perspectives from cloud builders, cloud developers

The cloud/datacenter backbone was at one time purely dependent on x86-based servers. Those servers continue to play an important role but now clouds must support a rapidly expanding diversity of workloads. CPU types have fragmented into x86-based versus Arm-based. GPUs are more common, for video processing support, gaming in the cloud and AI training. Specialized AI platforms have emerged like the Google TPUs. Warm storage depends on intelligent access to SSD, through Arm-based interfaces. Software defined networking interfaces are Arm-based. DPUs – data processing units – are a thing now, a descriptor for many of these data-centric processing units. Application-specific platforms for the datacenter, all of which are building on SystemReady® qualified Arm platforms.

Microsoft Azure made an important point, that the cloud game is now about total throughput at lowest operational cost, not just about highest performance. Power is a particularly important factor; even today power-related costs contribute as much as 40% of TCO in a datacenter. Mitigating this cost must touch all components within the center, compute instances, storage, AI, graphics, networking, everything. The Azure product VP stressed that Arm is working with them on a holistic view of TCO, helping them to define best solutions across the center. I assume Arm have similar programs with other cloud providers, shifting up to become a solutions partner to these hyperscalars.

Arm enables cloud independence

A developer advocate at Honeycomb (which builds an analysis tool for distributed services) made another interesting point: the ubiquity of Arm-based instances in the major clouds provides cloud independence for developers. Of course x86 platforms offer the same independence. I think the point here is that Arm has eliminated a negative through availability on a wide range of clouds.  Honeycomb also incidentally highlight the cost and sustainability advantages; Arm is calling this the carbon-intelligent cloud. Young development teams like both of course, but they also have an eye to likely growing advantages to their businesses in deploying on more sustainable platforms.

Product update

As a reminder the Neoverse family breaks down into three classes. The V-series offers highest performance per thread – the most important factor for scale-up workloads, such as scientific compute. The N-series is designed to provide highest performance per socket – the most important factor for scale-out workloads, good for (I’m guessing) massive MIMO basebands. The E-series is designed for efficient throughput in edge to cloud applications; think of a power over ethernet application for example.

The newest V-series platform is the V2, code-named Demeter. This offers improved integer performance, a private L2 cache to handle larger working datasets and expanded vector processing and ML capability. The platform now supports up to 512MB system level cache, a coherent mesh network with up to 4TB of throughput (!) and CXL for chiplet support. Supporting 2.5/3D coherent designs. Nvidia Grace is built on the V2 platform, which is interesting because Grace is one half of the Grace Hopper platform, in which Hopper is an advanced GPU.

In N-series, they plan an “N-series next” platform release next year with further improved performance per watt. They also have an E-series E2 update, and an “E-series-next” release planned next year. Not a lot of detail here.

About the competition

Seems clear to me that when Arm is thinking about competition these days, they are not looking over their shoulders (RISC-V). They are looking ahead at x86 platforms. For example, Arm compares performance on popular database applications between Graviton2 (AWS) and Xeon-based instances, measuring MongoDB running 117% faster than Intel. They also measured an 80% advantage over Intel in running BERT, a leading natural language processing platform.

I’m sure Arm is also taking steps to defend against other embedded platforms, but the Neoverse focus is clearly forward, not back. You can read more HERE.


UCIe Specification Streamlines Multi-Die System Design with Chiplets

UCIe Specification Streamlines Multi-Die System Design with Chiplets
by Dave Bursky on 09-26-2022 at 10:00 am

protocol stack 1

Over the last few years, the design of application-specific ICs as well as high-performance CPUs and other complex ICs has hit a proverbial wall. This wall is built from several issues: first, chip sizes have grown so large that they can fill the entire mask reticle and that could limit future growth. Second, the large chip size impacts the manufacturing yield, often causing diminishing returns (reduced manufacturing yields) for the large chips. Third, power consumption for the large monolithic chips has also reached critical levels and must be reduced to avoid thermal issues. And fourth, the need to mix different technologies with the advanced processes used for the digital core—non-volatile memories, analog and RF functions, high voltage drivers, and high-speed serial interfaces—can limit what designers can integrate on a single chip due to process incompatibilities.

To deal with these challenges, designers have started to disaggregate their chip designs by splitting the large chips into smaller dies that are now referred to as chiplets. However, therein resides another problem – the lack of standardization regarding chiplet sizes, interfaces, and communication protocols. That, in turn, limits design flexibility and the ability to mix and match chiplets from multiple suppliers. Trying to solve some of those issues, the recently introduced Universal Chiplet Interconnect Express (UCIe) specification goes a long way towards easing the designer’s job of crafting customizable package-level integration of multi-die systems explains Manuel Mota, Product Marketing Manager in the Synopsys Solutions Group. It has the support to make the marketplace for disaggregated dies truly vibrant—one with plug-and-play-like flexibility and interoperability.

Mota expects that the specification will help establish a robust ecosystem for a new era of SoC innovation. In addition to supporting different chiplets fabricated on different process nodes that are each optimized for each particular function, a multi-die architecture also allows integration of dies from digital, analog, or high-frequency processes. Designers can also incorporate three-dimensional high-density memory arrays, such as high-bandwidth memory (HBM) chip stacks into the 2D, 2.5D, or 3D packaging configurations.

Although the UCIe specification is fairly new, there have been several different standards prior to UCIe that address the challenges of multi-die systems, but mostly from the physical design aspects of multi-die system design. The OIF Extra Short Reach (XSR), Open Compute Project Bunch of Wires (BOW) and OpenHBI (OHBI), and Chip Alliance Advanced Interface Bus (AIB) are the alliances and standards for 2D and 2.5D package types. These standards provide bandwidth versus power tradeoffs with a primary focus on providing transport connectivity between chiplets.

UCIe is the only specification that defines a complete stack for the die-to-die interface. The other standards focus only on specific layers and, unlike UCIe, do not offer a comprehensive specification for the complete die-to-die interface for the protocol stack. As Mota explains, Synopsys looks forward to our future contributions to the UCIe specification. Along with the promoting members AMD, Arm, ASE Alibaba, Group, Google Cloud, Intel, Meta, Microsoft, NVIDIA, Qualcomm, Samsung, and TSMC, we are looking to actively help promote a healthy ecosystem for UCIe.

Not only does UCIe accommodate the bulk of designs today from 8 Gbps to 16 Gbps per pin but it also accommodates designs at 32 Gbps per pin for high-bandwidth applications from networking to hyperscale data centers. UCIe is comprised of two package variants: UCIe for advanced packages, such as silicon interposer, silicon bridge, or redistribution layer (RDL) fanout; and UCIe for standard packages, such as organic substrate or laminate.

The UCIe stack consists of three layers. The top Protocol Layer ensures maximum efficiency and reduced latency through flow-control-unit-based (FLIT-based) protocol implementation, supporting the most popular protocols, including PCI Express® (PCIe®), Compute Express Link (CXL), and/or user-defined streaming protocols. The second layer is where the protocols are arbitrated and negotiated and where the link management occurs through a die-to-die adapter. The third layer, the PHY, specifies the electrical interface with the package media. This is where the electrical analog front end (AFE), transmitter and receiver, and sideband channel allow parameter exchange and negotiation between two dies. Logic PHY implements the link initialization, training and calibration algorithms, and test-and-repair functionality (see the figure).

Whether the primary goal is high-energy efficiency, high-edge usage efficiency, low latency, or all of the above, the UCIe specification has very competitive performance targets. To help you in your journey of adoption, Synopsys offers a complete UCIe Solution, allowing designers to put the specification into practice with PHY, controller, and verification IP (VIP).

The PHY interface supports both standard and advanced packaging options and is available in advanced FinFET processes for high-bandwidth, low-power, and low-latency die-to-die connectivity. The controller IP supports PCIe, CXL, and other widely used protocols for latency-optimized network-on-chip (NoC)-to NoC links with streaming protocols; for example, bridging to CXS interfaces and to AXI interfaces. Lastly, the Synopsys Verification IP (VIP) for UCIe supports various designs under test (DUT) at each layer of the full stack. The VIP includes testbench interfaces with/without PCIe/CXL protocol stack, Application Programming Interface (API) for sideband service requests, and API for traffic generation. Protocol checks and functional coverage are at each stack layer and signaling interface. It enables scalable architecture and Synopsys-defined interoperability test suites.

The Synopsys solution enables robust and reliable die-to-die links with testability features for known good dies and CRC or parity checks for error correction. It enables designers to build seamless interconnects between dies for the lowest latency and highest energy efficiency. With multi-die system designs, an increase in payloads due to multiple streaming protocols could take days or even months for simulations, limiting its usefulness.

To verify a multi-die system, designers can first create various single-node and multi-node models, simulating these minimalistic systems to check the integrity of data. Once those scenarios are tested, designers can then test in higher-level system scenarios with multi-protocol layers using the Synopsys ZeBu® emulation system, and then move to prototyping with the Synopsys HAPS® prototyping system. This flow from models to simulation to emulation to prototyping, using our verification IP and other protocol verification solutions, will help you ensure seamless interoperability pre-silicon.

Multi-die system design is a great option to catapult systems beyond the limitations of Moore’s law. With it, designers can realize new levels of efficiencies and performance while reducing power and area footprints. UCIe is helping to fast track this new way of designing for advanced applications. To learn more about how UCIe facilitates multi-die system designs, check out the Synopsys article, Multi-Die SoCs Gaining Strength with Introduction of UCIe.

For a list of UCIe compatible verification IP products, go to https://www.synopsys.com/verification/verification-ip.html, and for UCIe IP, go to https://www.synopsys.com/dw/ipdir.php?ds=dwc_ucie_ip.

Also Read:

Methodology to Minimize the Impact of Duty Cycle Distortion in Clock Distribution Networks

Methodology to Minimize the Impact of Duty Cycle Distortion in Clock Distribution Networks
by Kalar Rajendiran on 09-26-2022 at 6:00 am

Figure Gate Failing to Reach 1.1V

Synchronous circuits dominate the electronic world because clocking eases the design of circuits compared to asynchronous circuits. At the same time, clocking also introduces its share of challenges to overcome. No wonder, a tremendous amount of time and effort have been spent over the years on developing and implementing various types of clock distribution networks. A lot of time has also been spent on analyzing and addressing clock jitter due to power supply. And at the design level, a lot of thought goes into choosing the clock duty cycle when designing a circuit.

In terms of accuracy, SPICE simulations have always been held as the gold standard. But SPICE simulations are compute-time intensive and typically run on just small portions of a design.  Instead, gate level simulation was used as the default signoff tool for chips until the turn of the 21st century. This worked well as most of the designs then were not very large or complex and the process nodes in use were 250nm or larger. As process nodes advanced and design size and complexity started growing, gate level simulation as a signoff tool started getting strained. Static timing analysis (STA) took over as the default signoff tool and has worked well for the last two decades. But today’s advanced-process-based designs are facing chip-signoff challenges due to limitations of STA and duty cycle distortions (DCD). While the intrinsic limitations of STA were always present, they did not pose practical issues when it came to signoff on less advanced process nodes. And while duty cycle distortions go hand in hand with clocking, they were either corrected with a DCD correcting circuit or were not serious enough to impact the proper functioning of a design. But no longer.

We’re entering an era where STA needs to be augmented for addressing DCD and increasing verification coverage for high confidence at chip signoff. Wouldn’t it be great if overnight simulation runs on multi-millions of gates can deliver SPICE level accurate results? Infinisim has published a whitepaper that explains how their analysis tools and methodology can deliver all of the above. This blog covers the salient points from that whitepaper.

Duty Cycle Distortion (DCD)

Duty cycle distortion (DCD) is a propagation delay difference between low-to-high and high-to-low transitions of the clock and is typically expressed in percent. With 7nm and below, deep clock distributions are prone to DCD accumulation as the signal propagates through different levels. With millions of gates on a single clock domain, even a picosecond DCD per gate will add up to significant distortions at the end points. While DCD results from manufacturing process variations, marginal designs and electrical noise, it gets worse with transistor aging. Traditionally, duty cycle correcting circuits have been added to designs to remedy the problem.

Duty Cycle Correcting Circuit

Duty cycle corrector circuits work by adding or removing delay from the rising or falling transition until an expected duty cycle is reached. While duty cycle corrector circuits may help reduce DCD, they add complexity to the clock design of today’s already complex chip designs. With time to market pressure ever increasing, the goal is to reduce complexity wherever one can in order to get the chip out on schedule. Implementing a methodology that accurately analyzes DCD can eliminate the need for DCD correcting circuit and reduce the complexity of a design.

Limitations of STA

STA tools do not compute the full clock signal waveforms. Instead,  they estimate timing values by inferring them from pre-characterized timing libraries for different PVT corners. While this makes STA fast, it is not accurate enough at finer geometries, failing to detect DCD and rail-to-rail failures directly.

At sub 7nm designs with higher transistor nonlinear effects, increased aging and deep clock distributions, complex analysis is not possible with traditional STA. In addition, STA is especially inaccurate for advanced clock topologies containing meshes and spines. In essence, DCD, rail-to-rail and minimum clock pulse width problems are critical issues that can go unnoticed during STA, resulting in serious failures in silicon.

Infinisim’s ClockEdge

Infinisim’s ClockEdge is a high-capacity, high-performance, SPICE accurate, end-to-end integrated clock analysis solution. ClockEdge can handle chips that incorporate multiple topology high-speed clocks and used for full-chip sign-off. It plugs into current design flows, allowing designers to simulate large clock domains with millions of gates.

Overnight Runs on the Gold Standard

Infinisim’s ClockEdge computes DCD using SPICE simulations of an entire circuit using full interconnect parasitics. The simulator identifies the nets that are failing duty cycle, minimum pulse width and rail-to-rail failures. It generates clock waveforms and estimates the maximum frequency at which rail-to-rail failures occur. SPICE accurate results are delivered overnight on clock domains containing 4+ millions gates, which is unheard of in standard SPICE simulations.

High Verification Coverage

Designers can run multiple PVT corners and input duty cycles for comprehensive and increased design verification coverage thereby gaining high confidence in their design. ClockEdge users routinely find DCD issues missed by STA-based-CTS methodologies.

Some Salient Features of ClockEdge

    • SPICE accurate results overnight, for clock domains containing 4+ million gates
    • Leverages distributed computing to simulate and analyze large complex clocks
    • Handles complex clock topologies includes trees, grids/mesh and spines
    • Reports include timing, process variation, power and current analysis
    • OCV analysis: during design for guard-band reduction, in post-design phase to estimate yield
    • Results from ClockEdge are integrated into CTS flow for optimizing design

Some use cases below where ClockEdge augments STA for SPICE accurate, comprehensive timing analysis:

    • Timing optimization during design iterations
    • Base-layer tapeout/Metal-layer tapeout signoff verification
    • Post-fab investigation into performance degradation and potential improvements for next revision

Rail-to-Rail Failures Report

ClockEdge also reports rail-to-rail failures by plotting the maximum and minimum voltages reached by every node in a full clock domain. The Figure below shows ClockEdge identifying a gate at level 1 failing to reach supply voltage of 1.1v.

Fmax Report

The data can also be represented in a Fmax plot to show the expected maximum frequency (Fmax) at which rail-to-rail failures would occur for each node. Refer to the Figure below.

The above reporting capability allows designers to quickly determine if there are any rail-to-rail failures amongst the millions of nodes on a particular clock path.

Summary

ClockEdge delivers SPICE accurate results on clock domains containing 4+ millions gates and higher verification coverage compared to competitive products in the market. It easily plugs into current design flows used by customers. And it can accurately analyze top-level, block-level and hard-macro level clocks to cover all blind spots. The tool finds DCD, jitter, aging and rail to rail issues that are routinely missed by traditional STA-based methodologies.

For more details about ClockEdge, you can access the whitepaper here.

To learn about a comprehensive solution for full-chip clock analysis, visit Infinisim.

Also Read:

WEBINAR: Challenges in analyzing High Performance clocks at 7nm and below process nodes

WEBINAR: Overcome Aging Issues in Clocks at Sub-10nm Designs

White Paper: A Closer Look at Aging on Clock Networks


Podcast EP108: Brief History of the Semiconductor Industry – How Did It Get Started?

Podcast EP108: Brief History of the Semiconductor Industry – How Did It Get Started?
by Daniel Nenni on 09-23-2022 at 10:05 am

Dan is joined by Chris Miller, Associate Professor of International History at The Fletcher School and author of Chip War: The Fight for the World’s Most Critical Technology, a geopolitical history of the computer chip. Chris provides a far-reaching overview of the forces that shaped the worldwide semiconductor industry, with a special view of the R&D done by US aerospace and defense in the early days. The forward-looking strategies that were developed are truly remarkable.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Semifore is Supplying Pain Relief for Some World-Changing Applications

Semifore is Supplying Pain Relief for Some World-Changing Applications
by Mike Gianfagna on 09-23-2022 at 8:00 am

Semifore is Supplying Pain Relief for Some World Changing Applications

In a recent post, I discussed how Samtec is fueling the AI revolution. In that post, I talked about how smart everything seems to be everywhere, changing the way we work, the way we think about our health and ultimately improving life on the planet. These are lofty statements, but the evidence is growing that the newest wave of applications could do just that. If you take a closer look at the technology that is enabling all this, you will find two primary drivers – better/faster AI and more ubiquitous and efficient data communication. Semifore has recently issued two press announcements about new customers in AI and data communication. Let’s look at those announcements. I believe you will see a pattern – Semifore is supplying pain relief for some world-changing applications.

Data Communication

The first press announcement is about CommScope. In that announcement it is reported that CommScope will expand the term of its use of Semifore solutions for the design of advanced communication devices through a multi-year agreement.

Who is CommScope and how is this relevant? According to its website:

At CommScope we push the boundaries of communications technology to create the world’s most advanced networks. We design, manufacture, install and support the hardware infrastructure and software intelligence that enable our digital society to interact and thrive. Working with customers, we advance broadband, enterprise and wireless networks to power progress and create lasting connections. Across the globe, our people and solutions are redefining connectivity, solving today’s challenges and driving the innovation that will meet the needs of what’s next.

This is the kind of holistic approach that’s needed to truly unify communications on a global scale. It goes toward the goal of changing the world by unifying the data in the world. Building systems like this isn’t easy. There are many, many hurdles to cross. These systems contain large amounts of hardware as well as massive software stacks. Getting the interaction between the hardware and software right is one of those hurdles.

Here is where Semifore provides pain relief. CSRs, or control status registers is where the interface between software and the hardware it controls occurs. These registers define the communication protocol between hardware and software and the correctness of that interface is absolutely critical to success.  According to CommScope:

“We have used other CSR EDA tools over the years, but Semifore’s CSRCompiler offers the fullest featured and most flexible tools as well as easiest to use,” said Andy Mansen, senior manager of hardware engineering at CommScope. “It eliminates all confusion around CSR.”

Semifore to the rescue.

AI Acceleration

The second press announcement is about Flex Logix. In that announcement it is reported that Flex Logix selects Semifore for advanced inference chip design.

Inference is the process AI uses to recognize things – people in front of a car, spoken language, or cancer cells for example. These systems demand very fast response time with very low latency. You do want to recognize a pedestrian in front of your self-driving car long before you hit the person for example. As a result of demands like this, more and more AI processing is moving from the cloud to the edge, or even onto the sensing device itself. There just isn’t time to do it any other way.

This trend, in turn, has created a rather vexing problem. How do you fit all that processing power in the rather small space and energy budget available? It is here that Flex Logix delivers innovation. According to Flex Logix:

Flex Logix is a reconfigurable computing company providing AI inference and eFPGA solutions based on software, systems and silicon. Its InferX™ X1 is the industry’s most-efficient AI edge inference accelerator that will bring AI to the masses in high-volume applications by providing much higher inference throughput.

So, help is on the way for local, efficient inference. You can learn more about Flex Logix on SemiWiki. But, just as with data communication, there is a catch. These are very complex devices, and the hardware/software interface is a challenge. According to Flex Logix:

“We are redefining the deployment of inference at the edge with our highly efficient technology,” said Charlie Roth, VP of Hardware R&D at Flex Logix. “These designs are highly complex, and the hardware and software interfaces are critical to performance and core functionality.  Semifore’s CSRCompiler ensures the hardware and software interfaces function as expected, and that both the hardware and software teams can test interaction during chip development.”

Once again, Semifore to the rescue.

More About Semifore

I caught up with Semifore’s founder and CEO, Rich Weber recently. A lot of the support for specification of the hardware/software interface is provided by industry standards. I wanted to see if Semifore was following that work. What I discovered is that not only is Semifore following relevant standards, but their team is driving and defining many of them.

Rich told me that he personally has been a voting member of the Accellera SystemRDL 1.0 and 2.0 committees, the IEEE 1685 2009 and 2014 committees and the Accellera UVM committee. Rich is also co-chair of the Accellera IP-XACT 1.5, 2.0, and 2.1 committees and he is currently the secretary of the IEEE 1685 2022 committee. Busy guy.

He also told me that Jamsheed Agahi, a Semifore co-founder and VP of quality is the secretary of the Accellera UVM committee, has been the secretary of the IEEE 1800.1 UVM committee and is a voting member of the Accellera Portable Stimulus committee.

These two gentlemen are driving important work for system design.

Are you working on a project that aims to change the world? If so, you are likely to encounter many hurdles with lots of pain. The good news is that help is available for some of that pain. You can learn more about Semifore’s pain-relieving products on SemiWiki here.  There is a lot of good information on Semifore’s website as well.  There’s also a webinar coming up that provides a clever perspective on how the RTL architect, verification engineer and firmware developer try to work together on complex, world-changing projects.

Now you know how Semifore is supplying pain relief for some world-changing applications.


Enjoy the Super-Cycle it’ll Crash in 2023!

Enjoy the Super-Cycle it’ll Crash in 2023!
by Malcolm Penn on 09-23-2022 at 6:00 am

Semiconductor Crash 2022

At our January 2021 industry update webinar, 20 months ago, we forecast “There’s no tight capacity relief before 2022 at the earliest” whilst simultaneously cautioning “Enjoy the super-cycle … it’ll crash in 2023!”  At the time, we were dismissed as being ‘ever optimistic’ for the first prognosis and ‘losing the plot’ for the second. 20 months down the road, both predictions have transpired to have been astutely prescient.

We made those predictions based on our tried and tested methodology of analyzing what we see as the four key driving factors influencing market growth, namely the economy, unit demand, capacity and ASPs, overseen by our 55 years of hands-on experience in the industry.

Just to recap, the economy determines what users can afford to buy; unit demand reflects what users actually buy; capacity determines how much demand can be met; and ASPs set the price units can be sold for.

Industry Update

Twelve months later, at our Jan 2022 update webinar, we forecast 10 percent growth was still possible for 2022, but with a downside of 4 percent and upside of just 14 percent.  Our concerns at that time were that the risks outweighed the opportunities; the chip market was seriously overheating; and the road ahead very stony.  We were especially worried about:

  • Supply/Demand Rebalance (Once The 2021 CapEx Increases Bite Home)
  • Slowdown In End Demand (Covid Surge Slowdown and/or Supply Chain Constraints)
  • Economic Slowdown (Covid Restrictions, Inflation Concerns, Unwinding Fiscal Stimuli)

Our over-riding caveat was “When the bubble bursts, it burst, there are no soft landings”.  First unit shipments plummet and then ASPs collapse, with the added word of caution “Don’t be surprised if the market goes negative … it more often does than not.”

Five months later, at our May update event, despite most firms still reporting stellar first half year sales, we were once again out on a limb, reducing our January 10 percent forecast to just 6 percent, due to concerns about the worsening economic outlook, driven by increased food, energy and gasoline costs wrought by Russia’s war with Ukraine, and its squeeze on consumer spending.  At that time, we reiterated our belief that the downturn would hit in Q3, triggered by the anticipated capacity increases loosening up supply constraints.

The downturn actually broke in June 2022, with a huge implosion in sales driven by a slowdown in unit shipments and an 18 percent fall in ASPs, from Q2’s $1.350 peak to $1.105, wiping out in one month 77 percent of the Super-Cycle’s gain. This pushed our January and May forecasts well into bear territory.

Granted downturns are always uneven and patchy at the start, partly driven by their commodity nature, or not, and partly because there is at least four months’ of WIP production in the wafer fab pipeline which cannot be stopped, but we firmly believe no sector will prove 100 percent immune. As we move towards 2023, the inflation outlook is much worse than we originally believed, in the high double-digit range, with the Fed now having abandoned its ‘soft landing’ goal. Interest rates everywhere, except China, are now poised to rise much faster in response, despite the obvious and clear risk of triggering an economic recession.

At the same time, Russia has now openly declared weaponization of its oil, natural gas, and mineral supplies, especially In Europe, and China’s still aggressive Covid zero tolerance strategy continues to disrupt the global supply chains everywhere. War sanctions on Russia are inevitably also disrupting the global economy and heightened geopolitical tensions with China re Taiwan and the US technology sanctions are also raising the global tension thermometer.

Current Status

As we entered the third quarter, the chip industry is facing a simultaneous series of global economic shocks coincident with slowdown in market demand. Every single warning light on the economy is now flashing red. Unit shipments are still running about 20 percent above their long-term average and a sharp downward adjustment is inevitable as lead times start to shorten and excess inventory is purged.

CapEx spend, as percentage of semiconductor sales, is at 20-year record high, with the massive 2H-2021 74 percent increased CapEx splurge about to hit home in 2H-2022, just as the market demand turns down. The current ASPs plunge has been driven by memory, but all other sectors are weakening and expected to follow suit in 1H-2023.

We reached the top of the semiconductor Super-Cycle roller-coaster in Q2-2022 and the downward plunge started in June. Sectors in the front are already feeling the impact; for those in the rear? Their start has yet to come. The 17th industry down cycle has now definitely begun.

We have revised our 2022 forecast down to 4 percent growth, but held our 2023 outlook at a 22 percent decline, back to around US$450 billion, and, all things being equal, we should be back to single-digit positive growth in 2024.

Downturn Opportunities

Downturns are a structural part of the industry psyche, which is quite natural and normal. They are also a time when longer-term market share gains are best made.

It is also a time when innovation comes to the fore. Firms need to emphasize new products to differentiated from their competitors and invent their way out of the crisis.

Innovation tends to slow during an upturn as foundries and IDMs ration R&D access to scarce wafer fab capacity. The opposite happens in a downturn when fab capacity becomes abundant.

As a result, R&D and new IC design activity always accelerates during a downturn and, as with the past cycles, the downturn beneficiaries will be the IC design houses, EDA industry and leading-edge suppliers.

Malcolm Penn

20 Sep 2022

Preview of the webinar proceedings here: https://youtu.be/PX9TmRTzD18

Order the webinar slides and recording here: https://www.futurehorizons.com/page/133/

Also Read:

Semiconductor Decline in 2023

Does SMIC have 7nm and if so, what does it mean

Samtec is Fueling the AI Revolution


Flex Logix: Industry’s First AI Integrated Mini-ITX based System

Flex Logix: Industry’s First AI Integrated Mini-ITX based System
by Kalar Rajendiran on 09-22-2022 at 10:00 am

AI Workflow for Development and Deployment

As the market for edge processing is growing, the performance, power and cost requirements of these applications are getting increasingly demanding. These applications have to work on instant data and make decisions in real time at the user end. The applications span the consumer, commercial and industrial market segments. AI hardware accelerator solutions are being sought to meet the needs of these applications.

With its focus on the embedded vision market, Flex Logix introduced its InferX X1 accelerator chip in 2019. The InferX X1 is the industry’s most efficient AI edge inference accelerator that can bring AI to embedded vision applications across multiple industries. Flex Logix has been working on making AI system adoption and deployment easier in a number of markets and applications. Ease of incorporation into the customers’ applications accelerates the adoption of any useful technology and the InferX X1 is no exception.

At the AI Hardware Summit this week, Flex Logix launched Hawk X1, the industry’s first Mini-ITX AI x86 based system card. I had an opportunity to chat about that announcement with Barrie Mullins, VP of Product Management for Flex Logix.

Enabling Easier Edge and Embedded AI Deployment

The Hawk X1 is designed for customers looking to upgrade their current Mini-ITX systems with AI, or quickly develop new edge AI appliances and solutions. As the bulk of existing mini-ITX systems are x86 based, the Hawk X1 will be a drop-in upgrade to these systems. This makes it easier for customers to get to market faster at a lower development cost and reduced risk. From an operating system perspective, the Hawk X1 supports both Linux and Windows.

AI Workflow for Development and Deployment

The Hawk X1 system leverages Flex Logix’s InferX accelerator chip, the industry’s most efficient AI inference chip for edge systems. It provides Flex Logix’s customers a price/performance/power advantage over competitive edge inference solutions. Flex Logix’s Easy Vision platform includes pre-trained ready-to-use models for object detection such as hard hat, people counting, face mask and license plate recognition. Customers can save over six months of product development time, additional system costs and power compared to alternate solutions.

Depending on the application, a customer may want to utilize the pre-trained ready-to-use AI models, load them to the Hawk X1 board and run their application. Alternately, they may want to develop their own models using the Easy Vision Platform and the Infer Model Development Kit (MDK). The MDK optimizes, quantifies and compiles the models into the format that the Hawk X1 will understand. The customer then uses the Run Time Kit (RTK) to configure and manage the run time execution.

Hawk X1 Hardware Specification

The interfaces are ready for direct connect to the cameras. The Hawk X1 includes two InferX X1 accelerators and the AMD Quad-core chip for delivering the highest performance system. Flex Logix plans to launch another version of the card with one InferX X1 chip and an AMD Dual-core chip to address customers who may want a lower performance system. Customers get to choose their own memories allowing customers to have more control on the performance, power consumption and cost of the system. The Hawk X1 comes with a thermal solution that sits on top of the card to dissipate the heat.

Target Markets and Applications

  • Safety and Security
    • Mask, personal protection equipment (PPE) detection, building access, data anonymization and privacy
  • Manufacturing and Industrial Optical Inspection
    • Employee safety, logistics and packaging, and inspection of parts, processes and quality
  • Traffic and Parking Management
    • Traffic junction monitoring, vehicle detection and counting, public and private parking structures, toll booths
  • Retail
    • Logistics, safety, consumer monitoring, automated checkout, and stock management
  • Healthcare:
    • Medical image analytics, patient monitoring, mask detection, staff and facility access control and safety
  • Agriculture
    • Crop inspection, weed and pest detection, automated harvesting, yield and quality analysis, animal monitoring and health analysis
  • Robotics
    • First/last mile delivery, forklifts, tuggers, drones, and autonomous machines

Benchmarking Hawk X1

The Hawk X1 offers better performance against all NVIDIA boards. The main competition in this space is the Xavier AGX based systems. In the chart below, you can see how the Hawk X1 compares against the Xavier AGX with popular and standard object detection models.

Hawk X1 Availability

Flex Logix is taking orders for delivery starting January 2023. Hawk X1 is priced at $1,299 for order quantity of 1Ku+.

Summary

Through the latest addition to its product portfolio, Flex Logix has made AI system adoption and deployment easier for a number of market applications. The Hawk X1  in the mini-ITX form factor is deployment ready for Safety and Security, Manufacturing and Industrial and Traffic and Parking management applications.

You can read the Hawk X1 product announcement here.

For more details, visit Flex Logix website.

Also Read:

Flex Logix Partners With Intrinsic ID To Secure eFPGA Platform

EasyVision: A turnkey vision solution with AI built-in

WEBINAR: 5G is moving to a new and Open Platform O-RAN or Open Radio Access Network


Semiconductor Decline in 2023

Semiconductor Decline in 2023
by Bill Jewell on 09-22-2022 at 8:00 am

Semiconductor MArket Forecast 2022 1

The semiconductor market dropped 0.8 percent in 2Q 2022 versus 1Q 2022, according to WSTS. The 2Q 2022 decline followed a 0.5% quarter-to-quarter decline in 1Q 2022. The 2Q 2022 revenues of the top 15 semiconductor suppliers match the overall market results, with a 1% decline from 1Q 2022. The results by company were mixed. Memory suppliers SK Hynix and Micron Technology led with 2Q 2022 revenue growth of 13.6% and 11.0% respectively. AMD’s revenues were up 11.3% primarily due to its acquisition of Xilinx. The weakest companies were Nvidia with a 19% decline due to weakness in gaming and Intel with a 16.5% decline because of a weak PC market.

The outlook for 3Q 2022 is also mixed. The strongest growth is from the companies primarily supplying analog ICs and discretes. STMicroelectronics has the highest expectations, with its 3Q 2022 revenue guidance a 10.5% increase from 2Q 2022. ST cited strong overall demand, particularly in the automotive and industrial segments. Automotive and/or industrial segments were also cited for expected 3Q 2022 revenue growth by Infineon Technologies, NXP Semiconductors, and Analog Devices. Analog Devices’ revenue was over $3 billion for the first time in 2Q 2022 (primarily due to its acquisition of Maxim Integrated Products last year) and made it on to our list of top semiconductor companies.

Weakness in the PC and smartphone markets were mentioned as major factors in expected 3Q 2022 revenue drops by MediaTek and Texas Instruments. Nvidia cited continuing weakness in gaming for an expected 12% decline. The largest decline in 3Q 2022 revenue will be from the memory companies. Micron Technology guided for a 21% decline. Samsung did not provide specific revenue guidance, but Dr. Kye Hyun Kyung, head of Samsung’s semiconductor business, said the second half of 2022 “looks bad.”

The weakness in the smartphone and PC markets is reflected in recent forecasts from IDC. Smartphone unit shipments are projected to decline 7 percent in 2022 following 6 percent growth in 2021. Smartphones are expected to return to 5 percent growth in 2023. PCs, which experienced 13% growth in 2020 and 15% growth in 2021 as a result of the COVID-19 pandemic, are forecast to decline 13% in 2022 and 3% in 2023. Automotive remains healthy, with LMC Automotive projecting light vehicle unit production will increase 6.0% in 2022 and 4.9% in 2023.

The global economic outlook is another factor pointing toward a slowing of the semiconductor market. Recent forecasts for global GDP growth in 2022 are in the range of 2.7% to 3.2%. The percentage point decline (or deceleration) from 2021 growth ranges from 2.9 points to 3.3 points. Our Semiconductor Intelligence model predicts a 3-percentage point deceleration in GDP growth will result in a 16-point deceleration in semiconductor market growth. Our current forecast of 5% semiconductor growth in 2022 is a 21-point deceleration from 26% growth in 2021. Global GDP is expected to show continued growth deceleration in 2023 of 0.3 to 1.0 points. However, a global recession is still a possibility. Bloomberg surveys put the probability of a recession in the next 12 months at 48% for the U.S. and 80% for the Eurozone.

Our Semiconductor Intelligence forecast of a 5.0% increase in the semiconductor market in 2022 is the lowest among recent publicly available forecasts. Other recent projections of 2022 semiconductor market growth range from 6% (Future Horizons) to 13.9% (WSTS).

The semiconductor market will likely show at least five consecutive quarter-to-quarter declines from 1Q 2022 through 1Q 2023. If the global economy does not weaken more than current expectations, the semiconductor market should have a modest recovery in the second half of 2023. However, the quarterly trends will drive the market negative for the year 2023. Our Semiconductor Intelligence forecast is a 6.0% decline in 2023. Future Horizons is the most pessimistic with its August projection of a 22% drop in 2023. Gartner projects a decline in of 2.5%. IDC and WSTS expect growth in 2023, but at a slower rate than 2022: 6.0% for IDC and 4.6% for WSTS.

After 2023, the semiconductor market should stabilize toward typical trends. The COVID-19 related shutdowns and resulting supply chain disruptions will be mostly resolved. Traditional market drivers smartphones and PCs should be back to normal growth. Emerging applications such as automotive and IoT (internet of things) will become increasing important as market drivers.

Also Read:

Automotive Semiconductor Shortage Over?

Electronics is Slowing

Semiconductors Weakening in 2022


Load-Managing Verification Hardware Acceleration in the Cloud

Load-Managing Verification Hardware Acceleration in the Cloud
by Bernard Murphy on 09-22-2022 at 6:00 am

Scheduling emulation min

There’s a reason the verification hardware accelerator business is growing so impressively. Modern SoCs – now routinely multi-billion gate devices – must be verified/validated against massively demanding test plans, requiring high levels of test coverage. Use cases extend all the way up to firmware, OSes, even application software, across a dizzying array of power saving configurations. Testing for functionality, performance, peak and typical power, security and safety goals. All while also enabling early software stack development and debug. None of this would be possible without hardware accelerators, offering many orders of magnitude higher verification throughput than is possible with software simulators.

Maximizing Throughput and ROI

Hardware accelerators are not cheap, but there is no other way to get the job done; SoC design teams must include accelerators in their CapEx budgeting. But they want to exploit their investment as fully as possible. Ensuring machines are kept fully occupied and are fairly allocated across product development teams.

In the software world, this load management and balancing problem is well understood. There are plenty of tools for workload allocation, offering a range of sophisticated capabilities. But they all assume a uniform software view of the underlying hardware. Hardware options can range across a spectrum of capacities and speeds, while still allowing at least in principle any job to be virtualized anywhere in the cloud/farm. Not so with hardware-accelerated jobs which must provide their own virtualization options given the radically different nature of their architectures.

Another wrinkle is that there are different classes of hardware acceleration, centered either on emulation or FPGA prototyping. Emulation is better for hardware debug, at some cost in speed. FPGA prototyping is faster but not as good for hardware debug. (GPUs are sometimes suggested as another option for their parallelism, though I haven’t heard of GPUs playing a major role so far in this space.)

Verifiers like to use emulators or prototypers in in-circuit emulation (ICE) configurations. Here they connect the design under test to real hardware. This directly mimics the hardware environment in which the chip under design must ultimately function. This requires physical connectors, and hence application-specific physical setups. Further constraining virtualization except where the hardware offers multiple channels between connectors and the core emulator/prototyper.

Swings and Roundabouts

Expensive hardware and multiple suppliers suggest an opportunity for allocation software to maximize throughput and ROI against a range of hardware options. Altair aims to tap this need with their Altair® Hero™ product, an enterprise job scheduler built for multi-vendor emulation environments. As their bona fides for this claim, they mention that they already have field experience deploying with Cadence Palladium Z1 and Z2, Synopsys Zebu Z4 and Synopsys HAPS. They expect to extend this range over time to also include Cadence Protium and Siemens EDA Veloce. They also hint at a deployment in which users can schedule jobs in a hardware accelerator farm, choosing between Palladium and Zebu accelerators.

In a mixed vendor environment, clearly Altair has an advantage in providing a neutral front-end for user selected job scheduling. Advantages are less clear if users hope for the scheduler to dynamically load-balance between different vendor platforms. Compile for FPGA-based platforms is generally not hands-free; a user must pick a path before they start. Unless perhaps the flow compiles for both platforms in parallel, allowing for a switch before running? Equally scheduling ICE flows must commit to a platform up-front given physical connectivity demands.

Another point to consider is that some vendors closely couple their emulation and prototyping platforms. In order to support quick debug handoff between the two flows. Treating such platforms as independently allocatable would undermine this advantage.

In a single vendor environment, I remain open-minded. The hardware guys are very good at their hardware and have apparently put work into supporting virtualization. But could a 3rd party with significant cloud experience add scheduling software on top? To better optimize throughput and ROI in balancing jobs? I don’t see why that shouldn’t be possible.

You can learn more from this Altair white paper.