ads mdx semiwiki building trust gen 800x100ai

Podcast EP46: Arteris IP – the role and impact of system IP

Podcast EP46: Arteris IP – the role and impact of system IP
by Daniel Nenni on 11-05-2021 at 10:00 am

Dan is joined by industry veteran Charlie Janac, chairman, president and CEO of Arteris IP. Dan explores the various products that comprise system IP with Charlie, including the high growth markets he sees. Dan and Charlie also have an interesting discussion about autonomous driving – when and how it will likely be deployed throughout the world.

https://www.arteris.com/

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Update: Tuomas Hollman, Minima Processor CEO

CEO Update: Tuomas Hollman, Minima Processor CEO
by Daniel Nenni on 11-05-2021 at 6:00 am

Tuomas Hollman Minima CEO 2

Tuomas Hollman is an experienced senior executive, with proficiency that ranges from strategy to product development and business management. He began his semiconductor industry career at Texas Instruments, serving for 15 years in increasingly important roles, including general management and profit and loss responsibility for multiple product lines. From Texas Instruments, Tuomas joined Exar Corporation as division vice president of power management and lighting with oversight of strategy, product development and marketing. Tuomas joined Minima Processor from MaxLinear, through its acquisition of Exar Corporation, where he continued to lead the power management and lighting products. Tuomas holds a Master of Science degree in Microelectronics Design from Helsinki University of Technology and a Master of Science degree in Economics and Business Administration from the Helsinki School of Economics, both in Finland.

Congratulations on your recent news that you have been selected in the first batch of 65 companies to join the European Innovation Council (EIC) Accelerator! What was the selection process like? What was the criteria for selection? What stood out in Minima’s submission that appealed to the EIC Accelerator?

The EIC Accelerator is a unique European funding instrument of the European Innovation Council. This was the first batch of companies selected from more than 800 applications for the EIC Accelerator. Prior to this, Minima participated in the SME Instrument, the predecessor programme to the EIC Accelerator. So they were familiar with us but the process still required an indepth video, application, pitch and intensive interview with a jury of investors and entrepreneurs. They are targeting companies looking to fund their
development and scale up their ground-breaking innovations in healthcare, digital technologies, energy, biotechnology, and space. They resonated with our value proposition to enable lower energy chips to save battery life and make that available to the broadest market possible through a semiconductor IP and EDA licensing model. We were the only semiconductor IP and EDA company selected this time.

You’re getting a full blended investment. What does that mean? How do you plan to use the investment?

The EIC Accelerator offers grants of up to ~€2.5 million combined with equity investments which is what they call a full blended investment. It supports the development of top-class innovations through crowding-in private investors for equity. Minima will be using the grant and potential equity in combination with private investors to scale up both the technology and the business. Remember in the last interview we discussed scaling our business not only by resource enabling us to support more customers, but by developing our IP delivery methods and Dynamic Margining EDA tools. This is precisely what the EIC Accelerator enables us to do. Tooling our solution enables our customers to explore and implement the technology more independently, enabling a step function in business growth, rather than scaling by mere resource growth.

Early in the year you spoke about the focus of Minima on the always on, sensing type applications such as hearables and wearables. How is that going?
Minima’s Dynamic Margining IP solution is a perfect fit for energy reduction in hearables and wearables, achieved by finding the minimum (stable) voltage at a given operating frequency. We have seen 60%+ energy savings for the chip designer’s processor of choice when combined with Minima’s Dynamic Margining. New energy savings benchmarks such as this one will drive the industry to adopt near threshold voltage solutions. We’re seeing the increase in the number of customer opportunities happen this year which is why we applied for the EIC Accelerator To meet the market demand, we need to scale both via our solution delivery technology and customer support resource. Hearables and wearables remain a very active area for us. We see also an increasing demand in Edge-AI / AIoT type devices, which are broadly expected to be the fastest growing product category among the typical always on SoCs.

How do you work with customers on an implementation of your solution? What do they need to be prepared to do and what does Minima provide? Will that model need to change to scale up the company?

Minima’s approach to what the industry has known as DVFS combined with AVS is totally new because we can scale to the optimal energy dynamically. It means that we are working with customers earlier than design implementation, we’re working with them at the architectural phase to help implement Dynamic Margining for their application case. At implementation time, we are process technology agnostic, compatible with the standard EDA flows and basically can enable the adaptive voltage scaling, enabled by our Dynamic Margining technology, down to threshold voltage level at any given process node.

What’s in store from Minima in the next 12 months as we look ahead to 2022?
Minima is continually improving our Dynamic Marging solution to make it as automated and tooled as possible for implementation, something we plan to do with the increased funding. The market opportunity in hearables, wearables plus IoT radios, MCUs, image recognition and edge AI is huge for us, it’s a multi-billion dollar revenue opportunity for us.

Also Read:

CEO Interview: Dr. Ashish Darbari of Axiomise

CEO Interview: Jothy Rosenberg of Dover Microsystems

CEO Interview: Mike Wishart of Efabless


Cliosoft Webinar: What’s Needed for Next Generation IP-Based Digital Design

Cliosoft Webinar: What’s Needed for Next Generation IP-Based Digital Design
by Mike Gianfagna on 11-04-2021 at 10:00 am

Cliosoft Webinar Whats Needed for Next Generation IP Based Digital Design

There’s plenty of talk about requirements for IP data management. The fundamental methods to prevent chaos, waste or worse are popular topics. I’ve covered webinars from Cliosoft on the topic on SemiWiki. But what about the future? What’s really needed to set up a path that scales, addressing the challenges of today and the new ones you’ll face tomorrow? Cliosoft recently presented a webinar that addressed this topic, and I found it quite enlightening. If you want to plan for your next design and be ready for its challenges, you need to watch this webinar. A replay link is coming that will let you know what’s needed for next generation IP-based digital design.

Simon Rance

The webinar is entitled IP Based Digital Design Management That Goes Beyond The Basics. It is presented by Simon Rance, vice president of marketing at Cliosoft. Simon has been an IP designer, system integrator, architect, and IP manager at Arm, so he brings substantial perspective to the conversation. Simon went beyond IP management in his discussion and detailed the benefits of IP platforms. I’ll take you on a quick tour of information Simon shares during the webinar.                                                                                                                     

Design Management Basics for Digital Design

The basics covered in this section include managing text and binary files, applying version control and labeling releases. Simon reviews all the sources of these basic capabilities and there are many. The names will be familiar. He goes on to expand the topic to a more complete set of capabilities, the ones required to truly implement IP-centric digital design. The figure below illustrates what’s involved.

IP Centric Digital Design

Simon than takes you through a real design project with various design personas to illustrate how the pieces fit. This is a great overview of the process and the benefits it delivers.  The following are some of the topics he covers.

The Project Dashboard

Early in the project, the architect will create a project dashboard. Here, the IP bill of materials (BoM) and various project documents such as architectural block diagrams, memory, and register maps will be assembled. To maintain coherency across the project teams, a home page, forum and news feed should be added. A sample block diagram is used for the balance of the discussion.

Finding the Right IP

Next, a structured method to locate the IP needed for the hypothetical design project is presented. Scenarios covered include reuse of internal IP, identifying useful third-party IP that is already licensed, the need to update or fix internal IP and identifying new IP that must be licensed.

IP Design

For the case where IP must be developed internally, Simon provides an overview of the processes required, including review of issue tracking and the knowledge base, and updating and publishing of new IP. The importance of hierarchical visibility is discussed.

System Integration

Here, Simon reviews IP BoM and conflict detection, IP assembly, glue logic requirements for IP integration and label systems. What label systems are and why they are needed is covered as well.

IP & System Verification

Here, simulation and formal verification are covered. How to increase coverage and reduce time results are discussed, along with an overview of techniques and tools to manage large design files and large storage requirements. Methods to fix issues found during verification are also covered, along with considerations for access control.

IP Traceability

Methods of implementing hierarchical IP and project tracking are discussed, along with the benefits of the approach.

RTL Signoff

To finish the presentation, an RTL and SoC signoff flow that supports design management snapshots is presented.

Hardware Design Management Checklist

As an extra benefit, Simon presents a complete hardware design management checklist to pull it all together.

To Learn More

The strategies and techniques presented in this webinar should find immediate use in any complex design project. I highly recommend you see this webinar. In under 30 minutes you will learn a lot. You can access the webinar replay here. After you watch the webinar, you’ll know what’s needed for next generation IP-based digital design.

Also Read

CEO Interview: Srinath Anantharaman of Cliosoft

Close the Year with Cliosoft – eBooks, Videos and a Fun Holiday Contest

The History and Physics of Cliosoft’s Academic Program!


KLAC- Foundry/Logic Drives Outperformance- No Supply Chain Woes- Nice Beat

KLAC- Foundry/Logic Drives Outperformance- No Supply Chain Woes- Nice Beat
by Robert Maire on 11-04-2021 at 8:00 am

KLAC Tencor SemiWiki

KLA- great quarter driven by continued strong foundry/logic
No supply chain hiccups- Riding high in the cycle
Wafer inspection remains driver with rest along for the ride
Financials remain best in industry

A superb quarter
There was little to complain about in the quarter. Revenues of $2.1B and EPS of $4.64, both nicely beating street estimates. Guidance is perhaps even better with revenue of $2.325B +- $100M and EPS of $4.95 to $5.85.

Free cash flow and return to shareholders remains very high as do gross margins which came in at just shy of 63%. Basically the usual great KLA ATM performance with little to complain about.

Process control continues to outgrow overall market for semi equipment
Process control remains one of the best segments of the equipment market. Even though KLA does not have a monopoly like ASML does in the EUV market, they have a near monopoly in some of the more critical areas of process control as they re several time the size of their nearest competitor with relatively few threats in the wafer inspection space.

Size matters
One of the reasons that KLA has outgrown the market is the pace at which new and improved products are brought out which is due to the huge R&D spending that they are doing and are continuing to ramp up. R&D spend was $960M over the last 12 months closing in on the billion dollar club.

Foundry/Logic is the biggest driver
We continue to live in a market dominated by foundry/logic spend as logic devices remain in tight supply and require the highest process control spend. While memory, and DRAM in particular continues to spend, this is a foundry/logic driven cycle.

Since KLA is the “poster child” for foundry/logic spend they are obviously the biggest beneficiary. We don’t see this changing any time soon, and if anything we are more concerned about memory slowing first as compared to foundry logic. This suggests that KLA will see both a stronger and longer benefit of spend.

China remains big at 33%
China continues to ramp its aspirations in the semi business and the best way to learn and ramp production is with a lot of process control tools and it makes sense for them to go with the industry standard.

We see China continuing to spend and will likely continue to spend even as supply and demand come into balance as China is not as driven by the near term shortage . This longer demand cycle bodes well for KLA.

Few places to complain
Wafer inspection is so great it overwhelms the good but not as great segments of KLA. Reticle inspection while good has been less stellar than wafer inspection because reticle inspection has lost share to Lasertec which has taken some of the wind out of its sales. The ex Orbotech business lines while OK are not much as compared to wafer inspection. But we knew that going into the Orbotech acquisition that anything KLA bought would likely be lower growth and certainly lower margin

The stocks
The stock will obviously have a strong positive reaction. A solid beat and guide coupled with zero supply chain impact that has haunted other names in the space from ASML on down.

Growth metrics are unlikely to slow any time soon though we could see moderation in 2022, the last quarter of 2021 is in the bag and likely just as strong as the quarter just reported.

KLA remains the stock to own in the process control space and remains a very solid and likely more defensible player in the semiconductor equipment space going forward.

Also Read:

Intel – “Super” Moore’s Law Time warp-“TSMC inside” GPU & Global Flounders IPO

Intel- Analysts/Investor flub shows disconnect on Intel, Industry & challenges

LRCX- Good Results Despite Supply Chain “Headwinds”- Is Memory Market OK?


Five Reasons Why a High Performance Reconfigurable SmartNIC Demands a 2D NoC

Five Reasons Why a High Performance Reconfigurable SmartNIC Demands a 2D NoC
by Kalar Rajendiran on 11-04-2021 at 6:00 am

6 Reason 1 High Bandwidth 6

As part of their webinar series, SemiWiki hosted one in June with the title “Five Reasons Why a High Performance Reconfigurable SmartNIC Demands a 2D NoC.” The talk by given by Scott Schweitzer, Sr. Manager, Product Planning at Achronix. Scott is a lifelong technology evangelist and focuses on recognizing technology trends and identifying ways to accelerate networking communications. I recently watched it on-demand.

Before I summarize Scott’s talk, let’s breakdown the long title of the webinar. First the NIC and the NoC. NIC stands for Network Interface Card and NoC here stands for “Network on a Chip.” This NoC is not to be confused with the other NOC which stands for Network Operations Center, both of course relating to communications. 2D NoC is analogous to a grid of highways that quickly gets traffic through to their final destinations. In the case of the NoC, it is data traffic. The pivotal part of the long title is “High Performance.”

To understand high-performance in this context, let’s look at an analogy that many of us can relate to. If a broadband connection into a home is very fast but the modem hardware slows things down, the benefit is lost. Similarly, if a home WiFi network is slow, the benefit of a very fast broadband connection is lost. But what if one already has a fast modem and a fast WiFi network? In reality, most home WiFi networks cannot fully benefit from the higher than 1Gbps broadband connections that are available to them. While this mismatch may not even be noticed at homes, today’s data centers, hyperscale data centers, edge AI applications, etc., cannot afford to tolerate these bottlenecks. This is the context for Scott’s talk.

Scott starts off by making a case for single-chip SmartNIC implementations at 100GbE and above. No contention there as on-chip communications are faster than when data paths having to jump through many different chips before getting to their final destinations. He states that studies show that a 2x10GbE interface network could move 25% more data than a 8xPCIeGen1 link can handle. That gap increases to 56% when we consider a 2x400GbE interface network with a 16xPCIeGen5 link. In other words, Ethernet bandwidths are fast outpacing PCIe speeds. Refer to Figure below. That shows the first of his five reasons for the need for high-performance SmartNICs overlayed on to a 2D NoC. The maximum data rate coming into a chip could be as high as 3.2Tbps if we consider a Cisco 5500 series router supplying the data. This external data has to be touched a number of times before sending to the host for processing.

 

Reasons 2 through 5:

The current generation of SmartNICs relies heavily on semiconductor devices with many processor cores to process packets. This approach, which is already challenged at 25GbE, becomes very difficult to scale beyond 100GbE.

Add virtualization requirements and software define (SD) overlay networks and we have increased the number of processing/touch points before the data can get to the final destination. The logical network (virtualization defined) may look like it has a couple of touch points. But the physical network through which the data is routed may have many SmartNICs through which the data has to go through. And each of these SmartNICs may have to do lot of work on the data before sending to the next SmartNIC.

More and more functions are being thrust upon the SmartNICs to handle. Security, filtering and key management are important functions that SmartNICs are tasked with. Processing data to identify if it is safe or not could be a simple task or a complicated deep analysis task depending on the application.

Offloading tasks that were traditionally handled by the host is becoming more common. For example, NVMe storage is being used like network attached storage with access managed by a SmartNIC.

The above reasons revolve around the need for having both reconfigurability and fast processing speed. A programmable-logic-based implementation is more efficient with packet processing than a processor-based implementation which requires executing multiple instructions for this processing. The same programmable-logic also enables the reconfigurability of the SmartNIC, which essentially boils down to solution flexibility.

 

It is a big benefit to be able to swap the algorithms running on these SmartNICs as the requirements of the supported applications evolve.

 

2D Network on Chip (NoC)

After handling more data and processing them very fast, it doesn’t make sense to wait. Like the phrase “hurry up and wait.” This is where overlaying the programmable-logic based SmartNIC on to a 2D NoC on the same FPGA platform comes in. As you see in the Figure below, the north-south and east-west data highways can get the data quickly to the host/final destination.

 

Summary

SmartNICs are being expected to handle more functionality and offer flexibility to handle changing requirements. They are expected to process incoming external data very efficiently and get the data to the final destination rapidly. Programmable-logic based single chip SmartNIC solution that leverages a 2D NoC offers an attractive approach as the gap between Ethernet bandwidths and PCIe speeds widen. You can watch the entire webinar on-demand by registering here.

 


Alchip Reveals How to Extend Moore’s Law at TSMC OIP Ecosystem Forum

Alchip Reveals How to Extend Moore’s Law at TSMC OIP Ecosystem Forum
by Mike Gianfagna on 11-03-2021 at 10:00 am

Alchip Reveals How to Extend Moores Law at TSMC OIP Ecosystem Forum

The TSMC Open Innovation Platform (OIP) event brings together a wide array of companies reporting cutting edge work that are part of TSMC’s rather substantial ecosystem. The event covers everything from high-performance computing to mobile, automotive, IoT, RF and 3D IC design. Of particular interest for this post is a presentation made by Alchip. At the current incredible pace of innovation, methods to extend and enhance Moore’s law are of great interest to many. At the event, Alchip reveals how to extend Moore’s law with a targeted combination of technology and know-how. Read on to learn more.

Yield vs. number of die

The presentation was given by James Huang, vice president of R&D at Alchip Technologies. James began his presentation describing the yield challenges associated with a very large die at advanced technology nodes. As die size increases, yield decreases, sometimes approaching single-digit numbers. An alternative approach to address this issue is to segment the design into smaller portions, or chiplets and integrate the resulting pieces with advanced interface and packaging technology. A group of smaller chips, or chiplets will deliver a yield far better than a single, large die as shown in the figure.

James pointed out this approach does introduce additional costs from the advanced package, die to die interfaces and higher testing costs. The approach still delivers competitive total cost in many cases, however. There are several key ingredients required to make this approach viable. They include:

  • 2.5/3D packaging technology: TSMC 3DFabric™ & CoWoS® makes high bandwidth memory multi-die integration feasible.
  • Die-to-die communication: Alchip APLink (Advanced Package Link) D2D IOs enables high-speed data traffic between multiple chiplets.

James explained that Alchip’s APLink 4.0 is compatible with TSMC’s N3 process and offers a 16 Gbps line rate. It can use TSMC’s most advanced CoWos technology, offering five metal layers. An approach like this can definitely cut the large die yield issue down to size. This method of chip decomposition and reintegration is illustrated in the figure below.

2.5D package

He provided some information about APLink 4.0. Key features include:

  • Source-synchronous I/O bus running with standard core voltage
    • 12Tbps per PHY macro
    • 16Gbps per DQ line
    • 25+ pJ/bit
    • 5ns latency
  • Reliable system operation
    • De-skew, DBI
    • Eye training & monitoring
    • Lane repair
    • PVT monitoring

The APLink IP is delivered as a hard PHY macro with a soft controller. The IP supports south/north and east/west orientations. Symmetric PHY alignment achieves minimum die-to-die wire length. James then discussed the steps required to develop an effective chiplet-based design approach. Items to consider include:

  • Fundamentals
    • Process node selection for each chiplet
    • D2D IP performance and readiness
    • 5D/3D packaging lengthens the design cycle and introduces new risks
    • Total design and manufacturing cost
  • System architecture considerations
    • How to distribute sub-systems across multiple chiplets (chiplet partitioning)
    • Are power distribution networks sufficient?
    • Timing, thermal and SI/PI budgeting
    • Can individual chiplets be shut off?

As you can see, there is a lot to consider. James went on to point out other considerations, such as thermal dissipation, mechanical stress and warpage, and overall routing space for the interfaces. A lot of these issues must be considered early in the design process. These problems can be tamed, however. A successful 2.5D design was showcased during his presentation that contained multiple chiplets and HBM memory stacks.

He also discussed strategies for energy management for complex chiplet-based designs, as well as signal and power integrity considerations. Testing strategies for this kind of design were also discussed. Test can be quite a bit more complex when compared to a monolithic design.

James concluded with an overview of Alchip’s customer engagement model. The company works with the customer at every step in a collaborative way to ensure project success. If you are considering a chiplet-based approach for your next design, you should carefully consider your ASIC partner. There are many challenges with this type of design, and James demonstrated in his presentation a strong command of the requirements needed to achieve success.

You can learn more at www.alchip.com. You can also get lots of good information about Alchip on SemiWiki here, including a copy of the press release about the TSMC OIP presentation. And that is the story of how Alchip reveals how to extend Moore’s law with a targeted combination of technology and know-how.

Also Read:

Alchip is Painting a Bright Future for the ASIC Market

Maximizing ASIC Performance through Post-GDSII Backend Services

Alchip at TSMC OIP – Reticle Size Design and Chiplet Capabilities


Update on TSMC’s 3D Fabric Technology

Update on TSMC’s 3D Fabric Technology
by Tom Dillinger on 11-03-2021 at 8:00 am

3D eTV testchip

TSMC recently held their 10th annual Open Innovation Platform (OIP) Ecosystem Forum.  An earlier article summarized the highlights of the keynote presentation from L.C. Lu, TSMC Fellow and Vice-President, Design and Technology Platform, entitled “TSMC and Its Ecosystem for Innovation” (link).

Overview of 3D Fabric

The TSMC 3D Fabric advanced packaging technology spans both the 2.5D and vertical die stacking offerings, as depicted below.

The Integrated FanOut (InFO) packages utilize a reconstituted wafer consisting of die embedded face down, surrounded by a molding compound (link).

Redistribution interconnect layers (RDL) are fabricated on the epoxy wafer.  (InFO-L refers to a silicon “bridge chiplet” between die embedded in the InFO package for improved inter-die connectivity over the RDL metallization pitch.)

The 2.5D CoWoS technology integrates die (and often, high-bandwidth memory stacks) on an interposer utilizing microbump attach.  The original CoWoS technology offering (now CoWoS-S) used a silicon interposer, and related silicon-based lithography for RDL fabrication;  through-silicon vias (TSVs) provide connectivity to the package bumps.  The silicon interposer technology offers improved interconnect density, critical for the high signal count HBM interface.  More recently, TSMC has been offering an organic interposer (CoWoS-R), providing a tradeoff between interconnect density versus cost.

The 3D SoIC offering provides vertical integration utilizing hybrid bonding between die pads.  The die may be oriented in face-to-face or face-to-back configurations.  TSVs provide connectivity through the (thinned) die.

InFO and CoWoS offerings have been in high-volume production for several years.  The recent innovations in CoWoS development relate to expanding the maximum silicon interposer dimensions to greater than the maximum reticle size to accommodate a larger number of die (especially, HBM stacks), stitching together the RDL interconnects.

The majority of Jim’s presentation covered advanced in SoIC development.

SoIC Testchip

TSMC shared results of a recent SoIC qualification test vehicle, as shown below.

The configuration used was the vertical bonding of an (N5) CPU die with an (N6) SRAM die, in a face-to-back topology.  (Indeed, a major CPU vendor has pre-announced plans for a vertical “last-level” SRAM cache die attached to a CPU using TSMC’s SoIC, to be available in 1Q2022.)

SoIC Design Flow

Jim presented a high-level design flow for vertical die integration, as shown in the figure below.

The flow requires concurrent focus on both top-down system partitioning into individual die implementations, plus early analysis of the thermal heat dissipation in the composite configuration, as highlighted above.

The discussion on thermal analysis highlighted the “chimney” nature of the low thermal resistance paths of the BEOL PDN and interconnect, compared to the surrounding dielectrics, as shown above.  Specifically, TSMC has collaborated with EDA vendors on improving the accuracy of the SoIC model discretization techniques, applying a more detailed mesh in specific “hotspot” areas initially identified with a coarse grid analysis.

TSMC also presented a methodology recommendation to incorporate thermal analysis results into the calculation of SoIC static timing analysis derate factors.  Much like on-chip variation (OCV) is dependent upon the distance spanned by (clock and data) timing paths, the thermal gradient for the SoIC paths is an additional derate factor.  TSMC reported that on-die temperature gradients for a path are typically ~5-10C, and a small flat derate timing margin for temperature should suffice.  For SoIC paths, large gradients of ~20-30C are feasible.  A flat derate to cover this range would be too pessimistic for paths with a small temperature difference – results of SoIC thermal analysis should be used for derate factor calculation.

SoIC Testing

The IEEE 1838 standardization effort pertains to the definition of die-to-die interface testing (link).

Much like the IEEE 1149 standard for boundary-scan chains on-die for package-to-package testing on a printed circuit board, this standard defines the control and data signal ports on each die for post-stack testing.  The primary focus of the standard is to exercise the validity of the face-to-face bonds and TSVs introduced during SoIC assembly.

Jim indicated that this definition is sufficient for low-speed I/Os between SoIC die, yet a more extensive BIST method will be required for high-speed I/O interfaces.

TSMC Foundation IP for SoIC – LiteIO

TSMC’s library development teams commonly provide general-purpose I/O cells (GPIOs) for each silicon process node.  For the die-to-die connections in SoIC configurations, where the driver loading is less, TSMC offers a “LiteIO” design.  As illustrated below, the LiteIO design focuses on optimizing the layout to reduce parasitic ESD and antenna capacitances, to enable faster datarates between die.

EDA Enablement

The figure below lists the key tool features recently developed in collaboration with major EDA vendors for the InFO and SoIC package technologies.

Summary

TSMC continues to invest heavily in 2.5D/3D advanced packaging technology development.  The key recent initiatives have focused on the methodology for 3D SoIC direct die attach – i.e., partitioning, physical design, analysis.  Specifically, early thermal analysis is a mandatory step.  Additionally, TSMC shared results of their SoIC eTV qualification testchip vehicle.  2022 is shaping up to see the rapid emergence of 3D SoIC designs.

-chipguy

Also read:

Highlights of the TSMC Open Innovation Platform Ecosystem Forum

Highlights of the TSMC Open Innovation Platform Ecosystem Forum

 

 


Back to Basics in RTL Design Quality

Back to Basics in RTL Design Quality
by Bernard Murphy on 11-03-2021 at 6:00 am

Deming min

Harry Foster waxes philosophical in a recent white paper from Siemens EDA, in this case on the origins of bugs and the best way to avoid them. Spoiler alert, the answer is not to make them in the first place or at least to flush them out very quickly. I’m not being cynical – that really is the answer though practice often falls short of ideal. Harry suggests we need to get back to basics in RTL design quality, and what better place to start than W. Edwards Deming, a founding father of Total Quality Management.

W. Edwards Deming

Quality must be designed in

This seems trite but it’s often the simple mistakes that bite us, like an out-of-range indexing error. Best case they slow down system level testing, worst case they make it through to silicon. It’s easy for us to believe that we are mostly infallible and what few mistakes we make will be caught in verification. But survey after survey proves that trivial mistakes still slip through, because we should know we left the mirage of exhaustive testing behind a long time ago.

Following Deming, we need to design quality in, not try to paste it on in verification. Harry proposes a 3-step process for design, based on a combination of design plus intent. The first step, Produce, starts with producing correct RTL by design (I assume we’re talking here about new IP or subsystems). The argument here is that bugs per line of code (LOC) are more or less constant at 15-50 bugs per thousand LOC irrespective of whether you are creating RTL, C++ or Javascript. The best way to create less bugs is therefore less lines of code, using a higher level of abstraction like SystemC /C++, Chisel or some other domain specific language.

Proving that intent is met

Since the method connects design and intent, the second step aims to prove in design that the intent is met. Harry’s suggestion here is particularly to leverage static and formal verification tools. We are designing quality in, so this is a task for RTL designers. Who already have access to a wide range of apps to simplify this analysis. They can find FSM deadlocks, arithmetic overflow possibilities and potential indexing errors. For possible domain crossing bugs, they can find metastability potential and other domain crossing errors which in many cases cannot be detected at all in simulation. Another possible source of errors is in X optimism and pessimism. The former may at least waste valuable time in system-level verification and the latter can create mismatches between RTL and gate-level sims which even equivalence checking may not find.

Your system verification team will thank you. Or they may curse you if they find problems you could have fixed before you checked in your code.

Protecting intent

The third pillar requires that intent should be protected through the rest of the design lifecycle by continued testing. Harry’s suggestion is to adopt a continuous integration (CI) flow here. We simply reuse the static and formal tests we developed and proved in design. These are largely hands-free and fast tests which should quickly flag checkin mistakes (we all make them).

A final (blogger) thought

This is a worth addition to the canon. We all nod wisely but we still trip up sometimes. With tools like CI we should be able to flush out more of these problems early on.

That said, there are some system-level problems which remain challenging, and which can’t be fixed (I think) at the unit-level. Cache coherence problems, emerging only after billions of cycles are one good example. Power bugs are difficult to cover fully in designs with very complex power and voltage switching. Security problems around speculative execution are another example. It would be great to find some kind of “unit test” methodologies around these system-level “IP”.

You can access the white paper HERE.

Also Read

APR Tool Gets a Speed Boost and Uses Less RAM

DARPA Toolbox Initiative Boosts Design Productivity

Heterogeneous Package Design Challenges for ADAS


On-Chip Sensors Discussed at TSMC OIP

On-Chip Sensors Discussed at TSMC OIP
by Tom Simon on 11-02-2021 at 10:00 am

phase noise correlation

TSMC recently held their Open Innovation Platform (OIP) Ecosystem Forum event where many of their key partners presented on their latest projects and developments. This year one of their top IP provider partners, Analog Bits, gave two presentations. Analog building blocks have always been necessary as enabling technology on leading edge designs. The move to 3nm continues this important relationship. Analog Bits has developed specialized analog IP that can help differentiate end products. For instance, they have focused on optimized high performance and low power SerDes, among other things. Another significant area is specialized on-chip sensors for monitoring chip health and performance.

In his presentation Mahesh Tirupattur, Analog Bits’ EVP, discussed how their sensing IP was used by Cerebras in developing the largest chip ever designed. The Cerebras WSE-2 has 2.6 trillion transistors in 850,000 optimized cores covering 46,225 square mm of silicon.  Cerebras faced challenges in power distribution and power supply integrity. Analog Bits IP offered them a solution to monitor chip operation in real time that can be used to apply real time corrective actions. They used 840 distributed glitch detectors to provide real time coverage of the entire design. The Analog Bits glitch detectors can detect short duration events that could otherwise easily be missed. They are programmable for trigger voltage, depth of glitch and time span of glitch. Their sensitivity exceeds 5pVs.

Recently Analog Bits expanded their sensor offering by adding power supply glitch detectors with an integrated voltage reference to their lineup of integrated POR sensors and on-die PVT sensors. This allows them to cover all aspects of chip operation in real time, including POR conditions – and now the health of the power supplies.

Of course, sensors require extremely high accuracy to correctly report chip behavior. Similarly, clocking macros need accuracy to enable proper chip operation. So, it was good to see that their second presentation at OIP was specifically on the topic of design and verification of these blocks. The paper was titled “Design and Verification of Clocking Macros and Sensors in N5 and N3 Processes Targeting High Performance Compute, Automotive, and IoT Applications”, and authored by Sweta Gupta, Director of Circuit Engineering at Analog Bits and Greg Curtis, Sr. Product Manager of Siemens EDA. The paper itself was the result of a three-way collaboration between TSMC, Siemens and Analog Bits.

In the second presentation Analog Bits shared correlation data on silicon measurements to specification for several of their PLLs, a power supply droop detector and a temperature sensor. Here is one of their slides on the phase noise correlation between measurement and silicon for a PLL built in TSMCs N5.

phase noise correlation

Analog Bits has a broad and well thought out portfolio of analog IP. They have customers on a wide range of processes from 0.25um to 3nm. I am sure that part of their success stems from their no-royalty business model. They have billions of units shipped from over a thousand IP deliveries since they first started in 1995. While the OIP presentations are over, more detailed information on all of their IP for on-chip sensors, SerDes, clocks and I/Os is available by contacting them. Their website offers detailed products listings & data sheets, and access to their N5 test chip video.

Also read:

Package Pin-less PLLs Benefit Overall Chip PPA

Analog Sensing Now Essential for Boosting SOC Performance

Analog Bits is Taking the Virtual Holiday Party up a Notch or Two


Design Technology Co-Optimization for TSMC’s N3HPC Process

Design Technology Co-Optimization for TSMC’s N3HPC Process
by Tom Dillinger on 11-02-2021 at 8:00 am

N3HPC performance comparison

TSMC recently held their 10th annual Open Innovation Platform (OIP) Ecosystem Forum.  An earlier article summarized the highlights of the keynote presentation from L.C. Lu, TSMC Fellow and Vice-President, Design and Technology Platform, entitled “TSMC and Its Ecosystem for Innovation” (link).

One of the topics that L.C. discussed was the initiatives that TSMC pursued for the N3 process node, specifically for the High-Performance Computing (HPC) platform.  This article provides more details about the design-technology co-optimization (DTCO) activities that resulted in performance gains for N3HPC, compared to the baseline N3 process.  These details were provided by Y.K. Cheng, Director, Design Solution Exploration and Technology Benchmarking, in his presentation entitled “N3 HPC Design and Technology Co-Optimization”. 

Background

Design technology co-optimization refers to a cooperative effort among process development engineering and circuit/IP design teams.  The technology team optimizes the device and lithography process “window”, typically using TCAD process simulation tools.  At advanced nodes, the allowed lithographic variability in line widths, spacings, uniformity, and density (and density gradient) is limited – technology optimization seeks to define the nominal fabrication parameters where the highly-dimensional statistical window maintains high yield.  The circuit design team(s) evaluate the performance impacts of different lithographic topologies, extracting and annotating parasitic R and C elements to device-level netlist models.

A key element to DTCO is pursued by the library IP team.  The standard cell “image” defines the allocated (vertical) dimension for nFET/pFET device widths and the number of (horizontal) wiring tracks available for intra-cell connections.  The image also incorporates a local power distribution topology, with global power/ground grid connectivity requirements.

In addition to the library cell image, the increasing current density in the scaled metal wires at advanced nodes implies that DTCO includes process litho and circuit design strategies for contact/via connectivity.  As the design variability in contact/via sizes is extremely limited due to litho/etch uniformity constraints, the process and circuit design teams focus on optimization of multiple, parallel contacts/vias and the associated metal coverage.

And, a critically important aspect of DTCO is the design and fabrication of the SRAM bitcell.  Designers push for aggressive cell area lithography, combined with device sizing flexibility for sufficient read/write noise margins and performance (with a large number of dotted cells on the bitlines).  Process engineers seek to ensure a suitable litho/etch window, and concurrently must focus on statistical tolerances during fabrication to support “high-sigma” robustness.

The fact that TSMC enables customers with foundation IP developed internally provides a tight DTCO development feedback loop.

N3HPC DTCO

Y.K. began his presentation highlighting the N3HPC DTCO results, using the power versus performance curves shown in the figure below.  (The reference design block used for these comparisons is from an Arm A78 core;  the curves span a range of supply voltages, at “typical” device characteristics.)

The collective set of optimizations provide an overall 12% performance boost over the baseline N3 offering.  Note that (for the same supply voltage) the power dissipation increases slightly.

Y.K. went into detail on some of the DTCO results that have been incorporated into N3HPC.  Note that each feature results in a relatively small performance gain – a set of (consistent) optimizations is needed to achieve the overall boost.

  • larger cell height

Wider nFET and pFET devices within a cell provide greater drive strength for the (high-fanout) capacitive loads commonly found in HPC architectures.

  • increase in contacted poly pitch (CPP)

A significant parasitic contribution in FinFET devices is the gate-to-source/drain capacitance (Cgd + Cgs) – increasing the CPP increases the cell area (and wire lengths), but reduces this capacitance.

  • increased flexibility in back-end-of-line (BEOL) metal pitch (wider wires), with corresponding larger vias, as illustrated below
  • high-efficiency metal-insulator-metal (MiM) decoupling capacitor topology

The MiM capacitor cross-section illustrated below depicts three metal “plates” (2 VDD + 1 VSS) for improved areal efficiency over 2-plate implementations.

Improved decoupling (and less parasitic Rin to the capacitor) results in less supply voltage “droop” at the switching activity typically found in HPC applications.

  • double-height cells

When developing the cell image, the library design team is faced with a tradeoff between cell height and circuit complexity.  As mentioned above, a taller cell height allows for more intra-cell wiring tracks to connect complex multi-stage and/or high fan-in logic functions.  (The most demanding cell layout is typically a scannable flip-flop.)  Yet, a larger cell height used universally throughout the library will be inefficient for many gates.

The DTCO activities for N3HPC led TSMC to adopt a dual-height library design approach.  (Although dual-height cells have been selectively employed in earlier technologies, N3HPC adopted more than 400 new cells.)  This necessitated extensive collaboration with EDA tool suppliers, to support image techfile definition, valid cell placement rules, and auto-place-and-route algorithms that would successfully integrate single- and double-height cells within the design block.  (More on EDA tool features added for N3HPC shortly.)

As part of the N3HPC library design, Y.K. also highlighted that the device sizings in multi-stage cells were re-designed for optimized PPA.

  • auto-routing features

Timing-driven routing algorithms have leveraged the reduced R*C/mm characteristics of upper metal layers by “promoting” the layer assignment of critical performance nets.  As mentioned above, the N3HPC DTCO efforts have enabled more potential BEOL metal wire lithography width/spacing patterns.

As shown below, routing algorithms needed enhancements to select “non-default rules” (NDRs) for wire width/spacing.  (The use of NDRs have been available for quite a while – typically, these performance-critical nets were routed first, or often, manually pre-routed. The N3HPC DTCO features required extending NDR usage as a general auto-route capability.)  The figure also depicts how via pillar patterns need to be inserted to support increased signal current.

For lower metal layers where the lithography rules are strict and NDRs are not an option, routing algorithms needed to be enhanced to support parallel track routing (and related via insertion), as shown above.

EDA Support

To leverage many of these N3HPC DTCO features, additional EDA tool support was required.  The figure below lists the key tool enhancements added by the major EDA vendors.

Summary

TSMC has made a commitment to the high-performance computing platform, to provide significant performance enhancements as part of an HPC-specific process offering.  A set of DTCO projects were pursued for N3HPC, providing a cumulative 12% performance gain on a sample Arm core design block.  The optimizations spanned a range of design and process lithography window characteristics, from standard cell library design to BEOL interconnect options to MiM capacitor fabrication.  Corresponding EDA tool features – especially for auto-place-and-route – have been developed in collaboration with major EDA vendors.

For upcoming process node announcements – e.g., N2 – it will be interesting to see what additional DTCO-driven capabilities are pursued for the HPC offering.

-chipguy

Also read: Highlights of the TSMC Open Innovation Platform Ecosystem Forum