webinar banner2025 (1)

Thermo-compression bonding for Large Stacked HBM Die

Thermo-compression bonding for Large Stacked HBM Die
by Tom Dillinger on 07-24-2020 at 8:00 am

HMB stack

Summary

Thermo-compression bonding is used in heterogeneous 3D packaging technology – this attach method was applied to the assembly of large (12-stack and 16-stack) high bandwidth memory (HBM) die, with significant bandwidth and power improvements over traditional microbump attach.

Introduction

The rapid growth of heterogeneous die packaging technology has led to two innovative product developments.

For high-performance applications, system architects have incorporated a stack of memory die in a 2.5D package configuration with a processor chip – see the figures below for a typical implementation, and expanded cross-section.  These high-bandwidth memory (HBM) architectures typically employ four (HBM, 1st gen) or eight (HBM2/2E) DRAM die attached to a “base” memory controller die.  The stack utilizes microbumps between die, with through-silicon vias (TSV’s) for the vertical connections.

A silicon interposer with multiple redistribution metal layers (RDL) and integrated trench decoupling capacitors supports this 2.5D topology, providing both signal connectivity and the power distribution network to the die.

A more recent package innovation provides the capability to attach two heterogeneous die in a 3D configuration, in either face-to-face or face-to-back orientations (with TSV’s).  This capability was enabled by the transition of (dense) thermo-compression bonding for die attach from R&D to production status.

Previous semiwiki articles have reviewed these packaging options in detail.  [1, 2]  Note that the potential for both these technologies to be used together – i.e., 3D heterogeneous die integration (“front-end”) with 2.5D system integration (“back-end”, typically with HBM) – will offer architects with a myriad of tradeoffs, in terms of:  power, performance, yield, cost, area, volume, pin count/density, thermal behavior, and reliability.  A new EDA tools/flows discipline is emerging, to assist product developers with these tradeoffs – pathfinding.  (Look for more semiwiki articles in this area in the future.)

Thermo-compression Bonding for HBM’s

The high-performance applications for which an integrated (general-purpose or application-specific) processor and HBM are growing rapidly, and they need an increasing amount of (local) memory capacity and bandwidth.  To date, a main R&D focus has been to expand the 2.5D substrate area, to accommodate more HBM stacks.  For example, TSMC has recently announced an increase in the maximum substrate area for their 2.5D Chip-on-Wafer-on-Substrate (CoWoS) offering, to enable the extent of the interposer to exceed 1X the maximum lithographic reticle size.  RDL connections are contiguous across multiple interposer-as-wafer exposures.

Rather than continuing to push these lateral dimensions for more HBM stacks, there is a concurrent effort to increase the number of individual memory die in each stack.  Yet, the microbump standoffs with the TSV attach technology introduce additional RLC signal losses up the stack, with a less-than-optimum thermal profile, as well.

At the recent VLSI 2020 Symposium, TSMC presented their data for the application of thermo-compression bonding used in current 3D topologies directly to the assembly of the HBM stack – see the figure below. [3]

A compatibility requirement was to maintain a low-temperature bonding process similar to the microbump attach method.  Replacing the microbumps between die with thermo-compression bonds will result in reduced RLC losses, greater signal bandwidth, and less dissipated energy per bit.  The simulation analysis results from TSMC are shown below, using electrical models for the microbumps, compression bonds, and TSV’s.  Note that TSMC pushed the HBM configuration to 12-die and 16-die memory stacks, well beyond current production (microbump-based) designs.

To demonstrate the manufacturability of a very tall stack with bonding, TSMC presented linear resistance data in (bond + TSV) chains up and down the die – see the figure below.

A unique characteristic of the bonded HBM stack compared to the microbump stack was the reduction in thermal resistance.  The directly-attached dies provide a more efficient thermal path than the die separated by the microbumps.  The TSMC data is shown below, illustrating the improvement in the temperature delta between HBM stack and the top (ambient) environment.

The conclusion of the TSMC presentation offered future roadmap opportunities:

  • Tighter thermo-compression bond pitch (< 10um) is achievable, offering higher die-connections/mm**2.   (Bandwidth = I/O_count * data rate)
  • Additional R&D investment is made to pursue increased thinning of the DRAM die, further reducing the RLC insertion losses, improving the thermal resistance (and allowing more die in the same package volume).  For example, the current ~60um die thickness after back-side grinding and polishing could be pushed to perhaps ~50um.

The figure on the left below highlights the future targets for bond connection density, while the figure on the right shows the additional bandwidth and energy/bit improvements achievable with a more aggressive HBM memory die thickness.

The application of 3D packaging technology thermo-compression bonding to HBM construction will enable greater memory capacity and bandwidth, required by high-performance computing applications.  System architects now have yet another variable to optimize in their pathfinding efforts.

For more information on the 2.5D and 3D heterogeneous packaging technology offerings from TSMC, please follow this link.

-chipguy

References

[1]  https://semiwiki.com/semiconductor-manufacturers/tsmc/285129-tsmcs-advanced-ic-packaging-solutions/

[2]  https://semiwiki.com/semiconductor-manufacturers/tsmc/8150-tsmc-technology-symposium-review-part-ii/

[3]  Tsai, C.H., et al., “Low Temperature SoIC Bonding and Stacking Technology for 12/16-Hi High Bandwidth Memory (HBM)”, VLSI 2020 Symposium, Paper TH1.1.

Images supplied by the VLSI Symposium on Technology & Circuits 2020.

 


Semiconductors up in 2020? Not so fast

Semiconductors up in 2020? Not so fast
by Bill Jewell on 07-24-2020 at 6:00 am

Semiconductor Forecast 2H 2020

Two months ago most forecasts called for a decline in the semiconductor market in 2020 as reported in our Semiconductor Intelligence May 2020 newsletter. The outlook changed in June, as the Worldwide Semiconductor Trade Statistics (WSTS) consensus forecast projected 3.3% growth in 2020. This month IC Insights revised its IC forecast for 2020 to 3.0% growth from its April forecast of a 4% decline. The optimism is largely based on WSTS data which shows the January to May 2020 semiconductor market up 6.4% versus the same period in 2019.

Most major semiconductor companies have not yet reported second quarter 2020 results. Micron Technology reported revenue for its fiscal quarter ended in May up 13.4% from the prior quarter, above the high end of its guidance. Micron expects revenue growth of 6% to 15% in its current quarter. Texas Instruments reported 2Q 2020 revenue down 2.7% from 1Q 2020, however this was above its high-end guidance of a 4.2% decline. TI guidance for 3Q 2020 was 1% to 9% revenue growth. Despite the better than expected results, there is concern in a statement TI made about its industrial customers. TI Investor Relations VP Dave Pahl stated, “We do believe that some customers are trying to maintain strong inventory positions to limit exposure to any supply chain disruptions.”

The cautious optimism for the semiconductor market in 2020 may be premature. The impact of the COVID-19 (coronavirus) pandemic on the world economy in 2020 is almost impossible to predict. The outlook looked promising when hard hit European countries such as Spain, France and the UK saw declines in new cases in April and May. However, in June and July there has been a major surge in new cases in the U.S., Brazil, and India – the world’s third, sixth and second most populous countries, respectively. According to Johns Hopkins University, cumulative worldwide COVID-19 cases were 15.3 million as of July 23, two- and one-half times the cumulative 6.2 million cases at the end of May.

Mobile phones have been the largest market for semiconductors for several years. China electronics production took a major hit in January and February 2020 as China shut down many factories to try to get COVID-19 under control. Mobile phone units were down 41% from a year ago and PCs were down 29%. In the second quarter of 2020, mobile phone units were down 21% from a year earlier while PC units were up 17%. Total China electronics production in local currency (yuan) recovered from a 14% year-to-year decline in January and February to 12% growth in 2Q 2020. Even if China is able to return to year ago production levels by 4Q 2020, mobile phone units will still be down about 15% in 2020 versus 2019. China accounts for the vast majority of smartphone production.

Second quarter 2020 smartphone shipment estimates have not yet been released, but available data indicates a very weak quarter. Canalys estimates China smartphone shipments were down 22% in 2Q 2020 versus a year ago. India shipments were down 48% as many retail outlets were closed due to COVID-19. Counterpoint Research estimates sell-through of smartphones in the U.S. was down 25% from a year ago. These three countries account for about half of worldwide smartphone shipments.

The latest available forecasts for the 2020 smartphone market range from -11.9% from IDC to -16% from Fitch Ratings. If 2Q 2020 smartphone shipments come in weaker than expected (as is likely based on the above data) these forecasts will be revised downward. It is conceivable the smartphone market could decline 20% in 2020.

The news is better for PCs, the second largest market for semiconductors after smartphones. IDC estimates PC unit shipments in 1Q 2020 were down 10% from a year ago but bounced back to 11% growth in 2Q 2020. For the first half of 2020, PC units were up 1.2% from a year ago. This is impressive given that PC units have declined in six of the last eight years. The PC market has been driven by demand from more people working and learning from home over the last several months.

We at Semiconductor Intelligence will update our forecast next month once 2Q 2020 data is available. Our May forecast was a 6% decline in the 2020 semiconductor market. Our August forecast will probably again call for a decline. COVID-19 will have an impact on the worldwide economy through at least the end of the year. Unemployment rates in much of the world are as high or higher than rates in the great recession of 2008-2009. Employed people are likely to be more cautious about spending during the pandemic. Though the PC market is healthy, the smartphone market will see its first major decline in its history. As TI indicated, some electronics companies may be building semiconductor inventories due to concerns about the stability of the supply chain. If end markets remain weak in the second half of 2020, these companies may significantly reduce semiconductor purchases to adjust inventory levels.

In the history of the semiconductor market, it has experienced downturns when the overall economy has been growing (1985 for example), but it has never been immune from a global economic downturn.

Also Read:

Is the Worst Over for Semiconductors?

COVID-19 and Semiconductors

Semiconductor Recovery in 2020?


Intel 7NM Slip Causes Reassessment of Fab Model

Intel 7NM Slip Causes Reassessment of Fab Model
by Robert Maire on 07-23-2020 at 5:00 pm

Intel vs TSMC

Waving white surrender flag as TSMC dominates-
The quarter was a success but the patient is dying-
Packaging now critical as Moore progress stumbles-
Intel reported a great quarter but weak H2 guidance-
But 7NM slip and “fab lite” talk sends shockwaves-

Intel reported a great quarter beating numbers all around with revenues of $19.7B and EPS of $1.23. Revenue was $1.2B better than expected and EPS was $0.13 better than expected. Guidance was for $18.2B and EPS of $1.10 as a widespread slowdown is expected to hit in H2.

Results for the quarter were great but other more significant issues far outweigh and swamp the quarterly results. We won’t waste time regurgitating the quarterly results which are well summarized in Intels slides:

Intel Earnings presentation

7NM products delayed at least 6 months while process is a year behind
Echoes of the 10NM delay disaster. Perhaps the biggest news was that 7NM will be delayed at least another 6 months due to yield issues. This seems to put the overall 7NM delay at roughly a year. It was unclear wether this is “one and done” or if this is the beginning of another series of rolling delays as those that haunted 10NM. Either way the news is not good at all.

Rather than Intel regaining its “Mojo” as some had hoped at 7NM, to suffer another delay and fall further behind TSMC is just horrible, there is no way around it. Its a huge disappointment and heads will likely roll.

While management did suggest that the problem is understood and identified we came away without a firm feeling that it was under control, fixed or on its way to being fixed. Further slippage due to not finding a solution could easily happen as we saw at 10NM.

The 7NM slip is pushing Intel into a “fab lite” model following AMD’s lead-
Would make both Intel and AMD dependent upon TSMC…and more even-
During the earnings call, management made it quite clear that they are looking at alternatives for manufacturing of future nodes. Wether or not to outsource and how much to outsource to TSMC.

It seems from the tone of tonight’s call coupled with the 7NM slip that Intel is on the slippery slope to give more of its manufacturing to TSMC and perhaps TSMC will get to do Intel’s most leading edge manufacturing as Intel falls further behind.

Management couched it as a prudent allocation of resources and dollars but it sure sounds a lot more like waving the white flag of surrender after you’ve lost the race.

It sounds like sacrilege but Intel may be on the road to a “fab lite” model. Most semiconductor investors may not be old enough, but I can still hear the echoes of AMD’s founder, Jerry Sanders and his “real men have fabs” speech.

We can only hope that Intel can get its act together and get 7NM back on track and perhaps even make up some lost time, but we wouldn’t bet our investment dollars on it.

Intel joining Apple and AMD at TSMC’s fab on China’s doorstep…..
Apple obviously saw this coming and investors should have seen this coming with Apple’s recent announcement to give up on Intel. Apple correctly figured out that they could go straight to the source, TSMC, with their own customized design and do much better on their own

Obviously there will be little if any transistor density advantages between AMD and Intel if their advanced chips are built at the same TSMC fab. Differences will come down to design capability, which Intel continues to tout, but we don’t think there is that much there there.

The other ominous omen of Intel’s issues was likely the recent departure (for “personal reasons”) last month of Jim Keller, the famous CPU “Guru” who has had stints at AMD, Apple and Tesla designing their best CPU’s and who had joined Intel in 2018, with many hoping he could revamp things.

TSMC is obviously laughing all the way to the bank as Intel’s business will be huge upside, many times the size of Huawei business lost.

It would mean that in a couple of years, TSMC will be manufacturing every advanced chip on the planet. The demise of US chip making accelerates. We find the news of Intel going “fab lite” as a huge contradiction to the recent talk of the US governments “Chips for America” package of $22.8B in aid for the industry.

Intel’s talk of outsourcing to TSMC is in direct contradiction to Bob Swan’s personal lobbying of the White House and government officials and personal trips to DC to convince officials to have Intel lead a “trusted fab” initiative, while at the exact same time planning on outsourcing more manufacturing to TSMC .

It seems disingenuous to be lobbying to lead a US semiconductor resurgence initiative while at the same time calculating how much of the companies product to outsource to Taiwan.

The government should be highly embarrassed as Intel is the last advanced US semiconductor logic manufacturer after GloFo gave up the race. Micron is not the leader in memory. If the US government had any smarts they would match China’s $100B checkbook as well as push other efforts to keep manufacturing in the US.

TSMC’s “planned” fab in Arizona isn’t even throwing a bone to a dog as the capacity is far too little and far too far behind the leading edge to be of any consequence at all.

Packaging matters
One very interesting point that came out of the call will be Intel’s increasing reliance on advanced 3D packaging to mix and match heterogeneous die in a mixed package to optimize manufacturing and performance. Intel will be able to mix a 14NM die with a 22NM die, throw in a few memory dies into a heterogeneous package and increase Moore’s law without geometry shrinks which are obviously harder for them to do and increasingly delayed.

TSMC is already great at packaging and AMD has also pushed chiplet technology as well so unfortunately its not an advantage but just a “me too” technology for Intel.

The Stocks
Obviously Intel stock will get whacked as it did to the tune of 10% in the after market and perhaps even more so as the repercussions of the delay and outsourcing sink in.

The weak guidance doesn’t help but a weaker H2 is something we have been talking about for quite a while and the market should be expecting that. Perhaps there are still investors who think that the good times will continue into H2. Intel should be a wake up.

Intel guided capex to be $15B which is no surprise and the equipment stocks shouldn’t see much reaction from that but should likely see a negative reaction of the longer term negative out of Intel and increasing buying power of TSMC.

TSMC is looking at lot more like the old Intel with its dominance of capex spend in the industry. It is certainly not a positive for the US semiconductor industry to be so reliant on a tiny island “run away province”, soon to be re-united with mother China by any means necessary. All in all, not positive for the chip industry with perhaps the exception of TSMC and AMD.

Semiconductor Advisors

Semiconductor Advisors on SemiWiki


The Polyglot World of Hardware Design and Verification

The Polyglot World of Hardware Design and Verification
by Daniel Nenni on 07-23-2020 at 10:00 am

SemiWiki article

It has become a cliché to start a blog post with a cliché, for example “Chip designs are forever getting larger and more complex” or “Verification now consumes 60% of a project’s resources.” Therefore, I’ll open this post with another cliché: “Designers need to know only one language, but verification engineers must know many.” This statement has been around since the earliest days of Verilog and VHDL for RTL-based design. Neither of those hardware description languages (HDLs) offered the full set of programming capabilities needed for verification environments. At a minimum, verification engineers would also use C, often linked to the HDL code by the Programming Language Interface (PLI) or something similar. Of course, the claim that designers needed only to know RTL was a stretch. Both design and verification engineers were well versed in the Unix toolkit (shell scripts, awk, sed, etc.) and some of these utilities surely qualified as languages.

It is true that the demands on verification engineers for new languages and formats grew faster than on designers for many years. Perl, Python, and Tcl appeared frequently in verification environments. Verification engineers added object-oriented programming (OOP) with e, C++, and SystemC , and the rise of formal verification introduced a wide variety of exotic assertion and property languages. The landscape changed again with the introduction of SystemVerilog. Although most designers paid little attention to its OOP features, they had to master other aspects of a language much more complex than Verilog or VHDL. Some became proficient in SystemVerilog Assertions (SVA) since their design knowledge enabled white-box properties and more thorough formal verification. These trends have continued. Many newer standards such as SystemRDL, Unified Power Format (UPF), and Portable Stimulus Standard (PSS) require input from both design and verification engineers. Today, everyone developing chips must be comfortable working with multiple languages.

What does this all mean for EDA tools that deal with languages? To get some answers, I checked in with Cristian Amitroaie, CEO of AMIQ EDA. They provide Design and Verification Tools (DVT) Eclipse Integrated Development Environment (IDE) and Verissimo SystemVerilog Testbench Linter, both of which are all about compiling and understand complex languages to make coding more efficient and more accurate. Cristian confirmed my observation that the number of languages and file formats in use on chip projects has grown a lot in recent years, and that both design and verification engineers are affected. He said that AMIQ is seeing a leap in designer usage of their tools, perhaps driven in part by the new requirements as well as increased design size and complexity. Over the years, they have added support for many new languages in DVT Eclipse IDE. Cristian walked me through some of the tool’s capabilities, and I jotted down a few notes.

For designers, the combination of easy RTL entry and efficient tracing through the design is the most important role of an IDE. For Verilog, VHDL, and the synthesizable subset of SystemVerilog, the IDE is a much better vehicle for design capture and exploration than traditional text editors. Every time that a new file is opened or new code is entered, the IDE compiles it, checks for correctness, and updates its internal model. Errors are detected on the fly, with suggestions for fixes plus templates for new code to be added. Users can follow signals throughout the design without having to search multiple files.

The same capabilities are available to verification engineers, especially when Verissimo is added for deep and sophisticated testbench checks. The IDE understands the OOP features of SystemVerilog and e, including the key concept of extending classes. Since most testbenches today are compliant with the Universal Verification Methodology (UVM), the IDE has built-in knowledge of the UVM library. AMIQ has added support for UPF power intent files and PSS models as well. The C/C++ Development Tooling (CDT) Project provides a fully functional C and C++ Eclipse-based IDE. DVT Eclipse IDE includes this environment and provides links to connect C/C++ with the other languages supported. As a writer and editor, I couldn’t help noticing that AMIQ even provides an English spell-checker to detect typographical errors in comments and text strings. Offhand, I can’t think of anything they’ve missed.

Cristian mentioned three additional points that struck me as important. First, he noted that AMIQ provides a consistent level of capabilities for all design and verification languages. This makes it easy to move from one language to another with a familiar graphical interface and available commands. This is one aspect of a seamless experience in a multi-language environment, but his second point was that users must be able to navigate across language boundaries in a transparent way. This is possible because DVT Eclipse IDE’s unified internal model includes all aspects of the verification environment and the design, spanning all languages. Users can click hyperlinks from one language to another, jumping among source code editors, schematics, hierarchy browsers, and Unified Modeling Language (UML) diagrams.

Cristian’s final point concerned their choice of the “DVT Eclipse IDE” name. Their solution has never been just about Verilog, VHDL, RTL, or verification languages. “Design and Verification Tools” was chosen because they built an IDE for all engineers involved in chip development, no matter their specific roles or languages used. The result is a unified, efficient, and easy to use cockpit that blurs language boundaries.

I’d like to thank Cristian for his thoughts on our multi-language chip development world and for his team’s efforts to make it possible to be a true polyglot engineer. I’ll bet that there will even more languages coming along in the future, and I expect that AMIQ EDA will keep up with them.

To learn more, visit https://www.dvteclipse.com.

Also Read

An Important Step in Tackling the Debug Monster

Debugging Hardware Designs Using Software Capabilities

Automatic Documentation Generation for RTL Design and Verification


Alchip Delivers Cutting Edge Design Support for Supercomputer Processor

Alchip Delivers Cutting Edge Design Support for Supercomputer Processor
by Mike Gianfagna on 07-23-2020 at 6:00 am

MN 3 Supercomputer

Alchip issued a press announcement recently entitled Alchip Provides Supercomputer Processor Design Support. The release is literally a tour de force of technology, with many advanced design and packaging accomplishments. First, let’s examine the basics of the design.

Preferred Networks, Inc (PFN) is the customer. They are a Japanese leader in deep learning technology, and the product is the company’s ground-breaking MN-3 supercomputer. This system is powered by the MN-Core™ chip from PFN. This chip is a deep learning processor jointly developed by PFN and Kobe University. The device is optimized for matrix operations, often found in deep learning applications. The chip achieves world-class energy efficiency of 1 TFLOPS (half-precision) per watt. The MN-3 supercomputers has four MN-Core processors and began operating in May, 2020.

This is also an award-winning design. The MN-3 supercomputer recently won first place in the June 2020 Green500 ranking of the world’s most energy-efficient supercomputers. Green500 rankings are published semi-annually by the non-profit Top500 rating service. You can find out more about Green500 here. To win first place in the Green500 rankings, PFN operated 40 nodes (a total of 160 MN-Cores) and reached 21.11GFLOPS/W.

Let’s examine the Alchip contributions to this project. The MN-Core is a complex, full reticle sized SoC, which brings with it many design and manufacturing challenges. Notable among these are heat dissipation for this 500-watt part. Mechanical package samples were developed early in the project to address reliability concerns associated with mounting such a large chip.

The computing ASIC, built on TSMC’s process technology hosts 512 processor core engines on each die. Alchip used their own clock tree methodology to meet the power specs. The package design is also quite an engineering accomplishment. The four-in-one 6,400~-ball stacked package contains an organic substrate measuring 85x85mm².

Alchip’s proprietary die-to-die interface technology was used in the design. This high-speed interface addresses bandwidth requirements with a small area/power ratio. Alchip also addressed chip and package integration and verification. This is quite a complex project.

Yusuke Doi, Preferred Networks’ Vice President of Computing Infrastructure commented, “Alchip’s leading-edge design technology contributed greatly to the development of PFN’s MN-3 supercomputer and its deep learning performance, especially through their high-density, low-voltage implementation of the matrix computation units. We will continue our research and development in deep learning using MN-Core which Alchip supported us to develop.”

Makoto Onodera, President of TSMC Japan also commented, “As PFN’s foundry partner, TSMC is deeply honored to be part of this joint R&D project with PFN and Alchip, and support PFN’s hardware strategy to promote the practical application of Japan’s R&D in deep learning technology and related technologies. We look forward to seeing outstanding achievements from the further acceleration of PFN’s research in deep learning and related technologies.”

PFN’s supercomputer work dates back to September 2017 with the MN-1, a GPU computer cluster that NTT Communications operates exclusively for PFN. MN-2 was the first GPU cluster built and managed solely by PFN. It started operating in July 2019.

Alchip Technologies, Limited was founded in February 2003 and the company went public on the Taiwan Stock Exchange in 2014.

Also Read:

CEO Interview: Johnny Shen of Alchip

Alchip Reveals How to Extend Moore’s Law at TSMC OIP Ecosystem Forum

Alchip is Painting a Bright Future for the ASIC Market


Die shrink: How Intel scaled down the 8086 processor

Die shrink: How Intel scaled down the 8086 processor
by Ken Shirriff on 07-22-2020 at 2:00 pm

Intel 8086 Comparison

The revolutionary Intel 8086 microprocessor was introduced 42 years ago this month so I’ve been studying its die.1 I came across two 8086 dies with different sizes, which reveal details of how a die shrink works. The concept of a die shrink is that as technology improved, a manufacturer could shrink the silicon die, reducing costs and improving performance. But there’s more to it than simply scaling down the whole die.

Although the internal circuitry can be directly scaled down,2 external-facing features can’t shrink as easily. For instance, the bonding pads need a minimum size so wires can be attached, and the power-distribution traces must be large enough for the current. The result is that Intel scaled the interior of the 8086 without change, but the circuitry and pads around the edge of the chip were redesigned.

The photo below shows an 8086 chip from 1979, and a version with a visibly smaller die from 1986.3 (The ceramic lids have been removed to show the silicon dies inside.) In the updated 8086, the internal circuitry was scaled to about 64% of the original size by length, so it took 40% of the original area. The die as a whole wasn’t reduced as much; it was about 54% of the original area. (The chip’s package was unchanged, the 40-pin DIP package commonly used for microprocessors of that era.)

Comparison of two 8086 chips. The newer chip on the bottom has a significantly smaller die. The rectangle in the upper-right of each die is the microcode rom.

The 8086 is one of the most influential chips ever created; it started the x86 architecture that still dominates desktop and server computing today. Unlike modern CMOS processors, the 8086 was built from NMOS transistors, as were the 6502, Z-80, and other early processors.4 The first chip was built with HMOS,5, Intel’s name for this process. Intel introduced improved HMOS-II in 1979 and in 1982, Intel moved to HMOS-III, the process used for the newer 8086 chip.6 Each newer HMOS version shrunk the size of features on the chip and improved performance.

Two versions of the 8086 die, at the same scale. The bond wires are connected to pads around the edge of the die.

The photo above shows the two 8086 dies at the same scale. The two chips have identical layout in the interior,7 although they may look different at first. The chip on the right has many dark lines in the middle that don’t appear on the left, but this is an artifact. These lines are the polysilicon layer, underneath the metal; the die on the left has the same wiring, but it is very faint. I think the newer chip has a thinner metal layer, making the polysilicon more visible.

The magnified photo below shows the same circuitry on the two dies. There is an exact correspondence between components in the two images, showing the circuitry was reduced in size, not redesigned. (These photos show the metal layer on top of the chip; some polysilicon is visible in the right photo.)

The same region of the two dies at the same scale.

However, there are significant differences around the edges of the dies. The bond pads around the outside are closer together, especially in the bottom right. There are two reasons for this. First, the bond pads can’t shrink very much, since they need to be large enough to attach bond wires.

Second, the power distribution traces around the edges are wider in order to support the necessary current. (Look to the right of the microcode ROM in the lower right, for instance.) Part of this is because the power traces in the middle of the circuitry were scaled down with the rest of the circuitry, so they are smaller; the outside traces need to pick up the slack. In addition, the thinner metal layer in the newer chip can’t support as much current without being widened.

A bond pad and associated transistors, comparing the old chip (left) and new chip (right). In the copyright date, the top of the “6” is strangely flat; it looks like they changed a “1985” to “1986”.

The photo above shows a bonding pad with an attached bond wire. The drive transistors are above the pad. The newer chip has almost the same size pad, but the power drive transistors have both shrunk and been redesigned. Note the much thicker metal power wiring on the newer chip. The Intel logo was moved from the bottom right to the bottom left, probably because that’s where there was room.

A closer look at the dies
First, a bit of background on the NMOS construction used in the 8086 and other chips of that era. These chips consist of a silicon substrate, which is doped (diffusion) with arsenic or boron to form transistors. On top, a layer of polysilicon creates the gates of the transistors as well as providing wiring between components. Finally, a single metal layer on top wires up the components.A semiconductor process (such as HMOS-III) has specific rules on the minimum size and spacing for features on the silicon, polysilicon, and metal layers. By looking closely at the chips, we can see how the features correspond to the design rules for HMOS I and HMOS III. The table below (from HMOS III Technology) summarizes the characteristics of the different HMOS processes. The features get smaller and the performance gets better with each version. (Intel got a 40% overall performance improvement going from HMOS-II to HMOS-III.)

HMOS I HMOS II HMOS III
Diffusion Pitch (µ) 8.0 6.4 5.0
Poly Pitch (µ) 7.0 5.6 4.0
Metal Pitch (µ) 11.0 8.0 6.4
Gate Oxide Thickness (Å) 700 400 250
Channel Length (µ) 3.0 2.0 1.5
Idsat (mA) 8.0 14.0 27.0
Minimum Gate-Delay (ps) 1000 400 200
Speed-Power Product (pJ) 1.0 0.5 0.25
Linear Shrink Factor 1.0 0.8 0.64

The microscope photo below shows a complex arrangement of transistors in the older 8086 chip. The dark regions are doped silicon, while the white rectangles are the transistor gates. (There are about 21 transistors in this photo.) A key measurement is the channel length, the length of the gate between the source and drain. (This is the narrower dimension of the white rectangles.) I measured 3 μm for these transistors, which nicely matches the published value for HMOS I.8 This indicates the chip was manufactured with a 3 μm process; in comparison, processors are now moving to a 5 nm process, 600 times smaller.

Transistors in the older 8086 chip. The metal and polysilicon were removed for this photo. Circles are vias that connect to the metal layer.

The photo below shows transistors in newer 8086 at the same scale; the transistors are much smaller. The linear dimensions are scaled by 64%, so the transistors have 40% of their original area. Because I processed this die differently, the polysilicon remained on the die, the yellowish lines. The doped silicon appears pinkish, much less visible than before. I measure the gate length as 1.9 μm, which is 64% of the previous 3 μm. Note that HMOS-III supports a considerably smaller 1.5 μm channel length, but since everything shrinks by the same 64% factor, the channel length is larger than necessary. This illustrates that uniformly shrinking the die wastes some of the potential gain from the new process, but it is much easier than completely redesigning the chip.

Transistors in the later 8086 chip. There are many vias between the silicon or polysilicon and the metal (which has been removed).

I also looked at the spacing (pitch) of lines in the metal layer. The photo below shows some horizontal and vertical metal wiring in the older chip. I measured 11μm pitch for the metal lines, which matches the published HMOS I figure. The shrink to 64% yields 7 μm pitch on the new chip, even though HMOS III supported 6.4 μm. As before, the constant shrink factor doesn’t take full advantage of the new process.

The metal layer of the older 8086 chip. Reddish polysilicon wiring is visible underneath the metal.

Finally, I looked at the pitch of the polysilicon wiring. The photo below shows the older 8086; the polysilicon has been removed leaving faint white traces. These parallel polysilicon lines probably formed a bus, routing signals from one part of the chip to another. I measured 7 μm pitch for the polysilicon lines, matching the published HMOS figure. (Interestingly, polysilicon wiring can be denser than metal wiring under HMOS rules.) The newer chip has 4.5 μm polysilicon pitch, compared to possible 4.0 μm.

Polysilicon traces on the older 8086 chip.
Conclusions

A die shrink provides a way to improve the performance of a processor and reduce its cost without the effort of a complete redesign. Comparing the two chips, however, shows that a die shrink is more complex than uniformly shrinking the whole die. While most of the circuitry is a straightforward shrink, the bond pads didn’t shrink to the same degree, so they needed to be moved around. The power distribution was also modified, adding more power wiring around the outer part of the chip.

Modern microprocessors still use die shrinks. In 2007, Intel moved to a tick-tock model, where they would alternate shrinks of an existing chip (the “tick”) with the production of a new microarchitecture (the “tock”).

I plan to analyze the 8086 in more detail in future blog posts so follow me on Twitter at @kenshirriff for updates. I also have an RSS feed.

Notes and references

  1. The 8086 was released on June 8, 1978. 
  2. It’s actually quite remarkable that MOSFET circuits still work after being scaled down over a large range, since most things don’t scale as easily. For instance, you can’t scale down an engine by a factor of 10 and expect it to work. Most physical things suffer from the square-cube law: the area scales with the square of the ratio, while the volume scales with the cube of the ratio. For MOS circuits, however, most things either stay the same with scaling, or get better (such as frequency and power consumption). For more details on scaling, see Mead and Conway’s Introduction to VLSI Systems Ch 1 sect 2. Interestingly, that 1978 book says that scaling had a fundamental limit of 1/4 micron (250 nm) channel length due to physical effects. That limit was wildly wrong; transistors are now moving to 5 nm, through technologies such as FinFETs. 
  3. The older chip says ©’78, ©’79 on the package and ©1979 on the die and has a 7947 (47th week of 1979) date code on the underside. The newer chip says ©1978 on the package but ©1986 on the die and has no identifiable date code, so I figure it is from 1986 or slightly later. It’s unclear why the newer chip has an older copyright date on the external package. 
  4. A brief description of the technologies in early processors. N-channel MOSFETs are a particular type of MOSFET transistor. They have considerably better performance than the P-channel MOSFETs used in the earliest microprocessors, such as the Intel 4004. (Modern processors use N-channel and P-channel transistors together for lower power consumption; this is CMOS.) Gates built from N-channel MOSFETs require a pull-up resistor, which is implemented by a transistor. Depletion load transistors are a type of transistor introduced in the mid-1970s that perform better as pull-up resistors and don’t require an extra power supply voltage. Finally, MOS transistors originally used metal for the gate (the M in MOS). But in the late 1960s, Fairchild developed the use of polysilicon for the gate instead of metal. This provided much better performance and was easier to manufacture. The point of all this is that between the late 1960s and mid-1970s, several radical changes were introduced in MOS integrated circuit production, and these led to the success of the 6502, Z-80, 8085, 8086, and other early processors. In the 1980s, CMOS processors took over due to their lower power consumption and better performance. 
  5. Strangely, it’s unclear what the “H” stands for in HMOS. I couldn’t find anywhere that Intel expands the acronym; databooks refer to “Intel’s advanced N-channel silicon gate HMOS process” or say “HMOS is a high-performance n-channel MOS process”. Intel later defined CHMOS as Complementary High Speed Metal Oxide Semiconductor) (example). Motorola defined HMOS as High-density MOS (example) while other sources defined it as High-speed MOS or High-density, short channel MOS. Intel has a patent on “High density/high speed MOS process and device”, so perhaps the “H” stands for both “high density” and “high speed”. 
  6. Interestingly, Intel used a 4K static RAM chip to develop each of their HMOS processes, before using the process for their microprocessors and other chips. They probably developed with the RAM chip because it has dense circuitry, but is relatively easy to design because it repeats the same memory cell over and over. Once they had all the design rules figured out, then they could create the much more complex processor. 
  7. I scaled complete, high-resolution images of the two chips to compare and the main part of the chips is an exact match except for some trivial changes. I found a couple of places where a via was slightly moved, which is puzzling because I see no logical reason for that. The circuit was unchanged, so it’s not a bug fix. One question is if there were any microcode changes. The microcode looks identical, but I didn’t do a bit-by-bit comparison. 
  8. You may have noticed that three transistors in the photo have much larger gates. These are transistors that are acting as pull-up resistors, as is typical for NMOS circuits. The larger size makes the transistors weaker, so they provide a weak pull-up current. 

How About a Faster Fast SPICE? Much Faster!

How About a Faster Fast SPICE? Much Faster!
by Tom Simon on 07-22-2020 at 10:35 am

Analog FastSPICE eXTreme

When Analog FastSPICE was first introduced in 2006 it changed the landscape for high performance SPICE simulation. During the last 14 years it has been used widely to verify advanced nanometer designs. Of course, since then the most advanced designs have progressed significantly, making verification even more difficult. Just before DAC I had a conversation with Greg Curtis, Senior Product Manager at Mentor, a Siemens business, about these changes and Mentor’s newest improvements for Analog FastSPICE.

Greg pointed out three main drivers making verification more difficult. He said that the first is increasing interconnect resistance, which is jumping by a factor of 3 at nodes below 16nm. The second is increasing parasitic complexity. With more interconnect coupling and larger interconnect networks, the size of the RC networks is mushrooming. The final driver is new device models with large numbers of model equations – sometimes up to ~600.

While probably every tool during the last 14 years has benefited from improvements in performance, these changes hopefully have been enough to keep up. What designers are really looking for are manifold increases in performance – the kind that are game changing. This is especially true in the world of SPICE simulators, where the number of runs required has increased due to variation at the same time that complexity has increased.

Of course, design teams are used to EDA vendors rolling out new and improved versions of their software, but usually with a hefty price tag for the upgrade. Well, Greg was happy to inform me that on Monday July 20th Mentor is announcing Analog FastSPICE eXTreme with significant performance improvements at no additional cost for existing users. Availability is scheduled for October 2020. OK, but what do you get in this new version?

According to Greg, the focus of the improvements was on post layout simulation. It has become a reality that pre layout simulation is no longer useful, and the first real simulation is with parasitics. Mentor developed a new adaptive core SPICE matrix solver. They also enabled new RC reduction algorithms to help users meet performance and accuracy targets. The RC reduction operates with user-defined accuracy. As before Analog FastSPICE eXTreme will operate in the existing AFS Platform.

Greg had information showing the effectiveness of the speed up in Analog FastSPICE eXTreme. In the case of SRAM timing simulations times there was almost a 4X speed up. For a transceiver they boast a 15X speed up. A quick look at their data shows around an average speed up on various circuit types of 7X. These are impressive numbers.

Mentor was able to cite work with two early adopters on the results achieved with Analog FastSPICE eXTreme. Analog Bits, a leading provider of mixed signal IPs, such as PLLs, Sensors and IOs in processes down to 3nm, spoke of a 6X improvement in simulation times. Silicon Creations, developers of leading edge PLL and SerDes IP, also made comments supporting Analog FastSPICE eXTreme’s improved performance. They saw up to 10X improvement in simulation speed.

It is generally acknowledged that a 10X improvement in any design tool is meaningful in terms of getting users to switch tools. Mentor has taken their well-respected tool and given it that kind of performance improvement. Existing users should be delighted. Users of other tools are going to need to kick the tires simply out of necessity. The full announcement is available for review on the Mentor website. If you are doing SPICE on advanced process nodes, it will be well worth your while.


Accelerating High-Performance Computing SoC Designs with Synopsys IP

Accelerating High-Performance Computing SoC Designs with Synopsys IP
by Daniel Nenni on 07-22-2020 at 6:00 am

Synopsys DesignWare IP

Semiconductor IP is one of the most talked about topics on SemiWiki. Always has been, always will be. Synopsys is also one of the most talked about topics on SemiWiki and IP is a very big part of that, absolutely.

After reading Eric Esteve’s latest IP Report I Googled around and found some interesting things. First, I found a Brief History of Synopsys DesignWare IP  blog I did back in 2013. I also found the “Executive and Expert Access: Accelerating High-Performance Computing SoC Designs with Synopsys IP” webinar series and the DesignWare IP University (which I will be spending more time on in the coming days).

Executive and Expert Access: Accelerating High-Performance Computing SoC Designs with Synopsys IP, A Webinar Series

Don’t miss this opportunity to hear from Synopsys’ IP senior executives and product experts on how to accelerate your high-performance computing SoC designs. Find out about the latest market trends that will help you make important design decisions. Learn how specific features of Synopsys’ IP enables you to achieve the required functionality for your chip and deliver competitive products to market faster:

  • The New Frontier of Die-to-Die Connectivity: What You Need to Know for Silicon Success
  • Navigating Between DDR5, LPDDR5, and HBM2/2E IP to Meet Your Design Goals
  • Keys to Achieving Maximum Throughput and Lowest Latency for PCI Express 5.0 and CXL Designs

I just finished this on-demand webinar series. It’s easy to register and you get immediate access. The format is a 10 minute executive introduction, a 40 minute technical presentation, and a 10 minute Q&A, this is excellent content!

John Koeter is the Synopsys IP executive in the webinar series. John has been at Synopsys for more than 20 years and is one of the foremost semiconductor IP experts.

The Product Expert Speakers are:

Manmeet Walia brings over 18 years of experience in product management and system engineering covering ASSP, ASIC, and IP products for broad range of applications. Manmeet holds a Master of Science degree in Electrical Engineering from University of Toledo, and an MBA from San Diego State University.

Graham Allan brings over 25 years of experience in the memory industry. Graham has spoken at numerous industry conferences and is a significant contributor to the SDRAM, DDR and DDR2 JEDEC memory standards. He currently holds 25 issued patents in the area of memory design.

Gary Ruggles brings over 25 years of experience in electronics and integrated circuit design. Gary began his career as Assistant Professor of Electrical and Computer Engineering at North Carolina State University, where he taught courses in Solid State Physics and VLSI Processing.

This webinar series is definitely worth your time.

The DesignWare IP University is organized into 7 topics:

  • Interface IP
  • Processor IP
  • Foundation IP
  • Security IP
  • Artificial Intelligence
  • Automotive
  • Cloud Computing

Most of of these are trending topics on SemiWiki. Under each topic there are on-demand webinars, videos, event presentations, and white papers. This is an excellent resource that should be shared.

DesignWare IP University, eLearning on your schedule

Learn about the latest interface protocols and standards, processor implementation techniques, and market trends in these educational white papers, webinars, and videos. Whether your chip design includes artificial intelligence capabilities, targets next-generation cars, or enables massive data in the cloud, the DesignWare IP University resources will help you create the SoC your market needs.

About DesignWare
Synopsys is a leading provider of high-quality, silicon-proven IP solutions for SoC designs. The broad DesignWare IP portfolio includes logic libraries, embedded memories, embedded test, analog IP, wired and wireless interface IP, security IP, embedded processors and subsystems. To accelerate IP integration, software development, and silicon bring-up, Synopsys’ IP Accelerated initiative provides architecture design expertise, pre-verified and customizable IP subsystems, hardening, signal/power integrity analysis, and IP prototyping kits. Synopsys’ extensive investment in IP quality, comprehensive technical support and robust IP development methodology enables designers to reduce integration risk and accelerate time-to-market.

Download DesignWare IP Overview

Also Read:

Quantifying the Benefits of AI in Edge Computing

Synopsys Introduces Industry’s First Complete USB4 IP Solution

Synopsys – Turbocharging the TCAM Portfolio with eSilicon


PLDA – Delivering Quality IP with a Solid Verification Process and an Extensive Ecosystem

PLDA – Delivering Quality IP with a Solid Verification Process and an Extensive Ecosystem
by Mike Gianfagna on 07-21-2020 at 10:00 am

Screen Shot 2020 07 13 at 7.04.46 PM

For those who design advanced and complex SoCs, the term “off-the-shelf IP” can be elusive. While this approach works for a wide range of IP titles, the pressure for maximum performance or minimum power can lead to custom-tailoring requirements for the IP.

PLDA has seen these requirements for the class of complex, high-performance IP the company is known for, such as PCIe 5.0 or CXL. Often, PLDA customers will require very specific features and configurations which trigger an IP modification cycle. In spite of this, each customer is expecting the delivered product to be proven, robust and reliable, as though it had been used in many prior tapeouts. This is a daunting requirement, but it’s the price of admission into the high-end IP market.

PLDA has developed a thoughtful and rigorous approach to this challenge. They’ve even developed an ecosystem to support their efforts – more on that later. I had the opportunity to get an overview of the work going on in IP verification from Romain Tourneau, Marketing Manager at PLDA.

Romain started with an overview of the spec requirements from the customer that need to be understood and managed. These include:

  • Functional requirements: behavior rules (this event is causing this consequence)
  • Parametric requirements: performance, gate count, power consumption
  • Structural/physical requirements: must be synthesizable, prone to metastability (CDC)

Aiming for a quality deliverable means requirements qualification to identify the “golden”, or most important items. Maximizing ways to verify these golden requirements then becomes the focus. This activity follows an implementation, debug and improvement process. There are many approaches to manage this process, including:

  • The standard approach (single process start to finish)
  • The incremental approach (following design changes incrementally)
  • The “Sprint” approach (the project is split into small, incremental releases)
  • The “Super-sprint” approach (same as Sprint, but accelerated timing)

PLDA uses the Super-sprint approach, which is summarized below:

While there are still many manual and time-consuming tasks to perform, the iterative nature of the Super-sprint method allows for efficient collaboration with the customer. It requires a deep verification discussion with the customer at project kickoff in order to fulfill these objectives:

  • Understanding the customer’s process and tools
  • Know the customer’s verification plan and IP usage and associated cost
  • Explain PLDA process and tools
  • Identify gaps in design criteria and address possible solutions
  • Explain the advantages and importance of the customer starting IP verification early in their project since PLDA can provide a ready-to-use verification environment to perform this stand-alone verification earlier

To further solidify their commitment to robust verification, PLDA recently announced a Robust Verification Toolset, Increasing Design Accuracy and Reducing Time-to-Production for Next Generation SoCs with CXL®, PCIe® 6.0 or Gen-Z® Interconnect. The announcement details a comprehensive verification strategy that includes components from PLDA’s own verification process as well as tools from Aldec, Avery Design Systems and Mentor, a Siemens Business.

You typically don’t often see such a proactive and broad approach to IP verification – this is noteworthy. The release states:

“The verification process for IP design takes place at the front end of chip design and requires a high level of reliability to prevent production delays. Achieving the necessary levels of verification can be time-consuming, however cutting corners in verification often results in costly and difficult bug fixes at the end of chip fabrication. It is much more efficient to ensure a robust and high-quality initial verification process.” This is a very informed and evolved point of view in my opinion.

Dubbed the Robust Verification Toolset, it includes:

  • Verification IPs covering standards compliance for PCIe, AMBA AXI, CXL, CCIX and Gen-Z
  • Simulators that support mixed-language designs with UVM testbenches
  • Synthesis and static verification tools from classic EDA providers, delivering verification of quality of RTL design and of CDC

To manage the data generated by the Robust Verification Toolset during both the verification and validation processes, PLDA has developed an interface named DANA.

This proprietary PLDA tool is used to deliver highly efficient supply chain management through a collection of automatic reports, flow automation and strict follow-up processes. Data from the complete toolset is automatically collected, analyzed, and reported.  This reduces review cycles caused by data management and accelerates the decision-making process. A great gain of time for both projects leaders and verification engineers.

To learn more about PLDA’s verification solutions:

 


Quantifying the Benefits of AI in Edge Computing

Quantifying the Benefits of AI in Edge Computing
by Bernard Murphy on 07-21-2020 at 6:00 am

Architectures for Edge computing

Many of us are now somewhat fluent in IoT-speak, though at times I have to wonder if I’m really up on the latest terminology. Between edge and extreme edge, fog and cloud, not to mention emerging hierarchies in radio access networks – how this all plays out is going to be an interesting game to watch. Ron Lowman, DesignWare IP Product Marketing Manager at Synopsys, recently released a technical bulletin which provides some quantified insight in the motivation of moving compute and AI closer to the edge, and how these changes affect IP selection and system architectures.

Hierarchy in radio access networks

The basics are well-understood by now, I think. Shipping zettabytes of data from billions or trillions of edge devices to the cloud was never going to happen – too expensive in power and bandwidth. So we start moving more of the compute closer to the edge. Handling more of the data locally, requiring only short hops. Ron cites one Rutgers/Inria study using a Microsoft HoloLens in an augmented reality (AR) application. This was tasked to do QR code recognition, scene segmentation and location and mapping. In each case the HoloLens first connects to an edge server. For one experiment, AI functions are shipped off to a cloud server. In a second experiment, these are performed on the edge server. Total roundtrip latency in the first case was 80-100ms or more. In the second case, only 2-10ms.

Not surprising, but the implications are important. The cloud latency is easily long enough to induce motion sickness in that AR user. In other applications it could be a problem for safety. The edge-compute round-trip latency is much less of a problem. Ron goes on to add that 5G offers use-cases which could drop latency under 1ms. Making the case for edge-based compute no contest. Going to the cloud is fine for latency-insensitive applications (as long as you don’t mind the cost overhead in all that transmission. And privacy concerns. But I digress.) For any real-time application, compute and AI has to sit close to the application.

Architectures from the cloud to the edge

Ron goes on to talk about three different architectures for edge computing, in a way I found novel. He sees edge as anything other than the cloud, drawing on use models and architectures from a number of sources. At the top end are regional data centers, somewhat more locally you have on premise servers (maybe in a factory or on a farm) and more locally still you have aggregators/gateways. Each with their own performance and power profiles.

Regional data centers are scaled-down clouds, with same capabilities but lower capacity and power demand. For on-premise servers, he cites an example of Chick-Fil-A who have these in the fast food outlets, to gather and process data for optimizing local kitchen operations.

The aggregators/gateways he sees performing quite limited functions. I get the higher-level steps in this architecture; however, I’ve seen this hierarchy go further, right down into the edge device, even battery-operated devices. In a voice-activated TV remote for example. I know of remotes in which voice activation and trigger word recognition happens inside the remote. Ron’s view still looks pretty good, maybe just need to add one more level. And possibly consider the gateway may do a bit more heavy lifting (command recognition for example).

He wraps up with a discussion on impact on SoC architectures and the IP that goes into server SoCs and AI accelerators. I agree with his point that the x86 vector neural network extensions probably aren’t going to make much of a dent. After all, Intel developed Nervana (and now Habana) for a reason. More generally, AI accelerator architectures are exploding. Very much in support of vertical applications, from the extreme edge to 5G infrastructure to the cloud. AI is finding its place throughout this regime, in every form of edge and non-edge compute.

You can read the technical bulletin HERE.

Also Read:

Synopsys Introduces Industry’s First Complete USB4 IP Solution

Synopsys – Turbocharging the TCAM Portfolio with eSilicon

Synopsys is Changing the Game with Next Generation 64-Bit Embedded Processor IP