RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Beyond CMOS: Three Industry Teams Aim at Next Generation of High-performance Computing

Beyond CMOS: Three Industry Teams Aim at Next Generation of High-performance Computing
by admin on 04-12-2015 at 10:00 pm

Given the current limitations with CMOS designs, such as low temperature thresholds and efficiency in power consumption, there is a vast need to expand into superconducting computers in order to manage consumers’ need for power and performance. Although supercomputers require extremely low temperatures, they are capable of considerably higher Floating Point Operations per Second (FLOPS) with similar power requirements. The current goal within superconductive computing is to reach an exaFLOPS, however petaFLOPS are the best presently attainable. In order to keep the superconductive metal oxides running at optimal temperatures, cryogenics were introduced.

IARPA has contracted IBM, Raytheon-BBN and Northrop Grumman to develop a small, yet scalable, superconducting computer in the cryogenic computing complexity (C3) program in order to expand the current capacities of high performance computing focusing on cryogenic memory and logic, communications, and systems (Keller, 2014).

Operating electronics at cryogenic temperature improves their performance, as well as lowering noise, allowing them to run at higher speeds, and increases efficiency (Kirschman, 2009). In superconducting computers, the use of super-cooled copper wire has the ability to allow current flow almost indefinitely. Therefore, the switching voltage is significantly reduced (Pop, 2014). The threshold voltage is the minimum gate source voltage that is needed to create a conducting path between the source and drain terminals. Superconducting computers operating at low temperatures require significantly lower threshold voltage. If the design begins to heat up, the supply voltage will not reach threshold voltage which results in sub-threshold leakage. In a device, when high temperatures are reached, leakage in p–n junctions become excessive which may render the device useless (Krischman, 2003).

Reducing the threshold voltage also reduces the heat of resistivity and capacitance which allows for smaller designs without the risk of leakage. Thus, engineers introduced the idea of cryogenic memory. In the beginning stage of the C3 program, the three teams will develop components for the memory and subsystems (Keller, 2014). Cryogenic memory is still in its infancy; however, this program is designed to flesh out its varying possibilities for high performance computing. While not much is currently known, cryogenic memory allows for memory and caches capable of supporting the processing power by the CPU of a superconducting computer (Anthony, 2014). With the use of recent ideas of energy efficient cryogenic memory and superconducting logic without static dissipation, the teams will use these ideas to meet the energy demands of today’s high-performance computers (Keller, 2014).

Liquid nitrogen, as a means of cooling both components and memory, is most readily available and is capable of temperatures below -196 C; however, it is not quite cold enough to reduce resistivity to zero (Kross). While liquid nitrogen is effective, there are other alternate sources for cooling. Liquid helium offers the coldest but is finite and not as abundant as nitrogen. Liquid hydrogen can also reach colder temperatures than liquid nitrogen; however, it is combustible at one thousand and sixty five degree Fahrenheit (NOAA). With these draw backs, liquid nitrogen proves to be the cheapest and easiest to obtain material to cool these super computers (Kross). Nonetheless, to progress further there needs to be some better way to cool superconducting metals more efficiently without risking an explosion.

Apart from expanding cryogenic memory, the second main focus of this program is the logic, communications, and systems desired to build superconducting logic circuits that demonstrate the potential of the technology for high-performance computing (Keller, 2014).

As designs are getting smaller the supply voltage was lowered in order to reduce electric fields within the design (Stockinger, 2001). Reducing electric fields mitigates the impact of wires in close proximity. Compared to semiconducting computers, such as CMOS designs, getting temperatures low enough to reduce resistivity would also cause the design to break down (Krischman, 2003). One reason for using Silicon is because of its durability compared to other semiconductors. For example, compared to Germanium, Silicon is used more in electronics because it can withstand higher temperatures (GSU, 2000). However, as temperatures decrease towards approximately 40 K for Silicon, an effect called freeze-out begins to occur. Freeze-out is where dopants are not sufficiently ionized which causes a defined lack of carriers. Dopants usually require some thermal energy in order to ionize and produce carriers in semiconductors; therefore, significantly cooling the superconductive metal oxides becomes the major challenge (Krishcman, 2003).

Electronics has come a long way since the first computers, but we still have not met our goals yet. Right now engineers are trying to achieve speeds that process at the rate of the human brain. Without proper cooling methods and power distribution, achieving a supercomputer of this caliber will be difficult.

By Aaron Carnahan and Thomas Garner

The University of Mississippi Electrical Engineering Department introduced a Digital CMOS/VLSI Design course this semester. As part of this course, students researched a contemporary issue and wrote a blog article about their findings for presentation on SemiWiki. Your feedback is greatly appreciated.

References:
Georgia State University. (2000). Silicon and Germanium. Retrieved from http://hyperphysics.phy-astr.gsu.edu/hbase/solids/sili.html

Kirschman, R. K. (2003). Extreme Temperature Electronics. Retrieved from http://www.extremetemperatureelectronics.com/

Pop, Sebastian. (2014). Cryogenic Memory and Superconductors Allow Supercomputer to Reach 1 Exaflop. Retrieved from http://news.softpedia.com/news/Mysterious-Supercomputer-Will-Use- Cryogenic-Memory-and-Superconductors-to-Reach-1-Exaflop-466602.shtml

Kross, Brian. Is there anything colder than liquid nitrogen?. Retrieved From http://education.jlab.org/qa/liquidnitrogen_02.html

Keller, John. (2014). Beyond CMOS: three industry teams aim at next generation of high- performance computing (HPC). Retrieved from http://www.militaryaerospace.com/ articles/2014/12/iarpa-c3-contracts.html
Kirschman, Randall. (2009). Cryogenic Electronics. Retrieved from http://www.cryogenicsociety.org/resources/cryo_central/cryogenic_electronics/

NOAA. HYDROGEN, REFRIGERATED LIQUID (CRYOGENIC LIQUID). Retrieved from http://cameochemicals.noaa.gov/chemical/3606

Stockinger, Michael. (2001). 2.1 Subthreshold Leakage. Retrieved from http://www.iue.tuwien.ac.at/phd/stockinger/node13.html


Safety Dominates Agenda in DAC’s Automotive Track

Safety Dominates Agenda in DAC’s Automotive Track
by Majeed Ahmad on 04-12-2015 at 4:00 pm

The connected car movement is in full bloom, making headlines in the trade media on how the cutting-edge electronics will transform the twenty-first century driving experience. However, a closer look at the Internet of cars juggernaut shows that safety and security of the networked vehicle are still a major stumbling block.

Design Automation Conference (DAC) — June 7-11, 2015

The Automotive Track at the upcoming Design Automation Conference (DAC) in San Francisco to be held on June 7-11, 2015 just affirms how crucial safety and security are going to be in the connected automotive platforms. Another prominent highlight of the DAC program for automotive seems that it’s evenly divided between hardware and software aspects of car safety and security.

For instance, Jeffrey Massimilla, Chief of Cybersecurity at GM, is going to talk about cyber threats to connected cars. He will also join a technology chat along with Craig Smith, the author of Car Hacker’s Manual, and John McElroy, the host of Autoline Daily, the first webcast of automotive industry news and analysis. The session will provide a detailed treatment of how connectivity features like Bluetooth, GPS, LTE and Wi-Fi create entry points for hackers.


John McElroy will host a chat on connected car technologies

Jeffrey Owens, CTO of Delphi Automotive, will elaborate on how electronics and design automation are playing a critical role in shaping the future of automotive in another keynote titled as “The Design of Innovation That Drives Tomorrow.”

Next, DAC’s Automotive Track brings ISO 26262 certification to the technology limelight; a whole session is dedicated to the nitty and gritty of robust chip solutions for connected vehicles. Maik Herzog of Infineon Technologies AG will be the keynote speaker at this session about the physical design of automotive ICs, which will also encompass advanced verification tools for the brave new world of ISO 26262.


Infineon’s EDA expert Maik Herzog will talk about the brave new world of ISO 26262

There are going to be three conference sessions. The first one is about modeling, simulation and testing in automotive embedded systems. The session will feature three talks from Infineon’s Moomen Chaari, Kenji Nishimiya of Honda R&D and Armin Wasicek from University of California at Berkeley.

The second conference session is about energy efficient, safe and secure automotive software and systems. The session features six speakers and 21 authors, and will present new results on embedded automotive systems, architectures and algorithms. For instance, there will be a talk about a novel algorithm related to heating, ventilation and air conditioning (HVAC) systems for improving the energy management of electric vehicles.

The third and final session will cover the different facets of automotive embedded software. Cars have now several millions of lines of software code that run on a highly distributed architecture consisting of as many as 100 electronic control units (ECUs) connected by a heterogeneous communication subsystem consisting of CAN, FlexRay and Ethernet among others. The session will discuss various aspects of automotive software—model-based design, component integration and timing analysis.

Majeed Ahmad is former Editor-in-Chief of EE Times Asia and author of six books about wireless and smartphones. His latest book The Next Web of 50 Billion Devices: Mobile Internet’s Past, Present and Future is about the Internet of Things and connected wearable devices.


Sidense NVM Scores Qualification on GLOBALFOUNDRIES 28nm SLP and HPP

Sidense NVM Scores Qualification on GLOBALFOUNDRIES 28nm SLP and HPP
by Tom Simon on 04-12-2015 at 7:00 am

A tremendous number of chips being designed for today’s products require some sort of onboard data storage. The size of these needs range from a handful of bytes, for trim and calibration storage, to something much more substantial like boot code storage. In both of these examples the storage ideally should be nonvolatile, with the option of writing during test and calibration, and possibly several more times over the life of the product. Furthermore, this capability should come with no additional process changes such as special layers or masks.

Design teams have several choices for their on-chip storage requirements. The simplest is mask ROM, but it sacrifices several of the useful traits called for above. First off, it must be made part of the mask when the chip is designed. This limits its use for calibration, unique ID’s or for microcode that might require updates. On the plus side, its useful life is extremely long, eliminating concerns about reliability.

At the other end of the spectrum is NAND flash memory. It often gets ruled out for on-chip uses firstly because it requires modifications to the process and adds additional masks. Also because its ability to retain data relies on storage of a charge in a floating gate, it is prone to errors after repeated writing or even reading. Therefore, for applications that require high reliability NAND flash memory can be a concern.

Because of these and other reasons another type of memory is used for frequently for on-chip storage. One time programmable (OTP) nonvolatile memory (NVM) offers many advantages for storing trim and calibration data, unique hardware addresses, encryption keys, and microcode. By using antifuse technology that selectively breaks down gate oxide, it avoids the potential localized physical damage issues that come from ‘blowing’ fuses in conventional fuse technology. They have fast read times that can make them suitable for code execution use. And, the available storage can be managed to provide the functionality of “few times programmable”. Of course they cannot compete with the re-write levels of NAND flash.


Even though OTP NVM uses conventional CMOS process layers and masks, it needs to be qualified for a given process to ensure the antifuse devices work with during the program and read operations.Sidense, a leading supplier of OTP NVM, has just announced that their SHF family is now qualified on GLOBALFOUNDRIES’s 28nm SLP and HPP processes. 28nm is fast becoming one of the most versatile and widely used process nodes. This is due to its low relative cost and flexibility. The 28nm node is being used for a broad range of products, including networking, wireless, automotive and IoT.

Sidense says their SHF family is available on a wide selection of processes in sizes from 1 Kbit to over 1 Mbit. This makes it suitable for calibration data, encryption keys, ID tags and code storage. This family uses a so called 1T, or one transistor, bit cell to save space and simplify design. SIdense points to the SHF family adoption for use in HTDV processors, PMIC’s, wireless chip sets, and communications and network processors.

Qualification of OTP NVM architectures on a specific node is a significant undertaking that involves cooperation between the foundry and OTP vendor, in this case, GLOBALFOUNDRIES and Sidense. After the design of the bit cell, all the other OTP NVM supporting IP needs be implemented on the target node. This includes the integrated power supply (IPS) that helps eliminate the needs for external supply pins and routing for the programing voltage by generating the necessary voltage internally using the chip supply. On top of this there are addressing and interface blocks in RTL that support the OTP memory core.

Test chips were run at GLOBALFOUNDRIES and then characterized for performance and reliability. After silicon results verified performance of the Sidense SHF family OTP NVP, the two companies announced completion of the qualification process. OTP NVM is one of those things that by itself does not garner a lot of attention. However it is a key enabling technology for many of the growing applications of semiconductor products. Having OTP NVM available on new nodes is ultimately critical to the advancement of many end user products.


TSMC Unleashes Aggressive 28nm Strategy!

TSMC Unleashes Aggressive 28nm Strategy!
by Daniel Nenni on 04-11-2015 at 10:00 pm

The most interesting presentation at the jam-packed TSMC Symposium last week for me was “Advanced Technology Updates” by Dr. BJ Woo. Coincidentally, I met with BJ during my last visit to Fab 12. Much of what we discussed was about TSMC being more aggressive this year but I wasn’t able to really connect the dots until her presentation. The example I will use here is 28nm but it certainly applies to all of the TSMC process nodes moving forward.

First let me tell you that BJ is engaging and a very credible semiconductor executive. She spent the majority of her 30 year career at Intel in Santa Clara designing both DRAM and microprocessors (she has 13 patents). In 2009 BJ joined TSMC taking responsibility for the advanced technology roadmap at 28nm and 20nm and today is Vice President of Business Development.

According to recent press releases and the resulting comments by analysts, who don’t know any better, other foundries are eating away at TSMC’s 28nm stronghold. Articles like that will get you lots of clicks but they are misleading. Remember, there are two versions of 28nm: gate-first and gate-last HKMG. Moving a TSMC gate-last 28nm design that is in production with 90%+ yield to a new gate-first process is absolute madness. Even moving a production design to a new gate-last process that is supposedly “T” compatible (UMC and SMIC) is risky. But of course it will happen because if you are negotiating a better price from one vendor you have to actually be in the position to use another vendor to even be at the negotiation table.

Having the best yielding process does not just give you the lowest cost, it also gives you better design margins and that is the point TSMC made at the symposium. Today TSMC has five versions of 28nm: HP (high performance), HPM (high performance mobile), HPC (high performance computing), HPL (high performance low power), and LP (low power). Two additional processes were added: HPC+ which is an even faster version of HP and ULP which is ultra-low power for IoT and other battery powered applications.

28HPC+ is more compact with 9 and 7 track cell libraries versus 12 and 9 track for 28HPC. The design rules are the same but it has better design margins which offers 15% more performance. 28ULP looks a lot like 55ULP and 40ULP that are already in production. Compared to the associated LP processes, ULP processes can further reduce operating voltages by 20% to 30% to lower both active power and standby power consumption resulting in a 2x-10x increased battery life. IoT and wearable devices are the target applications for ULP processes of course.

The other big 28nm announcement that BJ made is that the TSMC 28nm is now qualified for automotive work which is an industry first. Given the growth of electronics in our cars and the coming autonomous vehicles this is a very big deal for sure.

In the same vein, BJ also talked about a new 16nm process coming called 16FFC, the C meaning compact. It is a more economical version of 16FF+ aimed at cost and power sensitive markets. Power is said to decrease by more than 50% and the pricing will be very competitive for mainstream markets.

Again, when I met with BJ she said TSMC would be very aggressive moving forward and she had a definite twinkle in her eye and now I know why. What a great year for the fabless semiconductor ecosystem, absolutely!

Also read: TSMC Processes Galore


Xilinx at NAB: Any Media Over Any Network

Xilinx at NAB: Any Media Over Any Network
by Paul McLellan on 04-11-2015 at 7:00 am

The NAB (National Association of Broadcasters) show has just started, April 11-16th in Las Vegas. It covers a very broad range of topics:
As the premier trade association for broadcasters, NAB advances the interests of our members in federal government, industry and public affairs; improves the quality and profitability of broadcasting; encourages content and technology innovation; and spotlights the important and unique ways stations serve their communities.


That is a big range from content to technology. House of Cards to network interface cards.

Xilinx will be there. Programmable logic devices are a key component in a lot of video transmission technologies, being a good combination of flexibility, performance and power. When standards have not totally settled down, programmability is essential, but just using general purpose microprocessors and software consumes too much power, and simply is not high enough performance for many video applications anyway.

In fact Xilinx just launched its next generation of Video over IP connectivity solutions to address the industry’s transition to all IP-based networks. The transition to IP-based technologies is creating huge opportunities for cost savings, video production efficiency, and scalability, but its newness is also creating some confusion and hesitation by some vendors in terms of protocol selection. A programmable Xilinx device in conjunction with an Ethernet PHY offers a firmware upgradeable Video over IP platform that supports any media over any network with cores and reference designs enabling fast time-to-market and low risk deployment.

Xilinx defines and deploys Video over IP protocols for contribution and distribution networks with the provision of IP cores and reference designs. These cores encapsulate multiple compressed JPEG 2000 or MPEG transport streams, or uncompressed SDI streams onto 1Gb and 10Gb Ethernet IP networks, and offer optional Forward Error Correction (FEC) to recover lost packets and provide robustness in media transmission.

Xilinx will be at NAB on booth N5616 demonstrating their video technology cores and full reference designs. Of course all these demonstrations run on Kintex or Zynq programmable platforms.

  • 6G & 12G SDI —this will showcase reference designs that enable developers to implement the latest SMPTE standards for SDI
  • HDMI—IP cores for both HDMI 1.4 Tx/Rx and HDMI 2.0 Tx/Rx
  • 4K Video Processing—the new Real-Time Video Engine reference design. It also features a motion adaptive deinterlacer, scaler, and OSD at 4K.
  • SMPTE ST 2059 & ST 2022—demonstration of the upcoming ST 2059 IP core
  • intoPIX TICO—the intoPIX TICO mezzanine compression provides up to 4:1 visually lossless compression.
  • Omnitek PCIe Streaming DMA Controller
  • NGCodec HEVC Encoder—this will implement an HEVC encoder using the HDMI IP cores.

It is not just Xilinx who will be there, but also their partners showing solutions based on Xilinx fabrics: CoreEL are at SU12203, Fidus at N4739, inrevium at N4739, intoPix at C8425a, Omnitek at 3114, Barco-Silex at C8427b and Pathpartner at SU10826.


Xilinx’s specialized broadcast page is here. NAB’s own webpage is here. If you are at NAB then come by booth N5616 and see what any media over any network means.


Cu-Pillar in Advanced Logic Devices

Cu-Pillar in Advanced Logic Devices
by Arabinda Das on 04-10-2015 at 7:00 pm

In 2001, flipchip with solder bump was already a dominant technology and it was replacing wire bonding as the main interconnection choice for a growing number of devices. It was offering fine pitch interconnections for increased I/O counts. In the solder bump process, a bump is formed on the chip and on the package substrate and they are connected by reflow. During the reflow the solder bumps collapse and do not retain their height in two directions. Moreover, the solder bumps occupy a larger space than the pitch of a pad on the chip. These problems frustrated two researchers in IBM and they came out with an alternative solution, called copper pillars (US 6229220 B1).

Their main idea was to have a conducting post that has two metal layers and the lower one has a melting point that is greater than the first layer. According to the teachings of the patent, the difference of melting temperature between these two materials must be greater than 20 C. The lower layer is in contact with the chip substrate and the upper layer with the package substrate. The lower layer could be made of Cu and the upper layer consists of solder. This difference in melting temperature between the two layers is the main innovative concept of the patent. This would help to retain the height of the conducting post during solder reflow. The patent also outlined an integration process to fabricate Cu-pillars, which came to be known as IBM’s Cu-pillar process. This process is widely used in the industry with minor variations. A few years later around 2007, Intel also introduced their concept of Cu-pillars (US7276801B2), which is slightly different from the IBM process-flow. The basic difference is that the conductive pillar is encapsulated with a diffusion barrier as a protective layer.

Very quickly, many researchers started working on this concept and they realized that the bump-pad height, the bump composition and the bump-pitch had an influence on the stress that was transferred to the underlying dielectrics, especially if the dielectrics were made of low-k materials. The stress was coming from the differential thermal expansion coefficient between the Si substrate and the organic printed circuit board. This discovery led to the major improvements in underfill materials and under bump metallization (UBM). The semiconductor industry quickly realized the advantages of this new technology; including: a bump-pitch in the order of 50 µm that could be achieved and thus open up the possibility of higher density of connection. The stand-off height was definitely an edge compared to the conventional C4 solder bumps and facilitated void free underfill. The biggest advantage was the superior electrical and thermal conductance of copper pillars than that of the solder material.

At 65 nm node, two companies introduced copper pillars; the first one was from Intel (Intel Pentium 65 nm D 920 processor) and the second one was ST-Ericsson (ST-Ericsson 65 nm, DB5730 Baseband Processor). Surprisingly, this technology did not take wings as the technology advanced to the next node. At 45/ 40 / 32 nm technology node, several advanced logic processors flooded the market. But only two companies Intel and Texas Instrument employed the Cu-pillar integration scheme. A list of the 45 nm node devices analyzed at TechInsights is given below along with a remark about their packaging process.

[TABLE] border=”1″
|-
| style=”width: 160px” | Company
| style=”width: 160px” | Device
| style=”width: 66px” | Node
| style=”width: 210px” | Packaging
|-
| style=”width: 160px” | Matsushita
| style=”width: 160px” | UniPhier System LSI
| style=”width: 66px” | 45 nm
| style=”width: 210px” | Au ball bond + Wire bonding
|-
| style=”width: 160px” | Intel
| style=”width: 160px” | Penryn Processor QX9650
| style=”width: 66px” | 45 nm
| style=”width: 210px” | Flipchip, Cu-Pillars
|-
| style=”width: 160px” | Apple/Samsung
| style=”width: 160px” | Applications Processor 3[SUP]rd[/SUP] Generation
| style=”width: 66px” | 45 nm
| style=”width: 210px” | Flipchip, Solder bump
|-
| style=”width: 160px” | Texas Instrument
| style=”width: 160px” | X4430SDCCBL OMAP4430 processor
| style=”width: 66px” | 45 nm
| style=”width: 210px” | Flipchip, Cu-Pillars
|-
| style=”width: 160px” | Sony / IBM
| style=”width: 160px” | CXD2992AGB, in the Sony PS3 Slim
| style=”width: 66px” | 45 nm
| style=”width: 210px” | Flipchip, Solder bump
|-
| style=”width: 160px” | AMD
| style=”width: 160px” | Quad-Core-Opteron
| style=”width: 66px” | 45 nm
| style=”width: 210px” | Flipchip, Solder bump
|-
| style=”width: 160px” | Freescale
| style=”width: 160px” | P2020PSE2KZA, Processor
| style=”width: 66px” | 45 nm
| style=”width: 210px” | Au ball bond + Wire bonding
|-
| style=”width: 160px” | Altera Stratix-TSMC
| style=”width: 160px” | IV GX 40 nm, FPGA
| style=”width: 66px” | 40 nm
| style=”width: 210px” | Flipchip, Solder bump
|-
| style=”width: 160px” | ATI-Radeon-AMD-TSMC
| style=”width: 160px” | Graphics processor
| style=”width: 66px” | 40 nm
| style=”width: 210px” | Flipchip, Solder bump
|-
| style=”width: 160px” | AMD-Global foundries
| style=”width: 160px” | AD3850WNGX Processor
| style=”width: 66px” | 32 nm
| style=”width: 210px” | Flipchip, Solder bump
|-
| style=”width: 160px” | Panasonic
| style=”width: 160px” | MN2WS0150 Processor
| style=”width: 66px” | 32 nm
| style=”width: 210px” | Flipchip, Solder bump
|-
| style=”width: 160px” | Intel
| style=”width: 160px” | Clarkdale/Westmere
| style=”width: 66px” | 32 nm
| style=”width: 210px” | Flipchip, Cu-Pillars
|-

The situation is quite different for devices below 30 nm node. There are fewer device makers who can employ this technology beyond 20 nm. Most of the devices are manufactured either by TSMC, by Samsung or by Intel. Intel has continued to use copper pillars in all its technology nodes since the 65 nm node. Samsung is still using solder bump technology even for their 20 nm logic device; while TSMC has adopted Cu-Pillars in their packaging modules and no company is using wire bonding technology in its advanced logic devices.

[TABLE] border=”1″
|-
| style=”width: 160px” | Company
| style=”width: 160px” | Device
| style=”width: 78px” | Node
| style=”width: 174px” | Packaging
|-
| style=”width: 160px” | Xilinx-Kintex-TSMC
| style=”width: 160px” | 7XC 7 XC7K325T; HKMG
| style=”width: 78px” | 28 nm
| style=”width: 174px” | Flipchip, Solder bump
|-
| style=”width: 160px” | ATI-Radeon-TSMC
| style=”width: 160px” | HD7970 Graphics, HKMG planar
| style=”width: 78px” | 28 nm
| style=”width: 174px” | Flipchip, Solder bump
|-
| style=”width: 160px” | Nvidia-TSMC
| style=”width: 160px” | GK107 Garphics, HKMG planar
| style=”width: 78px” | 28 nm
| style=”width: 174px” | Flipchip, Solder bump
|-
| style=”width: 160px” | Mediatek-TSMC
| style=”width: 160px” | MT6592, HKMG planar
| style=”width: 78px” | 28 nm
| style=”width: 174px” | Flipchip, Cu-Pillars
|-
| style=”width: 160px” | Qualcomm-Samsung
| style=”width: 160px” | MDM9215,Poly planar
| style=”width: 78px” | 28 nm
| style=”width: 174px” | Flipchip, Solder bump
|-
| style=”width: 160px” | Intel
| style=”width: 160px” | i5-3550 Ivy Bridge, HKMG, FinFET
| style=”width: 78px” | 22 nm
| style=”width: 174px” | Flipchip, Cu-Pillars
|-
| style=”width: 160px” | Intel
| style=”width: 160px” | Valley View Atom Z3740, HKMG, FinFET
| style=”width: 78px” | 22 nm
| style=”width: 174px” | Flipchip, Cu-Pillars
|-
| style=”width: 160px” | Qualcomm-TSMC
| style=”width: 160px” | MDM9235, HKMG, planar
| style=”width: 78px” | 20 nm
| style=”width: 174px” | Flipchip, Cu-Pillars
|-
| style=”width: 160px” | Samsung
| style=”width: 160px” | Exynos 5430, HKMG planar
| style=”width: 78px” | 20 nm
| style=”width: 174px” | Flipchip, Solder bump
|-
| style=”width: 160px” | Intel
| style=”width: 160px” | Broadwell 5Y70, HKMG, FinFET
| style=”width: 78px” | 14 nm
| style=”width: 174px” | Flipchip, Cu-Pillars
|-

The copper pillars of two highly successful processes in the industry are shown below. Figure 1 is TSMC’s 28 nm node and the Figure 2 is Intel’s 22 nm node. The biggest difference between the two processes is that Intel’s 22 nm bond pad is made of Cu, while the bond pad of TSMC’s 28 nm device is formed in Al.

Figure 1: TSMC 28 nm, Cu-Pillar process on Al bond pad, showing the Cu-pillar pitch
Figure 2: TSMC 28 nm, Cu-Pillar process on Al bond pad, showing the Cu-pillar structure

Figure 3: Intel 22 nm, Cu-Pillar process on Cu bond pad, showing the Cu-pillar pitch

Figure 4: Intel 22 nm, Cu-Pillar process on Cu bond pad, showing the Cu-pillar structure

There are some similarities and differences between the two processes. The cross-sections show that TSMC’s process uses fairly perpendicular copper pillars as compared to Intel’s process. The ratio of Cu to solder is smaller for the Intel process as compared to TSMC process. Intel prefers to employ a narrow neck and a broad shoulder. But both of them employ very relaxed pitches, probably to have a void free underfill. The general process flow is the same and the main steps are given below:

— Pattern the bond pads,
— Deposit the passivation layers on top of the bond pads
— Pattern the openings in passivation to expose the top surface of the bond pad.
— Deposit a polyimide layer on top of the passivation and on the exposed bond pad
— Pattern polyamide to have an opening
— Deposit a barrier layer followed by a seed layer (Cu)
— Apply photo-resist and pattern to form a mold for the pillar
— Electro-deposit the pillar material using the seed layer as a nucleation site
— Cap the Cu-pillar with Ni to prevent oxidation and for adhesion with solder
— Deposit solder on top of the Cu-pillar
— Remove the photo-resist and pattern the barrier layers using the pillar as a mask

The differences are mainly in the geometrical aspects and are summarized in the table below:

Table 1: Cu-pillar dimensions and materials for TSMC and Intel

The Cu-pillar dimensions of these two processes are not at the frontiers of what the Cu-pillar can deliver in terms of fine pitch or standoff heights but these Cu-pillar processes are the fore-runners of the industry. They are designed to dissipate heat effectively and have robust reliability for high performance processors. Adoption of the Cu-pillar is inevitable for advanced logic devices because Cu-pillar technology is one of the key enablers of 3DIC integration. In the future, if Cu-Cu bonding becomes the mainstream then the solder cap on the Cu-pillars will eventually be eliminated. Cu-pillar bump technology is needed for through-silicon-vias (TSV) and for other advanced packaging methods.

That is the reason, why the most recent 20nm DRAM device from Samsung is employing Cu-pillars. Recently, Dr. Kevin Gibb from TechInsights blogged that Samsung’s latest DRAM is TSV enabled (TechInsights – Samsung 20 nm DDR4 TSV Enabled DRAM).Adopting Cu-pillars is the first step for TSV bonding. Companies will realize the superior performance of Cu pillars to solder bumps and will feel the need to adopt the process. Several players for the same process will lead to a greater variety of Cu-pillar designs and the manufacturing cost will be lowered.


From Medical and Wearables to Big Data, in 日本語/한국어/中文

From Medical and Wearables to Big Data, in 日本語/한국어/中文
by Paul McLellan on 04-10-2015 at 7:00 am

Whether it’s a tiny always-on medical device or a secure cloud network processing Big Data, the Internet of Things (IoT) is bringing new challenges to IC design. Almost by definition an IoT device contains a microcontroller of some sort along with some way of communicating. Unlike our smartphones where we are reasonably happy if they last all day before requiring a charge, IoT systems, and thus the chips in them, have to last much longer. Current wearables such as Fitbit will last a week but some IoT devices are expected to last much longer, perhaps their whole lifetime, without a charge. Or even to scavenge power from the environment. Under those constraints, every tiny bit of power is important. To make things worse, many of these devices are likely to be “always on” meaning that at least a small part of the design must be permanently powered up to notice when something interesting happens. Both active power and static power are critical for maintaining long battery life. Therefore optimizing for very low voltage as well as very low leakage is important. These devices typically operate significantly below 100MHz.


One of the big challenges in these types of conditions is having memory that works reliably. Most available memory IP is not optimized for these criteria. Most semiconductor foundries prefer to develop and manage the SRAM bitcells. Because SRAM cells are limited by stability (or static noise margin) and write ability (or write margin) the lowest operating voltage (VDDMIN) is carefully specified. The random threshold variations in subnanometer technologies have resulted in serious yield issues for realizing low VDD READ/WRITE operations with a typical 6T SRAM cell. The use of different cell topologies may improve the SRAM stability at low operating voltages.

eSilicon has developed statistical simulation techniques to determine statistical failure probability distributions for low-voltage failure modes for various bitcell topologies. Bitcell read current, read-disturb margin, write margin, minimum data-retention voltage (MDRV), and leakage current are all thoroughly analyzed. These techniques enable quantification of actual failure rates as a function of critical process parameters, temperature, voltage, and design parameters. Effective design optimization for optimum VDDMIN, power, performance, and yield is enabled by these efficient simulation techniques. The result is a series of optimized SRAM architectures that operate below 100MHz, below 0.7V, and at one-fourth the leakage power of other available SRAM compilers at 65nm, 55nm, 40nm and 28nm technologies.

Associative lookup structures lie at the heart of many computing problems and content-addressable-memories (CAMs) provide fast constant time lookups over a large array of data (content keys) using dedicated parallel match circuitry. The two most common search-intensive tasks that use CAMs are packet forwarding and packet classification in Internet routers. For a lot of these applications, ternary CAM (TCAM) is an even more powerful primitive, able to simultaneously search through a large number of subspaces of a higher dimensional space in one shot. eSilicon’s 14/16nm TCAM compiler offers 1 gigasearch/s under worst-case conditions with low- power search features.


eSilicon have created a white paper From Medical and Wearables to Big Data: Differentiated IP for the IoT Spectrum available here.

eSilicon also created a webinar recently on this topic, focusing on ultra-low-power and ultra-low-voltage memory solutions. This webinar was, by far, the most popular eSilicon have done. Interest came from all over the globe. When the replay went up, interest again came from all over the globe (see the map below).

eSilicon decided to do something that they have never done before, they presented the webinar 3 more times in Japanese, Korean and Chinese. So you can watch:

  • In English with Lisa Minwell presenting, Senior Director of IP Marketing eSilicon
  • In Japanese 日本語 with Zenda Nguyen, Program Manager, IP Business Unit, eSilicon Vietnam
  • In Korean 한국어 with Taeho Kim, Country Manager & GM, eSilicon Korea
  • In Chinese 中文 with Kar Yee Tang, IP Product Marketing Manager, eSilicon


All four videos are available for replay here.


ANSYS Enters the League of 10nm Designs with TSMC

ANSYS Enters the League of 10nm Designs with TSMC
by Pawan Fangaria on 04-09-2015 at 7:00 pm

The way we are seeing technology progression these days is unprecedented. It’s just about six months ago, I had written about the intense collaboration between ANSYSand TSMCon the 16nm FinFET based design flow and TSMC certifying ANSYS tools for TSMC 16nm FF+ technology and also conferring ANSYS with “Partner of the Year” award. Read “ANSYS Tools Shine at FinFET Nodes!”. Just before this Intel also certified ANSYS tools at 14nm Tri-gate process as written in another article, “Intel & ANSYS Enable 14nm Chip Production”. And this week, TSMC has certified ANSYS Power Integrity and Electromigration (EM) solutions for 10nm FinFET process node. It’s amazing progress! Read the press release here.

ANSYS portfolio of products was showcased in the TSMC Technology Symposium held in San Jose, California on 7[SUP]th[/SUP] April, 2015. ANSYS’ RedHawk and Totem were certified by TSMC for 10nm FinFET DRM and Spice models. These tools were certified to provide solutions for static and dynamic voltage drop analysis and advanced signal and power EM verification that are required for ultra-low power and high performance SoC designs at 10nm for mobile, computing and networking applications.

At 10nm process node the devices are left with extremely low noise and reliability margins and FinFET’s structure is typically prone to increasing self-heat.

As shown in the picture, heating happens at the device (FEOL) as well as interconnect (BEOL) levels and hence both need to be considered. At sub-28nm process nodes, as we go down the node, the current density increases and makes the device increasingly vulnerable to EM. In a FinFET the current density can be generally 25% more than that in a planar transistor. Also the narrow 3D fin structure and the lower thermal conductivity of the SiO2 dominated substrate can cause local heat to get trapped.

With such tough challenges and extremely tight window of accuracy, it’s critical to ensure power integrity across the chip, package and board. And an accurate EM analysis at all levels is a must. There are some key critical enhancements added into ANSYS tools to provide the kind of accuracy and versatility needed for the EM, power integrity and reliability solution at 10nm.

To support multi-patterning technology, ANSYS solution provides color-aware resistance extraction and EM analysis capability. And there is a complete system-to-block level EM analysis flow with color-aware metal-fill capability that delivers higher yield and performance along with accurate EM analysis.

To address the increasing difference in the current between signal and power rails, ANSYS solution provides various approaches to apply appropriate EM rating factors for signal and power analysis. At 10nm, there can be measurement issue between the drawn trapezoidal shape and the physical implementation of a wire in silicon. ANSYS provides a comprehensive wire width adjustment solution to compensate for the difference that leads to more accurate results in the EM analysis.

ANSYS solution provides thermal-aware EM methodology. Above diagram shows the Thermal-aware EM Flow at TSMC for the 16nm FF+ process node that uses RedHawk, Totem and Sentinel-TI. RedHawk/Totem along with Sentinel-TI uses foundry data to accurately compute the self-heat temperature on an IP or SoC. The temperature can be analyzed at instance or metal layer basis. A Chip Thermal Model (CTM) is generated for back-annotation into RedHawk or Totem. This methodology helps avoiding over-heating of the device, thus increasing its lifetime and reliability.

With increasing complexity and sizes of SoCs at lower nodes, challenge of managing capacity, performance, and parasitic effects also increases. RedHawk/Totem uses a novel Distributed Machine Processing (DMP) capability that can handle large power delivery network (PDN) and perform flat simulation with high performance and small memory footprint. RedHawk-CPA provides chip-package co-simulation and co-analysis within a unified environment that ensures integrity of power delivery on the complete chip and takes into account the impact of package parasitic, thus avoiding undesired hotspots.

The overall comprehensive solution provided by ANSYS delivers highly accurate results as needed at 10nm FinFET node and also reduces design turnaround time through its innovative methodology, algorithms, and multi-physics simulations. The Power Integrity and EM solutions are ready for 10nm FinFET based early design start. On earlier technologies, ANSYS solution for SoC/IP power integrity, noise, and reliability sign-off has been proven on thousands of successful silicon wins.


Starvision Pro: Lattice Semiconductor’s Experience

Starvision Pro: Lattice Semiconductor’s Experience
by Paul McLellan on 04-09-2015 at 7:00 am

During SNUG I took the opportunity to chat to Choon-Hoe Yeoh of Lattice Semiconductor about how they use Concept Engineering’s Starvision Pro product. He is the senior director of EDA tools and methodologies there.

Lattice Semiconductor is a manufacturer of low-power, small-footprint, low-cost programmable logic devices. Earlier this month it closed an acquisition of Silicon Image, a leading provider of multimedia connectivity solutions and services for mobile, consumer electronics and PC markets based in Sunnyvale, CA.

One of their products is the world’s smallest, lowest power, most integrated, most flexible mobile FPGA. With up to 4,000 LUTs and key IP for IR, barcode, voice, USB-C, user ID, LEDs, pedometer, and more. Perfect for the IoT and mobile markets!

StarVision Pro provides engineers with the ability to quickly and easily understand and debug mixed-mode designs and to integrate IP building blocks into their complex SoCs and ICs. Due to the increasing use of building blocks in SoC design, engineers need to work at different design levels (RTL, gate, transistor, analog, parasitic) as well as with different design languages and netlist formats.

Choon is responsible for design enablement at Lattice, including tools, methods, flows, PDKs, license queuing and so on. Lattice has been using Starvision Pro for a couple of years. They use it primarily for better visualization of chip level design. These are difficult to work with in a “classical” schematic tool since the designs are a mixture of actual gate-level and transistor-level schematics along with Verilog. Starvision Pro helps to improve productivity at chip level design debugging as it gives the designers system-level visibility which is important since FPGA design is a mixture of full-custom and RTL with a number of different variants of the flow.

Choon expects to expand the use in the future, and is looking at various ways that different products and flows could benefit.

Here in a single table is a concise summary of the features of Starvision Pro.

[TABLE]
|-
| style=”text-align: center” | Features
| style=”text-align: center” | Benefits
|-
| Ultra fast HDL reader and graphics on the fly
| Graphical representations make it easier to understand, debug, change and optimize Verilog, VHDL and SystemVerilog code
|-
| Schematics from SPICE netlists
| Schematics provide easier and faster debugging for complex circuits. Supported dialects include SPICE, HSPICE, Spectre, Calibre, CDL, Eldo and PSPICE.
|-
| 32/64-bit database
| Higher performance and increased capacity, for very large designs
|-
| Powerful GUI
| Multiple views, including tree, schematic, waveform and source file plus drag and drop between different views for increased circuit understanding
|-
| Cone Window
| Incremental schematic navigation for easy design exploration
|-
| Tcl UserWare API
| Allows interfacing with tool flow and definition of electrical rule checks
|-
| Circuit fragment save
| Circuit netlists can be saved as SPICE files or Verilog files for future reuse as IP, or for partial simulation
|-
| Automatic clock tree and clock domain extraction and visualization
| Faster detection and resolution of clock domain problems
|-
| Full support for mixed language and mixed-signal designs
| Designers can easily develop and debug today’s most complex heterogeneous designs (SystemVerilog, Verilog, VHDL, SPICE, HSPICE,…)
|-
| Parasitic analysis features
| Allows visualization and analysis of parasitic networks (DSPF, RSPF, SPEF) and provides capabilities to create SPICE netlists for critical circuit fragment simulation.
|-

Lattice Semiconductor’s website is here. Concept Engineering’s page on Starvision Pro is here.


Archives from TI’s Baseband Glory – Part 1

Archives from TI’s Baseband Glory – Part 1
by Majeed Ahmad on 04-08-2015 at 7:30 pm

In 1992, nearly two years after Britain’s Acorn Computers joined hands with Apple and VLSI to create Advanced RISC Machines or ARM, the semiconductor upstart landed its first major licensing breakthrough. In retrospect, while Apple’s Newton handheld computer had played a key role in creating the ARM venture, Texas Instruments Inc. was the most important early licensee that ARM had snagged.


ARM7 was integrated with TI’s DSP for baseband in Nokia phones

ARM’s founding CEO Robin Saxby later acknowledged that it was really the TI’s license that put ARM on the semiconductor map. TI, the sixth largest chipmaker in the world at that time, was a licensing coup for ARM because it offered an entree into the vast market for embedded control in the automotive industry.

Meanwhile, TI in Europe, who was working closely with Nokia for developing mobile phone chips, saw potential in ARM’s CPU-light product and brought Nokia into the ARM fold. The collaboration between ARM, Nokia and TI eventually led to Thumb-capable ARM7TDMI chip that Nokia used in its 6110 GSM phone introduced in 1994. Nokia wasn’t happy with the code density of the ARM7 processor, so ARM developed Thumb as an alternative instruction set which addressed the code density issue.


Robin Saxby: TI license was a powerful endorsement for ARM technology

The Thumb chip provided low power in a 16-bit architecture that took less space, memory, and power than competing core architectures. The ARM processor core was integrated with TI’s DSP in a baseband solution and was used in Nokia’s 6110 handset. The 6110 mobile phone became hugely successful at the time of GSM’s early take-off. The 6110 design-win gave TI the ability to push its chips into other mobile phones, and that gave ARM the market backing for its processor architecture.


Nokia’s 6110 handset provided enormous boost to ARM and TI

By the late 1990s, ARM had a share of nearly 97 percent in that rapidly growing market. The only two major cellular phone makers that didn’t use ARM cores were Hitachi and Siemens. Eventually, TI became not only ARM’s single largest licensee, but also went on to gain 60 percent market share in mobile phone chips. TI had just about sewn up the mobile handset silicon market by devoting vast engineering resources to Nokia for the development of platforms based on its chips.

It was during this time that TI began to focus on its DSP technology for other electronic products such as modems, PC peripherals and television sets. In 1994, for instance, TI launched a multimedia processor, the first single-chip solution that combined parallel DSP and RISC parts. The confidence that came with the baseband triumph was now branching into other semiconductor markets.

Majeed Ahmad is the author of Nokia’s Smartphone Problem: The End of an Icon?