Bronco Webinar 800x100 1

3D NAND – Moore’s Law in the third dimension

3D NAND – Moore’s Law in the third dimension
by Scotten Jones on 05-07-2016 at 4:00 am

For more than a decade 2D NAND has been the leading driver of lithography shrinks, for example, Samsung went from 120nm in 2003 to 16nm in 2014 with shrinks on an almost yearly basis, but the shrinks came at a price. At 16nm Self Aligned Quadruple Pattering (SAQP) was required for the most critical layers and patterning related costs including deposition and etches for multi-patterning grew to represent nearly two thirds of the cost of the wafer fabrication process. At the same time device related issues were also a growing problem, adjacent cell interference, maintaining control gate to floating gate coupling and the shrinking number of electrons per cell are just a few of the many issues.

In 2014 Samsung introduced the first 3D NAND part. Instead of horizontal stings of memory cells Samsung turned the strings on end into the vertical direction. The basic process flow can be broken up into three major segments:

[LIST=1]

  • CMOS – this is the peripheral circuitry that drives and controls the memory array.
  • Memory Array – the area where the values are stored.
  • Interconnect – connects the memory array and CMOS together.

    The CMOS and Interconnect are similar to the 2D NAND process but the memory array formation is completely different. The memory array fabrication is as follows (Samsung TCAT process):

    • Alternating layers of silicon dioxide and silicon nitride are deposited.
    • Channel hole etch – the channel opening is etched down through all of the oxide/nitride layers.
    • Channel fill – an epitaxial layer is grown in the bottom of the channels and then the channel is filled with polysilicon and oxide to create a “macaroni channel” (a tube of polysilicon filled with oxide).
    • Stair Step Formation – a thick photoresist layer is applied and patterned, one set of oxide/nitride pairs is etched and then the photoresist pattern is shrunk and the next pair of oxide/nitride layers is etched. This sequence is repeated to create a stair step structure at the edge of the array. Ideally this is done with a single mask but in practice multiple masks are required.
    • Planarize – a thick oxide layer is now deposited and planarized.
    • WL Slot – a word line slot mask is applied and a slot is etched down through all of the oxide/nitride layer pairs.
    • Gate Formation – the nitride layers are now etched out through the word line slot. A gate stack of silicon dioxide, silicon nitride, aluminum oxide, tungsten and tantalum nitride if then deposited and etched back and finally the slot is filled with oxide and tungsten. This is a gate last process, other companies use a gate first process.

    There are a number of advantages to this process:

    [LIST=1]

  • The lithography requirements are relaxed because the cell “length” is set by the depositions. All of the memory array patterns are done with single patterning.
  • The number of cells in a vertical string can be scaled up by depositing more layers. In theory you can add layers without needing any additional masks although the stair step formation may require some additional masks. In theory the whole memory array is fabricated with only three masks although in practice more are required.
  • The memory cells are bigger and hold more electrons.
  • Speed, endurance and other critical performance characteristics are all improved versus 2D NAND.

    With 2D NAND we saw memory density improve from 0.006 Gb/mm[SUP]2[/SUP] at 120nm to 1.1 Gb/mm[SUP]2[/SUP] at 16nm for a 3 bit per cell memory cell. In 2014 Samsung introduce a 24 layer 3D NAND part with 0.97 Gb/mm[SUP]2[/SUP] for a 2 bit per cell part, in 2015 Samsung introduced a 32 layer 3 bit/cell part with a density of 1.86 Gb/mm[SUP]2[/SUP] and in 2016 a 48 layer 3 bit per cell part with 2.62 Gb/mm[SUP]2[/SUP]. 3D NAND has already far surpassed the higher memory density of 2D NAND and it is expected that additional layers will continue to be added until parts with over 100 layers and more than 1Tb per part will be introduced. In fact, we forecast that a 128 layer – 4 bit per cell part will be produced around 2020 with 8.67 Gb/mm[SUP]2[/SUP].

    3D NAND is not without it challenges, as the number of layers increases it may not be possible to etch and fill through the entire stack and the stack may need to become a two-step process where half the stack is deposited and patterned and then the other half is deposited and patterned. The relatively low mobility of the polysilicon channel may also become limiting and IMEC has already demonstrated InGaAs as a channel material.

    See my article on IMECs work here.

    Another interesting innovation in 3D NAND was disclosed by Intel and Micron at IEDM last year where they fabricate part of the peripheral CMOS under the memory array. The combination of CMOS under the memory array and a denser array enabled Intel-Micron to achieve a 22% density advantage over Samsung for a 32-layer device.

    See my article on the Intel-Micron disclosure here.

    Of course no technology succeeds in the semiconductor industry unless it is economical. The switch to 3D NAND has changed the cost paradigm away from being patterning dominated to being deposition and etch dominated. In fact, I estimate that patterning costs make up less than one third of the total fabrication process for Samsung’s 32-layer device (one double patterned layer for interconnect). Some analysts claim that 48 layers is the breakeven technology versus 16nm 2D for bit cost, I disagree with this. 3D versus 2D wafer fabrication costs are similar although with different costs drivers. 3D NAND has much higher bit density but to-date poor yield due to the challenges of pattering the memory stack. My modeling is that Micron’s 32-layer part is 25% less expensive per bit than their 16nm 2D NAND after factoring in yield. This is consistent with statements from Micron. Furthermore, Micron has shown a generation 2 part that they say will provide an additional 30% cost reduction over generation 1, also consistent with our modeling.

    In conclusion 3D NAND has overcome the limitations of 2D NAND providing lower cost and better performance with a scaling path into the next decade.


  • One FPGA synthesis flow for different IP types

    One FPGA synthesis flow for different IP types
    by Don Dingee on 05-06-2016 at 4:00 pm

    Both Altera and Xilinx are innovative companies with robust ecosystems, right? It would be a terrible shame if you located the perfect FPGA IP block for a design, but couldn’t use it because it was in the “wrong” format for your preferred FPGA. What if there were a way around that?

    There is a compelling argument to use each FPGA vendor’s tool that delivers synthesis results optimized for their particular FPGA. However, that can be a limiting factor in a design with numerous IP blocks. Constraining the IP search to only the FPGA’s vendor ecosystem may artificially rule out what might be the best option for differentiating a design. Imagine what would happen if your firm is looking at acquiring another firm, and you discover they work in the other FPGA environment. “Oh, dang it, we can’t buy them ….” Probably not a good response.


    It makes a lot more sense to be ready for any FPGA IP that comes your way. I’d also like to challenge the assumption that a third-party FPGA synthesis tool can’t deliver the same or better quality of results – QoR is a function of the entire design, after all the IP is comprehended. Synopsys Synplify Premier is designed to handle both Altera and Xilinx IP, working with the various formats and constraints, and deliver better synthesis results.

    Paul Owens, Sr. Corporate Applications Engineer for Synopsys, points out in a recent webinar that there are two broad categories of FPGA IP: interface, and datapath. Interface IP often has vendor-specific physical constraints and non-timing constraints, while datapath IP has associated timing constraints. There is also the possibility that vendor IP is in one of several formats. Life is good if everything is in readable RTL source, but Altera IP is often encrypted RTL, and Xilinx IP also comes in DCP (Design Check Point) and BD (Block Design) formats.

    Readable RTL can be added in what Owens calls absorb mode. The entire IP is read in, the synthesis engine gets to optimize timing paths and logic in and around the IP, and the final netlist contains the IP netlist in entirety.

    What if IP is encrypted? Synplify Premier also handles a white or grey box method. Timing models for the IP are read in, and the synthesis engine optimizes around it but doesn’t modify the IP itself. These IP blocks typically have more complex constraints, which Synplify Premier imports for synthesis of surrounding logic while preserving the original constraints for place & route of the encrypted block.

    Constraints can make or break a synthesis cycle. Writing and importing constraints is a hugely important step in achieving QoR. About half of Owens’ presentation is devoted to dealing with constraints and achieving QoR, even while working with the disparate IP types. A noteworthy observation is Synplify Premier works with the Xilinx Vivado place & route engine for congestion improvement, and similar capability is in development for Altera.


    This webinar provides one of the most packed yet to-the-point descriptions of Synplify Premier capability I’ve seen. Owens discusses the benefits of parallel place & route on a server farm, which can significantly speed up the overall synthesis process. He also touches on the debug process with the Identify debugger, offering a unified environment and one look and feel regardless of which side the FPGA IP came from.

    To view the entire webinar (registration at TechOnline):
    Accelerate your FPGA Design Schedules with Synplify Premier

    I’ve called the concept of free the “f-word” of technology marketing: operating systems, EDA tools, it’s all the same argument. There are times in more advanced scenarios where paying for a tool delivers better results. FPGA synthesis on big projects with disparate IP and complex constraints is one of those scenarios where productivity and QoR gains are worth the investment in a tool like Synopsys Synplify Premier.


    Neural nets for Qualcomm Snapdragon

    Neural nets for Qualcomm Snapdragon
    by Bernard Murphy on 05-06-2016 at 12:00 pm

    Neural nets are hot these days. In this forum certainly you can’t swing a cat without hitting multiple articles on the topic – I’ve written some myself. For me there are two reasons for this interest. First, neural nets are amazingly successful at what they do, for example in image recognition where they can beat human observers in accuracy and response time. More subtly, they have changed the way we look at some aspects of artificial intelligence, from mathematical models to biological models.

    With the benefit of hindsight this shouldn’t be surprising. If we want to mimic the behavior of say the visual cortex, starting with a low-level model of how the brain actually works (interconnected neurons with connectivity weights trained through learning) seems like a better bet than a high-level algorithmic abstraction of how we think vision works. You lose the benefit of understanding the process, but the effectiveness of the result is more important in this case than scientific insight.

    The way neural nets work is maybe easiest to understand in image recognition. First an image is broken up into small regions. Pixels within each region are tested against a function to detect a particular feature such as a diagonal edge. The function is simple – a weighted sum of the inputs, checked against a threshold function to determine if the output should trigger. Other feature tests (eg for color) can then be performed, but I’ll skip that complication.

    Outputs are fed into a second layer. The same process repeats, this time with a different set of functions which extract slightly higher level details from the first-level. This then continues through multiple layers until the final outputs provide a high-level characterization of the recognized object. The weighted sums at the core of this method can be modeled very nicely in a DSP or GPU, which is convenient because the Snapdragon 820 core offers both and can perform this modeling with low power consumption.

    Setting the weights requires a training phase. Once a net has been trained it can be used to distinguish between the classes of objects on which it has been trained – road signs for example. Within their training domain, such neural nets have been shown to achieve 99% or better recognition accuracy in real time.

    A great place to deploy this capability is in mobile systems, because that removes the need to go to the cloud for complex processing. Qualcomm recognized this and has just announced a software development kit to be used with the Snapdragon 820 (the processor at the heart of the Samsung Galaxy S7 and other phones) to enable neural net processing. This Snapdragon Neural Processing Engine SDK is powered by the Qualcomm® Zeroth™ Machine Intelligence Platform and is optimized for Snapdragon 820.

    Just some of the places this capability can be used are on smartphones, security cameras, cars and other platforms for scene detection, text recognition, object tracking and avoidance, gesturing and natural language processing. Think about upcoming electric vehicles, game-stations and “remoteless” home entertainment centers – all of these will enabled by this kind of technology.

    In many applications, untethering from the cloud is not a nice-to-have. You don’t want collision-avoidance dependent on whether you have line of sight to a cell tower (despite Verizon claims to the contrary, this is not universal). Or be at the mercy of heavy loads on cloud servers. And you don’t want security checks like facial or iris recognition on your phone farmed out to the cloud for similar reasons. Not to mention that man-in-the-middle attacks are an obvious weakness in cloud-based security.

    Thanks to programs like this, we can look forward to much more safety, security and other intelligent usefulness in mobile devices in the near future. You can learn more about the Qualcomm offering HERE.

    More articles by Bernard…


    Seven Reasons to Attend DAC in Austin

    Seven Reasons to Attend DAC in Austin
    by Daniel Payne on 05-06-2016 at 7:00 am

    I’m attending the 53rd Design Automation Conference (DAC) in Austin, Texas starting June 5th, and there are at least seven reasons that you should consider attending as well. For decades now DAC has been the premier place for all the players in our semiconductor ecosystem to get together: Academics, Commercial vendors in EDA, foundries, semiconductor IP, and media.

    1. Keynotes
    I like to hear about the big picture from industry luminaries, and this year we get to hear from people at NXP Semiconductors, NVIDIA Corporation, Advanced Micro Devices and the University of Texas at Austin. Recall that NVIDIA just announced a 15 billion transistor, the Tesla P100. NXP Semi is well-know in the automotive, IoT and security markets.


    The keynotes and Sky talks are here.

    2. Exhibitors
    As a blogger most of my time will be spent visiting some of the 175 exhibits to find out what’s new with EDA software, how it compares to previous releases, and who is winning against competitors.

    Here’s the online exhibitor list.

    3. Training
    Did you know that at DAC there is a special day just for training? Yes, on Thursday there are interesting training topics that you can sign up for like:

    • How to Build Class-Based Verification Environments in SystemVerilog
    • Learn UVM using the Easier UVM Coding Guidelines and Code Generator
    • SystemVerilog Synthesis Tuned for ASIC and FPGA Design
    • The Definitive Guide to SystemC TLM-2.0: Learn the Technology Standard that Underpins Virtual Platforms
    • Introduction to Embedded Security: Making Security Hard: Hardware Security and How to Use it
    • Introduction to Embedded Linux Security: Keys to Understanding Vulnerabilities in Embedded Systems and How to Secure Them
    • Taking Your C++ to the Next Level
    • Finding Creative Solutions to Complex Problems
    • Maximizing Mental Agility

    Signup for training classes here.

    4. Networking
    Let’s admit it, reunions are grand fun and can also benefit your career path by staying connected to those coworkers that have landed at various places over the years. Each evening of DAC look to network with other semiconductor professionals over cocktails from 6:00-7:00PM. There’s even a Technology Art Show for you to open up the creative side of what silicon chips and other techno-art looks like.

    5. DAC Tracks
    We live in an era of specialization, and so DAC has organized into multiple tracks based upon your specific interests:

    6. Video Casts
    Since DAC has so much activity going on simultaneously you cannot be all places at once, so throughout DAC they will be recording some portions for viewing later. They call this DACtv and you can bookmark this page and return to get caught up a bit. Here’s a glimpse of what a DACtv video looks like:

    7. Customers
    One important part of business is getting to know what your customers really want, so with all of the top EDA and IP executives in one place at one time, attending DAC to meet with key customers is an essential part of keeping close to your customers. I’m not talking about closing business at DAC but I am certain that relationships are started and enhanced by this special face time that cannot be gained through email, phone calls or even video conferencing.


    Is the Future Finally Here? What a GaAs!

    Is the Future Finally Here? What a GaAs!
    by Mitch Heins on 05-05-2016 at 4:00 pm

    Back in 1983 I was working for Texas Instruments during the beginning of the push to let common electrical engineers develop their own CMOS application specific ICs (ASICs). This would eventually the be the fuel that fed the semiconductor engine to reach over $335 billion in 2015. At that time, I was a young guy and I had a rascally old boss who used to say, “Gallium Arsenide – has been and always will be the technology of the future!”. Fast forward to 2016 and we witness the announcement byPOET Technologies Inc. that it has signed a definitive agreement to acquire all the shares of DenseLight Semiconductors Pte. Ltd. DenseLight is a Singapore-based privately held photonics company that designs, manufactures, and markets photonic optical light source products to the communications, medical, instrumentation, industrial, defense, and security industries. These products are based on DenseLight processes using Indium Phosphide (InP) and you guessed it, Gallium Arsenide (GaAs). Have we met the future and is it “now”?

    POET Technologies’ name is in fact an acronym for Planar Opto-Electronic Technology and they are working on moving from the lab into fab technologies for monolithically integrated opto-electronics or what POET calls smart optical components. Much of their value proposition is based on an invention they call DOES (Digital Opto-electronic Switch). This switch is used in their proposed products, that would replace with one POET IC, an existing transceiver made up of a hybrid assembly of a VCSEL (vertical cavity surface emitting laser), laser driving electronics, a GaAs photo detector, and a receiver IC consisting of a TIA, limiting amplifiers and an output amplifier . The existing transceivers have been effective operating at 10Gbps to 25Gbps over multi-mode fiber (MMF) up to about 100 meters. However, POET claims that these solutions fail for 500 meter links, especially at the 25Gbps link rates. This has prompted the market to look to multi-die solutions that use a combination of a silicon photonics IC (PIC), an electronics IC (EIC) and a laser along with single mode fiber (SMF) for the reaches past 100 meters.

    POET believes that they can use their III-V (GaAs) VCSEL epitaxy process and their new DOES technology to integrate both VCSELs and electronic FETs (field effect transistors) and HBTs (Heterojunction Bipolar Junction Transistors) on a one-chip solution that will provide 10X improvement over what can be provided by silicon photonics in this space (see POET white paper here).

    POET started life as OPEL Technologies out of Toronto, Canada selling III-V semiconductor devices through a U.S. company named ODIS Inc. to the military, industrial and commercial market spaces. They specializing in infrared sensor arrays and ultra-low power random access memory. They changed their name to POET Technologies in June of 2013 and have been working ever since to use their expertise in III-V processes to become a premier supplier of opto-electronics processes and smart optical solutions. Over the last year POET has made some major strides to bring their technology out of the lab and into the fab. A short time line follows:

    • August, 2015: POET announced a manufacturing services agreement with ANADIGICS, a New Jersey company, to prove out their new hybrid VCSELs in ANADIGICS 6-inch fabrication line.
    • September, 2015: POET reported they were expecting prototypes of their VCSEL products in Q2 of 2016, with hopes of providing a 10X improvement in energy consumption, component cost and form factors used for data center short reach and very short reach communications.
    • January, 2016: POET made separate announcements of a supplier agreement with EpiWorks, a wafer manufacturer specializing in epitaxial growth and a manufacturing services agreement with Wavetek Microelectronics Corporation, that specializes as a GaAs foundry.
    • March, 2016: POET announced a R&D initiative with IMRE/A*STAR in Singapore to develop smart pixel technology for the Augmented Reality market using GaAs and Gallium Nitride (GaN).
    • April, 2016: POET announces acquisition of all shares of DenseLight, whose fabs in Singapore specialize in Indium Phosphide (InP) and GaAs. Also in April, POET announced that they had multiple wafer lots of their hybrid VCSEL technology produced as promised in Q2 of 2016 and were in the process of characterizing them. They also announced they are working on a new GaAs-based resonance cavity based photonic sensor targeting prototypes by the end of 2016.

    All of that said, I’m not quite sure the future is here yet. Prototypes are interesting but the proof is in real production volume products. It appears however that POET is getting closer and showing good progress. For the sake of integrated photonics, whether it be Si or GaAs based, I hope they are successful. It would mark a major milestone towards the commercialization of integrated photonics out of the labs.


    Qualcomm’s New Smartphone Chips Go Straight At MediaTek

    Qualcomm’s New Smartphone Chips Go Straight At MediaTek
    by Patrick Moorhead on 05-05-2016 at 12:00 pm

    Last Thursday at Qualcomm’s Financial Analyst Day the company made a slew of chip announcements ranging from the industry’s 1 Gbps wireless LTE modem to a custom designed smartwatch SoC and platform called “Snapdragon Wear 2100 SoC”. In between those, Qualcomm also announced a few very overlooked chips that help strengthen Qualcomm’s position in the mid-tier of the market, which is still the fastest growing portion of the smartphone. Contrary to some beliefs, Qualcomm is not reducing its focus on the smartphone market, but rather refocusing their efforts to better utilize their vast portfolio of smartphone IP in SoCs. These new chip announcements are all perfect examples of this new focus, giving added value where Qualcomm can with their own IP without undercutting the high-end.

    The Snapdragon X16 LTE Advanced Pro Gigabit modem is the most complex piece of this puzzle and also the crown jewel of Qualcomm’s financial analyst day announcements. As a result, I’ve written a separate piece detailing all of the technological improvements and implications. The X16 LTE modem harnesses Qualcomm’s latest and greatest R&D and modem technologies that could allow Qualcomm to continue to push the wireless envelope (literally) and help maintain their current leadership. Since modems is arguably one of Qualcomm’s strongest technological areas, it makes sense that they would announce this alongside all these other chips. That leads me to the next chips announcements, those are the Snapdragon 625, Snapdragon 435 and Snapdragon 425.

    The Snapdragon 625, 435 and 425 are by no means Qualcomm’s highest-end processors, in fact they are squarely intended for the middle range of the market. All of these chips are updates on older chips like the Snapdragon 617, Snapdragon 430 and Snapdragon 410/415. The Snapdragon 625 is the fastest and most technologically advanced of the bunch. It is Qualcomm’s first mid-range smartphone chip that features the latest 14nm FinFET process technology that Qualcomm says helps reduce power by up to 35% over the previous generation Snapdragon 617. These power savings are huge because that means smartphone OEMs that utilize these chips can really see some crazy power savings thanks to the eight lower performance/power A53 CPU cores and the new X9 LTE modem with Cat. 7 down (300 mbps) and Cat 13 up (150 Mbps) support. It also has 802.11ac MU-MIMO wireless capabilities, which is something that has traditionally only been for high-end smartphone chips.

    Qualcomm also added their 500 series Adreno GPU with the Adreno 506 in the Snapdragon 625, giving it extremely high graphical capabilities. They also added Qualcomm’s own high-end ISPs and DSPs to enable dual high-resolution cameras, up to 24-megapixel main camera and 13-megapixel front-facing cameras. For good measure, it also has Qualcomm’s latest QuickCharge 3.0 charging technology which is one of the fastest charging technologies I have seen to date.

    This chip from top to bottom screams that it is intended for the Chinese smartphone market and I believe the likes of Xiaomi and ZTE will very eager to get their hands on these chips. Even though I am by no measure a fan of eight-core CPU SoC designs with eight A53 CPU cores, there is no denying that the Chinese market and Qualcomm’s Chinese customers are demanding them and they’re filling a need. Even so, this chip will still very likely be very popular in China, especially with its 4K hardware encode and decode capabilities, that should prove pretty popular as 4K displays continue get cheaper and more market penetration.

    The Snapdragon 435 and 425 chips are also ARM Cortex A53-based SoCs, however they are 28nm processors, making them more affordable for OEMs as they are not using a leading edge process. The 435 is a slight upgrade from the Snapdragon 430, that has an eight core processor and adds a Cat 6 LTE modem which means download speeds of up to 300 Mbps and 2x CA, something that is becoming standard across the world. The Snapdragon 435 also adds differentiated features like an integrated Hexagon DSP and QuickCharge 3.0 which makes it an extremely attractive entry-level smartphone chip. The Snapdragon 425 on the other hand is only a quad core processor SoC, and is designed to replace the Snapdragon 410 and 415. However, it was designed to up the specs of Qualcomm’s entry-level SoCs and integrates a lot of technologies that smartphone vendors playing in that area might want to see in a single chip. Namely, those features are HD display at 60 Hz support as well as dual 13-megapixel camera support, 1080P video and a Cat 4 LTE modem with 2×10 CA for maximum download speeds of up to 150 Mbps. However, both the 435 and 425 also feature 802.11ac MU-MIMO wireless capability which shows Qualcomm bringing their wireless leadership with MU-MIMO all the way down to the 400-tier of SoCs. Qualcomm also included their QuickCharge 2.0 charging technology as well as their DSP and sensor hub to lower development costs for the OEMs in this price sensitive space.

    What makes the Snapdragon 625, 435 and 425 so interesting is that they are all software and pin compatible so that OEMs in China and other developing markets can save money on PCB designs and software development. Qualcomm is clearly making these chips as a way to open up their potential customer base and take away customers from the likes of AllWinner, MediaTek, Rockchip and Spreadtrum. This is also where a few of Huawei’s HiSilicon chip live.

    Qualcomm’s last chip announcement was probably the one that got the most attention outside of their LTE Advanced Pro X16 modem and that was the Snapdragon Wear chip. The Snapdragon Wear 2100 SoC is the first chip in their family of SoCs squarely aimed at the wearable market. Previously, Qualcomm simply repurposed some of their most capable low-cost chips like the Snapdragon 400 in order to satisfy short term demand for decent SoCs. In fact, most current generation smartwatches from Huawei, Motorola, LG and others feature Qualcomm’s Snapdragon 400. To me, this always seemed like a short-term fix until they actually built a purpose-built chip for wearables, which is exactly what the Snapdragon Wear 2100 SoC is designed to be.

    The Snapdragon Wear 2100 is first and foremost 30% smaller than the Snapdragon 400 which should leave more room for other components and frees up space for more battery or an overall thinner device. Qualcomm says the Snapdragon Wear 2100 is also 25% lower power than the Snapdragon 400 series processors which means that wearables with this processor should expect to have much longer battery life than ever before. The Snapdragon Wear 2100 has a quad core CPU with ARM Cortex-A7 cores which are clocked around 1 GHz paired with an Adreno 304 GPU and 400 MHz LPDDR3. This design makes a lot of sense when you think about the power sensitivity of most watches and how small their batteries are. Because of the move from A53 to A7 CPU cores, there should be a significant reduction in power consumption and very little difference in terms of performance as most wearables really don’t do much heavy on-board computing. Most compute is offloaded to the smartphone or cloud to save on power and improve on performance. This tiny little chip also features Qualcomm’s X5 LTE modem which allows for untethered voice and data like what AT&T offers with number sync that allows both your smartwatch and smartphone to share the same number and both take calls and use data. The Snapdragon Wear 2100 also has a built-in sensor hub like the Snapdragon 425. This once again shows Qualcomm harnessing their various IP advantages in modem, sensors and graphics to make their SoC the superior solution. I fully expect to see the Snapdragon Wear 2100 in a multitude of wearables this year, although Qualcomm hasn’t given any guidance of when devices with this chip will be available.

    Qualcomm’s announcements yesterday at their Financial Analyst Day marked a major shift by the company to go after more of the mid-range market and to take the fight to their competitors. They are doing this with their renewed focus on their IP that gives them an advantage over their competitors like their modems, GPUs, ISPs, DSPs and other processors. It will be interesting to see how this renewed focus and improved new chips will attract Asian OEMs towards Qualcomm’s products and how many wearable manufacturers adopt their Snapdragon Wear platform.


    More from Moor Insights and Strategy


    Are Standard Cell Libs, Memories and Mixed-signal IP Availabe at 7nm FF?

    Are Standard Cell Libs, Memories and Mixed-signal IP Availabe at 7nm FF?
    by Eric Esteve on 05-05-2016 at 7:00 am

    More than 500 designers (562) have responded to a survey made in 2015 by Synopsys. Answering to the question “What is the fastest clock speed of your design?” 56% have mentioned a clock higher than 500 MHz (and still 40% higher than 1 GHz). If you compare with the results obtained 10 years ago, the largest proportion of answers was for clock ranging between 100 MHz to 300 MHz.

    That means that Moore’s law has been extremely effective during the last ten years, and also that for a high proportion of designs, speed improvement is a real need. These designs are the natural candidates to target FinFET technologies. From the graph below, you see that moving from 28nm (bulk) to 14nm FF can provide 45% faster frequency, at constant dynamic & leakage power, each step below, 10nm and 7nm, providing another 20% improvement.

    Foundries like TSMC, Samsung or GlobalFoundries are in charge of the technology development and companies like Synopsys have to provide EDA tools and design enablement, foundation, memories and mixed-signal IP. Reading the presentation made by Navraj Nandra during the Silicon Valley SNUG last March will give you a very good understanding of the challenges linked with nodes like 7nm FF, and the way Synopsys has overcome these challenges to design standard cell libraries, memories and interface (mixed-signal) IP. These foundation IP had to be optimized for Power, Performance and Area for 14nm FF, 10nm FF and 7nm FF, just like it was done for now mature nodes, 65nm, 40nm or 28nm.

    If you take the example of the RC associated with BEOL, the value per um is moving from 5 E-15 for 14nm FF, to 10.5 E-15 for 10nm FF and up to 21.6 E-15 for 7nm FF, or doubling for every node. Another challenge is the metal pitch, requiring double patterning below 40 nm. Moreover, you can’t just scale down the 10nm standard cell library to target 7nm, but you have to lower the fin count, called “Fin Depopulation”, to lower dynamic power while preserving speed and reducing standard cell density. You even can get an additional 20% speed improvement by using specialized cells & assist circuits and managing electrostatic control.

    As well, you can’t just re-use memory compiler, as you have to create new embedded self-test and repair to address new 7nm defects, like process variation induced (fin height, fin pitch or lithography issues), systemic and random faults. Moreover, managing low resistance and parasitic line capacitance are becoming critical at 7 nm, remember that RC value is doubling at every node…

    The above table gives you an overview of the many specialized cells developed by Synopsys for 7 nm, optimized for a combination of Performance, Power and Area to target CPU, GPU or DSP. The effort is worth to do, as, after decreasing both the metal pitch and the gate pitch, you benefit from the scaling factor, improving Area, exhibiting much higher Performance (speed improvement graph) and lower Power consumption: PPA has been optimized and this is the first consequence of Moore’s law.

    Complexes, multicore SoC designed in 7 nm will certainly integrate CPUs, or GPUs or DSPs, and probably a combination of all these cores. When integrated in the system, such a SoC will interface with DRAM based on DDR3 or DDR4, or LPDDR3, 4 protocol, communicate thanks to USB 3.1 or PCI Express 4.0, to name a few. Before developing such complex IP, a vendor has to evaluate the market size (the TAM) to calculate the ROI and make the decision to invest in heavy development. From this well-known graphic built by Synopsys, we can evaluate the number of cumulated design starts per node, since the technology introduction. After zoom in the graphic, we come to about 200 design starts for 16/14 FF and 50 for 10 and 7nm. This sanity check is very positive, if the IP vendor can afford development cost and expect reasonable market share (usually, Synopsys enjoys in the 40-60% market share for interface IP), the ROI should be great.

    From the presentation made at SNUG, we understand that developing for 7 nm interface IP like DDR4, PCIe 4.0 or USB 3.1 type C, to name a few, require satisfying more stringent requirements than for previous bulk-based technologies. The layout effort doubled, due to Restricted Design Rules (RDR) and multi-patterning. To address reliability and process variability issues for digital IP, RAS features has to be implemented for PCIe 4.0 or DDR4, through stronger data protection like parity or ECC in conjunction with protocol defined mechanism to detect and correct errors in the data path and RAMs. He will have to use event counters and statistics to monitor system availability and to leverage error injection and silicon debug capabilities to diagnose issues and validate system recovery. The result will be 30% improved area and power compared to previous FinFET nodes.

    You can find many blogs talking about the end of Moore’s law, at least as Moore’s law used to be: faster and cheaper transistor when going down by one node. The introduction of FinFET based technologies is a way to continue Moore’s law by packing more transistors and cores in a SoC with higher performance and lower power. If you asking if there is a market for these FinFET nodes, the 200 design starts in 14/16 nm FinFET is the answer. The chip makers or OEM targeting fin FET need EDA tools, foundation IP (standard cell library and memories) and interface IP portfolio and Synopsys has demonstrated that they offer these down to 7 nm.

    From Eric Esteve from IPNEST

    You will also benefit from this video here.


    Body-biasing for ARM big or LITTLE in GF 22FDX

    Body-biasing for ARM big or LITTLE in GF 22FDX
    by Don Dingee on 05-04-2016 at 4:00 pm

    GLOBALFOUNDRIES has been evangelizing their 22FDX FD-SOI process for a few months; readers may have seen Tom Simon’s write-up of their preview at ARM TechCon. Dr. Joerg Winkler recently gave an updated webinar presentation of their approach in an implementation of ARM Cortex-A17 core.

    By now, you’ve probably heard that 22FDX targets a cost/die comparable to 28SLP while offering a performance boost from its next-generation FD transistors. 22FDX also offers the ability to integrate RF features and an opportunity to reduce RF power consumption significantly.

    Most of the story has focused on PPA (power-performance-area) enhancements, but I see two aspects of this people may have overlooked. To prove out the 22FDX process, GF decided to tape out a baseline implementation for comparison. The starting point was the same: a quad-core Cortex-A17 using the same processor core macro.


    However, taking full advantage of 22FDX is not as simple as dropping in a 28SLP design. Details of body-bias routing come into play. We’ve seen the above picture before as well. One can apply reverse body-bias (RBB) to raise VT and lower leakage, or apply forward body-bias (FBB) to lower VT and increase maximum frequency. FBB uses a flipped-well architecture where the nMOS transistor sits on the N-well and the pMOS transistor sits on the P-well.

    Winkler launches into an overview of how GF has teamed with Cadence on tools handling the details of body-biasing and other details of the 22FDX design flow. Philosophically, GF chose to implement one unified body-bias scenario for the Cortex-A17 baseline tests in 22FDX. They placed each of the cores on its own power domain, and brought in 5 pairs of body-bias nets with an outer ring approach (the white lines around the “non-CPU” block and the boundary of the four cores).


    One of the interesting points is the body-bias networks are known to the design flow. GF leverages support for UPF in the Cadence platform (UPF scripts were heavily used), as well as multi-corner PVT and PVTB support. There is also discussion of the details of handling cache. In this implementation, there are 14 different L1 cache macros, and one L2 cache macro. Each has to be supported for periphery body-biasing and bitcell array body-biasing, leading to the need for 5 body-bias net pairs. The routing has to obey high-voltage spacing rules.

    After the extensive discussion of how they added body-biasing to a quad-core Cortex-A17 in 22FDX, I got the distinct feeling that it is very hard to compare this big implementation apples-to-apples to the 28SLP baseline because no specific results were shared. Winkler switches to another story we’ve seen before, a PPA comparison on Cortex-A9 which is much simpler. The punchline of that story is for the same clock speed, the 22FDX version of the Cortex-A9 uses 45% less power and 45% less area – using RBB. One could choose to use FBB, and in that same 45% less area get 30% more frequency at the same power point.

    That leads to what I think are the two main takeaways of 22FDX. To change the PPA target on bulk nodes, the implementation has to change. On 22FDX, using body-bias (possibly dynamically under software control) one can slide the same implementation up and down the power-frequency curve. Also, up front choices of either RBB or FBB can have a major impact – for example, in the same SoC on 22FDX a big Cortex-A17 cluster could use FBB for maximum performance, and a LITTLE Cortex-A9 cluster could use RBB for minimum power consumption.


    You can see the entire GF webinar on 22FDX in the clear on YouTube:
    How to Implement an ARM Cortex-A17 Processor in 22FDX FD-SOI Technology

    There’s also much more information about 22FDX on the GF website. The investment in getting into 22FDX and having control over tuning an implementation using body-biasing puts it in a unique spot. Instead of just chasing smaller and smaller geometries, 22FDX captures the costs of a now-mature 28nm node with significant performance advantages.


    Eight Improvements for PCB Software

    Eight Improvements for PCB Software
    by Daniel Payne on 05-04-2016 at 12:00 pm

    I first met John Durbetaki at Intel in Aloha, Oregon and we both had a keen interest in the nascent personal computer industry. My first PC was made by Radio Shack and dubbed the TRS-80 which maxed out at 48KB of RAM. I kept watch on Durbetaki as he left Intel and formed his own company OrCAD in 1985 to serve the needs of PC-based CAD software. OrCAD started out with just schematic capture, but then added simulation and PCB layout tools, finally being acquired by Cadence in 1999. Last week I spoke with two folks at Cadence about what’s new with their PCB software, and they shared the eight latest improvements. Cadence divides up the PCB world into two different product lines based on design complexity: OrCAD for low-end designs, and Allegro for high-end designs.

    Hemant Shah started out with an update on what’s new with Allegro, their PCB tools for enterprise users. The four latest improvements in Allegro 17.2-2016 are:

    [LIST=1]

  • Advanced flex and rigid-flex designs
  • New concurrent team design option
  • Interoperable Allegro and Sigrity technology
  • Higher performance and capacity

    Flex
    A lot of our consumer electronics devices use flex and rigid-flex designs: watches, glasses, tablets, phones. By laying electronic components on top of flexible cable you can pack more features into a smaller space. This 17.2 release for Allegro lets you define stack-up by zones, perform DRC violations for flex layers and perform arc-aware routing.

    Concurrent Team Design
    Divide and conquer is a proven approach to getting more work done in a shorter time, and so now you can use Allegro in either an ad-hoc team design (no setup) or a structured team design (some setup). When you divide the PCB project between five designers, then routing can be done concurrently which saves up to 80% in time versus the old way of doing a combined routing.

    Allegro and Sigrity
    Cadence purchased Sigrity back in 2012 because of their signal integrity tools for PCB designers, and now this integrated technology ensures that your critical high-speed signals are meeting the performance criteria with features like:

    • Tabbed routing
    • Custom return path via structures
    • Extended in-design rules for backdrilled vias
    • PI (Power Integrity) for PCB designers

    Performance and Capacity
    The 17.2 release now supports a 64-bit OS, so RAM can be up to 18 Quintillion and database sizes up to 3GB. Any of your CPU-intensive applications will see performance gains. Two additional improvements are a new cross-section editor, and a new padstack editor.

    Up next was Kishore Karnane and he talked about four new improvements to the OrCAD 17.2 release:

    [LIST=1]

  • Design differencing in Capture
  • Advanced annotation and auto-referencing in Capture
  • PSpice for virtual prototyping
  • PCB Designer for advanced flex and rigid-flex designs

    The OrCAD tools are ideally suited for designs in the IoT marketplace because of the ease of use, productivity and low cost.

    Design Difference
    Often during the design process there have been multiple changes made, but you want to know how many changes and where each one is located. This graphical design difference feature will show you exactly what has changed, both logically and graphically, saving time so that you don’t have to attempt a manual comparison.

    Advanced Annotation
    In a PCB design every electrical component has a unique instance name, so you can either assign these instance names manually or use some automation. With OrCAD 17.2 you have plenty of automation for instances:

    PSpice for Virtual Prototyping
    IoT systems typically have analog sensors, processors, peripherals, etc. PSpice can simulate all of the mixed-signal sensors, then you can simulate your RTL and TLM with the Incisive simulator as shown below:

    You can even swap out your controller virtual platform with the actual hardware by using DMI through an Arduiono board.

    PCB Designer with flex
    Remember those new flex and rigid-flex features in Allegro? Well, you also get the same thing in OrCAD now. Teams that start a project with OrCAD can even decide to upgrade and use Allegro, because they’ve made this a scalable transition.

    Summary
    The PCB marketplace has changed dramatically since the 1980s and Cadence has kept up with current trends by offering two product lines in OrCADand Allegro, so choose the one that best fits your technical needs and budget.


  • Is Tesla Making Their Own CPUs?

    Is Tesla Making Their Own CPUs?
    by Daniel Nenni on 05-03-2016 at 4:00 pm

    One of the benefits of administering a leading semiconductor design enablement portal is that I get to see the traffic patterns then try and figure out what’s behind them. For example, a Cupertino domain has been reading all of our automotive content very thoroughly. We also get hits by Google.com, Amazon.com, and dozens of other Fortune 1000 domains from around the world that are not traditional semiconductor companies. Another more recent SemiWiki fan is a car company here in Silicon Valley who is consuming our semiconductor design enablement content. So you have to ask yourself, “Self, what in the heck is going on here?”

    In case you missed the news from Tesla back in January:

    Jim Keller is joining Tesla as Vice President of Autopilot Hardware Engineering. Jim will bring together the best internal and external hardware technologies to develop the safest, most advanced autopilot systems in the world.

    How good is Jim Keller? Jim does not have a LinkedIn profile but he does have a wiki page, that’s how good he is! Jim is a bit famous here in Silicon Valley for his work on the DEC Alpha, the AMD K7 and K8 architectures, and the ARM based Apple A4 and A5 SoCs. Jim and his team landed at Apple as a result of the P.A. Semi acquisition which you can read about in our book “Mobile Unleashed” chapter 7 “from Cupertino”:

    Dan Dobberpuhl, of StrongARM fame, formed fabless firm P. A. Semi in 2003 with industry veterans including Jim Keller and Pete Bannon. They embarked on research of Power Architecture, creating the PA6T core and the highly integrated PWRficient family of processors. PWRficient featured an advanced crossbar interconnect along with aggressive clock gating and power management. Its near-term roadmap had single and dual 2 GHz 64-bit cores. The approach delivered similar performance to an IBM PowerPC 970 – the Apple G5 – at a fraction of the power consumption. This made PWRficient well suited for laptops, or embedded applications… Unexpectedly, Apple bought P. A. Semi and many of its 150 employees in April 2008 for $278M…

    At Apple and P.A. Semi:

    Keller was most recently a director in the platform architecture group at Apple focusing on mobile products, where he architected several generations of mobile processors, including the chip families found in millions of Apple iPads, iPhones, iPods and Apple TVs. Prior to Apple, Keller was vice president of design for P.A. Semi, a fabless semiconductor design firm specializing in low-power mobile processors that was acquired by Apple in 2008. While there, he led the team responsible for building a powerful networking System on a Chip.

    In August of 2012, Jim went back to AMD to build a team and develop an ARM-based server processor (Opteron) which has just started shipping. Next, Jim’s team did a refresh of the x86 core architecture (Zen) which is being built on GF 14nm due out later this year. The latest data points (from SemiWiki and LinkedIn) show that quite a few members of his team from AMD have joined Jim at Tesla, which should make NVIDIA quite nervous since Tesla cars currently use NVIDIA chips.

    And that ends the latest episode of “As the Chip Turns”.