IEDM Blogs – Part 2 – Memory Short Course

IEDM Blogs – Part 2 – Memory Short Course
by Scotten Jones on 12-16-2015 at 12:00 pm

 Each year the Sunday before IEDM two short courses are offered. This year I attended Memory Technologies for Future Systems held on Sunday, December 6[SUP]th[/SUP]. I have been to several of these short courses over the years and they are a great way to keep up to date on the latest technology.

Introduction and Overview

Dirk Wouters of RWTH Aachem University was the course organizer and he gave a brief introduction to the course objectives and set the stage for the presenters to come.

System Requirements for Memories

Rob Aitken of ARM Research taught this session of the class. With ARM cores in virtually every mobile device made today, ARM is in a unique position to address this aspect of the course.

With mobile devices so important in today’s world the system requirement began with a discussion of energy. The first point was batteries don’t follow Moore’s law and batteries deliver energy. Memory systems need to be energy efficient in today’s applications.

Even for non-mobile devices every CPU has a power envelope it has to meet. The highest performance server chips have to produce good performance per watt. Heat and energy consumption in massive server farms is a major issue.

In order to meet power requirements a system approach is required taking into account everything from the IC process to the other system components, power management policies and software.

Memory systems today follow complex hierarchies designed to keep up with CPU performance. The memory follows a pyramid structure:

[LIST=1]

  • Registers – on-die with the CPU, registers are small, very fast memory with wide data paths. Because they are written so frequently very long endurance is required on the order of 10[SUP]18[/SUP]. Registers are built with SRAM. SRAM has access speeds in the hundreds of picoseconds. Registers are typically kilobytes in size.
  • Cache – also on-die and also built with SRAM. There are typically several levels of cache, L1, L2, and L3 with each level being a larger and slower memory array than the previous cache. In a multicore system L1 and L2 may be processor specific and L3 unified for all cores. Typical total cache storage on die is in the megabytes in size.
  • Main memory – off-die, main memory is much larger than any on-die memory. Due to cost constraints main memory is made of DRAM with much lower cost per bit than SRAM but at the cost of slower speed (on the order of 10 nanoseconds) and much lower endurance on the order of 10[SUP]9[/SUP]. Main memory is typically Gigabytes in size.
  • Disk – off-die, either a hard drive or more recently NAND Flash based solid state drives. Disk storage is much slower and cheaper than main memory. Access speed is on the order of milliseconds with endurance of around 10[SUP]4[/SUP]. Disk storage is typically hundreds of gigabytes or more.

    There was a lot more material covering CPU design and how the CPU interacts with memory and challenges with existing memory. One of the key points of the whole discussion was that memory architecture is designed to work with the capabilities of currently available memory. New memory types will open up new applications.

    One interesting area is the concept of storage class memory where a byte addressable – non-volatile memory with faster speed than NAND is interposed between main memory and disk storage speed up system performance. This is the application that the new Intel-Micro 3 XPoint memory is targeted at. 3D XPoint has 1,000x the speed and endurance of NAND and is intermediate in cost between DRAM and NAND.

    DRAM Life Extension Challenge and Response
    This session of the course was taught by Changyeol Lee of SK Hynix. SK Hynix is a top three producer of DRAMs in the market today.

    This section began with a brief description of what DRAM is. DRAM allows random access and is volatile, requiring periodic refreshes to maintain stored values. DRAMs are made up of banks – a bank is the minimum subdivision of DRAM that transacts data independently. There are typically 4 or 8 banks to a DRAM. Banks are made up of mats where each mat is made up of a memory cell array with word line drivers and bit line sense amplifiers.

    A single DRAM memory cell can be described as 8F[SUP]2[/SUP] or 6F[SUP]2[/SUP] where F is the features size and 6F[SUP]2[/SUP] or 8F[SUP]2[/SUP] is the cell area. DRAM has transitioned from 8F[SUP]2[/SUP] to 6F[SUP]2[/SUP] enabling denser memory arrays but at the cost of more sensitivity to misalignment and noise.

    DRAM cells are made up of an access transistor and a capacitor (1T1C). The transistor controls access to the cell capacitor (and must be very low leakage requiring long gate lengths) and the capacitor stores the value. Access transistors have been through several transitions beginning as planar devices and then transitioning to 3D structures to provide longer channel length in smaller areas. The current access transistor of choice is a saddle fin with a buried gate. There is a lot of work being done on vertical gate transistor as the next step possibly enabling a 4F[SUP]2[/SUP], cell but there are still a number of challenges to overcome.

    Continued DRAM scaling has required multi-patterning due to the well-publicized delays in EUV. As you move from self-aligned double patterning (SADP) to self-aligned triple patterning (SATP) and self-aligned quadruple pattering (SAQP) critical dimension uniformity (CDU) gets worse. Moving from SATP to SAQP CDU gets much worse for only a minimal gain in linewidth.

    Maintaining the capacitance of the cell capacitor, while scaling below 20nm is a big challenge. Cylinder capacitors will likely give way to taller pillar capacitors because the dielectric films need to be too thick to fit between the cylinder structures. A higher k dielectric is needed with low leakage but there appears to be a fundamental trade-off between high-k values and leakage with leakage increasing as k goes up.

    A variety of reliability and bandwidth issues were also discussed.

    Although there are solutions to memory bandwidth issues such as hybrid memory cube continued DRAM scaling faces a number of very difficult fundamental issues.

    Conventional Memory Technology: Flash Memory
    This session was taught by Sujin Ahn (corrected from original blog, Youngwoo Park as originally reported wasn’t available to teach the course) of Samsung. Samsung is a top three producer of Flash memory.

    The talk began by observing that NAND Flash density has doubled every year while bit cost has come down 35% per year. Some of the key applications that have driven the growth of flash were then reviewed.

    Flash has successfully scaled to less than 20nm with low resistance gates, air gaps, and double patterning and now quadruple patterning.

    2D Flash faces a number of scaling issues. In order of severity, most to least there is cell-to-cell interference, patterning costs with gate and active requiring quadruple patterning, inter-poly dielectric thickness limitations, control gate poly filling and other issues. Each of these issues and more were discussed in detail to basically make the case that 2D planar NAND scaling is over and 3D NAND is needed.

    The move to 3D NAND has also seen a change from floating gate cells to charge trap using silicon nitride as the charge trap (although Intel-Micron are still using floating gate for 3D). The various 3D architectures were then reviewed: Bit Cost Scalable (BiCS) from Toshiba, Vertical Gate NAND (VGNAND) from Macronix, Stacked Memory Array transistor (SMArt) from SK Hynix and Terabit Cell Array Transistor (TCAT) from Samsung. A 3D cell has approximately 20x the effective area of a 2D cell.

    Challenges with 3D NAND include word line (WL) cross talk due to the large parallel WL planes. Silicon nitride trap layers connected down the vertical channel can allow losses down the layers and the polysilicon channels used for 3D have lower mobility than single crystal silicon channels used for 2D (in a later blog I will discuss work IMEC is doing on creating higher mobility InGaAs channels).

    In current 3D NAND the peripheral and core circuitry is outside of the memory array stack. One of the hottest areas of research is to put the peripheral and core circuitry under the memory array. This would reduce the die size but at the cost of process complexity.

    3D NAND offers a path for continued cost per bit reductions and 1 terabit on a single die but issues with high aspect ratio contacts, stress control of the memory layers stack, channel mobility and more will need to be overcome.

    Emerging Memory Technologies: ReRAM and PCM
    This session was taught by Daniele Ielmini of DEIB – Politechno di Milano and IU.NET.

    For future system memory generation is growing rapidly and high-density high-performance memory is needed.

    Emerging memory devices included Resistive switching memory (ReRAM), Conductive Bridge memory (CBRAM, a sub type of ReRAM), Phase Change Memory (PCM), Spin-Transfer Torque memory (STT-RAM) and Ferroelectric memory (FeRAM).

    The resistance memories such as ReRAM, CBRAM and PCM were the focus of the rest of the session and all have reasonable write speeds ~100ns and small 4F[SUP]2[/SUP] cells suitable for cross point memory (suitable for storage class memory).

    The issues of scaling ReRAM and PCM were then reviewed. Noise could be a big issue for scaled ReRAM. Both memory types lack multilevel cells and what to use for the cell selector is still an open issue. The most likely applications are likely storage class memory, internet of things (IOT) where power is an issue and computing memory.

    Basics of STT-RAM
    This section was taught by Thibaut Devolder of Institut d’Electronique Fondamentale.

    Unfortunately I had to leave for the IMEC technology forum (ITF) and wasn’t able to see this part.


  • 0 Replies to “IEDM Blogs – Part 2 – Memory Short Course”

    You must register or log in to view/post comments.