RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

Designing Next Generation Memory Interfaces: Modeling, Analysis, and Tips

Designing Next Generation Memory Interfaces: Modeling, Analysis, and Tips
by Mike Gianfagna on 03-04-2020 at 10:00 am

IBIS AMI vs. Transient

At DesignCon 2020, there was a presentation by Micron, Socionext and Cadence that discussed design challenges and strategies for using the new low-power DDR specification (LPDDR5). As is the case with many presentations at DesignCon, ecosystem collaboration was emphasized. Justin Butterfield (senior engineer at Micron) discussed the memory aspects and Daniel Lambalot (director of engineering at Socionext) discussed the system aspects. I was able to spend some time with one of the other authors, Zhen Mu (senior principal product engineer at Cadence) as well. Zhen provided background on the tool platform used in this program, which is completely supplied by Cadence.

The LPDDR5 spec was finalized and published last year and is the cutting edge of DDR memory interfaces. Increased speed and lower power don’t come for free. There are many challenges associated with using LPDDR5, including channel bandwidth, reduced voltage margin, the need to route multiple parallel channels, dealing with crosstalk and ensuring proper return currents, multi-drop configurations (2 DRAM loads) and limited equalization capability.

The key features of LPDDR5 can be summarized as follows:

  • Higher data rates (up to 6.4Gbps)
    • Data transfer boosted about 1.5 times of the previous LPDDR4 interface
  • Power-isolated LVSTL interface with:
    • VDD2H=1.05V for the DRAM core
    • VDDQ=0.5V for the I/O
  • New packaging
  • Non-targeted on-die termination (ODT)
  • New eye mask specifications
    • Change from rectangular mask in LPDDR4 to hexagonal mask in LPDDR5
    • Two timing measurements – tDIVW1/tDIVW2 and vDIVW
    • See diagram, below
  • Data bit inversion (DBI)
  • Separate Read strobe (RDQS) and Write strobe (WCK)
  • Advanced equalization technologies such as feed-forward equalization (FFE), continuous time linear equalization (CTLE), and decision feedback equalization (DFE) for the controller and the memory

Timing is one of the most challenging aspects of LPDDR5; controller jitter must be considered. Accurate modeling is key for success. The presentation discussed the details of modeling and analysis approaches to optimize the use of LPDDR5 in actual designs. Items to be considered include device modeling, system-level design with typical topology, channel simulation for parallel bus analysis, bus characterization, modeling filtering functions implemented in LPDDR5 DRAMs and crosstalk simulation.

IBIS (I/O Buffer Information Specification) and the Algorithmic Modeling Interface (AMI) extensions are standards typically used in SerDes design and analysis. IBIS-AMI modeling can also be applied to parallel bus analysis for LPDDR5 designs. The benefits of this modeling approach include interoperability (different models work together), portability (models run in multiple simulators), accuracy (results correlate to measurements), IP protection (circuit details are not exposed) and performance (million-bit simulations are practical).

There are challenges to apply SerDes modeling to a DDR interface, including the non-symmetrical nature of LPDDR5 timing, see diagram below.

From a big picture point of view, channel simulations and circuit transient analysis are correlated, including IBIS-AMI models, using the Cadence Sigrity Explorer tool. To complete the analysis, memory models were supplied by Micron and controller models were supplied by the Cadence IP team. Socionext and Micron provided package models for controller and memory, respectively. See diagram below for some results.

Green: Circuit simulation; Blue: Channel simulation

For crosstalk simulation, two approaches were used -characterize each bus signal individually as is done in SerDes channel simulations and characterize the entire bus with practical stimulus patterns. Using the following conditions, the effect of channel length on performance can be modeled.

The analysis suggests a trace length of one inch is desirable. This presentation highlighted the modeling and simulation techniques needed to help achieve a fully functional system with LPDDR5 memories. Accurate modeling with good ecosystem participation are required for success.


LithoVision – Economics in the 3D Era

LithoVision – Economics in the 3D Era
by Scotten Jones on 03-04-2020 at 6:00 am

Slide3

Each year on the Sunday before the SPIE Advanced Lithography Conference, Nikon holds their LithoVision event. This year I had the privilege of being invited to speak for the third consecutive year, unfortunately, the event had to be canceled due to concerns over the COVID-19 virus but by the time the event was canceled I had already finished my presentation so I thought I would share it with the SemiWiki community.

Outline
The title of my talk is “Economics in the 3D Era”. In the talk I will discuss the three main industry segments, 3D NAND, Logic and DRAM. For each segment I will discuss the current status and then get into roadmaps with technology, mask counts, density and cost projections. All the status and projections will be company specific and cover the leaders in each segment. All the data for this presentation, technology, density, mask counts, and cost projections come from our IC Knowledge – Strategic Cost and Price Model – 2020 – revision 00 model. The model is basically a detailed industry roadmap that provides simulations of cost, equipment and materials requirements.

You can read about the model here.

3D NAND
3D NAND is the most “3D” segment of the industry with a layer stacking technology that provides density improvement by adding layers in the third dimension.

Figure 1 illustrates the 3D NAND TCAT Process.

Figure 1. 3D NAND TCAT Process.

In the 3D NAND segment, the market leader is Samsung and they use the TCAT process illustrated in this slide. Number two in the market is Kioxia (formerly Toshiba Memory) and they use an essentially identical process. Micron Technology is also adopting a charge trap process we expect to be similar to this process making this process representative of the majority of the industry. SK Hynix uses a different process that still shares many key elements with this process. The only companies not using a charge-trap process are Intel-Micron and now that Intel and Micron have split apart on 3D NAND, Intel will be the only company pursuing floating gate.

The basic process has three major sections:

  1. Fabricate the CMOS – the CMOS writes, reads and erases the bits. Initially everyone except Intel-Micron fabricated the CMOS next to the memory array with Intel-Micron fabricating some of the CMOS under the memory array. Over time other companies have migrated to CMOS under the array and we expect within a few generations that all companies will migrate to CMOS under the array because it offers better die area utilization.
  2. Fabricate the memory array – for charge trap the array fabrication takes place by depositing alternating layers of oxide and nitride. A channel hole is then etched down through the layers and refilled with an oxide-nitride-oxide (ONO) layer, a poly silicon tube (channel) and filled with oxide. A stair step is then fabricated using a mask – etch – mask shrink – etch approach. A slot is then etched down through the array and the nitride film is etched out. Blocking films and tungsten are then deposited to fill the horizontal openings where the nitride was etched out. Finally, vias are etched down to the horizontal sheets of tungsten.
  3. Interconnect – the CMOS and memory array are then interconnected. For CMOS under the array, some interconnect takes place before the memory array fabrication.

This approach is very mask efficient because many layers can be patterned with a few masks. The overall process requires a channel mask, several stair steps masks depending on the number of layers and process generation, in early generation processes, a single mask could produce approximately 8-layers but some process today can reach 32-layers with a single mask. The slot requires a mask, sometimes there is also a second shallow slot that requires a mask and finally via requires a mask.

The channel hole etching is a very difficult high-aspect-ratio (HAR) etch and once a certain maximum number of layers is reached the process must be broken up into “strings” in something called “string stacking”. In string stacking, basically, a set of layers is deposited, a mask is applied, and the channel is etched and filled. Then another set of layers is deposited and masked, etched and filled. In theory this can be done many times. Intel-Micron use a floating gate process that uses oxide-polysilicon layers that are much more difficult to etch than oxide-nitride layers, and they were the first to string stack. Figure 2 illustrates Intel-Micron string stacking.

Figure 2. Intel-Micron String Stacking.

Each company has their own approach to channel etching and their own limit in terms of when they string stack. Because they use oxide-poly layers Intel-Micron produced a 64-layer device by stacking 2 –  32-layer strings and then produced a 96-layer device by stacking 2 – 48-layer strings. Intel has announced a 144-layer device that we expect to be 3 – 48-layer strings. SK Hynix began string stacking at 72-layers and Kioxia at 96-layers (both charge trap processes with alternating oxide-nitride layers). Samsung is the last holdout on string stacking having produced a 92-layer device as a single string and they have announced a 128-layer – single string device.

Memory density can also be improved by storing multiple bits in a cell. NAND Flash has moved through a progression from 1 bit – single-level cell (SLC), to 2 bit – multi-level cell (MLC), to 3 bit – three-level cell (TLC), to 4 bit – quadruple-level cell (QLC). Companies are now preparing to introduce 5 bit – penta-level cells (PLC) and there is even discussion of 6 bit – hexa-level cells (HLC). Increasing the number of bits per cell helps with density but the benefit is decreasing, SLC to MLC is 2.00 the bits, MLC to TLC is 1.50x the bits, TLC to QLC is 1.33x the bits, QLC to PLC will be 1.25x the bits and if we get there PLC to HLC will be 1.20x the bits.

Figure 3 presents string stacking by year and company on the left axis and maximum bits per cell on the right axis.

Figure 3. Layers, Stacking and Bits per Cell.

Figure 4 presents our analysis of the resulting mask counts by exposure type, company and year. The dotted line is the average masks by year that is increasing from 42 in 2017 to 73 in 2025, this contrasts with the layers increasing from an average of 60 in 2017 to 512 in 2025. In other words, only a 1.7x increase in masks is required to produce an 8.5x increase in layers highlighting the mask efficiency of the 3D NAND processes.

Figure 4. Mask Count Trend.

Figure 5 presents actual and forecast bit density by company and year for both 2D NAND and 3D NAND. This is the bit density for the whole die or in other words the die bit capacity divided by the die size.

 Figure 5. NAND Bit Density.

 From 2000 to 2010, 2D NAND bit densities were increasing by 1.78x per year driven by lithographic shrinks. Around 2010 the difficulty of continuing to shrink 2D NAND led to a slow down to 1.43x per year until around 2015 when 3D NAND became the driver and continued at a 1.43x per year rate. We are projecting a slight slowdown from 2020 to 2025 to 1.38x per year. This is an improvement in our forecast from last year because we are seeing the companies push the technology faster than we originally expected. Finally, SK Hynix has talked about 500 layers in 2025 and 800 layers in 2030 resulting in a further slow down after 2025.

Figure 6 presents NAND Bit Cost Trends.

Figure 6. NAND Bit Cost Density.

In this figure we have taken wafer costs calculated using our Strategic Cost and Price Model and combined them with the bit density from figure 5 to produce a bit cost trend. In all cases the fabs are new greenfield 75,000 wafer per month fabs because that is the average capacity of NAND fabs in 2020. The countries where the fabs are located are Singapore for Intel-Micron, China for Intel, Japan for Kioxia and South Korea for Samsung and SK Hynix. These calculations do not include packaging and test costs, do not take into street width and have only rough die yield assumptions in them.

The first three nodes on the chart are 2D NAND where we see a 0.7x per node cost trend. With the transition to 3D NAND the bit cost initially increased for most companies but has now come down below 2D NAND bits costs and is following a 0.7x per node trend until around 300 to 400 layers. At 300-400 layers we project the cost per but will level out possibly placing an economic limit on this technology unless there are some breakthroughs in process or equipment efficiency.

Logic
For 3D NAND “nodes” are easy to define based on physical layers, for DRAM nodes are the active half-pitch, for logic nodes are pretty much whatever the marketing guys at a company wants to call them.

Some people consider the current leading edge FinFET processes to be 3D because the FinFET is a 3D structure but in the context of this discussion we consider 3D to be when device stacking allows multiple active layers to be stacked up to create stacks of devices. In this context 3D Logic will really come in once CFETs are adopted.

Figure 7 presents the nodes by year for the 3 companies pursuing the state of the art.

Figure 7. Logic Roadmap.

 The node comparisons in this chart are complicated by the split between Intel and the foundries. Intel has followed the classic node names, 45nm, 32nm, 22nm, 14nm whereas the foundries have followed the “new” node names of 40nm, 28nm, 20nm, 14nm. Furthermore, intel has shrunk more per node and so Intel 14nm has similar density to foundry 10nm and Intel 10nm has similar density to foundry 7nm.

At the top of the figure I have outlined a consistent node name series based on alternating 0.71 and 0.70 shrinks.

In the bottom of the figure I have nodes by company and year with transistor density for each node. The transistor density is calculated based on a weighting of NAND and Flipflop cells as I have previously discussed. Next to each node in parenthesis is either FF for FinFET, HNS for horizontal nanosheet, HNS/FS for horizontal nanosheets with a dielectric wall (Forksheet) to improve density based on work Imec has done and CFET for complimentary stacked FETs where nFETs and pFETs are vertically stacked. CFETs will be when logic crosses over into a layer-based scaling approach and becomes a true 3D solution, in principle CFETs can continue to scale by adding more layers.

Bold indicates leading density or technology. In 2014 Intel takes the density lead with their 14nm process. In 2016 TSMC takes the density lead with their 10nm process and maintains the lead in 2017 with their 7nm process. TSMC and Samsung have similar densities at 7nm but going to 5nm TSMC is producing a much larger shrink than Samsung and in 2019 TSMC maintains the process density lead with their 5nm technology. If Samsung delivers on their HNS technology in 2020 that we are calling a 3.5nm node, they may take the density lead and be the first company to manufacture HNS. In 2021 the TSMC node we are calling 3.5nm may return them to the density lead. If Intel can deliver on their two-year cadence with the kind of shrinks they typically target we believe they could take the density lead in 2023 with their 5nm process. In 2024 we may see a first CFET implementation from Samsung taking the density lead until 2025 when Intel may regain the lead with their first CFET process.

Figure 8 presents the logic mask counts for these processes. The introduction of EUV is mitigating mask layers, without EUV we would likely see over 100 masks on this chart. As we did with the NAND mask count figure, the dotted line is average mask counts. We have also grouped processes based on “similar” densities so for example Intel 14nm is combined with the foundry 10nm process and intel 10nm with the foundry 7nm processes.

Figure 8 Logic Mask Counts.

Figure 9 presents the logic density in transistors per millimeter squared based on the NAND/Flipflop weighting metric mentioned previously.

Figure 9. Logic Density Trend.

There are six types of processes plotted on this chart.

Planar transistors were the primary leading-edge logic process until around 2014 and produced a density improvement of 1.33x per year, FinFETs then took over at the leading edge and have provided a 1.29x per year density improvement. In parallel to FinFETs we have seen the introduction of FDSOI processes. FDSOI offers simpler processes with lower design costs and better analog, RF and power but cannot compete with FinFETs for density or raw performance. When HNS takes over from FinFETs we expect the rate of density improvement to further slow to 1.16x per year and eventually CFETs take over and increase density at 1.11x per year. We have also plotted SRAMs produced by vertical transistors based on work by Imec that may provide an efficient solution for cache memory chiplets.

Figure 10 illustrates the trend in logic transistor cost.

Figure 10. Logic Transistor Cost.

 Figure 10 presents the cost per billion transistors by combining wafer cost estimates from our Strategic Cost and Price Model with the transistor densities in figure 9. All fabs are assumed to be new greenfield fabs with 35,000 wafers per month capacity because that is the average size of logic fabs in 2020. The assumed countries are Germany for GLOBALFOUNDRIES except for 14nm that is done in the United States, the United States is also assumed for Intel except for 10nm in Israel, South Korea is assumed for Samsung and Taiwan for TSMC.

This plot does not include mask set or design cost amortization so while manufacturing cost per transistor is coming down the number of designs that can afford to access these technologies is limited to high volume products.

This plot does not include any packaging, test or yield impact.

From 130nm down to the i32/f28 (intel 32nm/foundry 28nm) node costs were coming down by 0.6x per node, then at the i22/f20 and f16/f14 node the cost reductions slowed because the foundries decided not to scale for their first FinFET processes. This slow down led to many in the industry erroneously predicting the end of cost reduction. From the f16/f14 node down to the i5/f2.5 node we expect costs to decrease by 0.72x per node and then slow to 0.87x per node thereafter. The g1.25 and g0.9 nodes are generic CFET processes with 3 and 4 decks respectively.

Figure 11 examines the impact of mask set amortization on wafer cost.

Figure 11. Mask Set Amortization.

The wafer costs in figure 11 are based on a new greenfield fab in Taiwan running 40,000 wafers per month. The amortization is mask set only and does not include design costs.

The table presents the 2020 mask set cost for 250nm, 90nm, 28nm and 7nm mask sets. Please note that at introduction these mask sets were more expensive. The mask set cost is them amortized over a set number of wafers and the resulting normalized costs are shown in the figure. In the table the wafer cost ratio is the wafer cost with amortization for 100 wafers run on a mask set divided by the wafer cost with amortization for 100,000 wafers run on a mask set.

From the figure and table, we can see that mask set amortization has a small effect at 250nm (1.42x ratio) and a large effect at 7nm (18.05x ratio). Design cost amortization is even worse.

The bottom line is that design and mask set costs are so high at the leading edge that only high-volume products can absorb the resulting amortization.

DRAM
Leading edge DRAMs have capacitor structures that are high aspect ratio “3D” devices but similar to current logic devices, DRAM doesn’t have scaling by stacking of active elements.

Figure 12 presents DRAM nodes by company in the top table and on the bottom of the figure are some of the key structures.

Figure 12. DRAM Nodes.

 As DRAM nodes proceeded below the 4x nm generation the buried saddle fin access transistor with buried word line came into use (see bottom left). The bottom right illustrates the progression of the capacitor structure to higher aspect ratio structures with two layers of silicon nitride “MESH” support. DRAM capacitor structures are reaching the mechanical stability limits of the technology and with dielectric k values also stalled, DRAM scaling is evolving into single nanometer per node scaling.

Figure 13 illustrates mask counts by exposure type and company.

Figure 13. DRAM Mask Counts.

 From figure 13 it can be seen that from the 2x to 2y generation there is a big jump in mask counts. The jump is driven by performance and power requirements that led to the need for more transistor types and threshold voltages in the peripheral logic.

At the 1x node Samsung is the first company to introduce EUV to DRAM production and the number of EUV layers grows at the 1a, 1b and 1c nodes. SK Hynix is also expected to implement EUV, we do not currently expect Micron to implement EUV.

Figure 14 illustrates the trend in DRAM bit density by year.

Figure 14. DRAM Bit Density.

 In figure 14 the bit density is the product capacity in gigabytes divided by the die size in millimeters square.

Form figure 14 is can be seen that there has been a slowdown in bit density growth beginning around 2105. DRAM bit density is currently constrained by the capacitor and it isn’t clear what the solution will be. Long term a new type of memory may be needed to replace DRAM. DRAM requires relatively fast access with high endurance and currently MRAM and FeRAM appear to be the only options that have the potential to meet the speed and endurance requirements. Because MRAM requires relatively high current to switch, large selector transistors are required constraining the ability to shrink MRAM to competitive density and cost. FeRAM is also a potential replacement and is getting a lot of attention at places like Imec.

Figure 15 illustrates the DRAM bit cost trend.

Figure 15. DRAM Bit Cost Trend.

 Figure 15 is based on combining wafer cost estimates from our Strategic Cost and Price Model with the bit densities in figure 14. All fabs are assumed to be new greenfield fabs with 75,000 wafers per month capacity because that is the average size of DRAM fabs in 2020. The assumed countries are Japan for Micron and South Korea for Samsung and SK Hynix.

These calculations do not include packaging and test costs and do not take into street width or die yield.

In this plot the combination of higher mask counts and slower bit density growth lead to a slow down from 0.70x per node cost trend to a 0.87x per node cost trend.

Conclusion
NAND has successfully transitioned from 2D to 3D and now has a scaling path until around 2025. After 2025 scaling may be possible with very high layers counts but unless a breakthrough in the process or equipment efficiency is made, cost per bit reductions may end.

Leading edge logic today utilizes 3D FinFET structures but won’t be a true stacked device 3D technology until CFETs are introduced around 2025. Logic has the potential to continue to scale until the end of the 2020s by transitioning from FinFET to HNS to CFET although the cost improvements will likely slow down.

DRAM is the most constrained of the 3 market segments, scaling and cost reductions have already slowed down significantly and no solution is currently known. Slower bit density and cost scaling will likely continue until around 2025 when a new memory type may be needed.

Here is the full presentation:

Lithovision 2020

Also Read:

IEDM 2019 – Imec Interviews

IEDM 2019 – IBM and Leti

The Lost Opportunity for 450mm


The Story of Ultra-WideBand – Part 1: The Genesis

The Story of Ultra-WideBand – Part 1: The Genesis
by Frederic Nabki & Dominic Deslandes on 03-03-2020 at 10:00 am

The Story of Ultra WideBand Part 1 SemiWiki

In the middle of the night of April 14, 1912, the R.M.S. Titanic sent a distress message. It had just hit an iceberg and was sinking. Even though broadcasting an emergency wireless signal is common today, this was cutting edge technology at the turn of the 20th century. This was made possible by the invention of a broadband radio developed over the previous 20 years: the spark-gap transmitter.

Developed by Heinrich Hertz in the 1880s, the spark-gap radio was improved by Guglielmo Marconi who succeeded in sending the first radio transmission across the Atlantic Ocean in 1901. After the Titanic disaster, wireless telegraphy using spark-gap transmitters quickly became universal on large ships, with The Radio Act of 1912 requiring all seafaring vessels to maintain 24-hour radio watch. The spark-gap radio was then the most advanced technology enabling wireless communication between ships, used through the first world war.

The architecture of the spark-gap radio was significantly different than what is currently used in wireless transceivers, including our cellphones, WiFi networks and Bluetooth devices. Modern narrowband communications systems modulate continuous-waveform radiofrequency (RF) signals to transmit and receive information. But at the turn of the 20th century, the spark-gap transmitter generated electromagnetic waves by means of an electric spark and no narrowband radiofrequency signal was being modulated. The spark was generated using a capacitance discharged through electric arcing across a gap between two conductors. These very short time discharges generated oscillating currents in the wires, which then excited an electromagnetic wave that radiated out and could be picked up electromagnetically at a great distance. From the well-known time-frequency duality principle, a short impulse in time, analogous to the electric spark, gives a wideband signal in frequency and this was the basis of communications for two decades.

An interesting point to note is that the spark-gap radio could not support a continuous transmission, such as a sound signal. A message had to be composed of a series of sparks, transmitting discrete pieces of information, making it the first digital radio. This characteristic was ideal to transmit Morse code. However, it was then believed that it was not possible with the spark-gap radio to transmit a continuous signal like voice or music, without loss of information. It was decades before Shannon and Nyquist showed how to do that with digital modulation techniques.

This gap in digital modulation knowledge, coupled with the difficulty to generate high power spark-gap transmissions were shortcomings that were fatal to the spark-gap radio. After World War 1, carrier-based transmitters were developed using vacuum tubes, producing continuous waves that could carry audio. Nowadays, virtually all wireless transceivers use the same architecture, based on the work of US engineer Edwin Armstrong in 1918. Called the superheterodyne radio, this architecture uses frequency mixing to convert a received narrowband signal to a relatively low intermediate frequency (IF) that is then processed in baseband circuitry. This innovation gave rise, starting around 1920, to the AM radio which was followed a decade later by the FM radio. By the late 1920s the only spark transmitters still in operation were legacy installations on naval ships.  Wideband radio was effectively dead.

Wideband’s Rebirth after 100 years: A Detective Story
Why then did Apple release the iPhone 11 in 2019 with an ultra-wideband (UWB) transceiver, implemented in silicon on their new U1 wireless processor chip? The answer requires some detective work into clues stretching back to the middle of the last century.

The first clue was another impulse-based wideband radio technology developed in top secret laboratories around the world in the 1930’s and during World War 2: RADAR. The story of RADAR has been told many times; it provided a pivotal advantage in both the Battle of Britain and naval battles in the Pacific.

For the purposes of this discussion, RADAR is able to determine the range, angle and velocity of objects. After the war, impulse-based transceivers started once more to gain momentum, now in military applications. From the 1960s to the 1990s, this technology was restricted to military applications under classified programs, both as a location finding and a communication technology. By the mid-1980s, a wide range of research papers, books and patents from UWB pioneers like Harmuth at Catholic University of America and Ross and Robbins at Sperry Rand Corp became available. This great source of information revived the interest in UWB systems because of wideband’s unique ability to deliver location data.

Apple’s first use for UWB is to provide positioning data. Positioning enables many applications in augmented reality (AR), virtual reality (VR), gaming, device recovery, file sharing and advertising beacons. We will explore UWB positioning technology further in Part 3. But positioning by itself is not sufficient reason for Apple to build a custom silicon UWB implementation.

Future articles in this series will discuss five clues to understanding Apple’s adoption of UWB for the iPhone 11:

  • UWB can provide positioning data
  • Its very low power emissions ensure that UWB does not interfere with other communications
  • Low power output also makes UWB signals difficult to detect by unintended users
  • The low duty cycle enables ultra-low power and increases resistance to jamming or interference
  • The very short impulses enable the reduction of the communication latency.

The story continues in Part 2

About Frederic Nabki
Dr. Frederic Nabki is cofounder and CTO of SPARK Microsystems, a wireless start-up bringing a new ultra low-power and low-latency UWB wireless connectivity technology to the market. He directs the technological innovations that SPARK Microsystems is introducing to market. He has 17 years of experience in research and development of RFICs and MEMS. He obtained his Ph.D. in Electrical Engineering from McGill University in 2010. Dr. Nabki has contributed to setting the direction of the technological roadmap for start-up companies, coordinated the development of advanced technologies and participated in product development efforts. His technical expertise includes analog, RF, and mixed-signal integrated circuits and MEMS sensors and actuators. He is a professor of electrical engineering at the École de Technologie Supérieure in Montreal, Canada. He has published several scientific publications, and he holds multiple patents on novel devices and technologies touching on microsystems and integrated circuits.

About Dominic Deslandes
Dr. Dominic Deslandes is cofounder and CSO of SPARK Microsystems, a wireless start-up bringing a new ultra low-power and low-latency UWB wireless connectivity technology to the market. He leads SPARK Microsystems’s long-term technology vision. Dominic has 20 years of experience in the design of RF systems. In the course of his career, he managed several research and development projects in the field of antenna design, RF system integration and interconnections, sensor networks and UWB communication systems. He has collaborated with several companies to develop innovative solutions for microwave sub-systems. Dr. Deslandes holds a doctorate in electrical engineering and a Master of Science in electrical engineering for Ecole Polytechnique of Montreal, where his research focused on high frequency system integration. He is a professor of electrical engineering at the École de Technologie Supérieure in Montreal, Canada.


Trends in AI and Safety for Cars

Trends in AI and Safety for Cars
by Bernard Murphy on 03-03-2020 at 6:00 am

AI at the Edge

The potential for AI in cars, whether for driver assistance or full autonomy, has been trumpeted everywhere and continues to grow. Within the car we have vision, radar and ultrasonic sensors to detect obstacles in front, behind and to the side of the car. Outside the car, V2x promises to share real-time information between vehicles and other sources so we can see ahead of vehicles in front of us, around corners to detect hazards, and see congested traffic and emergency vehicles. Also this AI can improve on the fly, adapting to new conditions through training updates from the cloud.

This all sounds wonderful, but of course implementation is much more complex than the vision. It demands a lot of specialized devices, each with its own constraints in performance, latency (how quickly it can respond to a change) and power consumption. Put them all together in the car and more constraints emerge: How well can the central brain respond to the massive flow of data being generated by all these sensors? Will it become bogged down and not be able to respond quickly enough to a pedestrian ahead of the car? Will AI be more reliable if object-based or rule-based or a combination of the two? Most important of all, will it be safe?

Taken together, it’s not surprising that the nirvana of full autonomy isn’t right around the corner. But progress is being made, bottom-up as it should be. A good place to see this in action is at the edge of the car, in sensors, sensor fusion and local AI.

Memory Implications

AI is being pushed to the edge. This is not new. Transmitting raw video or radar or ultrasonic streams would bog down the car network, create massive latency and burn a lot of power. Doing all the object detection and fusion close to the sensor reduces those problems. For cost, reliability and again, power reasons, it’s better per sensor cluster to do all of that in one chip.

Now you need to integrate onto a single chip support for multiple AI subsystems along with other administrative and safety functions. This has some interesting design implications. To manage power and latency it is important to share memory between central compute and the AI accelerators. You don’t want to have to go out to external memory any more than absolutely necessary because that’s much slower and burns more power than staying on chip. Further, as more accelerators and compute clusters are added to the system-on-chip it become nearly impossible to efficiently manage the data flow using software only. Therefore the AI accelerator subsystems must be cache coherent with the rest of the chip (CPU cluster, memory subsystem, communication, etc).

Cache coherence goes further. Within that group of AI accelerators it may be important to share memory, in fusion for example. Which means you need cache coherence between accelerators. But you don’t necessarily want to pay the penalty every time for coherence with the whole system; most of the time coherence is needed just between accelerators. Now you want hierarchical cache coherence – between accelerators at one level, then to the full system at the top. Kurt Shuler (VP Marketing at Arteris IP) told me cache coherence requirements of this kind are becoming more common in automotive applications, because they’re dealing with big images across more accelerators, yet they still need to manage to a tight power and performance budget.

Safety

What about safety? There is a larger question of how you quantify safety in non-deterministic systems, as most machine-learning based systems are. This is where SOTIF (ISO/PAS 21448:2019) and UL4600 are headed. But even before we get there, how do ISO 26262 and AI interact? Most accelerators so far have not been ASIL-rated, so must be managed in a larger system aiming for ASIL D compliance.

This mix of safety standards and levels is pushing a trend to a safety island on the system on chip (SoC) to monitor system safety, along with an ability to isolate each IP on the interconnect in turn for temporary in-service testing, or longer-term isolation if an IP is found not to be performing to expectations.

This level of monitoring acknowledges a few realities. We may never be able to build large SoCs in which each component can be brought up to ASIL D. Components will fail; systems must self-monitor to determine if this may be about to happen or has happened and must provide means for self-correction where possible (through say a soft reboot of a subsystem). And where a problem cannot be corrected, systems must provide notification to the central control to enable a fail-operational response – maybe the driver should retake control of the car.

Could AI be accelerators be brought up to ASIL D? This is still very much a research topic. A lot of work has been done on the software side. In hardware, Kurt tells me that attention is mostly on conventional functional safety (FuSa) for the various elements inside the accelerator. One interesting observation he made was that FuSa seems to be more important in later planes in the neural net. Sensitivity to errors in earlier planes is not as strong. Interesting topic to follow!

Interconnect Implications

One thing is clear. The interconnect becomes the backbone for mediating all this activity – coherent caching and safety. Coherent caching because a full SoC is inevitably going to depend on a mix of IPs from multiple suppliers, yet caching must still be managed coherently across all those IPs.

And safety because the NoC or NoCs running through these systems must interconnect a wide range of IPs with differing ASIL capabilities. Some of those IPs can be very exotic indeed, serving the needs of some of the most advanced AI suppliers. NoCs must enable and mediate this trend to safety islands, self-testing and isolation support, while also providing safety control and monitoring within the network itself.

This is a complex and fast-moving range of needs. Arteris IP is clearly working to keep up with these needs. Kurt is on the ISO 26262 working group, and they work with a lot of AI companies, including some of the most prominent in automotive applications. Check them out.

Also Read:

Autonomous Driving Still Terra Incognita

Evolving Landscape of Self-Driving Safety Standards

Safety and Platform-Based Design


Reliability Challenges in Advanced Packages and Boards

Reliability Challenges in Advanced Packages and Boards
by Herb Reiter on 03-02-2020 at 10:00 am

CTE Stress ANSYS SemiWiki

Today’s Market Requirements
Complex electronic devices and (sub)systems work for us in important applications, such as aircrafts, trains, trucks, passenger vehicles as well as building infrastructure, manufacturing equipment, medical systems and more. Very high reliability (the ability of a product to meet all requirements in the customer environment over the desired lifetime) is becoming increasingly important. Big Data and AI (Artificial Intelligence) are making humans even more reliant on electronic systems and will make insufficient reliability more painful, costly, even deadly. At the recent DesignCon 2020 I had the opportunity to learn how ANSYS is enabling engineers to design highly reliable products.

Brief ANSYS history, focus and key acquisitions
ANSYS, based near Pittsburgh, Pennsylvania, was founded in 1970 and employs now about 4,000 experts in finite element analysis, computational fluid dynamics, electronics, semiconductors, embedded software and design optimization work. ANSYS is well known as partner for very demanding customers in space and aircraft applications. ANSYS grew rapidly, also by acquiring other EDA suppliers. They bought and integrated Ansoft Corp. in 2008 and Apache Design Solutions in 2011. In May 2019 ANSYS acquired DfR Solutions to deepen their capabilities in electronics reliability, including simulation of semiconductor packaging and PCBAs as well as a physical laboratory capable of characterization and library generation as well as analysis and testing of a broad range of electronic parts (semiconductors, displays, batteries, etc.). DfR’s best known product is Sherlock, a powerful pre- and post-processor for reliability prediction of dice, packages, and PCBs subjected to thermal, thermo-mechanical, and mechanical environments.

The value of FEA tools and accurate inputs (libraries)
Analyzing the reliability of prototypes and/or pre-production units with a test-fail-fix approach is costly, time consuming and provides results very late in a product’s life cycle. ANSYS’ Sherlock applies finite element analysis (FEA) and enables engineers to easily assess a hardware design’s reliability, accurately and at the beginning of a design cycle. This also allows designers to evaluate trade-offs (e.g. different architectures, geometries and materials) early and across a wide range of conditions, to achieve optimal results.

Summary of the ANSYS Design for Reliability presentation at DesignCon 2020
In a fully-packed conference room, ANSYS’ Kelly Morgan, Lead Application Engineer, presented three examples for failure mechanisms, where Sherlock can add significant value. Sherlock and ANSYS Mechanical apply physics of failure principles to predict hardware reliability for: 1) Low-k cracking, 2) Solder joint fatigue and 3) Micro-via separation. The pointers lead to much more information than the paragraphs below can provide.

To 1) Low-k cracking: Dielectric material with low dielectric constant (k) reduces parasitic capacitance, enabling higher circuit performance and lower power dissipation. However, its low mechanical strength leads sometimes to cracks in the dielectric, due to thermal-mechanical forces from differences in coefficient of thermal expansion (CTE) that occur during reflow or thermal cycling. Acoustic inspection can reveal these cracks. If the low-k material is found to be cracked at this late stage of product’s introduction, it can trigger a dreaded redesign cycle. In contrast, Sherlock and ANSYS Mechanical allow an IC designer – at the beginning of a project – to predict such failures, take corrective actions right away and pre-empt such problems from occurring.

Figure 1: CTE differences between a copper pillar and a die lead to both compressive and tensile stress — and impact adjacent transistors’ performance as well as reliability  Courtesy: ANSYS

To 2) Solder joint fatigue: Many integrated circuits (ICs) have traditionally used lead (Pb) free solder bumps as connections to other dice, the package, and even the printed circuit board (PCB). Different CTEs and temperatures in adjacent layers make the materials expand and contract differently. These thermal-mechanical forces, as well as vibrations, mechanical shock, etc., cause strain on the solder bumps and may lead to cracks within the solder bumps and/or at the interconnect surfaces. More recently, copper pillars have become popular because they allow much tighter spacing than solder bumps. However, these interconnects are more rigid and can fail faster, depending on the strain being applied. Sherlock’s and ANSYS Mechanical’s multi-physics capabilities allow users to easily and accurately predict the reliability of such interconnects and, if needed, drive needed changes early in the design cycle.

Figure 2: Cross section of a solder joint and how different CTEs cause materials to shrink or expand     Courtesy: ANSYS

To 3) Micro-via separation: As spacings in electronics get smaller and smaller, the use of micro-via technology in PCBs has exploded. Micro-vias stacked as much as three or four high have become very common. However, if these designs do not use the right materials and geometries, the micro-vias can experience unexpected cracking and delamination.

Thermal-mechanical stress, moisture, vibration and other forces, can lead to separation of micro-vias, as well as delamination from copper traces at the top or bottom of plated through-hole vias (PTHs). Sherlock analyzes these problem areas, considers overstress conditions during reflow and/or operation and can predict when fatigue will lead to interconnect failures between vias, PTHs and routing layers and/or under bump metal (UBM) contact points.

Figure 3: Likely reliability risks in electronic products during manufacturing and operation  Courtesy: ANSYS

Design flow integration of Sherlock
Even a best-in-class point tool, like Sherlock, needs to be integrated into a user-friendly and high productivity design flow, to provide its full value in a customer’s design environment. Only smooth data exchanges with up- and down-stream tools enable engineers to utilize Sherlock’s many capabilities quickly and efficiently. Flow integration minimizes scripting, data format translations as well as error-prone and time-consuming manual interventions. Sherlock interacts with ANSYS’ Icepak and ANSYS Mechanical to combine these tools into a high productivity and very reliable design flow for reaching the “ZERO DEFECTS” goal more and more applications require.  Learn more about ANSYS Sherlock HERE.

Figure 4: Important stages where, in a hardware design process, Sherlock can avoid surprises   Courtesy: ANSYS


GLOBALFOUNDRIES Sets a New Bar for Advanced Non-Volatile Memory Technology

GLOBALFOUNDRIES Sets a New Bar for Advanced Non-Volatile Memory Technology
by Mike Gianfagna on 03-02-2020 at 6:00 am

eNVM applications


Whether it’s the solid-state disk in your laptop, IoT/automotive hardware or  edge-based AI, embedded non-volatile memory (eNVM) is a critical building block for these and many other applications. The workhorse technology for this capability has typically been NOR flash (eFlash), but a problem looms as eFlash presents challenges to scale economically below the 28nm node. That’s why a recent press release from GLOBALFOUNDRIES (GF) caught my attention:

GLOBALFOUNDRIES Delivers Industry’s First Production-ready eMRAM on 22FDX Platform for IoT and Automotive Applications.

Embedded magnetoresistive non-volatile memory (eMRAM) is a mouthful. I did a bit of research, and MRAM was presented back in 1974, when IBM developed a component called a Magnetic Tunnel Junction (MTJ). The device had two ferromagnetic layers separated by a thin insulating layer and a memory cell was created by the intersection of two wires (i.e., a row line and a column line) with an MJT between them. MRAMs can combine the high speed of SRAM, the storage capacity of DRAM, and the nonvolatility of eFlash at low power, so a production embedded implementation of the technology below 28nm is a big deal.

First, a bit about the implementation technology. 22FDX is a 22nm fully
depleted silicon-on-insulator (FD-SOI) technology from GF.  Another mouthful. FD-SOI delivers near FinFET-like performance without the design and manufacturing complexities of FinFET. The figure at the right summarizes the benefits of GF’s 22FDX.

“We continue our commitment to differentiate our FDX platform with robust, feature rich solutions that allow our clients to build innovative products for high performance and low power applications,” said Mike Hogan, senior vice president and general manager of Automotive and Industrial Multi-market at GLOBALFOUNDRIES. “Our differentiated eMRAM, deployed on the industry’s most advanced FDX platform, delivers a unique combination of high performance RF, low power logic and integrated power management in an easy-to-integrate eMRAM solution that enables our clients to deliver a new generation of ultra-low power MCUs and connected IoT applications.”

I caught up with Martin Mason, senior director automotive, industrial and multi
market BU at GF to get a bit more detail about their new, production-ready eMRAM. He took me through a very robust qualification process for the device, including a bit error rate in the 6E-6 range, robust data retention after 5X solder reflows, stand-by data retention sufficient for industrial-grade and automotive-grade 2 applications and multiple magnetic immunity tests. Martin summed up our discussion like this, “22FDX with embedded MRAM is an enabling technology platform for Intelligent IoT (IIoT), wearables, MCUs and advanced automotive products. We have a qualified Flash-like robust eMRAM process with our first client single product MRAM tape out in fab, multiple
clients running MRAM test chips and many silicon validated MRAM macros
(4Mb-48Mb).  Unlike other eMRAM solutions we built GFs 22FDX MRAM to be very robust with -40C to 125C operating range, high endurance and long data
retention, passing five rigorous real-world (5x) solder reflow tests while maintaining leading magnetic immunity. The GF eMRAM is very much like eFLASH – only better, with faster read and write times and reduced mask count manufacturing compared with traditional embedded Flash technologies.” The diagram to the right summarizes GF’s new eMRAM vs. eFlash.

GF reports they are working with several clients with multiple production tape-outs scheduled in 2020 using the new, production-ready eMRAM technology in 22FDX. GF’s state-of-the-art 300mm production line at Fab 1 in Dresden, Germany will support volume production for these projects. They also report custom design kits featuring drop-in, silicon validated MRAM macros ranging from 4 to 48 mega-bits, along with the option of MRAM built-in-self-test support is available today from GF and their design partners.

Looking ahead, GF expects its scalable eMRAM to be available on both FinFET and future FDX platforms as a part of the company’s advanced eNVM roadmap.  If you need an eFlash alternative below 28nm this is definitely something to look into.

Also Read:

Specialized Accelerators Needed for Cloud Based ML Training

The GlobalFoundries IPO March Continues

Magnetic Immunity for Embedded Magnetoresistive RAM (eMRAM)


Coronavirus Chops SPIE Litho EUV Conference

Coronavirus Chops SPIE Litho EUV Conference
by Robert Maire on 03-01-2020 at 6:00 am

SPIE EUV 2020 Coronavirus

Corona Curtails already quiet SPIE Litho conference
Our best guess is that attendance was off by 30% from last years SPIE conference due to a lack of travelers from many Asian areas obviously out of Corona fear. Even Intel, which is a few miles away was a virtual no-show with a mass cancellation.

More importantly, virtually all after hours parties and events were canceled with a handful of exceptions.

ASML, Nikon, TEL and KLA all canceled their events.
Aside from the drop in attendance, the conference presentations seemed more subdued as we are over and done with the EUV controversy, hype and then celebration of prior years. EUV is now almost as boring and mundane as DUV because its in production.

EUV is over and done….
We noted that there seemed to be a reduction in the number of EUV presentations as chip makers have figured out their issues and are likely keeping their solutions for themselves rather than broadcasting their questions and uncertainty in paper proposals. Gone are the controversies and speculation.

There is still a lot of “mopping up” to do about resist, stochastic errors, line edge roughness, pellicles etc; etc;…but it works.

The final, and sure sign that EUV has “grown up” and is done was a two hour retrospective panel of gray hair industry elders and pioneers reviewing their roles in the 35 year EUV struggle, much like war veterans.

The war is over, the good guys (Moore’s Law) won….on to the next battle

High NA + High Power = High price for ASML
The next battle….There were some early “teases” about the next completely different version of the EUV tool, the High NA tool. While the moniker is “High NA” the real truth is a “high power”, “high throughput” and therefore “high priced” tool.

The biggest change is not just the High NA but the optical “guts” that are completely different so as to reduce the loss of photons between the source of the EUV and the target wafer. How much of an improvement you ask? Maybe four, five to ten times the power…..yikes! Could that be like going from the current 250 Watt source to 1000 -2500 watts equivalent…holy EUV Batman…will the stage keep up, will the track keep up, will wafers get fried?..stay tuned.

Given what could be an immense throughput increase the “EXE3600” could make the current NXE series look like jalopies compared to a Tesla.

This would make an ASML salesman’s job very easy as the math could easily support a tool price  starting with at least a “2” if not a “3” and it would still be more cost effective than current tools.

While the NXE series has been a great proof that EUV works in production the EXE will be the real money maker for ASML. Of course ASML will be quick to point out that the new 1000KG lenses cost a lot to make but we would bet that the gross margins will be higher.

Maybe we should rename the “High NA” the “High GM” tool…..

Lam announces “dry” resist
Much as the etch sector went from “wet” etch to “dry” plasma etch decades ago, so Lam hopes they can dry out the resist sector.  Our guess is that it will be a lot harder.  While Lam described it as a “breakthrough” we would describe it as a slow uphill slog to try to get a slow reactionary chip industry to change its many decades acceptance of liquid resist and track machines. Lam’s project has been in the works for a number of years now and we will likely wait several more years to see if this will work.

We would point out that the biggest change in resist has been a private company called Inpria, that recently raised a $30M round and has been the talk of SPIE for several years already and is still not in production (that we know of…). Inpria has many major chipmakers as investors/partners versus Lam working with academic IMEC.

The industry needs to move from organic to inorganic or metal based resists , amplified resists etc; but it brings a lot of baggage, such as nasty metals like tin which can “poison” tools that it runs through.

We think its a great, needed idea that gets Lam into the “litho” cell and closer to patterning but we would come back and revisit to see how its doing in 3 or 4 years….it will not impact Lam’s financials for quite a while.

Lasertec is filling a significant void in reticle inspection
During a presentation at SPIE, it was revealed that Intel has been in full production with the Lasertec EUV reticle inspection tool since the December quarter.

The tool is apparently doing very well, finding lots of defects and moving EUV reticle production ahead. Its certainly a lot better than the alternative which is……nothing…..as KLA’s E beam and actinic tools are still in development.

Even a turtle can beat a cheetah if given enough of a head start or at the very least give the cheetah a run for its money…..Its likely that Lasertec will sell a lot of tools and make a lot of money before the KLA tools come out and also obviously alter the market dynamics and values after they do come out.

None of this matters as Corona crushes stocks
Not that anything at the SPIE conference matters anyway as the Corona crisis is crushing the stocks.

We would point out that High NA (or High GM) ASML tools will not start to show up for a couple of years and hopefully by that time Corona will be an old memory.

Similarly, Lam might start to see a few dollars of revenue in a few years if they can get dry resist to work, and finally, KLA will likely get E Beam and Actinic both out in a couple of years and certainly sell a bunch.

The bottom line is that nothing we saw at SPIE matters to the stocks for at least a couple of years and corona dominates the short term headlines anyway.

We had previously stated that we thought the corona impact was being underestimated and we think the latest news is starting to underscore that view.  We would not be surprised to hear pre-announcements of worse than expected Q1 numbers as China and potentially the rest of Asia grind to a virtual halt.

It continues to get uglier…..


Talking Sense With Moortec – The Future Of Embedded Monitoring Part 2

Talking Sense With Moortec – The Future Of Embedded Monitoring Part 2
by Stephen Crosher on 02-28-2020 at 10:00 am

Stephen Crosher Moortec CEO Square High Res

The rate of product development is facing very real challenges as the pace of silicon technology evolution begins to slow. Today, we are squeezing the most out of transistor physics, which is essentially derived from 60-year-old CMOS technology. To maintain the pace of Moore’s law, it is predicted that in 2030 we will need transistors to be a sixth of their current size. Reducing transistor size increases density, which itself presents issues when considering the relative power for a given area of silicon will increase, as described through Dennard Scaling. When combined with the limitations of parallelism for multi-core architectures, our ability to develop increasingly energy efficient silicon is simply going the wrong way!

As we descend through the silicon geometries we see that the variability of the manufacturing process for the advanced nodes is widening. The loosening of our grip to control thermal conditions presents increasing challenges, this means we cannot simply assume a power reduction dividend by moving to the next new node. The dynamic fluctuation of voltage supply levels throughout the chip threatens to starve the very operation of the digital logic that underpins the chip’s functionality. These factors, combined with the increasing urgency to reduce the power consumption of super-scale data systems and seek efficiencies to reduce global carbon emissions in both the manufacture and the use of electronics, means that we must think smart and seek new approaches. We need to innovate.

C’mon, we’ve heard all this before!
I’m not the first to report our pending technological gloom and won’t be the last. The ‘gloom mongering’ over the silicon industry has happened since, well, the beginning of the silicon industry!

As a species we can be smart. We know that if we are able to see and understand something, we have a better chance of controlling it. The more data we have the more efficiencies can be gained.

The nature of monitoring systems has two phases and is a reflection of our inherent curiosity as humans. Firstly, there is ‘realisation.’ The discovery that upon introducing the ability to view within an entity, that was otherwise considered a black box, brings enlightenment and presents us with an opportunity. Secondly, there is the ‘evolution’ phase. Once data is being gathered from a system (that up until this point hadn’t been visible), we seek to improve the quality, accuracy and granularity of the data. Increasing the ‘data intelligence’ of the information we are gathering, contextualising the dynamic circuit conditions, aiming to identify trends and pull out signatures or patterns within a sea of data. See previous blog, ‘Talking Sense with Moortec – The Future of Embedded Monitoring Part 1′

What’s next?
Information of any value needs to be good to be effective. I have had many conversations outlining that the perfect embedded monitoring system must be infinitely accurate, infinitely small, zero latency and zero power! Although as a provider of embedded monitoring subsystems for the advanced nodes we’re not there yet, we are however trying! Until we reach that panacea, SoC developers need to be aware of the area overhead to sensor systems. Although sensors are relatively small, at their core they are often analog by design which doesn’t necessarily scale with reducing geometries, unlike the neighbouring logic circuits.

So, for this reason we must be aware and seek circuit topologies and schemes that reduce the silicon area occupied by the sensors themselves. To minimise area impact and best utilise in-chip sensors in terms of placement, quite often such matters are best discussed and considered during the architecting phases of SoC development, rather than as a floor-planning afterthought. Increasingly sensor subsystems are becoming the critical foundation to chip power management and performance optimisation, as getting it wrong can lead to existential device stress and potentially immense reputational damage to companies within the technological food chain that create the larger product or system used in today’s automotive, consumer and high performance computing products. Therefore, no longer can we consider monitoring as a low priority endeavour for development teams and they progress through the design flow.

So, in our attempts to continue Moore’s Law and limit Dennard scaling we need to innovate and of course we will. However, such innovative solutions will come from having a clearer view of the dynamic conditions deep within the chip rather than how the core function of the chip is implemented itself.

If you missed the first part of this blog you can read it HERE

Watch out for our next blog entitled Hyper-scaling of Data Centers – The Environmental Impact of the Carbon ‘Cloud’ which will be dropping mid March!


An Important Step in Tackling the Debug Monster

An Important Step in Tackling the Debug Monster
by Daniel Nenni on 02-28-2020 at 6:00 am

AMIQ EDA Compare Report SemiWiki

If you’ve spent any time at all in the semiconductor industry, you’ve heard the statement that verification consumes two-thirds or more of the total resources on a chip project. The estimates range up to 80%, in which case verification is taking four times the effort of the design process. The exact ratio is subject to debate, but many surveys have consistently shown that verification dominates chip development. Less widely known is that many of these same surveys identify debug as the dominant verification task. Sure, it takes a lot of time to write testbenches, tests, monitors, scoreboards, assertions, and so on. Modern verification methodologies are quite effective at using these elements to find bugs. But investigating every test failure, determining the root cause, fixing the bug, and verifying the fix takes even more time than development. Further, the large number of bugs early in the development process and the many thousands of error messages generated can be completely overwhelming. In recent years, EDA vendors have focused more on speeding up debug with more precise error messages and better management of large numbers of warnings and errors.

In a recent talk with Cristian Amitroaie, CEO of AMIQ EDA, he mentioned that this is an area of great interest among his customers. His team has put considerable thought and effort into this challenge, producing some valuable new features. Cristian mentioned specifically a comparison and filtering mechanism recently added to AMIQ’s Verissimo SystemVerilog Testbench Linter. You may remember that we discussed this tool about a year ago; it checks SystemVerilog verification code using more than 500 rules. Verissimo finds erroneous and dubious code constructs, enforces consistent coding styles across projects, and fosters reuse. It can be run from a command line or within AMIQ’s flagship product, the Design and Verification Tools (DVT) Eclipse Integrated Development Environment (IDE). Users can easily enable and disable rules, add custom rules, execute the checks, and debug the errors within the IDE’s graphical environment.

AMIQ encourages users to run the testbench lint checks early in the verification process, often before all the code is written. If a user runs Verissimo early and often, code development can be an orderly process. However, it is rare that the results will cover only new code personally written by the user running the tool. Many designs are based on previous generations of chips, with extensive reuse of testbench code. Multiple engineers may also work on the same parts of the testbench. The result may be that running lint checks produces a lot of failure messages, and many of these may not be relevant to the changes being made and the new code being added. Users need ways to filter the messages and focus on the right areas. As they analyze the rule violations, debug the failures, and make fixes in the code, they want to be able to confirm these fixes without being distracted by all the other messages that are deliberately being ignored. It is also common to enable and disable rules as the project evolves, adding another level of possible confusion to the debug process. These are exactly the sorts of challenges that the lint compare and filter feature is intended to address.

As Cristian explains it, the concept is easy to understand. Users run the Verissimo linter on the testbench code to establish a “baseline” report that may have a whole bunch of violation messages. After some violations of particular interest are debugged and fixed or after some new code is added, the lint checks are run again, and a “current” report is generated. In most cases, this new report will also have many messages, so it’s hard to see whether the code has improved or degraded after the changes. The compare step examines the baseline and current reports using a clever algorithm that clusters failures into several categories. Users can then use filters to intelligently look as what changed from one run to the next. Showing violations present in the baseline but not in the current report is a quick and easy way to verify that the intended fixes worked. Similarly, showing violations in the current report but not in the baseline reveals new problems introduced by the code changes. In either case, the hundreds or thousands of violations common to the two reports are filtered out. Some of these may be addressed later in the project or by other engineers working on the testbench, but in the meantime they are “noise” and ignoring them is a big productivity boost.

The filters also make clear the effects of changing lint rules. If a rule from the baseline run is disabled in the current run, users can filter to view the associated violation messages. If a rule is added for the current run, users can view just the associated new violations. Users can save all the reports and all the compare results generated throughout the project to show verification progress over time. This is surely of interest to managers, who want to ensure that testbench linting is adding value while not forcing reviews of working old code that they do not want to touch. The net effect is that engineers can focus on linting only their own testbench code without being distracted by issues in reused code or code being developed by others. Cristian points out that filtering is a much safer approach than waiving violations not of immediate interest. It’s easy to leave waivers in place and therefore never examine the deferred results. Filtering hides these issues to speed debug, but they can be viewed at any time by looking specifically at the violations common to the baseline and current reports.

You can see how the lint compare and filter feature works with a demo movie. I didn’t really appreciate how valuable this is until I saw Verissimo in action. I congratulate the AMIQ team for taking this step to remove a significant barrier in the verification and debug process. As always, I thank Cristian for his time and his insight.

To learn more, visit https://www.dvteclipse.com/products/verissimo-linter.

Also Read

Debugging Hardware Designs Using Software Capabilities

Automatic Documentation Generation for RTL Design and Verification

An Important Next Step for Portable Stimulus Adoption


Navigating Memory Choices for Your Next Low-Power Design

Navigating Memory Choices for Your Next Low-Power Design
by Mike Gianfagna on 02-27-2020 at 10:00 am

Memory options

Choosing a memory architecture can be a daunting task. There are many options to choose from, each with their own power, performance, area and cost profile. The right choice can make a new design competitive and popular in the market. The wrong choice can doom the whole project to failure.

Vadhiraj Sankaranarayanan, senior technical marketing manager, Solutions Group at Synopsys has published a technical bulletin that should provide a lot of help and guidance for your next memory decision, especially of it’s focused on low power (which almost everything is these days).  Entitled Key Features Designers Should Know About LPDDR5, Vadhiraj’s piece explores the advantages of a popular new JEDEC standard, LPDDR5.

Before getting into some of the details of LPDDR5, Vadhiraj provides a good overview of the choices available today and some of the history regarding how these options evolved. You can get all the details by reading the piece, but suffice it to say there are a lot of choices, each with a long list of pros and cons.

The balance of the piece discusses the details of LPDDR5, highlighting its features and diving into how many of those features work. The LPDDR specification addresses the middle piece of the diagram, above – Mobile DDR. As stated by Vadhiraj, “LPDDR DRAMs provide a high-performance solution with significantly low power consumption, which is a key requirement for mobile applications such as tablets, smartphones, and automotive.” The best way to understand the benefits of a new standard is to compare it to the previous generation. Doing that with LPDDR5 vs. LPDDR4 yields the diagram, below. More flexibility, more capacity, more speed, less power.

You can learn a lot about the architecture and benefits of LPDDR5 by reading Vadhiraj’s technical bulletin. To whet your appetite, here are some interesting facts about LPDDR5:

  • Dynamic voltage scaling (DVS) is a method to modify, on-the-fly, the operating voltage of a device to match the varying needs of the system. LPDDR5 supports two core and I/O voltages through DVS (1.05V and 0.5V) for high-frequency operation and 0.9V and 0.3V for lower frequencies
  • LPDDR5 adopts a new clocking scheme, where the clock runs at one fourth the data-strobe frequency at speeds higher than 3200 Mbps, and at half the data-strobe frequency at speeds under 3200 Mbps
  • Decision feedback equalizers (DFEs) reduce inter-symbol interference on received data to improve the margin. LPDDR5 DRAMs have a single-tap DFE to improve the margins for the write data, thereby enhancing the robustness of the memory channel
  • Write X is a power-saving feature that allows the transfer a specific bit pattern (such as an all-zero pattern) to contiguous memory locations very quickly. LPPDDR5 supports Write X

As mentioned, you can learn a lot more from Vadhiraj’s technical bulletin. Synopsys provides additional resources on the topic. There is a white paper on DDR SDRAM memories and Vadhiraj conducted a webinar on DDR5 and LPDDR5 that can be viewed as well.

Also Read:

Hybrid Verification for Deep Sequential Convergence

Edge Computing – The Critical Middle Ground

How Good is Your Testbench?