SemiWiki – Page 409 – The Open Forum for Semiconductor Professionals

March 2, 2020August 22, 2024

GLOBALFOUNDRIES Sets a New Bar for Advanced Non-Volatile Memory Technology

GLOBALFOUNDRIES Sets a New Bar for Advanced Non-Volatile Memory Technology
by Mike Gianfagna on 03-02-2020 at 6:00 am
Categories: GlobalFoundries, IoT, Semiconductor
4 Comments

Whether it’s the solid-state disk in your laptop, IoT/automotive hardware or edge-based AI, embedded non-volatile memory (eNVM) is a critical building block for these and many other applications. The workhorse technology for this capability has typically been NOR flash (eFlash), but a problem looms as eFlash presents challenges to scale economically below the 28nm node. That’s why a recent press release from GLOBALFOUNDRIES (GF) caught my attention:

GLOBALFOUNDRIES Delivers Industry’s First Production-ready eMRAM on 22FDX Platform for IoT and Automotive Applications.

Embedded magnetoresistive non-volatile memory (eMRAM) is a mouthful. I did a bit of research, and MRAM was presented back in 1974, when IBM developed a component called a Magnetic Tunnel Junction (MTJ). The device had two ferromagnetic layers separated by a thin insulating layer and a memory cell was created by the intersection of two wires (i.e., a row line and a column line) with an MJT between them. MRAMs can combine the high speed of SRAM, the storage capacity of DRAM, and the nonvolatility of eFlash at low power, so a production embedded implementation of the technology below 28nm is a big deal.

First, a bit about the implementation technology. 22FDX is a 22nm fully
depleted silicon-on-insulator (FD-SOI) technology from GF. Another mouthful. FD-SOI delivers near FinFET-like performance without the design and manufacturing complexities of FinFET. The figure at the right summarizes the benefits of GF’s 22FDX.

“We continue our commitment to differentiate our FDX platform with robust, feature rich solutions that allow our clients to build innovative products for high performance and low power applications,” said Mike Hogan, senior vice president and general manager of Automotive and Industrial Multi-market at GLOBALFOUNDRIES. “Our differentiated eMRAM, deployed on the industry’s most advanced FDX platform, delivers a unique combination of high performance RF, low power logic and integrated power management in an easy-to-integrate eMRAM solution that enables our clients to deliver a new generation of ultra-low power MCUs and connected IoT applications.”

I caught up with Martin Mason, senior director automotive, industrial and multi
market BU at GF to get a bit more detail about their new, production-ready eMRAM. He took me through a very robust qualification process for the device, including a bit error rate in the 6E-6 range, robust data retention after 5X solder reflows, stand-by data retention sufficient for industrial-grade and automotive-grade 2 applications and multiple magnetic immunity tests. Martin summed up our discussion like this, “22FDX with embedded MRAM is an enabling technology platform for Intelligent IoT (IIoT), wearables, MCUs and advanced automotive products. We have a qualified Flash-like robust eMRAM process with our first client single product MRAM tape out in fab, multiple
clients running MRAM test chips and many silicon validated MRAM macros
(4Mb-48Mb). Unlike other eMRAM solutions we built GFs 22FDX MRAM to be very robust with -40C to 125C operating range, high endurance and long data
retention, passing five rigorous real-world (5x) solder reflow tests while maintaining leading magnetic immunity. The GF eMRAM is very much like eFLASH – only better, with faster read and write times and reduced mask count manufacturing compared with traditional embedded Flash technologies.” The diagram to the right summarizes GF’s new eMRAM vs. eFlash.

GF reports they are working with several clients with multiple production tape-outs scheduled in 2020 using the new, production-ready eMRAM technology in 22FDX. GF’s state-of-the-art 300mm production line at Fab 1 in Dresden, Germany will support volume production for these projects. They also report custom design kits featuring drop-in, silicon validated MRAM macros ranging from 4 to 48 mega-bits, along with the option of MRAM built-in-self-test support is available today from GF and their design partners.

Looking ahead, GF expects its scalable eMRAM to be available on both FinFET and future FDX platforms as a part of the company’s advanced eNVM roadmap. If you need an eFlash alternative below 28nm this is definitely something to look into.

Also Read:

Specialized Accelerators Needed for Cloud Based ML Training

The GlobalFoundries IPO March Continues

Magnetic Immunity for Embedded Magnetoresistive RAM (eMRAM)

March 1, 2020August 9, 2020

Coronavirus Chops SPIE Litho EUV Conference

Coronavirus Chops SPIE Litho EUV Conference
by Robert Maire on 03-01-2020 at 6:00 am
Categories: China, Lithography, Semiconductor Advisors, Semiconductor Services
3 Comments

Corona Curtails already quiet SPIE Litho conference
Our best guess is that attendance was off by 30% from last years SPIE conference due to a lack of travelers from many Asian areas obviously out of Corona fear. Even Intel, which is a few miles away was a virtual no-show with a mass cancellation.

More importantly, virtually all after hours parties and events were canceled with a handful of exceptions.

ASML, Nikon, TEL and KLA all canceled their events.
Aside from the drop in attendance, the conference presentations seemed more subdued as we are over and done with the EUV controversy, hype and then celebration of prior years. EUV is now almost as boring and mundane as DUV because its in production.

EUV is over and done….
We noted that there seemed to be a reduction in the number of EUV presentations as chip makers have figured out their issues and are likely keeping their solutions for themselves rather than broadcasting their questions and uncertainty in paper proposals. Gone are the controversies and speculation.

There is still a lot of “mopping up” to do about resist, stochastic errors, line edge roughness, pellicles etc; etc;…but it works.

The final, and sure sign that EUV has “grown up” and is done was a two hour retrospective panel of gray hair industry elders and pioneers reviewing their roles in the 35 year EUV struggle, much like war veterans.

The war is over, the good guys (Moore’s Law) won….on to the next battle

High NA + High Power = High price for ASML
The next battle….There were some early “teases” about the next completely different version of the EUV tool, the High NA tool. While the moniker is “High NA” the real truth is a “high power”, “high throughput” and therefore “high priced” tool.

The biggest change is not just the High NA but the optical “guts” that are completely different so as to reduce the loss of photons between the source of the EUV and the target wafer. How much of an improvement you ask? Maybe four, five to ten times the power…..yikes! Could that be like going from the current 250 Watt source to 1000 -2500 watts equivalent…holy EUV Batman…will the stage keep up, will the track keep up, will wafers get fried?..stay tuned.

Given what could be an immense throughput increase the “EXE3600” could make the current NXE series look like jalopies compared to a Tesla.

This would make an ASML salesman’s job very easy as the math could easily support a tool price starting with at least a “2” if not a “3” and it would still be more cost effective than current tools.

While the NXE series has been a great proof that EUV works in production the EXE will be the real money maker for ASML. Of course ASML will be quick to point out that the new 1000KG lenses cost a lot to make but we would bet that the gross margins will be higher.

Maybe we should rename the “High NA” the “High GM” tool…..

Lam announces “dry” resist
Much as the etch sector went from “wet” etch to “dry” plasma etch decades ago, so Lam hopes they can dry out the resist sector. Our guess is that it will be a lot harder. While Lam described it as a “breakthrough” we would describe it as a slow uphill slog to try to get a slow reactionary chip industry to change its many decades acceptance of liquid resist and track machines. Lam’s project has been in the works for a number of years now and we will likely wait several more years to see if this will work.

We would point out that the biggest change in resist has been a private company called Inpria, that recently raised a $30M round and has been the talk of SPIE for several years already and is still not in production (that we know of…). Inpria has many major chipmakers as investors/partners versus Lam working with academic IMEC.

The industry needs to move from organic to inorganic or metal based resists , amplified resists etc; but it brings a lot of baggage, such as nasty metals like tin which can “poison” tools that it runs through.

We think its a great, needed idea that gets Lam into the “litho” cell and closer to patterning but we would come back and revisit to see how its doing in 3 or 4 years….it will not impact Lam’s financials for quite a while.

Lasertec is filling a significant void in reticle inspection
During a presentation at SPIE, it was revealed that Intel has been in full production with the Lasertec EUV reticle inspection tool since the December quarter.

The tool is apparently doing very well, finding lots of defects and moving EUV reticle production ahead. Its certainly a lot better than the alternative which is……nothing…..as KLA’s E beam and actinic tools are still in development.

Even a turtle can beat a cheetah if given enough of a head start or at the very least give the cheetah a run for its money…..Its likely that Lasertec will sell a lot of tools and make a lot of money before the KLA tools come out and also obviously alter the market dynamics and values after they do come out.

None of this matters as Corona crushes stocks
Not that anything at the SPIE conference matters anyway as the Corona crisis is crushing the stocks.

We would point out that High NA (or High GM) ASML tools will not start to show up for a couple of years and hopefully by that time Corona will be an old memory.

Similarly, Lam might start to see a few dollars of revenue in a few years if they can get dry resist to work, and finally, KLA will likely get E Beam and Actinic both out in a couple of years and certainly sell a bunch.

The bottom line is that nothing we saw at SPIE matters to the stocks for at least a couple of years and corona dominates the short term headlines anyway.

We had previously stated that we thought the corona impact was being underestimated and we think the latest news is starting to underscore that view. We would not be surprised to hear pre-announcements of worse than expected Q1 numbers as China and potentially the rest of Asia grind to a virtual halt.

It continues to get uglier…..

February 28, 2020July 6, 2020

Talking Sense With Moortec – The Future Of Embedded Monitoring Part 2

Talking Sense With Moortec – The Future Of Embedded Monitoring Part 2
by Stephen Crosher on 02-28-2020 at 10:00 am
Categories: IP, Moortec
4 Comments

Stephen Crosher Moortec CEO Square High Res

The rate of product development is facing very real challenges as the pace of silicon technology evolution begins to slow. Today, we are squeezing the most out of transistor physics, which is essentially derived from 60-year-old CMOS technology. To maintain the pace of Moore’s law, it is predicted that in 2030 we will need transistors to be a sixth of their current size. Reducing transistor size increases density, which itself presents issues when considering the relative power for a given area of silicon will increase, as described through Dennard Scaling. When combined with the limitations of parallelism for multi-core architectures, our ability to develop increasingly energy efficient silicon is simply going the wrong way!

As we descend through the silicon geometries we see that the variability of the manufacturing process for the advanced nodes is widening. The loosening of our grip to control thermal conditions presents increasing challenges, this means we cannot simply assume a power reduction dividend by moving to the next new node. The dynamic fluctuation of voltage supply levels throughout the chip threatens to starve the very operation of the digital logic that underpins the chip’s functionality. These factors, combined with the increasing urgency to reduce the power consumption of super-scale data systems and seek efficiencies to reduce global carbon emissions in both the manufacture and the use of electronics, means that we must think smart and seek new approaches. We need to innovate.

C’mon, we’ve heard all this before!
I’m not the first to report our pending technological gloom and won’t be the last. The ‘gloom mongering’ over the silicon industry has happened since, well, the beginning of the silicon industry!

As a species we can be smart. We know that if we are able to see and understand something, we have a better chance of controlling it. The more data we have the more efficiencies can be gained.

The nature of monitoring systems has two phases and is a reflection of our inherent curiosity as humans. Firstly, there is ‘realisation.’ The discovery that upon introducing the ability to view within an entity, that was otherwise considered a black box, brings enlightenment and presents us with an opportunity. Secondly, there is the ‘evolution’ phase. Once data is being gathered from a system (that up until this point hadn’t been visible), we seek to improve the quality, accuracy and granularity of the data. Increasing the ‘data intelligence’ of the information we are gathering, contextualising the dynamic circuit conditions, aiming to identify trends and pull out signatures or patterns within a sea of data. See previous blog, ‘Talking Sense with Moortec – The Future of Embedded Monitoring Part 1′

What’s next?
Information of any value needs to be good to be effective. I have had many conversations outlining that the perfect embedded monitoring system must be infinitely accurate, infinitely small, zero latency and zero power! Although as a provider of embedded monitoring subsystems for the advanced nodes we’re not there yet, we are however trying! Until we reach that panacea, SoC developers need to be aware of the area overhead to sensor systems. Although sensors are relatively small, at their core they are often analog by design which doesn’t necessarily scale with reducing geometries, unlike the neighbouring logic circuits.

So, for this reason we must be aware and seek circuit topologies and schemes that reduce the silicon area occupied by the sensors themselves. To minimise area impact and best utilise in-chip sensors in terms of placement, quite often such matters are best discussed and considered during the architecting phases of SoC development, rather than as a floor-planning afterthought. Increasingly sensor subsystems are becoming the critical foundation to chip power management and performance optimisation, as getting it wrong can lead to existential device stress and potentially immense reputational damage to companies within the technological food chain that create the larger product or system used in today’s automotive, consumer and high performance computing products. Therefore, no longer can we consider monitoring as a low priority endeavour for development teams and they progress through the design flow.

So, in our attempts to continue Moore’s Law and limit Dennard scaling we need to innovate and of course we will. However, such innovative solutions will come from having a clearer view of the dynamic conditions deep within the chip rather than how the core function of the chip is implemented itself.

If you missed the first part of this blog you can read it HERE

Watch out for our next blog entitled Hyper-scaling of Data Centers – The Environmental Impact of the Carbon ‘Cloud’ which will be dropping mid March!

February 28, 2020March 16, 2022

An Important Step in Tackling the Debug Monster

An Important Step in Tackling the Debug Monster
by Daniel Nenni on 02-28-2020 at 6:00 am
Categories: AMIQ EDA, EDA
1 Comment

If you’ve spent any time at all in the semiconductor industry, you’ve heard the statement that verification consumes two-thirds or more of the total resources on a chip project. The estimates range up to 80%, in which case verification is taking four times the effort of the design process. The exact ratio is subject to debate, but many surveys have consistently shown that verification dominates chip development. Less widely known is that many of these same surveys identify debug as the dominant verification task. Sure, it takes a lot of time to write testbenches, tests, monitors, scoreboards, assertions, and so on. Modern verification methodologies are quite effective at using these elements to find bugs. But investigating every test failure, determining the root cause, fixing the bug, and verifying the fix takes even more time than development. Further, the large number of bugs early in the development process and the many thousands of error messages generated can be completely overwhelming. In recent years, EDA vendors have focused more on speeding up debug with more precise error messages and better management of large numbers of warnings and errors.

In a recent talk with Cristian Amitroaie, CEO of AMIQ EDA, he mentioned that this is an area of great interest among his customers. His team has put considerable thought and effort into this challenge, producing some valuable new features. Cristian mentioned specifically a comparison and filtering mechanism recently added to AMIQ’s Verissimo SystemVerilog Testbench Linter. You may remember that we discussed this tool about a year ago; it checks SystemVerilog verification code using more than 500 rules. Verissimo finds erroneous and dubious code constructs, enforces consistent coding styles across projects, and fosters reuse. It can be run from a command line or within AMIQ’s flagship product, the Design and Verification Tools (DVT) Eclipse Integrated Development Environment (IDE). Users can easily enable and disable rules, add custom rules, execute the checks, and debug the errors within the IDE’s graphical environment.

AMIQ encourages users to run the testbench lint checks early in the verification process, often before all the code is written. If a user runs Verissimo early and often, code development can be an orderly process. However, it is rare that the results will cover only new code personally written by the user running the tool. Many designs are based on previous generations of chips, with extensive reuse of testbench code. Multiple engineers may also work on the same parts of the testbench. The result may be that running lint checks produces a lot of failure messages, and many of these may not be relevant to the changes being made and the new code being added. Users need ways to filter the messages and focus on the right areas. As they analyze the rule violations, debug the failures, and make fixes in the code, they want to be able to confirm these fixes without being distracted by all the other messages that are deliberately being ignored. It is also common to enable and disable rules as the project evolves, adding another level of possible confusion to the debug process. These are exactly the sorts of challenges that the lint compare and filter feature is intended to address.

As Cristian explains it, the concept is easy to understand. Users run the Verissimo linter on the testbench code to establish a “baseline” report that may have a whole bunch of violation messages. After some violations of particular interest are debugged and fixed or after some new code is added, the lint checks are run again, and a “current” report is generated. In most cases, this new report will also have many messages, so it’s hard to see whether the code has improved or degraded after the changes. The compare step examines the baseline and current reports using a clever algorithm that clusters failures into several categories. Users can then use filters to intelligently look as what changed from one run to the next. Showing violations present in the baseline but not in the current report is a quick and easy way to verify that the intended fixes worked. Similarly, showing violations in the current report but not in the baseline reveals new problems introduced by the code changes. In either case, the hundreds or thousands of violations common to the two reports are filtered out. Some of these may be addressed later in the project or by other engineers working on the testbench, but in the meantime they are “noise” and ignoring them is a big productivity boost.

The filters also make clear the effects of changing lint rules. If a rule from the baseline run is disabled in the current run, users can filter to view the associated violation messages. If a rule is added for the current run, users can view just the associated new violations. Users can save all the reports and all the compare results generated throughout the project to show verification progress over time. This is surely of interest to managers, who want to ensure that testbench linting is adding value while not forcing reviews of working old code that they do not want to touch. The net effect is that engineers can focus on linting only their own testbench code without being distracted by issues in reused code or code being developed by others. Cristian points out that filtering is a much safer approach than waiving violations not of immediate interest. It’s easy to leave waivers in place and therefore never examine the deferred results. Filtering hides these issues to speed debug, but they can be viewed at any time by looking specifically at the violations common to the baseline and current reports.

You can see how the lint compare and filter feature works with a demo movie. I didn’t really appreciate how valuable this is until I saw Verissimo in action. I congratulate the AMIQ team for taking this step to remove a significant barrier in the verification and debug process. As always, I thank Cristian for his time and his insight.

To learn more, visit https://www.dvteclipse.com/products/verissimo-linter.

Also Read

Debugging Hardware Designs Using Software Capabilities

Automatic Documentation Generation for RTL Design and Verification

An Important Next Step for Portable Stimulus Adoption

February 27, 2020July 18, 2025

Navigating Memory Choices for Your Next Low-Power Design

Navigating Memory Choices for Your Next Low-Power Design
by Mike Gianfagna on 02-27-2020 at 10:00 am
Categories: EDA, Mobile, Synopsys

Choosing a memory architecture can be a daunting task. There are many options to choose from, each with their own power, performance, area and cost profile. The right choice can make a new design competitive and popular in the market. The wrong choice can doom the whole project to failure.

Vadhiraj Sankaranarayanan, senior technical marketing manager, Solutions Group at Synopsys has published a technical bulletin that should provide a lot of help and guidance for your next memory decision, especially of it’s focused on low power (which almost everything is these days). Entitled Key Features Designers Should Know About LPDDR5, Vadhiraj’s piece explores the advantages of a popular new JEDEC standard, LPDDR5.

Before getting into some of the details of LPDDR5, Vadhiraj provides a good overview of the choices available today and some of the history regarding how these options evolved. You can get all the details by reading the piece, but suffice it to say there are a lot of choices, each with a long list of pros and cons.

The balance of the piece discusses the details of LPDDR5, highlighting its features and diving into how many of those features work. The LPDDR specification addresses the middle piece of the diagram, above – Mobile DDR. As stated by Vadhiraj, “LPDDR DRAMs provide a high-performance solution with significantly low power consumption, which is a key requirement for mobile applications such as tablets, smartphones, and automotive.” The best way to understand the benefits of a new standard is to compare it to the previous generation. Doing that with LPDDR5 vs. LPDDR4 yields the diagram, below. More flexibility, more capacity, more speed, less power.

You can learn a lot about the architecture and benefits of LPDDR5 by reading Vadhiraj’s technical bulletin. To whet your appetite, here are some interesting facts about LPDDR5:

Dynamic voltage scaling (DVS) is a method to modify, on-the-fly, the operating voltage of a device to match the varying needs of the system. LPDDR5 supports two core and I/O voltages through DVS (1.05V and 0.5V) for high-frequency operation and 0.9V and 0.3V for lower frequencies
LPDDR5 adopts a new clocking scheme, where the clock runs at one fourth the data-strobe frequency at speeds higher than 3200 Mbps, and at half the data-strobe frequency at speeds under 3200 Mbps
Decision feedback equalizers (DFEs) reduce inter-symbol interference on received data to improve the margin. LPDDR5 DRAMs have a single-tap DFE to improve the margins for the write data, thereby enhancing the robustness of the memory channel
Write X is a power-saving feature that allows the transfer a specific bit pattern (such as an all-zero pattern) to contiguous memory locations very quickly. LPPDDR5 supports Write X

As mentioned, you can learn a lot more from Vadhiraj’s technical bulletin. Synopsys provides additional resources on the topic. There is a white paper on DDR SDRAM memories and Vadhiraj conducted a webinar on DDR5 and LPDDR5 that can be viewed as well.

Also Read:

Hybrid Verification for Deep Sequential Convergence

Edge Computing – The Critical Middle Ground

How Good is Your Testbench?

February 27, 2020September 29, 2020

Mentor Helps Mythic Implement Analog Approach to AI

Mentor Helps Mythic Implement Analog Approach to AI
by Tom Simon on 02-27-2020 at 6:00 am
Categories: AI, EDA, Siemens EDA

The entire field of Artificial Intelligence (AI) has resulted from what is called “first principles thinking”, where problems are re-examined using a complete reassessment of the underlying issues and potential solutions. It is a testament to how effective this can be that AI is being used for a rapidly expanding number of applications that previously challenged or defied traditional approaches in programming. Even using conventional CPU based architectures AI offers enormous advantages over conventional sequential “instruction based” coding in a wide range of fields, including autonomous driving, sensor data analysis, resource optimization, IoT, safety systems, etc. Yet even more impressive improvements in AI performance have come from the use of optimized AI processors.

Some of these AI processors rely on several well understood concepts that can improve the efficiency of the types of computations made in a neural network. Adding parallelism is the first approach, the other is to move memory closer to the processing elements. The AI chip company Mythic is making AI accelerator chips that use these proven methods and adds to them with an ingenious new “first principles” approach.

The seed for their idea is that in Ohm’s law V=IR is multiplication. The multiply-accumulate (MAC) operation is the mainstay of AI neural network implementation. Digital multiplication is cumbersome, and frequently inefficient and slow, even if reduced to 8-bit precision which – works well enough for many recognition and inference tasks.

Mythic has introduced analog computation as the method for performing MAC by using Flash memory cells as precision resistors to hold training coefficients. When voltage values are run through the flash memory cells the output current is the result of an analog computation. Using the memory cell as a computation unit saves not only memory access, but also significantly reduces computation time.

However, this requires significant analog design expertise, especially in designing memory cells and analog to digital converters. Accuracy is essential and it is extremely important to ensure that the entire computation is performed accurately.

Because this is a mixed signal design, SPICE simulation alone is not adequate for verification. Mixed signal simulation is called for. Mentor, a Siemens business and Mythic just made an announcement about Mythic’s use of Mentor’s Analog FastSPICE (AFS) and Symphony mixed signal simulation platform to simulate and verify the thousands of ADCs that are needed in their designs, and to verify overall chip performance. This involves RTL simulation along with the analog simulation.

Mythic chose Mentor’s Analog FastSPICE because of its proven speed and accuracy at nanometer-scale. It has demonstrated excellent correlation with silicon when performing full spectrum device noise analysis. The Symphony mixed signal simulation platform helps to verify the integration of digital and analog logic in their Intelligence Processing Units (IPUs). Mythic say they have been very pleased with the intuitive use model, powerful debugging features and configuration support.

The development of electronic systems is a layered process involving a chain of steps vital for reaching success. First principles are not only being used in the last step of chip design, they were applied by Mentor as well in the development of their enabling solutions. It’s conceivable that if Mythic wanted to apply their innovative approach and the needed supporting tools were not available, they might not have had the technical success they are enjoying today. The full announcement of Mythic’s use of Mentor’s analog and mixed signal solutions is available on the Mentor Website.

February 26, 2020January 8, 2021

Thermal Issues and Solutions for 3D ICs: Latest Updates and Future Prospect

Thermal Issues and Solutions for 3D ICs: Latest Updates and Future Prospect
by Mike Gianfagna on 02-26-2020 at 10:00 am
Categories: Ansys, Inc., EDA, Events

At DesignCon 2020, ANSYS held a series of sponsored presentations. I was able to attend a couple of them. These were excellent events with the material delivered by talented and high-energy speakers. The DesignCon technical program has many dimensions beyond the conference tracks. One of the presentations dealt with 3D ICs. It was presented by Professor Sung-Kyu Lim from the School of Electrical and Computer Engineering at the Georgia Institute of Technology.

The work presented by Professor Lim is funded by DARPA, Arm and ANSYS. I should also point out Professor Lim’s student, Lingjun Zhu contributed to this work as well. The discussion focused on thermal, IR-drop and PPA analysis of 3D ICs built with Arm A7 and A53 processors. Since 3D IC can mean many things, Professor Lim’s focus was on bare die stacking. He reviewed several designs using these techniques from companies such as GLOBALFOUNDRIES, Intel and TSMC.

First, a bit about the design flow used for these test cases. Professor Lim took a practical approach here, adapting commercially available 2D IC design tools to a 3D design problem. Logic/memory designs were decomposed into two tiers, one for logic and one for memory. First, the memory tier was designed, resulting in a pinout for that tier. Then a double metal stack was created. This allowed the memory tier and the logic tier to communicate through dense connections using TSVs, face-to-face pads, or monolithic inter-tier vias (MIVs). Next, the logic tier was placed and routed along with connections from the memory tier that were also represented in the logic tier.

The results of this approach were discussed for an Arm Cortex A7 design, containing L1, L2 cache and logic. All of the L2 and some of the L1 cache were placed on the memory tier and the rest of the design was implemented on the logic tier. Interconnect between the cache and logic was shortened quite a bit as a result of this approach. A similar process was applied to a Cortex A53 design. See below.

The results of these experiments yielded a smaller footprint thanks to the two-tier approach and a performance improvement thanks to the shorter routes. In turn, this resulted in more power, higher IR-drop and increased temperature, thanks to the faster operating speed. The results are summarized below.

Experiments were run on power savings as well. In this case an LDPC error correction circuit was used. Due to shorter wire lengths and smaller capacitors, a 39% power saving was achieved, illustrating another advantage of 3D design.

Going back to the Arm designs, below are heat maps of the various experiments between 2D and 3D to facilitate thermal comparisons.

Professor Lim then discussed the tool flow used for these analyses. ANSYS RedHawk was used extensively to perform many tasks, including power, thermal and IR-drop analysis. All of this work was based on very fine-grained analysis of each routing segment and device across many temperature profiles. Below is an overview of the flow.

Professor Lim concluded his talk with a discussion about the impact thermal awareness could have on IC design. He proposed a temperature-aware timing closure flow that would update circuit performance based on actual temperature gradients, which can now be calculated. This approach could produce designs that are much more robust in real-world environments. Below is an overview of the proposed flow.

To learn more about thermal-induced reliability challenges and solutions for advanced IC designs,please check out this recent ANSYS webinar.

February 26, 2020July 18, 2025

Hybrid Verification for Deep Sequential Convergence

Hybrid Verification for Deep Sequential Convergence
by Bernard Murphy on 02-26-2020 at 6:00 am
Categories: EDA, Synopsys
2 Comments

I’m always curious to learn what might be new in clock domain crossing (CDC) verification, having dabbled in this area in my past. It’s an arcane but important field, the sort of thing that if missed can put you out of business, but otherwise only a limited number of people want to think about it to any depth.

The core issue is something called metastability and arises in systems which must intermingle multiple clock frequencies – which is pretty much any kind of system today. CPUs run at one frequency, interfaces to external IOs run at a whole galaxy of different frequencies, AI accelerators maybe another frequency. Clockwise, our systems are all over the map.

When data is exchanged between these different domains, metastability gremlins can emerge, random chances that individual bits can be dropped or delayed, neither quite making it through the gate to the other side nor not making it. Bitwise there are solutions to this problem, metastability hardened gates (actually registers), though these are also statistical in their ability to limit problems. They’re better than crossings that aren’t hardened, but still not perfect, because this is engineering where perfect is never possible.

Still, if you improve matters to the point that the design meets some acceptable time between failures, everything should be OK, right?

Afraid not. There’s a problem in CDC called convergence. You have two independent signals from one clock domain, crossing into another. Each separately passes through a metastability hardened gate. They later combine in some calculation in the new domain – maybe “are these signals equal?”. This could be multiple clock cycles later.

Now you may (again statistically) hit a new problem. Metastability hardening ensures (statistically) that a signal gets through or doesn’t get through – none of this “partly getting through”. But in doing that, what emerges on the other side is not always faithful to what went in. It might be delayed or even dropped. Or not –accurately reflecting what went in is also an option.

So when you recombine two signals, separately gated like this you can’t be sure they are fully in-sync with the way they were on the other side of the gates. On the input side they might have been equal, but when they’re recombined, they’re not. Or at least not initially; maybe they become equal if you wait for a few cycles. At least as long at the inputs on the other side didn’t change in the meantime.

In VC SpyGlass we’d do a static analysis complemented by some level of formal analysis to try to catch these cases. That isn’t a bad approach as long as re-combination happens within one cycle. But who’s to say such a problem may not crop up after many cycles? Try to trace this using formal methods and you run into the usual problem – analysis explodes exponentially.

The better method, now becoming more common, is a combination of static and dynamic analysis. Use static CDC analysis to find crossings and recombination suspects, then use dynamic analysis to test these unambiguously, at least to the extent that you can cover them.

Synopsys now provides a flow for this, combining VC SpyGlass and VCS analysis. This is a refinement of a commonly used technique called a jitter injection flow, a method to simulate these random offsets. That method randomly injects random delays into the simulation when data input to a gate changes.

There are some technical challenges with the standard injection method – you should watch the webinar for more detail. Synopsys say they have made improvements around these limitations. An important challenge that jumped out at me is that there is no obvious way to quantify coverage in that approach. How do you know when you’ve done enough testing?

Himanshu Bhatt (Sr Mgr AE at Synopsys) explains in the webinar how they have improved on traditional jitter injection testing and also on the coverage question and debug facilities they provide to trace back problems to metastability root causes. You can register to watch the webinar HERE.

February 25, 2020July 6, 2020

Webinar – FPGA Native Block Floating Point for Optimizing AI/ML Workloads

Webinar – FPGA Native Block Floating Point for Optimizing AI/ML Workloads
by Tom Simon on 02-25-2020 at 10:00 am
Categories: Achronix, AI, eFPGA, FPGA

Block floating point (BFP) has been around for a while but is just now starting to be seen as a very useful technique for performing machine learning operations. It’s worth pointing out up front that bfloat is not the same thing. BFP combines the efficiency of fixed point operations and also offers the dynamic range of full floating point. When examining the method used in BFP I am reminded of several ‘tricks’ used for simplifying math problems. The first that came to mind was the so-called Japanese multiplication method, which uses a simple graphical method for determining products. Another, of course, is the once popular yet now nearly forgotten slide rule.

As will be explained in an upcoming webinar, by Mike Fitton senior director of strategy and planning at Achronix, on the topic of using BFP in FPGAs for AI/ML workloads, BFP relies on normalized fixed point mantissas so that a ‘block’ of numbers used in a calculation all have the same exponent value. In the case of multiplication, only a fixed point multiply is needed on the mantissas and a simple addition is performed on the exponents. The surprising thing about BFP is that it offers much higher speed and accuracy with much lower power consumption than traditional floating point operations. Of course, integer operations are more accurate and use slightly lower power, but they lack the dynamic range of BFP. According to Mike BFP offers a sweet spot for AI/ML workloads and the webinar will show supporting data for his conclusions.

The requirements for AI/ML training and inference are very different from what is typically needed in DSPs for signal processing. This applies to memory access and also for math unit implementation. Mike will discuss this in some detail and will show how the new Machine Learning Processor (MLP) unit they built into the Speedster7t has native support for BFP and also supports a wide range of fully configurable integer and floating point precisions. In effect their MLP is ideal for traditional workloads, and also excels at AI/ML, without any area penalty. Each one has up to 32 multipliers per MAC block.

Achronix MLPs have tightly coupled memory that facilitates AI/ML workloads. Each MLP has a local 72K bit block RAM and a 2K bit register file. The MLP’s math blocks can be configured to cascade memory and operands without using FPGA routing resources. Mike will have a full description of the math block’s features during the webinar.

The Speedster7t is also very interesting because of the high data rate Network on Chip (NoC) that can be used to move data between MLPs and/or to other blocks or data interfaces on the chip. The NoC can move data without consuming valuable FPGA resources and avoids bottlenecks inside the FPGA fabric. The NoC has multiple pipes that are 256 bits wide running at 2GHz for a 512G data rate. They can be used to move data directly from the peripherals, like the 400G Ethernet, directly to the GDDR6 memories without requiring the use of any FPGA resources.

Achronix will be making a compelling case for why the native implementation of BFP in their architecture that includes many groundbreaking features is a very attractive choice for AI/ML and a wide range of other more traditional FPGA applications such as data aggregation, IO bridging, compression, encryption, network acceleration, etc. The webinar will include information on real world benchmarks and test cases that highlight the capabilities of the Speedster7t. You can register now to view the webinar replay here.

February 25, 2020March 25, 2022

Build Custom SoC Assembly Platforms

Build Custom SoC Assembly Platforms
by Bernard Murphy on 02-25-2020 at 6:00 am
Categories: Defacto Technologies, EDA

I’ve talked with Defacto on and off for several years – Chouki Aktouf (CEO) and Bastien Gratreaux (Marketing). I was in a similar line of business back in Atrenta. Now I’m just enjoying myself, I’ve written a few blogs for them. I’ll confess I wondered why they wouldn’t struggle with the same problems we’d had. Script-driven RTL editing, design restructuring, real enough problems for which a solution is needed only infrequently. Recently I had an animated discussion with Chouki and now I believe I get it.

To explain, I need to back up a couple of steps. First, automating SoC assembly and related functions is now very common. A lot of this process is very mechanical – dropping in IPs and hooking up top level connections, easy to automate through a script and a bunch of spreadsheets. And where it isn’t purely bookkeeping, it lends itself very well to further script-driven additions – in hookup for IO, power management, interrupts, in the software interface through register and memory-map definitions.

IP-XACT was going to be the unifying standard behind all of this, and some organizations bought in enthusiastically – NXP, certain groups in ST and some groups in Samsung for example. Multiple IP vendors also bought in. What’s not to like about having standardized interfaces with your customers?

A lot of design houses weren’t so sure. Their in-house solutions worked fine. When it was time to upgrade, they’d work on the next generation of their solution – I had s similar discussion with Qualcomm years ago – but they weren’t comfortable with going all the way to IP-XACT. They liked the flexibility of being able to go outside the lines if they needed. They also had a lot of legacy databases in CSV and other formats they knew how to read, which would be a hassle to manage in switching to the standard.

But they still liked IP-XACT (along with other views) as a way to get IP from vendors. In other words, they wanted it all. Standards where it suited them, backward compatibility with legacy data, and flexibility to adapt and innovate at their pace, not the pace of an industry standard.

This is not a great starting point for a canned product. It’s a much better recipe for a platform/infrastructure product. Something that will take care of the mechanics of reading and writing multiple formats, from CSV, to Excel, to RTL to IP-XACT, etc, and provide a centralized object model, on top of which you can script to read, write or modify to your heart’s content.

Who cares about this? Pretty much anyone doing SoC design. It doesn’t take a lot of effort to figure out that Apple, Google, Samsung, Qualcomm, storage guys and many others are recruiting people with IP-XACT expertise and/or talking about what they’re doing in a variety of conferences. I’m sure none or few of them are diving head-first into full-blown IP-XACT. I didn’t get this from Defacto, I just did a little searching.

So the big reveal – this is what Defacto is providing. An infrastructure to take care of all the read, modify, update mechanics across all these formats through a unified, persistent datastructure, letting customers build their value-add in scripting on top of APIs to the object model. They also provide a number of implementation-centric functions and checks.

Now that makes sense to me.

You can learn more about Defacto HERE.

Also Read

Another Application of Automated RTL Editing

Analysis and Signoff for Restructuring

Design Deconstruction