Bronco Webinar 800x100 1

Why using new DDR4 allow designing incredibly more efficient Server/Storage applications?

Why using new DDR4 allow designing incredibly more efficient Server/Storage applications?
by Eric Esteve on 08-04-2016 at 12:00 pm

The old one-size-fits-all approach doesn’t work anymore for DDR4 memory controller IP, especially when addressing the enterprise segments, or application like servers, storage and networking. For mobile or high end consumer segments, we can easily identify two key factors: price (memory amount or controller footprint) and power consumption. The enterprise specific requirements are clearly defined and the DDR4 memory sub-system has to support very large capacity, provide as high as possible bandwidth, low latency and comply with Reliability, Availability and Serviceability (RAS) stringent requirements.

Server or storage applications are designed to compute and store large amount of data. It has been proven that using DRAM instead of SSD or HDD to build new generation of server leads to x10 to x100 performance improvements (Apache SPARK, IBM DB2 with BLU, Microsoft In-Memory option, etc.), mostly linked with better latency and bandwidth offered by DRAM. To build these efficient database systems, you just need to be able to aggregate large DDR4 DRAM capacity, and we will see the various options available, like LRDIMM, RDIMM or 3DS architectures. At the DDR4 interface level, new equalization techniques will help supporting higher speed. Larger DRAM capacity multiplied by higher bandwidth is the winning recipe for higher performance compute and storage systems.

But these advanced electronic systems have to be designed on the latest technologies and these are more and more sensible to perturbations, like cosmic particles, meta-stability, signal integrity and many more (just take a look at the picture below!). At the same time, these applications are expected to run H24, 7 days a week and this can be translated into as high as possible RAS characteristics.

If we review the various approached to add memory capacity, the first is certainly to add DDR4 channels to the CPU die. Current servers already support 4 channels per CPU, with a roadmap to 6 or 8 channels. The limits are quite obvious: available PCB area around the chip, CPU ballout and finally silicon area and beachfront.

Is it possible to extend capacity by plugging more DIMM on the same channel? In fact, at DDR4 speeds, every wire becomes a transmission line, so adding more DIMM creates more impedance discontinuities that create reflections, forcing to reduce the speed. The typical max configuration with un-buffered DIMM is 32 GB per channel.

This is not the best option, but you can add more DRAM ranks with Registered DIMM (RDIMM), where the address bus is buffered on each DIMM, and requiring one RDIMM memory buffer chip per DIMM. In this case a typical DDR4 system is limited to 3 slots, 2 ranks/slot and 3DIMM, leading to typical max configuration of 96 GB per channel.

An even better option is to buffer address AND data bus on each DIMM, as the number of ranks is limited by load on DQ (data) bus, creating the Load Reduced DIMM (LRDIMM). In this case, the typical max capacity increases up to 192 GB per channel using 3 Quad-Rank LRDIMM of 8Gb x4 DDR4 devices.

If you want to increase the capacity above the LRDIMM limit, or integrate a more area efficient memory structure, you have to add one dimension, moving to DDR DRAM dies that are 3D stacked using Through Silicon Via (TSV). The master die, at the bottom, controls from 2 to 8 dies, so the CPU memory controller PHY only sees one load. The first benefit is obvious, as you directly gain 2x to 8x capacity of single dies. The system also offers less PCB area and volume (you only use one package for up to 8 dies). Because inter-die loads are better than inter-rank loads, you should benefit from better timing and lower power characteristics. 3D stacking with TSV manufacturing capability has been demonstrated, the remaining issue staying the cost of this advanced solution. We can imagine than for high-end networking, server or storage application, the cost issue can be solved is the performance improvement justify higher pricing…

Integrating DDR4 for enterprise application can be an opportunity to greatly increase DRAM capacity in the system, in such a way that DRAM could replace part of HDD or SSD capacity, leading to the design of new systems offering 10x to 100x more performance. But we are dealing with the enterprise segment, which means that the system doesn’t fail (Reliability), if it fail, the system can continue after failure (Availability) and that it should be possible to diagnose system failure and even maintain the system without stopping it (Serviceability). In other words, RAS considerations should have been integrated during DDR4 specification and design.

At the device (DDR4 DRAM) level, using ECC or CRC techniques is the most efficient approach, even if it’s not the only one. For basic operation, using Hamming Codes allow SECDED protection (Single-Error-Correct, Double-Error-Detect), but for advanced operation you have to use Block-Based Codes for SxCyED protection (x-Error-Correct, y-Error-Detect). Implementing Block-Based codes for DRAM is equivalent to RAID for HDD.

These ECC codes are traditionally used for data, but an error occurring on command/address bus may not be acceptable in enterprise systems and the solution is to integrate bus parity detection and alert in DDR4 devices or DIMM command address bus.

Good design practices can also help increasing RAS, like implementing DBI in DDR4, limiting how many data bits can switch the same way at the same time. The physical result is visible as DBI limits the data eye shrinkage from SSO or crosstalk, offering significant timing margin gain in system timing budget.

DDR4 for enterprise application offers much more capacity and bandwidth than previous DDR and the RAS capabilities have been greatly enhanced, allowing DRAM penetration detrimental to HDD/SSD. Using much higher DRAM capacity has opened doors for higher performance server/storage applications.

From Eric Esteve from IPNEST

Blog post: https://blogs.synopsys.com/committedtomemory/2016/06/08/breaking-down-another-memory-wall/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+synopsys%2FptCJ+%28Committed+to+Memory%29

Webinar On-Demand: https://webinar.techonline.com/1878?keycode=CAA1CC


Linley Mobile and Wearable Conference Drills into Rapidly Evolving Markets

Linley Mobile and Wearable Conference Drills into Rapidly Evolving Markets
by Tom Simon on 08-04-2016 at 7:00 am

Last week the Linley conference on mobile and wearables started with an overview and keynote address by the event’s namesake Linley Gwennap. His talk offered a few surprises and was informative all around. As you have seen recently reported here on SemiWiki, he sees smartphone shipments continuing to rise, but with a declining growth rate. With penetration of smartphone handsets reaching 69% it is bound to level off. Nevertheless, there are estimated to have been 1.95 billion smartphones shipped by 2020.

However, even in the face of this growth there has been significant consolidation. Hopefully this is mostly over by now, but we have seen Intel, Marvell and Broadcom exit the mobile processor space, leaving just 3 big players – Qualcomm, MediaTek and Spreadtrum. At the same time a number of companies are opting to become verticals, and develop their own – Apple, Samsung and Huawei. Samsung and Huawei are also making their own cellular modems.

While we’ve seen Qualcomm lose some its luster over the last couple of years, they are still holding their own, owing in part to their use in high end phones, such as the Galaxy S7. The other three players are gaining on Qualcomm. Linley believes that the LTE price wars favor MediaTek. Spreadtrum is making its gains through the growth of low end smartphones.

Most processor chip vendors in the smartphone market opt to license specific component IP such as CPU, GPU and others for their SOC’s. There are some larger players who can afford and choose to develop their own IP. But this option is limited to only the top players in the market. Linley lists MediaTek, Spreadtrum, Rockchip and Allwinner as examples of houses that use all standard IP blocks. The players that use all standard IP, with the exception of one key IP element are Samsung, Apple and Huawei. In the last column we see Qualcomm and Intel incorporating most or all of their own IP in their processors.

Linley also spoke about the design issues revolving around the optimal number and size of processor cores. So called Big.Little designs use 8 or more cores, with half being larger cores and the other half being smaller cores. The larger number of big cores can create a dark silicon problem because they may generate too much heat. Linley believes that a 2+4 configuration with two large processors is a good tradeoff design for mainstream applications. It offers nearly double single thread performance relative to using all small cores. It also has the advantage of consuming much less die area. However, this configuration suffers in the market from the OEMs’ fixations on 8 as a magic number of cores. Alternatively, there are configurations that use 8 small cores. This layout runs much cooler and does well on benchmarks, but not surprisingly yields much lower single thread performance.

The larger displays found on newer smartphones are driving the need for more GPU cores. We are seeing resolutions in the range of 4-8 Mpixel. In the iPad Pro, which uses the AX9, there are 12 GPU cores. Adding more GPU’s allows the main processor to run at a lower clock rate, thus reducing power consumption. Another optimization to accommodate the higher bandwidth is the addition of wide buses. The AX9 uses a 128-bit DRAM bus to achieve 51GB/s.

Another significant change for GPU’s is the addition of shared virtual memory in OpenCL 2.0. No longer does data need to be copied between CPU memory and GPU memory. Cache coherency is becoming a critical asset for system performance. Linley pointed to ARM’s Mali-G71 as the first cache coherent GPU. Both Arteris and NetSpeed have offerings for Network on Chip interconnect supporting cache coherency that in turn provides the highest benefit for shared memory between GPU’s and CPU’s.

As no surprise Linley touched upon mobile security concerns during his keynote. Smartphones are a target rich environment for hackers. They contain a lot of attractive sensitive data: contacts, passwords, financial information, etc. Solving the security challenges for smartphones calls for both a hardware and software solution. On the hardware side secure boot, secure storage and advanced encryption are all necessary. Then the software must make full use of these features. The addition of biometric sensors can help reduce the likelihood of unauthorized access. Regardless, security will remain a very important issue in the mobile space for quite some time.

Stepping back from the handset side, Linley went on to discuss the progress in LTE. Carrier aggregation is now used across all tiers. This allows the combination of different bands to carry one data stream. In doing so, it linearly increases bandwidth. Within LTE we are seeing Category 9/10 data rates of up to 450 Mbps implemented at the high end. Later this year we can look for the first shipments of handsets with Category 16. This will bring data rates up to 1.0 Gbps – truly impressive for wireless. This is accomplished by using QAM256 and by taking advantage of unlicensed spectrum. The ability to add unlicensed spectrum means that more carriers will be able to offer gigabit data rates, even if they have limited licensed spectrum.

So where are we on 5G? Well, it’s on its way, and in some ways sooner than expected. Despite the formal plans to roll out 5G in 2020, some carriers like Verizon are looking at a selective launch of 5G as early as 2017. With 5G will come the use of new bands for higher data rates, increased carrier aggregation, and multiple simultaneous connections over multiple signals – such as small cell and macro cell and even WiFi. Verizon will probably pick a subset of the 5G technologies and leverage new signal bands to boost data rates. But with their “5G” roll out will come confusion about what 5G really is, similar to what happened when 4G rolled out early with features not compliant with the formal specification.

The second half of Linley’s talk covered wearables. Space does not permit going into that in this article. Overall the conference was very informative. The sessions in the two day conference delved much deeper into key issues such as security, on chip networks, hardware architecture, etc. For more information on upcoming Linley conferences follow this link.


One transistor for the future of mmWave?

One transistor for the future of mmWave?
by Don Dingee on 08-03-2016 at 4:00 pm

We’ve heard recently from several sources that millimeter wave radios, once the exclusive realm of defense and satellite use, are now finding homes in applications such as automotive radar and 5G networks. Therein lies a significant opportunity for digital design: moving frequency conversion and filtering from the analog domain into the digital domain. Continue reading “One transistor for the future of mmWave?”


At What Point Does Transistor Gate Length Stop Getting Smaller?

At What Point Does Transistor Gate Length Stop Getting Smaller?
by Daniel Payne on 08-03-2016 at 12:00 pm

When I started doing IC design back in 1978 we had 6,000 nm channel gate lengths, and today you can buy a smart phone with 16 nm or 14 nm technology, although the gate lengths in those phones are more like 34 nm. The International Technology Roadmap for Semiconductors (ITRS) makes predictions about emerging trends in our industry and they just released a chart showing transistor gate length stopping its typical shrinking trend in the year 2021:

Illustration: Erik Vrielink
Source: IEEE Spectrum

Notice how in just the two years from 2013 to 2015 that the ITRS increased their pessimism on the economics of ever shrinking transistor gate lengths. Does this mean that it’s impossible to build transistors with gate lengths shorter than 10nm? No, but it costs so much that you have to question why do it.

If it simply costs too much money to get smaller than 10 nm channel length, then what are semiconductor manufacturers going to be doing? There are lots of ideas, like adding 3D fabrication to add more transistor density or even changing the transistor orientation to vertical. Groups like the Semiconductor Industry Association (SIA) will collaborate with the Semiconductor Research Corporation (SRC) to list research priorities that could be used by industry or government programs. There’s even an IEEE initiative called Rebooting Computing that could provide some direction for how semiconductor technology can continue to add value.

The semiconductor roadmap from ITRS started back in 1998 and it really helped the equipment manufacturers focus on achieving milestones for the industry. Our industry had some 19 companies developing leading-edge fabs in 2001, however today we only have the big four: Intel, TSMC, Samsung and GLOBALFOUNDRIES. You won’t find these four competitors sharing much together about their detailed technology challenges and directions, but these companies do drive their equipment and material suppliers.

NAND Flash chips are a leading user of 3D structures as a means to increased density, and Samsung announced a 256 Gb 3D NAND flash memory in April 2016 that uses 48 memory layers.

FinFET transistors have been used for several years now, starting with Intel’s 22 nm TriGate devices where the transistor gate has three sides of a horizontal, fin-shaped channel where current is controlled. The ITRS roadmap predicts that a different type of transistor will surpass FinFET by using a lateral, gate-all-around device. Beyond the lateral device, the report predicts vertical transistors with pillars or nanowires that stand up on end. Even the silicon material used in the channel region will change to use III and V column materials like silicon germanium.

Smaller transistor sizes have not always been accompanied by faster chip performance in the same percentage expected because the wires used to connect the transistors have now become the dominant factor in determining speed. The IEEE even has their own roadmap called the International Roadmap for Devices and Systems (IRDS).

There’s an October event, the 1st International Reboot Computing Conference, and IRDS will be having a meeting to continue their roadmap efforts.

I’ve witnessed first-hand in my lifetime the transition from Bipolar to NMOS, NMOS to CMOS, and planar CMOS to FinFET, so look forward to the continuing saga of semiconductor creativity that battles to extend Moore’s Law to 10 nm gate lengths.


A Credible Player at the Power Table

A Credible Player at the Power Table
by Bernard Murphy on 08-03-2016 at 7:00 am

For a while it seemed like Mentor lived on the margins of the (RTL) design-for-power game. They had interesting micro-architectural optimization capabilities through their Calypto heritage but no real industry chops in power estimation, a must-have when you are claiming to reduce power. Better known offerings in RTL power estimation have dominated the landscape: PowerArtist, SpyGlass Power and more recently Joules.

(A quick sidebar to head off complaints from the emulation guys – there are excellent solutions for power estimation built around emulation. But that’s not what you need when you’re tweaking microarchitecture on a block; block designers most commonly use simulation to generate activity files.)

It now looks like Mentor has stepped up to the big table with their PowerPro solution. They have power estimation that looks comparable to the Cadence and Synopsys solutions (with one caveat) and they have further strengthened their tools for finding opportunities for optimization and (if you choose) implementing and verifying those changes automatically.

Accuracy in estimation is important first because you only want to implement proposed changes that give a big saving so you want to be sure the tool is in the ballpark when ranking candidates, and second because you don’t want your picks to wind up increasing power because implementation realities weren’t considered.

A large part of the art of getting this (reasonably) right is in acceptably modeling how the design will be physically synthesized. You want to do realistic tech mapping, ensure that datapath elements will map to efficient design macros, you want to synthesize a reasonable clock tree with reasonable buffering, you want to guide from UPF (or estimate) reasonable Vth distributions and you absolutely must fold in realistic physical data – wire lengths, clock tree topology and so on.


PowerPro seems to be covering most of these bases, at least to the extent Mentor are willing to share details. They use SPEF to import physical information, a realistic choice given the range of possible physical platforms. The downside is that this lacks tight coupling with physical design which might allow for further optimization. But there’s a potential upside for IP developers – you don’t want to tie optimization choices to just one implementation. Comparison between a few different implementations could be a good way to guide optimization for more general use.

In any case, tool vendors need to be careful not to over-engineer RTL power estimation. Mentor claims ~15% accuracy to gate-level based on their own correlation studies. This is exactly what everyone else claims. You may have read my comments elsewhere that these things are in a distribution. Perhaps 1σ is at 15%, but all solutions show outliers and it’s not clear any vendor has cracked making the distribution any tighter, so PowerPro seems to be right in the middle of the pack.

Next we get to an area where PowerPro may lead the pack – micro-architectural power optimization. This always starts by assuming first-order clock or memory enable gating constraints (or logic) has been correctly defined by the designer. Tools then look for what additional gating could be inserted to save power. These methods are formal, looking upstream and downstream of first-order gates for what additional constraints (or strengthening of existing constraints) those constraints logically imply. These cases can add up. I have heard of potential for up to 15% further saving, a large chunk of that coming especially from memory-gating.


Through Calypto functionality, Mentor is able to look many clock cycles forward and backwards in logic to find opportunities. Since looking this deep could generate an overwhelming number of suggested improvements, PowerPro does a cost-benefit tradeoff analysis, comparing power, area and timing to filter out just those suggestions that look optimal. A number of the power saving techniques are quite familiar – looking for redundant reads and writes on a memory for example. Some techniques sounded innovative (at least to me):

  • Redundant reset checks, for flops in a design which are not directly observable (like many low power suggestions, making this fix may not be a good idea for other reasons. But in some cases it might)
  • PowerPro is able to find logic for memory enable and light sleep pins – no need for labeling on your part
  • PowerPro will look for sequential data gating opportunities, eg an expensive operation like multiply in cases when the output is don’t care

All changes are formally-verified to be functionally equivalent, using Calypto’s well-proven SLEC.

There’s a nice interface for selecting fixes you want to implement, also to explore other power reduction options (reducing voltage in some areas for example), supported by macros that let you explore pre-canned options like DVFS.


PowerPro supports (user-controlled) automatic implementation of selected fixes in the RTL. A few years ago, I remember auto-fixing being looked on with deep disdain by most RTL designers (no tool is going to mess with my RTL). But it seems that’s changing; Mentor said they now see a 50-50 split. I guess that was going to come at some point; I’m just surprised how quickly the user-base is switching.

You might wonder why I opened this blog with a picture of cupcakes. Mentor served these at the lunch and learn. I was told they are very healthy, gluten-free and they were certainly attractive and tasty. Not so sure about the healthy part.

Overall, PowerPro looks like a pretty complete RTL estimation and optimization solution. You can learn more about the product HERE.

More articles by Bernard…


efabless: Think GitHub for ICs and IP

efabless: Think GitHub for ICs and IP
by Daniel Nenni on 08-02-2016 at 4:00 pm

For those of you who don’t know, GitHub is the crowdsourcing version of the defacto industry standard GIT source code management software. Currently, more than 14 million people have deposited more than 35 million software projects (mostly open-source) on GitHub making it the largest host of source code in the world.

Now think semiconductors. Imagine what could be done with an open crowdsourcing platform that dramatically reduces the cost and administrative barriers of semiconductor design and manufacturing. Sounds disruptive, right? Given the flat nature of the semiconductor industry I think disruption is a very good thing.

Do you remember how the fabless semiconductor transformation started 30 years ago? A pure-play foundry (TSMC) dramatically reduced the cost and administrative barriers of semiconductor manufacturing. Disruption is what made semiconductors the foundation of modern day life and disruption is what we need to maintain the cycle of semiconductor innovation that got us to where we are today, absolutely.

efabless corporation is the world’s first crowdsourcing platform for semiconductors. We harness the creativity of the community and dramatically reduce the cost and administrative barriers that have inhibited semiconductor innovation. In so doing, we create significant new markets for semiconductors and enable system companies to build better products and create new applications.

Another intriguing part of efabless is the people behind the company. The first name that stands out is Lucio Lanza. Lucio is the 2014 Phil Kaufman Award winner (recognizing excellence and vision in EDA) after spending his entire career in electronics. He started with Olivetti and Intel (with Phil Kaufman), then moved to EDA with Daisy Systems and Cadence, then IP with Artisan Components (ARM). Paul McLellan did a nice interview with Lucio HERE which is a must read for semiconductor professionals old and new.

The second intriguing name is Michael Wishart, efabless Chairman and CEO. Michael retired from Goldman Sachs after thirty years covering the technology industry as an investment banker. Michael is currently on the board of Cypress Semiconductor and before that he was on the Spansion board. The first question I asked Mike was why he is coming out of retirement. I already knew the answer but I wanted to hear it in his words. The answer of course is “disruption” and his rational matched up perfectly with mine.

The second question I asked was how are they going to monetize efabless. I was happy to hear it is a success based revenue sharing business model similar to how Artisan Components disrupted the semiconductor IP industry with their “Free IP Business Model” in 1998. As a competitor to Artisan at the time I can tell you this was a VERY disruptive move that transformed the fabless semiconductor ecosystem into what it is today, a force of nature.

Bottom line:
efabless provides community members with a robust design flow that they access without cost or NDA. efabless obfuscates from designers the underlying technology of foundries and thereby facilitates community access to foundry process technologies, again without the requirements of NDAs. The marketplace and community is inherently collaborative with proprietary and open IP and ICs that can be forked, customized, or improved by other community members to solve interesting problems and open new opportunities. Again, think GIT Hub for ICs and IP. You can join the efabless community HERE.


SEMICON West – Harry Levinson and Mike Lercel Interview

SEMICON West – Harry Levinson and Mike Lercel Interview
by Scotten Jones on 08-02-2016 at 12:00 pm

On Tuesday morning at SEMICON I had the opportunity to sit down with Harry Levinson, Sr. Director of Technology Research and Sr. Fellow at Global Foundries and Michael Lercel, Director of Strategic Marketing at ASML to discuss the state of lithography.

I opened the discussion with a question about how we are going to address lithography from 10nm down to 5nm.

Mike Lercel – two specific directions, control and edge placement, the second theme is how multiple patterning introduces sources of variability. Process simplicity with EUV is beneficial at both 7nm and 5nm. A lot of argon fluoride immersion multiple patterning will stay and EUV will be used for the most challenging layers.

Harry Levinson – from a chip maker’s perspective there has been a lot of concern about EUV maturity for 7nm. They are looking at 7nm as a node that can be done with optical.

I asked a question about Line Edge Roughness (LER) and how much it matters for cut masks. Harry noted that even for cuts you do care about LER, for contacts and vias you do care about regularity. Via or contact on line-end is one of the most critical applications so LER does matter. Mike noted that LER affects where the line ends.

Harry noted that the good news is if we introduce EUV at 7nm we aren’t pushing it too hard. People are working on understanding LER. Shot noise is at the top of the list and you can’t do much other than increase the dose. Photoresists also contributes to LER and you need to control it at a molecular level and even the building blocks of the polymer are important so we need smaller building blocks. Mike – some of the novel materials look interesting because metal and some of the others are different than what we have today.

I asked about smoothing to address LER. Mike said it is spatial frequency dependent, the high frequencies can be smoothed better than the low frequencies. Harry, there is definite potential. Smoothing contact holes is harder than line/space and pesky line ends.

Harry said the 7nm node could be done optically and depending on customer demand could be introduced early with optical and then EUV could come next in 2018.

Mike said ASML systems in the field are at 125 watts and about 85 wafers per hour (wph), ASML’s target is 125 wph at 250 watts. Harry noted that the throughput is based on ASML assumptions and manufacturers have different requirements in terms of fields, dose, etc. Harry went on to say they are struggling to have EUV equivalent to immersion triple patterning on cost and that 5nm will likely be defined how far you can push EUV and still have single patterning. Mike, a true shrink that requires 6 immersion layers versus 3 EUV layers is kind of a cost wash.

Harry, a 2.5nm overlay budget is really hard because you are dealing with angstroms. Mike went on to say that is why you need really good mix and match of EUV to immersion.

Harry said EUV at 7nm would really help because you could learn before you have to really push it.

Mike noted that there are 8 – 3300s out in the field running and generating a lot of cycles of learning. 445,000 wafers have been processed through the tools. Without EUV we could be looking at 100 mask layers in a logic technology slowing down cycles of learning, design verification and manufacturing cycle time. (Authors note, I commented that this really struck me at the Advanced Lithography Conference this year, that for the first time there are multiple EUV systems around the world running wafers in volume and that is what you need for learning).

Harry said we will see contacts and vias done first, then metal blocks. Mask defects are still a problem but contact/vias have a lot of space to cover the defects (dark field with small open area). He is concerned about metal masks and defects (light field with high open area). It would be very desirable if line/spaces could be EUV. At N7 metal is 3 masks but he would like to do a single EUV mask. Mike also pointed out that at N5 you could be looking at a grating and more than 2 block masks.

Harry said that for 7nm contacts/vias productivity is still the main issue, they need 250 watts robust in the field. I asked him if he had a 250-watt high uptime tool today could he do 7nm contact/vias and he said yes. For contacts you have local critical dimension uniformity (LCDU) the contact version of LER and they can hit the specs with high dose, the key is how far you can back off the dose without hitting yield. With respect to mask defects the ITRS specs were based on planar gates, today there is no rigorous metric but to get to metal layers you need lower mask blank defects. Mike agreed, with contacts dark field can cover defects, metal is light field and the can’t cover the defects.

Harry, once you have the power you need to see if there are mask and wafer heating issues. Mike, Samsung saw a mask blister at 40,000 wafers, it wasn’t that long ago that immersion hazing occurred on masks at 35,000 wafers.

I asked about pellicles and Mike said no new announcements today (Mike and Harry were both scheduled to present in a session after our interview). They have run 200 wafers at a customer tool and they continue to run it at a 40-watt level. They still need to improve the pellicle. Harry jumped in to say the pellicles don’t have the transmission we are used to plus may need a filter; we will lose at least 20% of the light.

In closing Harry said that the front end of line (FEOL) has lower density so mask blank defectivity is less of an issue. EUV could enable multiple gate lengths and Mike also noted eliminate multiple cut masks.


1-T SRAMs in high-density, portable applications

1-T SRAMs in high-density, portable applications
by Farzad Zarrinfar on 08-02-2016 at 7:00 am

For SoCs designed for various applications such as mobile, automotive, wearable computing, gaming, virtual reality, PC, imaging, security, and IOT applications, it is incredibly important to keep area (cost) and power as low as possible. Considering the growing percentage of chip area used for memory, it makes sense to choose the optimum memory IPs for each application. Among the memory IPs targeted for high-density consumer applications is the single-transistor dynamic random-access memory from Mentor Graphics (Novelics) called coolSRAM-1T.

You can read all about the coolSRAM-1T in Fundamentals of coolSRAM-1T Memory.

The one-transistor (1T) bit cell offers up to 50% reduction in core area for a given bit capacity compared to the more widely-used six-transistor (6T) bit cell. When your focus is on density over speed, the 1T architecture is an ideal choice. Figure 1 illustrates the density relationship between two embedded memory IP architectures.

But what about static power for always-on SOCs? A 6T SRAM uses an active driver to maintain data, so leakage power can be a concern in advanced process nodes. The coolSRAM-1T uses passive storage structures optimized for low leakage. To minimize subthreshold and junction leakage, the Mentor Graphics’ coolSRAM-1T dynamic memory cell utilizes the thick oxide or input/output (I/O) transistor option available in all advanced process nodes. The coolSRAM-1T is a nearly seamless replacement for existing SRAM-6T for lower leakage and chip area. It is also cost-effective since it can be implemented using a bulk-CMOS process with no additional mask steps.

The peripheral circuits for the coolSRAM-1T include (1) the sense amplifier and (2) the write-back circuit that restores the charge into the cell after a destructive read. To boost the signal in a given cell capacitor area, we operate the cell array at I/O voltage, which results in a larger signal for the sense amplifier and improves performance. The interface to the system is at the VDD (core) voltage. The signals must be level-shifted from one voltage domain to another as they travel from the memory interface to the cell array, and we offer three approaches for doing that that offer different tradeoffs depending on your needs.

The coolSRAM-1T is integrated into the Mentor Graphics MemQuest compiler, the web-based tool suite that lets you specify and implement custom memories. Compiled instances in the 160nm 1.8V/3.3V, 130nm 1.5V/3.3V, and 110nm 1.2V/3.3V technology nodes have been incorporated into customer products and are in volume production and is also silicon proven in 65nm technology. With the IP license comes documentation about the test flow, which includes three major steps:

[LIST=1]

  • Internally stress the instance to uncover any over-stress defects that could become failures in time.
  • Run the SRAM-style BIST (built-in-self-test) algorithm to check for defects or failing peripheral circuits.
  • Verify cell retention at higher temperature operation.

    The coolSRAM-1T embedded memory IP is the only silicon-proven single-transistor SRAM IP that can be implemented in bulk CMOS. For high-density consumer applications, it can lower the overall system cost and static power consumption by reducing the area and the number of external components.

    Learn more about Mentor Graphics coolSRAM-1T and the trade-offs between using the coolSRAM-1T and the coolSRAM-6T in this free Mentor whitepaper Fundamentals of coolSRAM-1T Memory.


  • Filling out the rest of the mobile device

    Filling out the rest of the mobile device
    by Don Dingee on 08-01-2016 at 4:00 pm

    We spend an inordinate amount of energy tracking the big chip – the application processor – in a mobile device. As we’ve seen this space is coming down to a handful of players. A more interesting competition is heating up around the APU for the rest of chips needed to make a phone. Continue reading “Filling out the rest of the mobile device”


    Foundry Technology Packaging Solutions

    Foundry Technology Packaging Solutions
    by Tom Dillinger on 08-01-2016 at 12:00 pm

    A significant shift is underway in the fabless semiconductor business model. As the application markets have become more diverse (and more cost-sensitive), product requirements have necessitated a new focus on multi-die packaging technology.
    Continue reading “Foundry Technology Packaging Solutions”