Often as we move down the process node treadmill, new challenges appear that we didn’t really have to worry about before. Often, these challenges require addressing at a number of different levels: the process, the cell libraries, the design, the EDA tools that we use.
One well known example is the problem of metal migration. If a current is too high through a metal wire that is too narrow then the current actually moves the metallic atoms creating a narrower neck, which is a positive feedback that makes the problem worse. Eventually the metal opens completely and the chip fails. We address this at many levels. At the process we design metal to be able to carry a high current (except in DRAMs where we do everything we can to keep the metal cost down). At the design level we need to make sure that we do current analysis. We need EDA tools to perform the analysis and allow us to address hot spots. Each process node typically makes the problem worse. For example, at 20nm, did you know that a large buffer is no longer in spec if it drives minimum width metal?
Another problem like this that is starting to become a real issue is soft errors caused by radiation leading to single event effects (SEE). SEE cause unpredictable system behavior and threaten safety and reliability. No surprise that this threat is increasing with smaller geometries. SEE generally occur from nuclear decay of packaging materials or atmospheric particles accelerated towards the earth by cosmic rays.
The problem needs to be addressed at multiple levels like the metal migration issue. The materials used in manufacture need to be analyzed, not just in the fab but also packaging material, bumps, solder. But we live on a radioactive planet that is bombarded with cosmic rays, so even with the best materials there is still a risk of SEE. How big a risk is affected by design of the cells (flops and memories that can be flipped into the wrong state) and by the layout of the design itself.
Just as with metal migration, which we can accelerate by raising the temperature, we can analyze product by putting it in a more radioactive environment. However, while that is great for in-depth reliability analysis, it is pretty useless for a real design where we need tools to analyze the problem before tapeout and manufacture, when we can still do something about it.
IROC Technologies is the leader in this space. They do everything from working with foundries such as TSMC and Global Foundries to analyze the whole manufacturing process, to working with fabless companies such as Qualcomm, Broadcom, Cisco, Rambus, Xilinx to help them determine whether or not they have problems and how to address them. Intel and IBM have in-house groups to do all this, but the fabless ecosystem relies pretty much on IROC for expertise in this area.
IROC can do the radiation testing and alpha particle counting. On the tool side, they have two products:
- TFIT, a simulation tool that predicts quickly and accurately the failure rate (FIT) of cells designed for specific foundry’s technologies.
- SOCFIT, a tool that predicts quickly and accurately the failure rate (FIT) and various derating factors of ASICs and SoCs, using either RTL or gate-level netlist
If you are a designer, especially if your designs go into products that require high reliability (medical, automotive, internet infrastructure etc) then you need to start to worry about the possibility of SEE. And memory cells are now so small that a single particle can affect more than one bit so it is critical to understand how the adjacency of different bits interacts with whatever ECC is being used to correct errors. The end customers (automotive companies, cloud infrastructure companies, router and base-station companies etc) will start to have specifications for SEE reliability which will then get driven down into the supply chain.
IROC Technologies website is here.Share this post via: