I will address this topic with two blog posts: validation (i.e. post silicon) — Part 1, and verification (pre-silicon) — Part 2 (coming soon!). In this blog post, I will focus on validation.
One of the upsides of using catalog chips that have been in the market for a long time and have ramped in substantial volumes is that other system companies already found a lot of the bugs . The chip supplier has had an opportunity to fix them, screen or calibrate defective parts at automated testing (ATE), withdraw the chip from the market, or at least warn new users about bugs with an ERRATA. Your system may be different, and you may still get bitten by some bug that is exposed due to your unique operational conditions. But generally catalog parts after they have ramped for some time in volume provide a certain herd immunity to the system companies that use them.
Unfortunately, catalog chips are also going to cost you significantly more than a custom silicon chip in larger volumes, the footprint will be significantly bigger considering all components needed, and will also result in a more inflexible set of options for the system designers. But when you go the custom silicon route YOU are the first and possibly the ONLY user for this chip. So how do you prevent silicon bugs getting into your system?
First, let’s talk about the risks:
-
Peanut butter engineering at your chip supplier.
This refers to the reality that your chip supplier is in the business of making as many chips as they can in as short a period of time as they can with a fixed amount of resources. Strong engineering culture at your supplier is a mitigation in general. But given the commercial pressures chip companies are under they need to deliver revenue. And the pressure from management is generally to produce more with the same engineering resources; to spread the peanut butter so to speak over all the chips they are working on. So how does peanut butter engineering manifest itself in real life during validation?
Here are some ways:
- Very liberal (i.e. watered down) interpretations of JEDEC standards.
- Over leveraging previous chip data. Examples are; to qualify by similarity (QBS) more than they should, using old chip data to reduce how much is validated in a new chip under the concept that it’s “the same circuit–even though layout is different and it’s not exactly the same circuit or surrounded by the same circuits/noise”, too small sample quantity bench validation, no corner material ATE testing, etc…
- Unroot caused ECOs: shotgun engineering and spec limit opening to get the silicon out the door without knowing why something is not behaving as expected according to worst case simulations.
-
Automated test equipment (ATE) program changes over the lifetime of the product.
J-STD-46 standard defines some of the reasons why a chip supplier must inform customers that a major change has been made provided that they have purchased components up to 2 years prior, and with whom there are various possible contractual obligations. In annex A of the J-STD-46 under datasheet changes, it is stated that the “Elimination of final electrical measurement or burn-in (if specifically stated in the datasheet as being performed)” is listed as an example of a major change that requires a PCN (product or process change notice) to be issued. However, most datasheets that I’ve seen don’t explicitly state what is ATE tested and what isn’t. So the supplier absent some other agreement with the system company will not issue a PCN for a test program change. In my experience, suppliers remove tests over the lifetime of the chip without issuing a PCN to respond to market pressures to reduce the price of the product while trying to maintain profit margins. This means they lower costs by removing tests based on “historical data” for that chip. You may get a lot that is very different from historical data and that can cause a problem that is undetected at ATE because the tests have been removed.
-
Supplier development teams in different business units (BUs) don’t necessarily have the same engineering methodologies and same rigor.
You may have a good relationship and good working knowledge of how one team at a chip supplier works, and with which you’re very satisfied. But when it comes to validation and verification, each team tends to do its own thing. Sometimes justifiably so depending on what products they make. But many times simply because the teams have different engineering cultures, and there may not be a truly unified methodology in practice for all of the supplier’s development teams. It often happens that smaller chip companies are acquired through time by larger ones, and they continue on doing things the way they have always done them. Which may have been great 10 or 20 years ago but may no longer be so great. So every time you start working with a new chip dev team at one of your known suppliers, you need to raise your guard and check as if this is a new supplier you’ve never worked with before.
All of these risks have mitigation, which are fairly straightforward to implement as long as the chip supplier is cooperative and the system company has the right specialists on its side.
The following are the main mitigation to the risks above:
-
Contracts. This is not legal advice, and I am not a lawyer, so please make sure to seek legal counsel to draft good contracts.
Require in your contracts with custom silicon suppliers the following:
- Establish your PCN requirements in your supplier agreement. Make sure to cover changes to the ATE program. This can get messy though since these are changed often by suppliers. They’d have to issue a new datasheet now showing what parameters are no longer covered by the datasheet assuming you implement point (2) below.
- Require the supplier to provide you a datasheet that describes how every parameter will be verified/guaranteed. Usually, the main ways to guarantee a parameter in a datasheet is by design/simulation, by bench evaluation (30 or more units), by corner lot evaluation, by ATE, by qualification testing (JESD47, JESD22, and JESD17) or by a combination of the above. Notice that by asking for the datasheet to show how each parameter is validated/guaranteed (including ATE) is going to require the supplier now to issue a PCN whenever the test program is changed to reduce coverage according to J-STD-46.
- Require the supplier to provide you with the validation reports used to guarantee the parameters in the datasheet. This is important since you want to make sure to see that the data was collected, for how many units, and what is the CPK and confidence interval for the data. The supplier should provide you with detailed bench testing reports for all the blocks and interfaces, detailed 30 unit or more bench evaluation reports for all the parameters guaranteed in the datasheet by bench evaluation and for whichever ones it is possible to char that are guaranteed by design, and a detailed report showing corner samples at a minimum passing all parameters guaranteed by ATE in the datasheet.
-
Review all validation reports in detail.
Check the following in the reports:
- Look at the plots to check that truly the points of data add up to the sample size the supplier is saying they took to calculate CPK, and also to check that they actually took the data for your chip and didn’t just re-use old data from a different chip.
- Check what tests are failing for corner parts. Do you care about that parameter that is failing for corner parts? Is it one of the corners that broke your system tests during the corner build? Chip suppliers may simply say not to worry, that corners don’t really happen in real life, or any other type of justification to do nothing about the issue. They could do this because they have many other chips on their work list to worry about. So your chip’s corner fails are not their worry. But when the system engineer has the corner sample test fixture results he may realize that the corner parameter that is failing ATE at the supplier is also correlated to the tests he is seeing issues with at the system factory tester for the corner build data. So while those corner units are being filtered out by ATE at the supplier, if later in the chip’s lifetime that ATE test is removed to cost down the chip testing you will now start seeing those corner units making it into the system builds and showing up as DPPM issues. So bottom line, the chip supplier needs to provide these reports with data plots using the same parameter naming as in the datasheet so that the system engineers can check against their factory test fixture data, and flag what ATE tests should never be removed by the chip supplier.
- Check the CPK for the bench validation data. Parameters in the datasheet that are guaranteed by bench validation or guaranteed by design are NOT ATE tested, which means whatever data was taken by the supplier to generate those reports is the data that will forever guarantee those parameters. Is the sample size big enough? How many lots of units did the supplier use for the bench evaluation? Do the data plots look bi-modal?
- Check the qual report to make sure the chip supplier is following the JEDEC standards previously mentioned.
- Complete a correlation between your system factory test fixture and the chip supplier’s ATE tests for system critical parameters. I’ve worked on a lot of PMICs and one test that is a classic for this is end of charge. Since once you enter the final phase of charging in voltage mode the final time to 100% is heavily influenced by the resistance in the system charge path which is system specific. There are many different types of test correlations that could be applicable to your system.
-
ECO reviews.
When the chip is evaluated, some bugs may be found. It is critical that the system company has chip specialists helping to review these ECOs proposed by the chip supplier to make sure that proper root causing is completed by the chip supplier. Chip specialists that may be needed depending on what types of bugs are found are: DV and AMS verification engineers, analog chip designers, digital chip designers, RF chip designers, package engineers, foundry engineers, and others. Chip suppliers are sometimes under pressure to tape out quick fixes, or simply hand waive away issues and ask for spec limit changes. This is a very serious danger to your custom chip program as you can end up in a run-break-fix cycle with multiple tape outs due to bad or incomplete root causing which will put your system schedule at risk. There are many tools available to debug chip issues such as FIBs, and other FA techniques. You must check that proper methods are being used to root cause your chip’s bugs and not accept incomplete root causes for your ECOs. This is why it is vital that your system company has chip technical experts on your side to ensure your project is not the one where the chip supplier spreads resources thin and you end up getting peanut butter engineering adding risk to your system launch.
-
Request corner samples for the custom chip, and build some of your systems with them.
Not all suppliers, especially the ones that have their own internal fabs will want to do this without some pushing from the system company side. But it’s really the only way you have to check if your system will have any issues when you ramp into higher volumes than what you are building at your EVT and DVT builds. Usually you can build 100 samples of each corner and see what the CPK looks like for your factory tests with those units. On a mostly digital chip, using the slow NMOS slow PMOS (SS), fast NMOS slow PMOS (FS), slow NMOS fast PMOS (SF), and fast NMOS fast PMOS (FF) will give you good insights into whether you will have a problem. As you know if the CPK is 1 you have a problem, so you want to see CPK of 1.33 or better at your factory tests with corner samples. This check is part of the validation phase of the custom silicon dev process we run at customsilicon.com.
Some suppliers will tell you that corner units will be filtered out at ATE to try to avoid having to provide the corner samples. But that is not a valid argument because many parameters in a datasheet are not ATE tested, and even the ones that are may not be tested in the future as the supplier starts to remove testing over the lifetime of the product. So you need to know if your system will be sensitive to process corners using system factory testers before you ramp to mass production. If you do all your builds with mostly nominal material you may not see an issue. But once you hit volume you will start seeing DPPM issues if your system is sensitive to some of the chip supplier process corners. It’s better to catch this early and either spin the chip to fix them, change OTP trim, change the ATE to fix the issue by calibration or filter them out at the chip supplier’s ATE so you never receive the parts. Parameters that you find are critical like this should be highlighted to the supplier as a “never remove ATE testing” parameter.
-
Trust, but verify.
Custom system silicon when done with the assistance of silicon experts puts the system company in control of its own destiny. It’s important to note that when purchasing catalog parts for your system, unless you perform similar due diligence to what is described above, you’re trusting but not verifying that your components will be of good quality and not likely to cause yield or other issues when you go to production in high volumes.
For more information contact us.
Share this post via:
TSMC 16th OIP Ecosystem Forum First Thoughts