Early in my so-called EE career, I sat in a workshop led by the director of quality for the Ford truck plant in Louisville, KY, where “Quality is Job #1.” At that time, they were gaining experience in electronic control modules (ECMs) for fuel efficiency and emissions control. Who better to transfer the secrets of Crosby and Deming to a bunch of missile designers?
After one of the sessions, he opened up to a Q&A. One of my colleagues asked him if these methods really worked in improving quality. His response was that quality is a moving target, and that is why we have jobs. He said that even with all the progress Ford had made, he would go down to the end of the F-150 production line and “watch trucks drive away just to convince myself we built stuff that actually worked.” Fortunately, most of what they put out did work.
That sounds funny, until you realize the same thing is probably happening to you right now. Even with all the progress, the process of FPGA and SoC design and verification is still laden with ample opportunity for defects. IP, both hardware and software, has become massively complex. We have moved from board-level design to chip-level design, with Gary Smith citing “average high-end” SoC designs now at 134 million gates.
Software has gone from what Grady Campbell called “a minor element of systems [where]needs were modestly conceived and easily understood” to what he now terms as software-intensive systems. He also points out that software defining systems must be regularly modified to stay competitive – see Android and iOS. We hear violent complaints when an older device isn’t upgradable, yet there is a lot of moaning when an upgrade exposes new problems.
Debugging has become far more continuous in this environment. Your own IP, other people’s IP, IP that works with this but not with that for no obvious reason, IP that you thought was perfectly fine in the previous device with previous software – it is all suspect, every day. Debugging is now a continuous process of maintaining quality by agile vigilance.
Vigilance requires visibility. Yet, we have a hard time with justifying and locating test points to create visibility. OMG, we can’t afford another I/O pin just to monitor that. We’d like to see more trace depth, but we’re already short on block RAM. We can’t afford to figure out the instrumentation IP and external analyzer for at-speed testing – we’ve simulated the IP blocks, right? We don’t need a test point there, we’ve never seen a problem in that block.
When I reread the case study “Dini Group Turns to Tektronix Certus to Tackle Daunting FPGA
Prototype Challenges”, all that came to mind. Neil Palmer of Dini talks about the time-savings in a lot of areas, like not being able to throw enough software instructions at a complex piece of third-party IP in a simulation to find the problem – at least, in the time he has left in his career, which will be considerably shorter if he doesn’t find a major problem.
Palmer’s story revolves around only 5500 test points using Certus, and the savings from not having to be limited to fewer with other FPGA tools and having to engineer their debug IP to be able to see inside. Not only does Certus lift the limitations and allow better visibility with 100k or more test points placed automatically in a design, it enables deep trace capability using on-chip instrumentation combined with compression without eating up a lot of valuable block RAM.
Quality is not free, no matter what Crosby said. What he meant was if you don’t deal with it, quality (or the lack thereof) gets really expensive. Time can cost a lot more than hardware and software, especially when entire teams are tied up in a problem. Don’t scrimp on the test points, and think about debug as a continuous process instead of a project milestone you check off.
Share this post via:
The Intel Common Platform Foundry Alliance