It was the final day of DAC56 and my head was already spinning from information overload after meeting so many people and hearing so many presentations, but I knew that IC functional verification was a huge topic and looming bottleneck for many SoC design teams, so I made a last-minute email request to attend a luncheon panel discussion featuring some tier-one companies, like:
Fellow-Oregonian Brian Bailey was the panel moderator, and he did a fine job keeping the discussion focused and moving along. Brian started out by noting that 20 years ago the IC world was simpler with Verilog being used, code coverage goals as metrics and RTL entry, however today there are many languages to consider, plenty of EDA tools to choose from, formal techniques, emulation becoming common, and new standards like PSS.
Q: What does verification throughput mean to you?
Tran – Yes, we do have an infinite verification challenge. Each of our design cycles is for about 9-10 months, and this time frame doesn’t change much each new project. How efficiently can I verify, are these new cycles finding any new bugs? A faster speed of verification helps us, but uncovering more bugs is what we really need.
Raju – we need a more holistic picture from design verification to post-silicon. There are issues like build, resource utilization, run speed, and time to root cause. How fast can this HW fix get in (Build)? How fast are jobs launched? Are testbenches running effectively? For debug time, we want to know how fast can I find and fix a bug. These four verticals are the focus of our work.
Dale – how many jobs can I submit into emulation, iterate and debug, finding RTL bugs? Is build, run, testbench, AI scripting to triage any failures. We don’t have QA engineers wait for jobs to finish, so we need to make their time more productive.
Paul – you can only optimize what you can measure. Raw throughput metrics like time to see waveforms are important. Higher order metrics are time to root cause of bugs. We do have a cockpit to pull all of these analytics together in one place.
Q: How do different SoC designs use verification?
Dale – for GPU verification and development we have functional goals like RTL health, GPU performance, pre-validate GPU with drivers, where each team has their own test bench. Regression and debug tasks have different test benches.
Raju – we need our chips to run across multiple OSes, meeting the requirements.
Tran – across multiple products we like to reuse some parts of test benches to improve throughput.
Paul – there’s raw performance and also root cause rates, but at different levels of abstraction (transistor, gate, RTL, OS) we need to optimize throughput, then pick the right tool for the job.
Dale – as a verification engineer we just pick up the tools and use it, but didn’t think about how the tools work. Collaborating with the Paladium team I’ve learned of different approaches to get the best throughput.
Q: How do you assess each of the verification tools to use them in the right tasks?
Raju – collaborating with the EDA vendor and knowing their product roadmap helps us plan which vendor to work with.
Tran – we need help to understand the verification data that we are generating from the tools. Would like to use some ML to help us analyze our test data better.
Paul – formal is a great example that complements dynamic verification approaches. What are the test payloads gong through?
Q: Is time being spent in debug getting worse?
Raju – yes, Verification tasks and use is increasing dramatically. Debug techniques need to improve, so more automation is required. We can build more common debug approaches, and we can use ML and AI to help pinpoint verification bottlenecks.
Tran – debugging a system of systems, like verifying an autonomous vehicle is a big, new challenge. We have a lot of known unknowns to verify, it’s very complex for us to achieve.
Paul – the opportunity to use AI in the debug process is ripe, how can we guide the human to look in the best debug areas?
Dale – we have metrics to assess how verification tools help throughput, but engineers need to know tool limitations in order to be most efficient in test bench generation. Generating traces for 1 million cycles takes one day of run time, so use a different approach to find and fix the bugs.
Q: Maxim – we use VIP and metric-driven verification approaches. But our designers and verification engineers have a different understanding of the same specs. Can you help our teams capture the specs correctly?
Raju – that’s a fantastic problem statement, because stale documentation causes differences between design and verification engineers. We’re trying to have standardized documentation requirements with frequent sign-off criteria, keeping specs up to date as design changes occur. Using PSS is going to help us document all of our requirements better.
Dale – making sure test and design specs meet the overall specs is important. Finding mistakes in interpretation of specs is important.
Paul – smart linting can catch mistakes earlier, some VIP can provide 100% coverage of known specifications.
Q: How much metric driven verification do you use?
Raju – we use functional and code coverage metrics in all verification flows to get signoff points. We have legacy coverage goals, and need to be smarter about finding and removing redundant testing.
Paul – coverage driven verification methodology is our goal (Formal, Simulation, Emulation), with a single dashboard.
Q: Are you using the right metric for your verification goals?
Raju – how can we improve our verification coverage with each tool: Formal, Simulatio, Emulation.
Q: Is there a standard way to use AI and ML, sharing across a verification environment?
Tran – AI is something very new, so we’re still learning how to use it during verification, trying to get a better understanding of our test data. We store and plot our test coverage metrics, but there’s no standard process that AI would automate.
Paul – we see lots of test data gathering going on now, and Cadence has a method to collect it, but there’s no industry standard out there for data gathering. What do you want us to do with this test data? How can we use this data improve our test goals?
Q: Is the cloud gong to help us in collecting data and applying AI for improving test?
Paul – you can do analytics anywhere.
Tran – the cloud is just a technique, the reasoning is the important point.
Paul – yes, the cloud will ensure that we gather more data.
Q: Software in our systems is a large part of SOCs, how does that affect verification?
Raju – having verification drivers is important, getting to SW debug we often use FPGA for prototyping.
Dale – to get Android and Linux booting, we need prototyping sign-off to reach verification goals. Bugs happen between SW, HW, firmware and RTL, so we need emulation to reach our tape out goals.
Tran – to verify SW and OS we use FPGAs for prototyping, but SW verification has a lot of room for improvement.
Dale – SW developers start with virtual debugging, then eventually HW prototyping.
Paul – SW bring up is very expensive, so pre-silicon SW bring up is the goal.
The panelists were uniform in their replies on the topic of optimizing verification throughput, and they have an established approach to verification that now includes: formal methods, emulation, FPGA prototyping, PSS, and even ML to help wade through so many logfile results. Successful EDA vendors will continue to help automate more verification tasks in order to equip engineers in finding more bugs, quicker.