I’m always curious to learn what might be new in clock domain crossing (CDC) verification, having dabbled in this area in my past. It’s an arcane but important field, the sort of thing that if missed can put you out of business, but otherwise only a limited number of people want to think about it to any depth.
The core issue is something called metastability and arises in systems which must intermingle multiple clock frequencies – which is pretty much any kind of system today. CPUs run at one frequency, interfaces to external IOs run at a whole galaxy of different frequencies, AI accelerators maybe another frequency. Clockwise, our systems are all over the map.
When data is exchanged between these different domains, metastability gremlins can emerge, random chances that individual bits can be dropped or delayed, neither quite making it through the gate to the other side nor not making it. Bitwise there are solutions to this problem, metastability hardened gates (actually registers), though these are also statistical in their ability to limit problems. They’re better than crossings that aren’t hardened, but still not perfect, because this is engineering where perfect is never possible.
Still, if you improve matters to the point that the design meets some acceptable time between failures, everything should be OK, right?
Afraid not. There’s a problem in CDC called convergence. You have two independent signals from one clock domain, crossing into another. Each separately passes through a metastability hardened gate. They later combine in some calculation in the new domain – maybe “are these signals equal?”. This could be multiple clock cycles later.
Now you may (again statistically) hit a new problem. Metastability hardening ensures (statistically) that a signal gets through or doesn’t get through – none of this “partly getting through”. But in doing that, what emerges on the other side is not always faithful to what went in. It might be delayed or even dropped. Or not –accurately reflecting what went in is also an option.
So when you recombine two signals, separately gated like this you can’t be sure they are fully in-sync with the way they were on the other side of the gates. On the input side they might have been equal, but when they’re recombined, they’re not. Or at least not initially; maybe they become equal if you wait for a few cycles. At least as long at the inputs on the other side didn’t change in the meantime.
In VC SpyGlass we’d do a static analysis complemented by some level of formal analysis to try to catch these cases. That isn’t a bad approach as long as re-combination happens within one cycle. But who’s to say such a problem may not crop up after many cycles? Try to trace this using formal methods and you run into the usual problem – analysis explodes exponentially.
The better method, now becoming more common, is a combination of static and dynamic analysis. Use static CDC analysis to find crossings and recombination suspects, then use dynamic analysis to test these unambiguously, at least to the extent that you can cover them.
Synopsys now provides a flow for this, combining VC SpyGlass and VCS analysis. This is a refinement of a commonly used technique called a jitter injection flow, a method to simulate these random offsets. That method randomly injects random delays into the simulation when data input to a gate changes.
There are some technical challenges with the standard injection method – you should watch the webinar for more detail. Synopsys say they have made improvements around these limitations. An important challenge that jumped out at me is that there is no obvious way to quantify coverage in that approach. How do you know when you’ve done enough testing?
Himanshu Bhatt (Sr Mgr AE at Synopsys) explains in the webinar how they have improved on traditional jitter injection testing and also on the coverage question and debug facilities they provide to trace back problems to metastability root causes. You can register to watch the webinar HERE.