Because you can never have too much to worry about in verification, reset domain crossings (RDCs) are another hazard lying in wait to derail your design. Which hardly seems fair. We like to think of resets as dependable anchors to get us back on track when all else fails, but it seems their dependability is not absolute, especially in modern designs.
We all know about clock domain crossings (CDCs), a problem that has been amplified by the integration onto SoCs of multiple interface standards along with high-performance compute engines, each needing to support different clock speeds. Signals passing between different clock domains on these devices are at risk of lock-up through metastability and/or loss of data. Finding and correcting these potential problems takes careful analysis.
Reset domains have also been with us for a while, but have especially proliferated in SoC design and design for low-power where, in additional to standard blanket resets like POR and software reset, we now find an abundance of local reset options, controlled at the IP or functional domain level. In the spirit of providing maximum controllability over power saving, IPs may have separate reset inputs for hard reset, soft reset, reset preserving retention registers and other options. At the system-level, application of reset has become more complex, requiring that application or release of reset be sequenced between functions; release on many blocks must wait at least until the controlling CPU has booted to ensure that startup from reset in those downstream blocks is well-controlled.
But this complexity is not the root-cause of RDCs, which start with asynchronous resets. The complexity, along with realities of multi-sourced IP design, simply makes RDCs harder to anticipate and isolate. An “ideal” way to fix the problem might be to forbid use of asynchronous reset. A lot has been written on the relative merits of synchronous versus asynchronous reset. Without getting into that debate, it is enough to observe that any place you need to ensure a reset where you may not yet have a clock (e.g. in the presence of gated clocks or switchable power domains) requires an asynchronous control to ensure the reset is applied. Then there’s the multi-sourced IP; you may have dominion over reset practices in your own IP, but you can’t control how other IP suppliers choose to reset. So RDCs can’t be banished – you have to learn to deal with them.
There are several different ways in which an RDC hazard can be created. One simple case that can occur is a path crossing between two flops, quite possibly using the same clock, where the first flop is asynchronously gated by RST_A and the second is asynchronously gated by RST_B. If RST_A and RST_B are not related, this becomes an asynchronous crossing and there is risk the second flop can become metastable or may sample incorrect data.
Another case has a reset synchronized in clock domain 1 but used asynchronously in clock domain 2. Because the reset is not synchronized to the second clock domain, again there is a metastability and/or incorrect data sampling hazard.
Even if you carefully generate reset signals synchronous to the domain clock and you’re not crossing between clock domains, you aren’t necessarily off the hook. In the example above, where both domains even use the same clock, there is still an RDC hazard because the path marked in red is not timed in STA and crosses between two potentially asynchronous reset domains.
Problems of this nature can be particularly dangerous for configuration registers, which are often exempt from warm-resets to speed-up recovery after reset. If an upstream warm-reset is applied while the configuration register is being written, an async crossing can corrupt the contents. A similar problem can occur in drivers for memory controller logic. If signals like chip select and write-enable can be asynchronously reset, again you may have a hazard.
There are plenty of other examples, distinguished more by the varieties of havoc they can wreak on the correct operation of your design, than by differences in root-cause. Correction is often not difficult, through more careful selection of resets and use of reset synchronizers. The real challenge here is in finding potential hazards scattered across large SoC designs. That’s where a tool like Meridian RDC from Realntent can help.
You can learn more about finding and correcting RDC hazards by registering for the RealIntent white-paper.
Share this post via:
Next Generation of Systems Design at Siemens