Technology trends demand netlist-level CDC verification
Complex processes and aggressive synthesis interventions are increasing the risks of metastability, creating a need for netlist-level CDC verification
Multiple asynchronous clocks are a fact of life on today’s SoC. Individual blocks have to run at different speeds so they can handle different functional and power payloads efficiently, and the ability to split clock domains across the SoC has become a key part of timing-closure processes, isolating clock domains to subsections of the device within which traditional skew-control can still be used.
As a result, clock domain crossing (CDC) verification is required to ensure logic signals can pass between regions controlled by different clocks without being missed or causing metastability. Traditionally, CDC verification has been carried out on RTL descriptions on the basis that appropriate directives inserted in the RTL will ensure reliable data synchronizers are inserted into the netlist by synthesis. But a number of factors are coming together that demand a re-evaluation of this assumption.
A combination of process technology trends and increased intervention by synthesis tools in logic generation, both intended to improve power efficiency, is leading to a situation in which a design that is considered CDC-clean at RTL can fail in operation. Implementation tools can fail to take CDC into account and unwittingly increase the chances of metastability.
Various synthesis features and post-synthesis tools will insert logic cells that, if used in the path of a CDC, conflict with the assumptions made by formal analysis during RTL verification. Test synthesis will, for example, insert additional registers to enable inspection of logic paths through JTAG. Low-power design introduces further issues through the application of increasingly fine-grained clock gating. The registers and combinatorial cells these tools introduce can disrupt the proper operation of synchronization cells inserted into the RTL.
The key issue is that all clock-domain crossings involve, by their nature, asynchronous logic and one of the hazards of asynchronous logic is metastability. Any flip-flop can be rendered metastable. If its data input is toggled at the same time as the sampling edge of the clock, the register is likely to fail to capture the correct input but instead become metastable. The state of the capturing flop may not settle by the end of the current clock period, and so presents a high chance of feeding the wrong value to downstream logic (Fig 1).
Figure 1 When data is still changing as a clock changes, the output can become metastable
The risk of metastability with asynchronous logic is always present. Designers can ensure that their designs are unlikely to experience a problem from metastability by increasing the mean time between failure (MTBF) of each synchronizer.
Equation 1 The governing equation of MTBF
The MTBF varies with the settling time of the signal, the time window over which data is expected to settle to a known state, the clock frequency, the data frequency, and the resolution time-constant for the synchronizer, written as τ (tau). The parameter τ depends primarily on the capacitance of the first flip-flop in the synchronizer, divided by its transconductance. MTBF exhibits an exponential dependence on τ as it is proportional to e1/τ. The value of τ tends to vary with both process technology and operating temperature because that affects drain current, which, in turn, affects transconductance. The MTBF can drop many orders of magnitude at temperature extremes, making a failure far more likely.
Technology evolution has generally improved τ, making it less significant as a parameter over the past decade or more, but the property is beginning to become significant again in more advanced nodes because of the failure of some device parameters to scale.
Designs that would probably not have experienced failure before are now at risk of suffering from metastability issues. Coupled with the need for higher performance, MTBF for CDC situations needs to be monitored carefully. Automatically inserted logic can introduce problems for the synchronizer, because register depth and organization affects MTBF. Tools need to be able to take these effects into account if they are to insert cells that reduce the probability of metastability. Further, logic inserted ahead of the synchronizer can introduce glitches that are mistakenly captured as data by the receiver in the other clock domain. Therefore information about the implementation is vital to guarantee performance during CDC checks. The following examples show some of the situations that can arise due to logic insertion by implementation tools.
Example implementation errors
Implementation tools can introduce a number of potential hazards by failing to take CDC into account. Additional registers inserted by test synthesis, for example, can result in glitches on clock lines that can lead to an increased probability of mis-timing issues (Fig 2).
Figure 2 The addition of test logic post-synthesis can make mis-timing more likely
Clock-gating cells inserted by synthesis tools to reduce switching power may also be incompatible with a good CDC strategy. A combinatorial cell such as an AND gate that follows the register intended to pass a clock signal across the boundary to drive the receiving registers is more likely to experience glitches (Fig 3).
Figure 3 Clock-gating logic may be susceptible to glitches
Timing optimization can result in significant changes in logic organization. The optimizer may choose to clone flops so that the path following each flop has a lower capacitance to drive, which should improve performance. If the flops being cloned form part of a synchronizer, this can result in CDC problems. A better way of handling the situation is to synchronise the signal first, and then to duplicate the logic beyond the receiving synchronizer (Fig 4).
Figure 4 The introduction of additional flops in parallel to help meet timing can increase the probability of metastability and create correlation issues
The introduction of test logic may even result in the splitting of two flops intended for synchronization. In other situations, optimisation of control logic or the use of non-monotonic multiplexer functions can result in the restructuring of CDC interfaces and introduce the potential for glitches (Fig 5).
Figure 5 Control logic optimisations may introduce glitches
Because of these possibilities, CDC verification needs to occur at both RTL and netlist – any solution that does not perform netlist-level verification is not complete. An effective strategy for verification is to ensure that the design is CDC clean at RTL and then to use physical-level CDC checks on the netlist to ensure that problems that may have been created by the various implementation tools are trapped and solved using a combination of structural and formal techniques. Tools such as Meridian Physical CDC take the full netlist into account, which is very large in modern designs and can often run to hundreds of millions of gates, ensuring that a design signed-off for CDC at RTL remains consistent with its actual implementation.
Dr Roger B Hughes, director of strategic accounts, Real Intent, is a renowned international expert in formal verification technologies and has more than 20 years’ experience in the EDA industry working both at start-up companies in lead engineering roles and publicly traded companies in managing and directing technical product development. He obtained his electronics engineering degree at University of Wales, Swansea, and his Masters in digital systems and his PhD in electronic engineering at Brunel University, UK. He has published more than 70 papers.
If anyone is interested in checking for CDC errors in their normal functional verification flow, I have some IP for that –
There’s an ICCAD poster paper that’s an easier read.
Pingback: Technology Errors Demand Netlist-level CDC Verification | Real Talk