CDC-related metastability is hard to catch by hand and processes are error prone. Tools offer a more comprehensive approach.
With the number of independent clock domains found on today’s highly integrated and concurrent designs is growing. According to a 2018 Wilson Research study, 89 percent of all IC/ASIC designs and 90 percent of FPGAs have two or more asynchronous clock domains, with many designs having five, ten, or more. This means that the probability of metastability bugs is significant in almost all designs.
Metastability is a common problem for designs with asynchronous clock domains. Traditional simulation does not accurately analyze multi-clock designs. Verifying metastability effects and clock domain crossings (CDC) by hand is very difficult and extremely error prone. Instead, DO-254 projects for aviation electronics need an automated solution designed specifically for CDC verification to bridge the knowledge gap between design and verification teams and ensure comprehensive prevention of CDC issues, metastability in particular.
The trouble with metastability
Metastability occurs in digital circuits when the clock and data inputs of a flip-flop change values at approximately the same time. This can create problems on paths transmitting data between asynchronous clock domains. When the data changes in the setup/hold window, the flip-flop output oscillates and settles to a random value (Figure 1). The output of the flip-flop is said to have gone metastable and will lead to incorrect design functionality, such as data loss or data corruption on CDC paths. This arises in every design containing multiple asynchronous clocks.
Metastability is a serious problem in safety-critical designs in that it frequently causes chips to exhibit intermittent failures. These failures generally go undetected during simulation and static timing analysis. A typical verification methodology simply does not consider potential bugs from clock-domain crossing paths. Thus, if CDC paths are not explicitly verified, CDC bugs are typically identified in the actual hardware device in the field. For DO-254 projects, catching faulty operation ‘in the field’ means critical bugs may not be caught until an in-flight failure.
Designers are generally aware of the metastability problem and try to implement logic to isolate the outputs of the metastable registers such that this metastable value does not propagate into the rest of the design. For example, experienced designers add synchronizers between clock domains, create protocols for transferring data between domains, and try to avoid situations where data from multiple clock domains reconverge (Figure 2).
However, it is very easy to leave out needed synchronizers, or place one incorrectly such that it does not work as expected. Even careful manual code reviews easily miss these problems. Reconvergence issues are among the most dangerous and insidious CDC problems and almost impossible to find through manual code reviews. The effects of CDC issues can be highly data dependent and may only exhibit themselves in corner cases when a combination of a specific data value crosses the CDC boundary while the design is in a specific vulnerable state.
To make matters worse, verification engineers—who generally are not as well versed in design as the designers themselves—often do not recognize these types of CDC issues. This is one situation where the independence of design and verification roles, as required by DO-254 can be potentially harmful.
Finally, after completing RTL verification, changes that are introduced in the design during the implementation process—such as logic optimization, physical optimization, design-for-test (DFT) logic, and low-power logic—may cause incorrect behavior on CDC paths as well as introduce new CDC paths. For example, incorrect combinational logic generated by synthesis tools may result in glitches on CDC paths (Figure 3).
Correcting metastability effects before failure
Although DO-254 does not explicitly mandate the verification of clock-domain crossings, verifying them should be a requirement in safety-critical projects. Many primary contractors recognize this issue and in some cases are adopting CDC as part of their own verification methodologies or placing requirements on their sub-contractors to test for CDC issues.
Because verifying metastability effects and CDC by hand is so difficult and error prone, companies should use an automated solution designed specifically for CDC verification. However, not all automated solutions are equivalent. Some tools only do a partial job of finding CDC bugs.
A comprehensive CDC verification solution, such as that offered by Questa CDC from Siemens EDA, a part of Siemens Digital Industries Software, must do four distinct things:
- Perform a structural analysis. This is most effectively done on the RTL code to identify and analyze all signals crossing clock domains and determine if their synchronization schemes are present and
- Verify transfer protocols. This assures that the synchronization schemes are used correctly, by monitoring and verifying that protocols are being followed during simulation or formal
- Globally check for reconvergence. This is most effectively done by injecting the silicon-accurate effects of potential metastability into the RTL simulation environment and verifying that the design will function correctly.
- Netlist glitch analysis. This structural analysis on the design netlist identifies glitchy logic introduced by
The combination of these four aspects of CDC verification is very powerful. Hands down, it is far superior to any sort of manual method. While an extensive manual code review might find structural issues (e.g., Are all the synchronizers in place?), it would be tedious, time consuming and error prone. In addition, manual reviews typically cannot ensure that transfer protocols are always used correctly, and almost never address reconvergence issues. Finally, RTL verification is not sufficient because implementation may introduce CDC issues after RTL verification is complete.
Assessing design and verification tools
One key aspect of the DO-254 process is to determine that the tools used to create and verify designs are working properly. The process to ensure this is called ‘tool assessment’ (though it is often mistakenly called ‘tool qualification’; tool qualification is actually one method of tool assessment).
The purpose of tool assessment (and potentially tool qualification) is to ensure that tools that automate, minimize, or replace manual processes for hardware design and/or verification perform to an acceptable level of confidence on the target project. Tools are classified as either design tools or verification tools, depending on which design flow processes they automate. Designs are designated with a DO-254 criticality level — referred to as the design assurance level (DAL) — that corresponds to the resulting severity of the consequences of a failure. The rigor of the tool assessment process depends on both the tool classification as well as the criticality level of the designated project.
As shown in Figure 4 from the DO-254 specification, the tool assessment and qualification process takes one of three forms. Independent Output Assessment (Item 3) dictates that another independent tool or method must validate the results of the tool. Relevant History (Item 5) indicates that the tool has been used before and has been shown to provide acceptable results. Tool Qualification (Item 7) requires establishing and executing a plan to confirm that the tool produces correct outputs for its intended application.
Regardless of these classifications, the task of tool assessment falls upon the airborne applicant or airborne integrator (not the tool vendor). The applicant or integrator proposes the method of tool assessment as part of the DO-254 planning and documentation procedures. The certification agency or its representative (in North America, this would be a Designated Engineering Representative, or DER) will determine if the proposed method of compliance to this requirement is adequate for the development process.
When using a specific tool, such as Questa CDC on a DO-254 project, tool qualification is required only if that tool is identified in the “Plan for Hardware Aspects of Certification’ or other DO-254 documents. In general, tools are only identified in DO-254 documents if there is a project requirement that must be fulfilled by using a specific tool. If a design does not have a specific requirement that says clock domain crossings must be verified, teams can run the CDC verification tool without it becoming part of the DO-254 review process. Conversely, when there is a specific requirement that says clock domain crossings must be verified to identify and eliminate metastability issues, then teams must choose a method of tool assessment.
‘Independent output assessment’ is the simplest method for tool assessment, and there are several options available. For independent output assessment, teams may validate the correctness of the CDC analysis by checking the results against the RTL design. In order to ‘qualify’ the CDC analysis tool, the design team must provide a set of representative test cases that validate each required CDC check against their coding styles. This validation will demonstrate that the tool is working properly for the specific tool analysis cases. Finally, the simplest method is to validate the CDC verification results independently by thorough testing in the lab. This will check all the conditions that were analyzed by the CDC verification tool.
DO-254 methodologies must ensure that a device is going to behave as specified, and that everything possible is done to catch bugs before the device begins operating in flight. Thus, the issue of verifying CDCs should be a requirement by design projects. Since verifying metastability effects and CDC by hand is very difficult and extremely error prone, DO-254 projects should use an automated solution, such as Questa CDC, designed specifically for CDC verification.
While a tool vendor cannot assess or qualify their own tools and the FAA does not provide blanket approval for the use of any tools in DO-254 projects, we share explanations and suggestions for getting through the assessment process in a more detailed paper, Automating Clock-Domain Crossing Verification for Do-254 (and Other Safety-Critical) Designs.
About the authors
Kurt Takara has over 25 years of experience in engineering design and verification, technical marketing, and engineering services. He is a Product Engineer at Mentor Graphics Corporation and specializes in assertion-based verification methods and applications, including formal and clock-domain crossing (CDC) verification. Takara has held engineering, marketing, consulting services and project management roles in electronics and EDA companies that include Ikos Systems, Raytheon, and Magnavox. He holds a BSEE from Purdue University and an MBA from Santa Clara University.
Kevin Campbell is a Technical Product Manager at Siemens EDA and is responsible for Questa Design Solutions products. He holds a Bachelor of Science Degree in Electrical and Computer Engineering from New Mexico State University. Prior to Siemens EDA, Kevin held a variety of positions in RTL design and verification, as well as leadership roles on teams at Seagate, Micron and Intel.