Fixing late ECOs in ARM core subsystems at STMicroelectronics
Using equivalence checking to validate ECOs in ARM core subsystem development at STMicroelectronics
ST Microelectronics has been working with ARM cores for more than 15 years and is currently implementing ARM-based CPU subsystems using the Cortex-A15 and A-57 architectures, running at clock rates of multiple gigahertz on FD-SOI processes.
As you can imagine, this represents a considerable engineering challenge given the complexity of the subsystems (see Figure 1, below).
Figure 1 Multicore ARM subsystem implementation (Source: STMicroelectronics)
The CPU subsystem has number of complexities that make it more difficult to change at a late stage in the design than is usually the case. The core has its own PLL to ensure the quality of the clock signal. Parts of the subsystem sit on an area of the chip that can be body biased to increase clock frequency or reduce power consumption.
The interface to the CPU is asynchronous, so the CPU subsystem can run at a different frequency from the rest of the SoC. There are many switchable power domains to manage power consumption, so the CPU subsystem needs isolation cells and always-on cells. And there are multiple supply voltages to the core, so it also needs level shifters and asynchronous bridges.
The use of these advanced techniques has created challenges that can’t be tackled using a standard tool flow alone to achieve the necessary area and power: we have to intervene manually to make the necessary optimizations. This complexity also necessitates many design iterations, sometimes developing up to 10 versions of some blocks and the subsystem before we can choose one to carry forward. We’re even using a special cell library for these CPU implementations.
The change challenge
Achieving the targeted combination of power, performance and area would be difficult enough on such a challenging subsystem if all the blocks were stable. But we have to deal with multiple updates to the RTL, including some from our various IP providers, quite late in the design cycle, where they can have a large impact.
In many cases, we don’t have time to re-run the complete flow from the RTL forward, especially if the physical design is already at an advanced stage. Since we don’t have the time to start over, we must concurrently implement the original design, take into account any changes to the source RTL, and verify that the modified design will meet its function and power, performance and area (PPA) targets.
This means that RTL changes and errors must be dealt with using engineering change orders (ECOs) that adjust the functionality of the final netlist without running the synthesis process again from the source RTL. In the flow we have been using to date, this has been done manually, which requires a lot of time and effort and in complex situations may take a few weeks to conclude.
As already discussed, the CPU cores are highly optimized for PPA using a flow that makes it difficult or even impossible to find ECO solutions manually. For example: we can improve timing and area results by collapsing the hierarchy of the design during synthesis, but this makes it more difficult to find the RTL-equivalent component in the netlist, and to trace through those hierarchies in the implementation.
An ECO flow
To solve this problem we need a quick, automated and efficient methodology for ECO implementation, which also works well with our flow.
Our current flow is as shown here, in Figure 2.
Figure 2 The standard ECO flow at STMicroelectronics (Source: STMicroelectronics)
In this flow:
- We start with the design in RTL
- Synthesize using the Synopsys Design Compiler tool to produce both the netlist and .svf file
- Use Synopsys’ Formality to ensure the equivalence between the original design RTL and the netlist
- If Formality says the two are equivalent, we move ahead; otherwise we go back to synthesis and fix the issue
An example in use
Here are two versions of some design code from an ARM implementation:
Original RTL
assign l2_dirty_ecc_err_l4_dly4_en = l2_tbnk_vld_l4_dly3_q | l2_dirty_single_ecc_err_l4_dly4_q;
assign l2_dirty_single_ecc_err_l4_dly4_nxt = l2_tbnk_vld_l4_dly3_q & l2_dirty_single_ecc_err_l4_dly3
assign l2_dirty_double_ecc_err_l4_dly4_nxt = l2_tbnk_vld_l4_dly3_q & l2_dirty_double_ecc_err_l4_dly3;
Modified RTL
assign l2_dirty_ecc_err_l4_dly4_en = (~l2_tslice & l2_tbnk_vld_l4_dly3_q) | ( l2_tslice & l2_tbnk_vld_l4_dly5_q) | l2_dirty_single_ecc_err_l4_dly4_q;
assign l2_dirty_single_ecc_err_l4_dly4_nxt = l2_dirty_single_ecc_err_l4_dly3;
assign l2_dirty_double_ecc_err_l4_dly4_nxt = l2_dirty_double_ecc_err_l4_dly3;
The difference between the original and the modified RTL is very small, but the changes are part of a group of combinatorial logic.
Using Synopsys Formality Ultra we were able to find all the differences between the ECO RTL and the original netlist of the design.
How do we then debug our design so that the ECO is made but the functionality doesn’t change?
Figure 3 Formality Ultra can show graphically where two versions of the same design differ (Source: STMicroelectronics)
Synopsys’ Formality Ultra tool can analyze two versions of a design and graphically highlight the differences. In this case, the expected value and the implementation value are different, so these points on the design differ.
Formality Ultra has another useful capability: if you pick a net that is failing its equivalence check, it will try to find the equivalent failing net in the implementation.
When it finds the equivalent implementation net of the reference design, it gives users all the scenarios that could be an equivalent of this net in the implementation.
If it is not possible to find the equivalent net in the reference design, the tool can look at all the fan-ins of that net and find equivalent nets of all the fan-ins of that net.
Once the equivalent nets are found, users can pick those nets in the implementation and add the missing logic to the implementation, achieving the required ECO.
Figure 4 Implementing an ECO in Formality Ultra (Source: STMicroelectronics)
In the case shown in figure 4, Formality Ultra found that both the inputs of this cell have an equivalent net present in the implemented design. We were able to add a cell to the implementation to create the equivalent of all the nets and then connect the output of the cell to the failing point and so implement the ECO.
To our surprise, Formality Ultra was even able to find an equivalent net needed for an ECO within an AND/OR cell from the cell library.
Some other examples
Here are some other ECO examples we have worked through with Formality Ultra.
In one case, the necessary ECO was on a signal that was part of the data logic in the RTL, but when we did the implementation it becomes part of the clock-gating enable signal. Design Compiler had used complex logic to implement the enable condition of the clock-gating cells.
This created another complex scenario in which the traceability between the RTL and the implementation was difficult, because the tool had divided the logic up.
We realized that finding the equivalent net or ECO chain in the implementation was not possible manually, because of the complexity of the logic implementation. After a day of trying to find it manually, we turned to Formality Ultra, which implemented and verified the ECO in four hours. Otherwise, the ECO would have been impossible to implement without time-consuming re-synthesis.
In a second case, the ECO was on combinatorial logic and sequential flops. The issue here was that whenever there is a change on sequential flops, a lot of subsequent nodes change, making it very challenging to do the ECO.
When we looked at the complexity of the ECO in the RTL and netlist, we decided not to spend time trying to find it manually. Given our scheme, we couldn’t re-synthesize, so we used Formality Ultra to add 28 cells to implement the ECO in just one week. Without Formality Ultra we would have had to re-synthesise.
An updated flow
We have now updated our ECO flow, as shown in Figure 5, below.
Figure 5 The updated EC flow in use at STMicroelectronics (Source: STMicroelectronics)
Figure 5: the updated EC flow in use at STMicroelectronics (Source: STMicroelectronics)
In this updated flow we:
- Use original svf and the netlist created with the original RTL
- Add in the ECO RTL
- Use Formality Ultra to generate the ECO edits and to check the equivalence
- These edits are given to the Design Compiler, along with the original netlist
- Design Compiler then produces an ECO netlist
Conclusion
We have had good success with Formality Ultra, and particularly value its ability to find functionally equivalent nets. We have also found it useful for tracing complex ECOs, especially those that don’t require re-synthesis. It has accelerated our design process, and in one case we were able to implement an ECO that would have otherwise required re-synthesis.
Author
Kailash Digari is a group manager of the CPU-GPU team with STMicroelectronics in Greater Noida, India. Digari has more than 15 years of experience in the VLSI industry, and has worked in various roles including specifications, design, front-end and DFT flows, and product delivery. Digari also has experience in FPGA architecture definition, ultra-wideband SoC development.