Debugging the debug challenge
Around 70% of the effort involved in taping out a complex SoC is spent on verification. Of that effort, about half, or 35% of the total effort involved in a chip design, is spent on debug. Why is this? Part of the debug effort involves finding errors in the chip. Part of it involves finding errors in the interaction between the chip and its software. And part of it involves the debug of the testbenches and other verification strategies that are used to give the design team the confidence to say, ‘We have done enough’, and tape out the chip.
Debug is a pervasive part of successful chip design, beginning with the application of static checkers (ranging from simpler linting tools to more advanced formal technologies) that help ensure that the RTL is as clean as possible throughout the design process. This also helps design teams find and fix bugs early on, when rectifying them is least costly.
Sensible design teams develop their verification strategies alongside the design code, using SystemVerilog to express test strategies that exercise each part of the design using a constrained set of random stimuli. These smart testbenches are, however, becoming very complex. Advanced users have reported testbenches with up to one million lines of code, and in the case of a more advanced processor, as much as one third of its several million lines of code (for combined RTL and testbench) was dedicated to testing the design. Even in less critical designs, it is not uncommon to see a ratio of five or ten to one between the design (RTL) and the testbench code.
These testbenches are so complex that, like any other code, they are prone to errors and need debugging themselves. In fact, development of these testbenches is very much like any other large software development – each of which introduce new sources of uncertainty and requires its own “verification” process.
The testbench debug process that resolves this issue focuses on two things: are your tests meaningful, and have you tuned them for greatest effectiveness? The idea is to ensure that tests exercise the most important parts of the design, and that those parts are exercised enough. With simulation cycles costing time and money, verification engineers also need to ensure that they don’t waste simulation resources by over-exercising parts of the design in a way that doesn’t bring them any more insight into its correctness.
The debug frontier is constantly expanding, as issues such as power management introduce new challenges. Take a chip that has been designed so that individual blocks can be turned off when not in use, and back on as needed. Turning blocks off and on creates a lot of uncertain (X) states in the simulation, which have to be sorted into real bugs and simulation artifacts and correlated with the designer’s power-management intentions.
Beyond hardware debug, there’s also the issue of the software running on that hardware. When we get a bug we need to be able to find which line of code caused it, demanding effective trace strategies in a hardware/software debug environment.
The biggest issue with these various debug challenges is that they’re not completely orthogonal: they pile up on each other and demand that you debug your logic, your testbenches, the impact of power-management strategies, and the software running on the hardware – all at once.
As is usual in EDA, the way to manage this problem of increasing complexity is to move to a higher level of abstraction. But how do you abstract debug? One way is to analyze designs at the transaction level first, trying to spot bugs at this more abstract level and so narrowing the scope of your analysis of the signal representation of the design.
Another way to ease the debug issue is to use verification IP. In simple terms, VIP are pre-written production-proven testbenches focused on the verification of standard protocols. Advanced verification IP technologies enable a new concept in verification: protocol-aware debug, where you can debug at the protocol layer (for example, with packets, transfers, etc.) first, before having to move down to the signal level where you can trace and find the root cause of a bug.
The wider issue for the beleaguered SoC designer is that the industry has underinvested in debug, so that although the process is not as manual as it once was it’s still a difficult problem. To counter that underinvestment, Synopsys acquired SpringSoft so that we could offer its de facto standard Verdi as an open platform upon which we, users, and third parties, can innovate effectively by offering plug-ins and extensions to tackle emerging debug issues.
This will enable the industry to start addressing some of the key pain points in debug, such as moving to the transaction level, dealing with X propagation in simulation, and handling low-power circuits. There are more complex issues to be tackled as well, such as hardware/software co-debug, and the integration of analog and mixed-signal into the digital realm. And there are what one might call ‘ancillary debug’ issues, such as coverage debug, and coverage qualification, an increasingly important issue in markets such as the automotive industry.
Along with each of these point solutions, the industry also needs to do a better job of turning the data that its tools deliver into actionable information for designer, and to develop a way of helping designers orchestrate these multiple approaches into the most efficient and effective overall debug strategy.
I’m convinced that as chip designs become more complex, the debug part of the verification process can only grow. We as an industry need to continue to invest and focus on debug innovation.