Applications won’t find all the bugs, but they have their uses

By Chris Edwards | No Comments | Posted: June 9, 2014
Topics/Categories: Blog - EDA, Embedded | Tags: DAC 2014, deadlock, debug, ESL, functional verification, multicore, network-on-chip, post-silicon debug, software-driven verification, verification coverage | Organizations: Arm, Green Hills Software, IBM, Jasper Design, Mentor Graphics

Applications or test cases extracted automatically from them should be very good for exercising hardware. The hardware has to run them anyway. They are large and complex, so should test many different aspects of the SoC. The problem, panelists in a session at the 51st Design Automation Conference in San Francisco last week (3 June) argued, is that applications do not verify functional behavior nearly as comprehensively as you might think.

There are serious problems with using regular applications for verification, said Klaus-Dieter Schubert, distinguished engineer at IBM: “Applications are optimized for hardware, and the hardware is generally optimized for the software it runs. So, applications tend to avoid corner cases.”

Schubert added applications are rarely, if ever, built with verification in mind. If the application is a spreadsheet, how will the test harness check that the spreadsheet is returning the right answers?

“If you just pick random applications, you don’t know whether they run correctly because they don’t test themselves,” said Jack Greenbaum, director of advanced product engineering at Green Hills Software.

Failing to find bugs

“As the software guy, I want you to use both traditional and application-oriented verification because I’m the guy whose schedule slips when you hit bugs. If the verification worked we would not have a business modifying our compiler to not use certain instructions or fix linkers to optimize binaries,” Greenbaum claimed.

Synopsys fellow Janick Bergeron said: “I would not want to be the guy who uses IP that’s only been verified using extracted test cases. But there is a place for the use of extracted tests.”

Bergeron argued that the current nomenclature for discussing verification on different levels of the total system are currently lacking, particularly when it comes to distinguishing from the overall system, its subsystems and the units that make up those subsystems. “The system is not the sum of its parts,” he argued, which makes the use of scoreboarding, which may work reasonably well for subsystems, problematic.

Greenbaum said: “Based not the quality of IP and SoCs I see it feels like the verification done today doesn’t cover the entire test space. USB sounds simple, so how come it locks up in implementations?

“Where we uncover the most problems is in our test suites. We know what they look like at a bit-accurate level.”

Unit- or system-level failures?

Alan Hunter, verification architect at ARM, said problems are being left too late, which may make applications-based verification look more useful than it really is. “Unit-level test benches are king. We should be finding and debugging all our bugs at this level because it’s by far the most efficient way of finding bugs. If we find bugs with system-level testbenches it often points to flaws in your unit-level test benches that need to be fixed. We really need to analyze why the test bench was defective in the first place.

“You still have to do system-level testing to cover things that don’t fit easily in a unit-level testbench, such as debug, where it is difficult to isolate all the affected logic to individual units.”

Ziyad Hanna, chief architect and vice president of research at Jasper Design Automation, argued the system level generally contains problems that would not be identified as bugs when verified lower down the stack or even using full applications, characterizing them as “non-mainline functionality”. They involve issues such as deadlocks or security problems. “Many of these things are not visible to the application itself. They require specific techniques and tools. Security needs a lot of dedication to ensure it. And it’s not covered by the applications. Applications can be useful but you can’t rely on them.

“We need to think about the verification of the specification. There is generally lots of ambiguity in the specification,” Hanna added.

Errors of intent

Greenbaum said a lot of the problems Green Hills uncovers do not even fit the category of deadlock but are more “bus bubbles” in which the on-chip interconnect inexplicably stalls and interrupts execution due to the interaction between IPs as multiple applications run. “It often turns out the bus has got into an unusual state. The bugs that cost me weeks are system-level bugs like this.

“Running out of bus bandwidth: is that a fault? Where did that fault occur? There was probably no error. But the system doesn’t work because there wasn’t a formalism to express the requirement for performance. I see more mistakes at the bus-bandwidth level than anywhere else.”

One issue with checking system-level performance as well as functional issues lie in sheer size. Harry Foster, chief verification scientist at Mentor Graphics, said: “The bigger designs we have worked on don’t fit in FPGAs. The ones we are working on may not fit into emulators.”

Greenbaum commented: “One customer combined three chips into one. The part that was broken was the chip reset because they couldn’t load the whole thing onto a Palladium. The process of just getting the chip to come up because they couldn’t test it that way.”

Hanna said more work at the system level during definition should avoid many of these issues. “I think if we could get people to do higher-level designs it would really simply the world.

Greenbaum added: “System-level techniques could shake this out. Not enough customers use them to deal with the problem.”

The problems, said Greenbaum, can be extremely subtle. In one instance, the bus-bandwidth issue didn’t crop up earlier because the early test code and data did not fit neatly into their respective caches. Later on, optimized versions of the code featured better cache utilization but the application was forced to wait when finally it ran out of cache and had to re-arbitrate for the bus while other activity was ongoing.

“Not enough people are using system-level models that are accurate enough,” Greenbaum claimed.

Schubert said: “When using abstracted modelling, you can fool yourself that everything is perfect. Then you screw it up because you don’t implement what you actually modelled. If you don’t followup with lower-level RTL performance verification you probably make those mistakes.”

The value of application tests

Although applications-based verification will not test corner cases as well as other types of testing, Greenbaum said it is worth trying to run software to see how the SoC behaves. “IP vendors say ‘you don’t need a manual for the IP, I’ll give you the Linux driver’. But the driver is usually written by a hardware guy. I sit there wondering ‘why are writing that register three times?’ or ‘why is there a timing loop in this driver?’ or ‘why are you toggling that undocumented bit?’

“That’s why I want you to do both types of verification. What can software bring to verification? I really like it when our chip partners let us validate their hardware. But by the time we see the chip, it’s often very late in the process. We recently found a core bug on probably the sixth revision of an IP core. We probably would have found that bug with our standard test suite first time.”

Greenbaum recommended emulating the way that applications behave to exercise SoC interactions. “It’s so easy to mock up applications that bang on the design. You might have code that implements isolated tests but when you run a number of them together as an ensemble that will stress the system. I don’t want to say it’s free but it is easy. This technique is commonly used in real-time systems.”

Applications won’t find all the bugs, but they have their uses

Related Posts

Failing to find bugs

Unit- or system-level failures?

Errors of intent

The value of application tests

Tech Design Forum