Three senior verification specialists talk about how they are navigating the challenge of verifying multibillion-transistor SoCs with limited compute resource, increasing coverage demands and shrinking timescales.
Verifying the most complex SoCs is going to take a combination of raw verification horsepower and the careful application of targeted techniques, according to speakers at a luncheon held by Synopsys at DAC, a video of which is now available here.
The single-thread performance challenge
Guy Hutchison, associate vice president hardware engineering, Cavium, talked about the challenges of verifying the company’s Xpliant high-speed switch chips. These have been developed for use in massively scalable data centers, which need lots of server to server bandwidth to cope with the traffic created by the use of MapReduce algorithms. The companies that build these massive data centers don’t wait for IEEE standards to reflect their needs, which accelerates the pace of change in the sector.
For these applications, switch chip designs are often reticle-limited, because integration usually pays off on networking chip and the customers care more about the total cost of ownership of the data center than they do about the price of the chips. This means network chip developers are often among the first companies to hit the capacity and scalability limits of EDA tools.
Hutchison said the first verification challenge for Cavium, therefore, is the single-threaded performance of the verification tools. The company verifies at module, subchip and full-chip levels, and at the full-chip level had hit a brick wall.
“We were looking at simulations that could get us on the order of a few packets a day on a device that could hold 96k packets. It wasn’t getting us where we needed to go,” he said. Emulation became a key part of the verification plan simply because it offered exploitable parallelism. The team also took on formal verification techniques, including using it to verify a full block without any functional simulation.
“Doing formal verification takes a different mindset to functional verification.”
Tool performance was an issue even for some of the design’s sub-blocks, which were so complex that they took 24 hours to simulate. Full-chip simulation was limited to integration and connectivity testing.
Performance testing had to be done in emulation. Hutchison commented that emulation compile time needed to be less than ten hours, so that verification engineers could do one job a day and then start an overnight compile run to get a new image to work on the next morning. The emulation environment was virtualised, and based on C and C++, and the goals for it were limited to enabling performance testing and software initialisation.
“The key takeaways are that functional simulation is reaching its useful limits due to limits on single-threaded performance,” said Hutchison, “although we are looking at new parallel simulation technologies that are coming out such as Synopsys’ Cheetah.
“The other thing we are looking at is making much more aggressive use of formal technologies to prove out the low-level blocks to ring out all their permutations,” he added.
Cavium is also considering using parallel simulation for sub-blocks, and even emulation, although “the emulator is always a scarce resource. There’s never enough time on it. Emulation is here to stay for chip-level performance. It’s been extremely valuable in the process of developing our chip.”
Andy McBride, senior director, verification, Samsung Austin R&D Center, talked about his group’s work in verifying high-performance, very low power ARM CPU designs and system IP for cellphone and other chips.
The designs tend to have multiple CPUs, a shared cache, coherent interconnect fabric, and memory controller. The CPUs tend to be highly parallel, highly speculative machines with advanced branch prediction. This leads to a design with tens of millions of coverage points. The testbenches are in UVM, and are used for ‘smoke’ test, representative tests, plus daily and weekly regressions.
The challenges facing the team include a shift from working on a single design at once to multiple design in parallel, making the most of limited computing resources, and halving ECO turnaround times from two weeks to one.
Tuning simulation for smaller, faster footprints
“We’re running hundreds of billions of cycles a week on a core by itself, and that doesn’t include other things down from that such as integer units and floating-point units, all of which have their own benches and their own cycle needs,” said McBride.
He said his team had worked with vendors including Synopsys to get more cycles per second out of their verification environment, and was able to increase performance up to 2.5 times – helping to halve ECO TAT as requested. A similar effort focused on reducing the memory footprint of simulations to less than 4Gbyte, which enabled the tools to run in 32bit mode. This change alone gained 10 to 15% compute resources. Achieving this involved changing code in the tools, and in the team’s RTL and verification code.
McBride also commented on some teams’ ambitions to get rid of gate-level simulation: “In an ideal world all of us would like to get rid of gate simulation. None of us think we can, so the question is how we can cover most of that space as early as we can.”
One way to do this is through the use of X propagation tools. The team has run about 10% of its test cases with X propagation enabled beyond the base 4state, to find all the X propagation issues before they turn up in the gate netlist. To test the strategy, the team applied it to an existing design that it understood well, and the X propagation tools were able to find both bugs that the team knew about others which it did not know about.
“The net of all of this is, that working with your Applcations eninggeris you can get a lot of this stuff done,” Mcbride added. “It took a look of work for Synopsys to clean up some of their stuff to work with our design style, it required us to clean up some of verification code. But I can now run more than two times faster than before, I have a higher quality design and we still have to make sure we are compliant to any other simulator in the world because Korea uses them all.”
Amol Bhinge, senior SoC verification manager, NXP Semiconductors in Austin, offered his take on the current challenges of verification.
The chips his team works can have more than two billion transistors, multiple cores and 50 or more blocks, and over 20 clock domains. The verification environment may involve 100,000 lines of test code, thousands of SoC-level test cases and hundreds of software bring-up scenarios. The verification challenge is to continuously improve quality, and create larger and more complex chips with the same verification resources.
Bhinge gave the example of working on a flagship chip for NXP’s Digital Network group, which involved integrating complex new cores and fabrics, shift to ARM-based design, and moving to a UVM based verification strategy. Other challenges included a new networking system architecture, and making it possible to use the verification environment to bring up software on the design.
The tyranny of choice
The first challenge Bhinge identified in verifying this design was the state-space explosion, deciding which of about 20 different types of coverage and metrics to use. The next step is to hold meetings with all stakeholders involved in the design so that it can architected from the outset with verification in mind, for example by limiting the number of use cases in order to make the verification more tractable.
The team also chose to use interface/ports toggle coverage as its primary code-coverage metric, and attempted to enable selective functional coverage.
“SoC verification is an art. There’s a science to it too, but if you do too little of it the chip won’t work and if you do too much you’ll never tape out in your lifetime,” said Bhinge. He recommended developing a verification plan, driven by factors such as legacy issues, IP subsystem checks, coverage needs, architecture documents, software bring-up requirement, and the needs of priority scenarios.
Call for collaboration and standards
Bhinge’s approach to managing this complexity is to try and develop a standard approach, with standard data management strategies and so on, and called for standardisation among EDA vendors to make navigating complex verification processes simpler.
On test case, Bhinge argued that there are no clear solutions to the problem of deciding which test cases to write and how. Do you reuse an existing test case or write from scratch? Do you use UVM, C, C++ or something else? Do you use a test generator, or a custom strategy – and so on.
“We need innovations in this space, like a testcase generator for ARM ecosystem – a standardised way to do it,” he said, calling also for reusable testcases at all levels of the design from IP to SoC, and a metric to check the reusability of testcases.
Bhinge also called for other metrics, such as on checkers, and on gate verification completeness.
He said that formal and static tools could play a complementary role to simulation, for cases such as resets, clocks, IO multiplexing, register and interconnect checks. But he said such tools should be natively integrated with simulation, so users would face the same front end and environment. Simulation could then be used to generate constraints and checkers for formal, while formal tools could be used for unreachable code coverage.
Bhinge also called for collaboration and standardisation on performance verification tools and strategies. And on low power verification, he argued that the industry needs convergence on a power format - so backend teams and front-end teams can converge on one format.
“There is plenty of room to innovate and collaborate between us and the EDA vendors, and among us,” he added.
Michael Sanie, senior director, verification marketing, Synopsys, used the meeting to outline what Synopsys is doing to help engineers find bugs sooner and faster and so make it possible to bring up software more quickly.
Among the steps Synopsys has taken over the last year are the acquisition of Atrenta for its Spyglass linting tools, and emerging offerings in clock domain crossing and reset checks. The company has also been working to bring the unified compile features of VCS, and the unified debug of Verdi, into the Spyglass platform.
The VCS simulator is also being updated over the next couple of years with its Cheetah technology, a form of parallel simulation, the verification IP portfolio has been expanded.
The Synopsys VC Formal tool has now been used in more than 50 projects, and is being integrated with the VCS unified compile and Verdi debug facilities. Verdi itself is being updated, for example with coverage analysis, formal verification debug and other facilities.
Synopsys has also invested in functional safety verification with an acquisition for fault simulation technology on the ISO26262 automotive safety standard.
Finally, Sanie said that the Zebu Server 3 emulator, used in a hybrid prototyping environment, had been able to do a full boot of the Android operating system on an emulated mobile processor in an hour.
Watch the video here.