Early tape-out: smart verification or expensive mistake?

By Chris Edwards | 1 Comment | Posted: June 15, 2014
Topics/Categories: EDA - Verification | Tags: DAC 2014, emulation, FPGA prototyping, post-silicon debug, RTL simulation, tapeout-to-mask | Organizations: Freescale Semiconductor, IBM, Mentor Graphics

Is it worth trying to iron out all the bugs in an SoC before taping out, or should design teams anticipating a re-spin go to silicon earlier and use the chips that come back as verification accelerators?

Early tape-out to speed up verification through the use of actual silicon is an option that chipmakers are considering, although others warned on a panel at the 51st Design Automation Conference (DAC) in San Francisco that the plan could backfire badly.

Sharad Kumar, senior manager of post-silicon validation, said Freescale’s networking chip operation had encountered the need for re-spins on several recent projects aimed at advanced nodes. At the 45nm and 28nm nodes, the first re-spin had both functional and electrical bugs. If a design had to go to a second re-spin, which was the case for chips that had significant new content, the reasons were usually for functional errors rather than electrical.

Kumar said the combination of multiple cores and accelerators with complex power-saving modes often led to bugs that are difficult to track down on simulators or even emulators. Some bugs were reported by customers after receiving silicon samples.

“For 16nm, it seems obvious that we should do something differently, so we don’t have three revisions,” said Kumar. “What I’m proposing: let’s pull in the first tape-out and get the chip out earlier. It gives you the benefit of being able to run a large number of test cases [on the silicon itself].

Test-case speedup

“If you address those things earlier by pulling in tape-out, you can spend a little longer in validation. Then I think we have a better chance of hitting the production schedule and avoiding further revisions to the chip,” Kumar claimed. “It gives you a larger number of test cases making it easier to wring out issues that cross multiple cores and accelerators. And it gets you to earlier customer sampling, which often shows up new things.

“Using this approach I think we have a better chance of hitting the production schedule and avoiding further re-spins of the chip,” Kumar claimed.

Prabhat Mishra of the University of Florida goes further. Focus effort on the parts of the design for which the product has known use-cases and then progressively update the silicon or the software to work around bugs that appear as new use-cases are uncovered in the field. In effect, it is an extension of Linux creator Linus Torvalds’ maxim that with enough eyes all bugs are shallow. To some extent this is an approach that Freescale has already encountered – customers have identified elusive bugs that only turn up during software development.

Eric Rentschler, chief validation scientist at Mentor Graphics, said there are advantages to getting silicon from the fab in order to find problems with the design.

Explore the state space

“Why not find all the bugs in pre-silicon?” he asked rhetorically. “Because there are not enough cycles. You can achieve a three to four times speedup from post-silicon. And do we have the technology to model interactions with [power] state changes pre-silicon? Do we have the practical ability to do that over all the possible combinations? There are many hardware features that need a full software stack to exercise. But that can’t be done on an emulator or FPGA prototype because the designs are too big.

“Can you get to all the state space without using the rest of the [software] stack? A lot of the functions that need to be exercised are not enabled without the stack. For example, GPU drivers may control the power management of the GPU.

“We see a lot of onion-peeling when we find bugs. You find one bug and when you look under that you find another one,” Rentschler explained. “One example was a GPU where we weren’t hitting the entire power-management space. We fixed that power-management issue and started getting hardware lockups because it led to a divisor not being set in the clock divider, so it didn’t get a clock.”

“How about things outside the chip?” asked Rentschler, citing an example he had seen of a south-bridge I/O controller that checked out in pre-silicon tests but had been characterized electrically using a different clock source than the one used for bring-up. “It ended up locking on to the third harmonic rather than the fundamental. That took a long time to debug.”

The connections to other things extends to the software stack, Rentschler said, using an example from the computing space. “Microsoft likes to control a lot of the power management with things like connected standby. There is a convergence between the CPU and GPU at the software level. There are software models around that to connect them and they are not simple at all. And then there are workload-specific corner cases that you need to uncover.”

Use-case complexity

The complexity of the usage environment for computing-oriented SoCs points to the need to be able to debug at full speed, which implies an earlier tape-out to get candidate silicon. But there are problems. One is being able to connect pre- and post-silicon debug techniques effectively. Rentschler said: “We definitely want to shift left but silicon validation is still a black art and not quite a science yet.”

Then there are the risks of taping out with a device that does not provide what the verification team and customers need.

“Really understanding the dependencies is key and understanding what functions on the device are absolutely critical. Would you want to go to tape-release if you couldn’t do reliable JTAG scans?” Rentschler asked.

Kumar said some things can be worked around with partially functional silicon. “We put logic on the chip that helps us work around things. There is scan logic that we can take out and also allows us to see where things are. Getting to silicon seems to find the bugs. We then use that with simulation and emulation during debug. We don’t just rely on the features on-chip. We use whatever’s available.”

The dangers of DOA

Kevin Reick of IBM issued a darker warning against being too reckless with the tape-out schedule as turnaround time is a bigger problem than cost with re-spins. “There are a lot of forces driving the tape-out date. Obviously the fab has a start date and the slots are narrow. There is a lot of stress on the design team to get into those slots. And then there are product availability dates to hit.

“For the 22nm process we use on Power 8, the turnaround time is about three months. There there is stress on the fab. If we have an unexpected release, that will cause stress and we have to move our products around.

Reick talked about two situations that make early tape-out problematic. “You can encounter seriously gating bugs. A major bug can stall any further testing. One of those situations is a dead-on-arrival. Thankfully, we haven’t seen that for a long, long time. But it’s a real problem.

“You can work around bugs by disabling functions. But disabling those functions hits the state space. We had one bug where we had no choice but to disable the [data] cache. It left a huge hole in the state space. And there were bugs hidden behind that state space,” Reick explained.

Early tape-out for delayed production?

If it contains serious showstopper bugs not caught at pre-silicon, the early tape-out may simply result in the insertion of additional delays of at least three months as the team tries to get a working device together to test. Reick said: “With a later tape-out, you have longer in the lab for final release. By ensuring all your metrics are met you can get to market earlier [with fewer tape-outs].”

Reick said IBM makes extensive use of accelerators to debug aspects of the processor to check the power-on reset sequences and other key functional requirements. “The quality of accelerators on the shift left that we do has proved to be immensely useful.”

Rentschler added: “There is huge value that hardware acceleration can bring.”

Despite the high costs of mask sets in advanced nodes, the estimated $5m to $6m needed at 14nm/16nm is not as big an issue as the impact of delays caused by non-functional first silicon, panelists argued.

Rentschler explained: “While the mask costs are going up in the new processes, that’s not really where the real money is. The real money is in the time to market. If you have a $3m to $6m a day revenue stream, the mask cost is nothing. What’s more interesting about re-spins is just how much time it takes to do.

“Ten years ago, the time was probably on the order of a little over a month. Now it’s two or even three depending on the process. That’s how the spins really feed back into the cost,” Rentschler added.

A process for tape-out

Reick said timing considerations feed into the process that IBM uses. “We have a very structured tape-out process at IBM. We make sure we cover the entire state space and that the bug rates have tailed off. We spend a lot of effort in validation of the power-on reset sequence. And we model the wafer bring-up systems so that when the chip arrives into wafer test everything runs properly. We have been able to run functional code within a hour of doing that.”

Rentschler agreed: “You definitely want to be able to fetch code and boot. If you have a corner case in a shader core, you are far better off than having a serious bug in your DFX infrastructure. You need to be on a path to convergence with those priorities. You want to be able to do as much debug as you can with the first silicon. But early tape-out is possible with the right rigour and design management.

“Shift left wherever you can but ensure pre-silicon coverage of core functionality,” Rentschler concluded.