20nm timing analysis – a practical and scalable approach

By Robert Hoogenstryd, Synopsys |  No Comments  |  Posted: December 6, 2012
Topics/Categories: EDA Topics, EDA - IC Implementation  |  Tags: , , , ,  | Organizations:

Using hierarchy and improved constraints management to accelerate static timing analysis at 20nm and below.

The shift to processes at 20nm and below will enable chips with much greater functional integration, performance and energy efficiency than previously possible. However, much of the promise of the new node will remain unexploited if engineers cannot produce 20nm designs in a practical time scale using realistic engineering resources.

The challenge

One of the key practical challenges of complex 20nm designs is getting them through the static timing analysis (STA) and signoff process. For tool vendors, this means finding ways to check designs that are two to four times more complex than those undertaken at 28nm, with a greater number of combinations of process parameters (corners), and a wider variety of usage scenarios.

The response

Tool vendors must respond by ensuring the performance of their offerings increases at least as fast as design complexity grows. For Synopsys, in STA and signoff, this means a long-term commitment to developing better algorithms and finding ways to exploit the increasing amount of parallelism that is becoming available in processors and server farms. Synopsys is working on many other ways to improve the performance of STA: out-of-the-box speedup, scalable support for design hierarchy, the reduction and simultaneous analysis of multiple scenarios, a patent-pending signoff driven approach to handling engineering change orders (ECO), and modeling strategies to reduce pessimism and over-design.

This article focuses on scalable support for design hierarchy, which can dramatically improve the turnaround time for 20nm designs.

Managing hierarchy

One of the most important steps towards making STA and signoff practical at 20nm is to start working with designs in a hierarchical manner, rather than as a flat netlist. This shift reflects the way in which designs with more than three to five million instances are handled, in terms of physical implementation, extraction and design management, and as a set of interconnected blocks rather than as a whole.

Each of these blocks has a budget for factors such as area, and power and timing to be derived from a top-level budget. Working this way breaks down the problem of meeting timing constraints into a number of sub-problems. It can be difficult to get designs of more than three to five million instances to meet timing constraints if you handle them as a flat netlist, because the growth in runtime and memory requirements involved in working with such large netlists is out-stripping the performance gains being brought to the tools by the introduction of multicore server processors.

Synopsys has introduced HyperScale to handle very large designs in a hierarchical manner; five to ten times faster than the flat approach, and using five to ten times less working memory. The basic idea is to analyze the chip on a block-by-block basis, and then use the results from the block-level analyses in a full-chip analysis which in turn creates a more accurate timing context for each block during the next cycle in the design process. This approach is open to parallelization, which makes it faster, avoids the use of excessively conservative block-level timing constraints, and means that late ECOs can be handled efficiently.

<em>HyperScale’s next-generation hierarchical approach speeds up signoff </em> (Source: Synopsys)

Figure 1 HyperScale’s next-generation hierarchical approach speeds up signoff (Source: Synopsys)

The HyperScale process begins once the block-level place and route has started and top-level integration has begun. It will then work with black-box representations of blocks that are still being finalized. At this point, static timing analysis is undertaken for each block using whatever timing budgets are available, whether they have been produced manually or by scripting. Any timing violations in each block can be fixed now, or flagged for attention once a more accurate timing context for the block has been developed from the top level.

The next step is to save the block-level sessions to be re-used at the chip-level timing stage. After the chip-level timing is run, each block then benefits from an accurate top-level context.

This approach improves the accuracy of timing contexts during the design flow, avoiding the need to make excessively conservative design decisions because of lack of accurate timing data. For example, in traditional static timing analysis, a hierarchical block timing approach will not capture the context necessary to enable accurate clock reconvergence pessimism removal (CRPR). When two related clocks enter a block, it is difficult to model the CRPR properties of their common top-level source in the block-level analysis – which leads to unnecessarily conservative design. The problem can be fixed by running a full-chip flat analysis, which exposes the entire clock network and enables accurate CRPR, but this comes at the runtime cost of the full flat analysis.

Classic approach – No CRPR correction applied at the block level, producing pessimistic results

With HyperScale – All CRPR effects are replicated for the block-level analysis

Accurate top-level context information (e.g. CRPR) captured with HyperScale analysis (Source: Synopsys)

Figure 2 Accurate top-level context information (e.g. CRPR) captured with HyperScale analysis (Source: Synopsys)

With HyperScale technology, the boundary clock CRPR relationships are captured as part of the updated context generated by the first run and used for accurate block-level analysis, eliminating the need for a flat run. HyperScale also captures an accurate enough context to take into account advanced on-chip variation, timing exceptions, signal integrity and noise effects.

The enabling technologies

One of the key underlying technologies of HyperScale is an automated constraint extractor. Many customers find that their block and full-chip level constraints do not align and that adjusting them so that they do is a lengthy manual and error-prone process. The HyperScale automated constraint extractor extracts the block-level constraints from the full-chip level constraints as a starting point, enabling constraint consistency. This is an important enabler for an efficient hierarchical timing flow.

Customer feedback suggests that the automated constraint consistency checker has proven to ensure consistent block-level constraints and to get rid of outliers in timing correlation caused by misaligned constraints. It’s also an easier way to debug constraints than the purely manual approach, which can leave untrapped errors.

HyperScale also includes a more flexible and efficient way to handle multiple instances of the same IP block. The block place and route process is implemented on one instance, but full-chip timing is usually run flat to capture an accurate picture of each instance and its neighborhood.

Classic approach – Flat run required for ECO fixing across all instances, user must manually resolve fixes to work for all instances

With Hyperscale – Bounding context captured at block-level, user can optionally choose unique contexts

<em>HyperScale provides added flexibility to handle multiple instances</em> (Source: Synopsys)

Figure 3 HyperScale provides added flexibility to handle multiple instances (Source: Synopsys)

This takes a lot of time and computing power. HyperScale allows designers to represent multiple instances of a block as a bounded single instance (which offers the greatest saving in runtime), or as multiple subsets of instances (which is valuable if two or more clusters of blocks are placed in substantially different contexts). If there is no commonality in the context of the multiple instances, they can be handled as unique instances.

The hierarchical approach of HyperScale also makes handling ECOs easier. With a flat netlist, making a change to meet an ECO means meeting a timing budget at the block level and then running another full flat timing analysis, in order to ensure that the corrected block still works in the full-chip context.

<em>A hierarchical approach makes meeting ECOs easier</em> (Source: Synopsys)

Figure 4 A hierarchical approach makes meeting ECOs easier (Source: Synopsys)

With the HyperScale approach, a block-level fix has to meet an accurate timing context that has been derived from the top-level constraints, removing the need for another full flat timing analysis run. On one customer design, this new approach has made checking an EC0 change 3.5 times faster.

The results

Shifting static timing analysis onto a hierarchical footing offers a number of advantages. Working on many relatively small blocks is inherently a more parallel problem than working on one large block, so it is easier to improve productivity by applying more computing resources. Working at the block level means that timing issues can be localized and solved within a block, rather than affecting a complete chip netlist. Using the same signoff algorithms in the hierarchical approach as are applied to a flat netlist analysis means that the results of the two approaches are closely correlated.

In practical terms, a hierarchical approach reduces the number of design iterations between the block and full-chip level necessary to meet signoff. It reduces the number of times that a time-consuming flat netlist analysis is necessary. By making the process of meeting timing requirements more deterministic, the approach enables better forecasting of when a design reaches signoff. Measures of the number of violations per iteration also help design management understand if the signoff process is progressing in the right direction.

Early feedback suggests that customers value the lower run-time and smaller memory footprint demanded by the hierarchical approach, the close correlation with a full flat analysis, the automatic updating of block contexts, and the way the tool checks for constraint consistency. The approach has been utilized by a number of customers at 28nm and is ready for use on processes at 20nm and below.

Author

Robert Hoogenstryd is director of marketing, design analysis and signoff tools, Synopsys

Company

Synopsys
700 East Middlefield Road
Mountain View, CA 94043
Phone: (650) 584-5000 or
(800) 541-7737
 

Comments are closed.

PLATINUM SPONSORS

Synopsys Cadence Design Systems Siemens EDA
View All Sponsors