On-chip variation (OCV)
On-chip variation (OCV) is a recognition of the intrinsic variability of semiconductor processes and their impact on factors such as logic timing. Historically, as well as operating temperature, timing variation was primarily a consequence of subtle shifts in manufacturing conditions that would lead to ICs from one batch of wafers being ‘slow’ or ‘fast’ relative to nominal estimates. To account for this, design would run two sets of timing analyses: one for the slow corner and one for the fast corner. If the design passed these two tests, the chip could be considered to have met its timing constraints.
In recent years, the number of contributors to timing variability has increased and led to significant variations not just between wafers but across individual wafers and increasingly intra-die. This, in turn, has led to significant changes in the way that static-timing analysis is performed to account for effects.
Variation sources
Causes of timing variations can include small variations in the way that features defined on a mask print on the surface of the chip because of the effect of surrounding features, variations in processes such as doping levels or etching that may remove more or less of a critical feature such as a gate stack or polysilicon interconnect. Even activity has an impact through factors such as IR drop, which became more prominent in design as voltages approached, and passed, the 1V level.
The result is that all the cells of the entire chip no longer can be modelled using the fast or slow process corner alone – some cells will run fast, others slower than expected, depending on the changes in process condition and the impact of design-dependent effects. Failure to account for these issues can lead to setup and hold violations in designs that, nominally at least, error-free. As a result, the design’s timing needs to be analyzed in a way that takes into account the potential for timing to change within a given process or temperature corner. The addition of multiple operating modes has, at the same time, extended the number of corners that need to be simulated and given rise to multi-corner multi-mode (MCMM or MMMC) analysis.
Initially, timing analysis accounting for OCV was handled by telling the static timing analysis (STA) tool to apply a global margin across the entire chip using a percentage or delay estimate that the designer, or the foundry, considered safe. This approach leads to a more pessimistic estimate of timing margin than present in reality. The shifts in threshold voltage and other parameters that lead to differences in performance may be correlated within a group of transistors – this is likely to be true for a group of transistors in a small area – but not over a wider area. A global margin does not take account of this correlation, leading some chipmakers to explore the option of statistical timing analysis.
In practice, statistical timing analysis proved difficult and time-consuming to implement so the industry adopted a variety of more sophisticated margining techniques under the general banner of location-aware OCV or advanced OCV.
OCV enhancements
Advanced OCV, in general, uses context-specific derating instead of a single global derate value, which should in principle reduce design margins and lead to fewer timing violations. An approach used by Synopsys and others, such as Incentia Design Systems, determines derate values as a function of logic depth – used in level- or stage-based OCV – and relative cell or net location.
Logic depth and location attack two different parts of the problem. According to Synopsys, statistical analysis shows that deeper paths are less affected by random variation – because the contributory effects are random, all cells within a deep path are highly unlikely to be simultaneously fast or slow. Using statistical HSPICE models, Monte-Carlo analysis can be performed to measure the accurate delay variation at each stage. Derate factors can then be computed as a function of cell depth to apply less pessimistic margins to the path. Stage-based OCV has been an element of TSMC’s Reference Flow since release 9.0.
Location-based derating deals largely with systematic effects. Cells in close proximity exhibit less variation relative to one another than those further apart. Using silicon data from test-chips, derate factors can be calculated based on relative cell-location. Advanced OCV computes the length of the diagonal of the bounding box that contains the cells being analyzed to select an appropriate derate value from the table constructed by test-chip results.
Beyond advanced OCV
Parametric OCV (POCV) is a technique that has been proposed as a means of reducing pessimism further by taking elements of SSTA and implementing them in a way that is less compute-intensive. Extreme Design Automation, acquired by Synopsys in 2011, pitched POCV as “practical SSTA”. However, the generation of libraries suitable for POCV is, in itself, considered to be onerous. Synopsys has said that POCV could be adopted for 16/14nm flows. Techniques such as hierarchical static timing analysis, if they prove successful, may be used in conjunction with more sophisticated OCV modelling regimes to reduce runtimes.