Clock tree synthesis

Sphere: Techniques | Tags: clock distribution, clock gating, clock tree synthesis, MCCM, OCV, timing closure

The design of the clock network in an SoC has come under increasing scrutiny for a number of reasons, ranging from its share of overall power consumption – sometimes as much as 40 per cent of the total – to the performance limitations of caused by increasing on-chip variation (OCV).

Traditionally, designers chosen between two competing clock tree architectures: mesh and tree. More recently, hybrids have appeared that combine attractive aspects of the two main forms. But the most prevalent architecture in ASIC and SoC design today is that provided by clock tree synthesis (CTS).

The tree is synthesized using a variety of buffers in such a way that very few paths share a route back to the clock root. Neighboring cells may have clock sources that have passed through a different number of buffers. This scheme offers high flexibility and provides the ability to tune clock skew down to individual cells as well as supporting fine-grained clock gating.

Pressure on the CTS strategy has come from the increase in process-voltage-temperature (PVT) OCV. The performance of the clock network can be highly temperature dependent, potentially leading to the chip meeting timing under some PVT corners but not others. Multi-corner CTS was developed to estimate clock delays over multiple corners, taking into account global and local variation accounted for. The technique makes dynamic tradeoffs between either buffering the wire or assigning it to less resistive layers in order minimize changes over the different corners.

Clock mesh

The clock-mesh strategy, which is often favored in the custom methodologies employed for high-speed microprocessors, has had some influence on SoC design at advanced nodes largely because of the impact of OCV and concerns over multi-corner signoff.

In the clock-mesh architecture, the root clock signal is split into parallel path using a tree of drivers that then feed an array of buffers that are cross connected in a metal mesh from which paths down to the clock sinks are routed. The crosslinking of the mesh builds a resonant structure in which the delays of individual buffers in the mesh are effectively cancelled out.

Because OCV has become such a major problem in sub-50nm designs, the cross-linking of the standard clock mesh approach looks initially attractive. But it can have issues for SoC designs both in terms of verification, cost, and power consumption.

In a presentation at ISPD in 2013, Juniper Networks described how the company’s designers use a hybrid tree-mesh, in which the mesh is used to deliver a shared clock signal to local logic. They added cross-links within the tree to try to reduce PVT variations there, at the cost of complicating timing sign-off. Static timing analysis (STA) is not designed to handle cross-linked clock trees, although STA tools have added support for regular meshes in recent years. Juniper used Spice simulations, run in Synopsys CustomSim, to estimate delays through the tree of buffers where cross-linking was used.

The team emphasized that cross-linking in the tree is used only to deal with OCV skew and not structural skew to avoid creating large short-circuit currents. Also, if analysis showed cross-linking as likely to increase jitter on a path, it was not used.

A further problem with the clock mesh is its high demand for routing resource as the fabric is typically quite dense and the inability to use clock gating in different levels of the structure. The gating has to be performed at the local level only, with placement used to group cells activated by the same gate signal to reduce the amount of routing needed. The large capacitance of the cross-linked mesh also incurs a power penalty of its own. Synopsys reported an average of a 30 per cent increase in power in 2009.

Multisource CTS and H-trees

Another compromise between mesh and tree is multisource CTS in which the mesh is raised to a higher level with trees delivering the clock signal to the sinks. This provides more trade-offs in terms of clock control and allows a coarser and less power-hungry mesh to be used while still providing some of the OCV advantage of the grid structure. Synopsys has said the power consumption of multisource CTS is generally closer to that of a conventional CTS tree but provides many of the benefits of a mesh under OCV. However, the mesh can need manual assistance to deal with blockages caused by macros and power connections within the SoC.

The pre-mesh drivers for clock tree can be be organized in form of H-trees, five-driver branches that trace the shape of a letter H. The H-tree shape provides a naturally balanced structure that minimizes skew through its natural routing symmetry that is not as OCV-resistant as a mesh but provides better stability over temperature and voltage corners than regular CTS.

Ideally, H-trees would be used in full CTS implementations to deliver clock signals to the actual sinks. But routing blockages and other issues have made it difficult to implement in practice as manual assistance may be needed to deal with blockages, complicating sign-off. Although it may be comparatively easy to ensure that the terminals are balanced symmetrically perhaps with some routing jogs to avoid large memories and other macros in the top levels of the tree, this becomes progressively more time-consuming and difficult as the clock tree divides towards the sinks. Because of the manual intervention normally required the H-tree structure is not often used in CTS flows.

Recently, Cadence Design Systems introduced a version of its CC-Opt tool, part of the Innovus implementation environment, that the company says is able to use the H-tree topology more widely. The FlexH heuristic algorithm searches a large number of possible tree structures that are electrically equivalent to the core H-tree to find those best able to avoid blockages. The algorithm attempts to find the maximum number of levels for which the H-tree topology can be used, finally switching to regular CTS for the leaf clock routes that lead to the final sinks. The CTS routes can then be refined with logic cells using the concurrent clock and data optimizations in CC-Opt.

Using this approach, the company argues that designers can more easily avoid the need to use clock meshes or hand-assisted trees and so stay within an automated clock flow that is easier to verify under standard flows.

Further clock optimizations

Under regular CTS strategies, as well as skew reduction across process corners, a number of power minimization techniques are available as well. One is the slew shaping available in tools such as Mentor Graphics’ Olympus-SoC. Slew shaping is used to help reduce dynamic power and pushes the majority of cases closer to target slew, eliminating transitions that are overly pessimistic.

Further development is likely to lead to mesh-based strategies that offer better compatibility with standard CTS flows. For more advanced designs that can support the overhead of a custom flow, some companies have investigated the use of resonant meshes. Some of AMD’s recent processors have used a resonant-mesh technology developed by Cyclos Semiconductor that builds inductors into upper-level metal layers to capture and recycle some of the energy provided by the clock drivers. The technique is intended to save power and support multi-gigahertz clocks.