On-chip clock strategies and GALS

Sphere: Techniques | Tags: asynchronous design, clock domain crossing (CDC), clock tree synthesis, GALS, OCV, timing closure, variability

Related articles and guides

OCV Guide

Clock-domain and reset verification in the low-power design era

Clock-domain crossing: guidelines for design and verification success

Thanks to the rise in clock speeds, increase in intra-die variability and the widespread use of off-the-shelf intellectual property (IP) cores, it has become increasingly difficult to use a common clock across an entire system-on-chip (SoC).

In principle, it is possible to use advanced clock-tree synthesis and implementation techniques to distribute a global clock across the SoC with low skew. But variability makes it difficult to close timing for the majority of important process corners and modes. Even with that global clock, individual IP cores will often run according to their local clocks in order to simplify the verification of those blocks and ensure compatibility with key standards such as USB and Ethernet.

The result of the combination of variability and high clock speeds has led to the need to adopt advanced techniques to account for onchip variation (OCV) to ensure that the full chip can close timing (TDF Guide: OCV). The complexity has led to increased time spent in the final stages before tapeout although improved timing analyzers have appeared that use a number of techniques to try to reduce simulation and analysis overhead. As well as timing-closure analysis, further analysis is required for clock-domain crossing (CDC) to ensure the design is not prone to missed data or metastability issues. (Further reading: Clock-domain and reset verification in the low-power design era; Clock-domain crossing: guidelines for design and verification success)

Mesochronous architectures

Instead of trying to align the entire SoC with a global clock, an alternative is to make use of asynchronous protocols that are free from clock-skew issues – although they can be prone to metastability problems and so need to be verified for the possibility. Rather than throw away the synchronous methodology entirely, which has been used to simplify digital logic verification for decades, a middle option is possible: globally asynchronous, locally synchronous (GALS).

A number of recent SoC architectures and experimental devices such as Intel’s ‘Teraflops’ processor have used a ‘mesochronous’ scheme originally proposed by David Messerschmitt of the Berkeley Wireless Research Centre based around a GALS architecture.

With mesochronous clocking there is no requirement to distribute clock signal with low skew across the chip. A single clock need only be delivered to tiny islands the size of a few individual processors – the clock delivered to another sector can have arbitrary skew. Signals that employ asynchronous handshakes are used.

The GALS architecture, particularly in its mesochronous form has been proposed as a candidate for devices that swap out the traditional synchronous bus for a network-on-chip (NOC) communications scheme. Transactions between network routers distributed around the SoC can be implemented asynchronously but provide an easier-to-verify synchronous interface to local IP cores such as processors and I/O modules.

As cores on most SoCs will be designed to power up and down dynamically, the GALS scheme has the benefit of allowing them to decouple from one master clock, although there is the increased verification overhead of dealing with asynchronous schemes as well as an area overhead for the synchronizers. But this needs to be traded off against increasingly complex deskewing mechanisms that themselves impose a die-area penalty. Other considerations that can determine the applicability of GALS are discussed in this interview with Pranav Ashar, CTO of Real Intent.