Dynamic power optimization
FinFETs present a number of problems with respect to dynamic power consumption. Part of the issue is that dynamic power rises in importance because the three-walled devices exhibit reduced leakage from short-channel effects. But the three-dimensional nature of the gate structure leads to increased capacitance that, in turn, leads to higher power consumption on each change in state.
Image Schematic of a merged 2bit flop (Source: NCTU)
At the physical level, dynamic power optimization techniques are today focusing on three main areas – clock power reduction, glitch control, and logic activity minimization. Clock gating has provided one means for cutting the power consumption of the clock network and the logic it drives. Recently, tool support has emerged for flop merging, in which one clock signal drives two or more parallel flops, reducing clock power to the combined cell by 50 per cent or more compared to individual cells. A 4bit cell can save 20 per cent overall power compared to four normal flops and 30 per cent area, according to Professor Iris Hui-Ru Jiang and coworkers of the National Chiao-Tung University in a paper presented at ISPD 2013. Process technology changes have steadily made the use of multibit flops more practical, as noted by Yao-Tsung Chang and colleagues from the National Chung Cheng University in a paper at ICCAD 2010:
“Each flip-flop contains two inverters to generate opposite-phase clock signals. As the process technology advances to 65 and beyond, even a minimum-sized inverter can still drive multiple flip-flops.”
Although multibit merging allows significant savings in clock power for register-intensive circuits, there are potential issues with their use. First, the cell library needs to contain multibit flops. Then there are questions over the stage at which merging is performed. If handled during synthesis, it is possible that flops being merged drive logic paths that may need to be placed far away from each other.
Physical synthesis or placement engines can guide compatible flops toward each other and then ‘bond’ them by replacing with a multibit equivalent. Tool flows from companies such as Cadence Design Systems, Mentor Graphics and Synopsys will deal with multibit merging.
Glitch and logic activity
Glitching has the potential to become a significant contributor to active power in finFET-based designs as some of the techniques traditionally used to minimize the problem are not readily available. The glitches arise in combinatorial logic because signals arriving at different times can drive spurious outputs that, although they should not be latched downstream, will consume power on each change in output.
Gate delay tends to filter out many glitches, although path balancing through buffer insertion may be needed to reduce the potential for hazards, with the associated problem of increasing power and area. Sizing the transistors to alter their delay profile can be used to tune out glitches but this is more difficult with finFETs because they are quantized in terms of size.
Researchers such as Arunkumar Vijayakumar and Sandip Kundu of the University of Massachusetts have proposed tuning clock skew instead using a variant of the techniques employed for peak current and ground bounce reduction.
Glitch suppression provides a means of reducing spurious on-chip activity. Attention is now turning to the switching behavior of logic rather than speed and area. Design and synthesis techniques are being developed that attempt to prevent gates from switching needlessly.
For example, 2D-multiplier arrays can be restructured to temporarily rows of adders in which the multiplicand is zero. In control structures, the evaluation of some signals can be delayed until the outcome of higher-priority decisions indicate that those signals are needed.
Re-evaluating logic styles
Image Energy efficiency comparison of custom and synthesized ARM cores (Source: Mentor Graphics)
There are suggestions that a move away from the static CMOS logic generally created by synthesis could improve power efficiency. Although not practical for a full SoC, the circuit topologies used in full- or semi-custom design, such as dynamic or domino logic, could be deployed on highly active cores to cut peak power consumption. Because dynamic logic uses fewer transistors for a given function than static CMOS, it is typically more area efficient and the transistors drive lower capacitances, further helping to save energy.
In a presentation at ISPD in 2013, David Chinnery of Mentor Graphics showed how ARM processors designed using custom techniques displayed up to seven times better energy efficiency than later fully synthesized designs – the later designs were more power efficient because they used more significantly advanced process technologies.
Pulsed-static CMOS logic could provide a useful compromise between the high-speed domino logic used in custom processor core designs for which automated tool support is rare, if available at all. According to Chinnery, this form of logic is approximately 25 per cent faster than fully static CMOS, providing a little extra timing slack that could be harnessed to reduce power through the use of smaller transistors. The pulsed-static CMOS relies on the use of glitch-free cells. However, an increased focus on glitch removal in fully static CMOS may provide the impetus to develop automated techniques for glitch-free pulsed-static CMOS logic or possibly even domino logic, now that it is possible to take advantage of the lower leakage of the finFET or FD-SOI processes.