Cadence reworks implementation for both finFET and older processes
Cadence Design Systems has coupled the parallel-processing techniques behind its recently launched sign-off tools to engines intended to deal with sub-28nm process issues in a suite that revamps the company’s implementation tools.
Rahul Deokar, product marketing manager for the Innovus toolsuite, claimed the optimization engines it uses can deliver up to 20 per cent better power, performance and area. “It’s like half a node,” he said, adding that the parallel processing also improves overall turnaround time.
“The improvements also apply to mixed-signal designs on more mature process nodes,” Deokar added.
One of the engines, GigaPlace, replaces the old placement software and uses slack- rather than timing-driven optimization techniques that avoid the need to manually add weights to nets to control placement. The placer is layer-aware, allowing better control over use of the precious low-resistance, wide-metal layers.
Image Concurrent clock and data path optimization using CCOpt
The placement engine is followed by CCOpt, which performs power, timing and area-driven optimisations.
“The older technologies are heuristic. This one is analytical and can model physical parameters like congestion and wire length,” Deokar said. “The transformations in CCOpt are fully power aware. Although leakage power is important, the dynamic power has become hugely important in newer process nodes and so need power-driven optimization.”
Clock optimizations include technology brought in with the acquisition of UK-based Azuro. A further change lies in the adoption of H-tree clocking structures using an approach Cadence calls “FlexH” that is designed to provide better cross-corner variation control.
“This is part of the CCOpt engine,” said Deokar. “Conventional H-trees have to be laid out manually, so very often when it is used it is a process performed separately and is error prone and effort-intensive. Also, regular H-trees are very power hungry, which is another limitation. Because of these issues, there has not been massive deployment of the methodology and is typically restricted to high-end microprocessor designs.
“We came up with a method that starts with synthesis that then can shift to a H-tree that goes to the register endpoints. It combines regular clock-tree synthesis with the H-tree structure, but automating the entire process. Because it’s a hybrid structure, it gives better power consumption.”
Customer usage
A number of customers have used Innovus so far, including ARM for the design of the recently launched Cortex-A72 microprocessor core. Deokar said the tool was able to deliver an estimated clock speed of 2.6GHz with a five-fold improvement in runtime over the previous generation of implementation tools. “This strong collaboration we have with ARM helps our mutual customers. Out of the box they will get impressive PPA results using the Cadence solution,” Deokar claimed.
Another customer reportedly pushed the frequency of their processor to 3GHz, beating an earlier frequency target by 10 per cent, but using fewer scripts to control placement.
The parallel architecture allowed a number of designs to move from a hierarchical implementation approach back to flat though the ability to use blocks with much higher instance counts. According to Cadence, blocks can have 10 million instances or more.
To demonstrate the tool on older nodes, one customer ran a mixed-signal 32bit microcontroller design. “They are moving to IoT applications and needed a smaller footprint,” Deokar said. “They had both multi-Vt and multi-Vdd requirements. The new flow allowed better mixed-signal floorplanning and reduced the die size by 15 per cent as well as the number of manual iterations needed to complete the design.”