Challenges of tool, process and design collaboration at advanced nodes
A look at how collaboration between design, process and tool development is becoming increasingly important to get the best out of the most advanced nodes.
Producing successful chips at the 10nm and 7nm process nodes is demanding unprecedented levels of collaboration and co-optimisation between designers and process and tool developers to realise the power, performance and area benefits of these advanced processes in real designs.
ARM has already announced a reference implementation of its A73 core, which achieves a 30% higher peak clock rate than previous offerings at a 20% power saving compared to the A72 core. To do so it had to use ARM POP IP libraries that had been specifically optimized for the core, and a reference flow from Synopsys tuned for the design.
Ron Preston, senior principal engineer in the physical design group at ARM said it was now taking end-to-end optimizations to achieve the required PPA.
“You can’t just operate top down any more – you have to look at what is physically possible in the process and then work back up the food chain,” he said. For example, the shift to finFET based processes happened to take advantage of the superior electrostatic control of the channel made possible by moving it into a vertical fin. But as process generations advance, the fins have become taller, which allows higher drive currents but also means higher active and leakage power.
The way to counter that trend is to use fewer fins, which reimposes control over power and reduces area. The issue is that the fins are built into library cells and in libraries, you want the most used cells to be the most efficient. As the number of fins used in the cells is reduced, the cells’ height is being reduced faster than their width, creating MEOL routing issues. To fix this issue, in turn, means a co-optimization process between library designers and process makers and the co-definition of key IP blocks.
It’s also the case that in these tighter geometries, intrinsic device delay is less significant than cell parasitics, so sizing up transistors is no longer as valuable as it used to be in driving longer lines.
“The partnering aspect of being able to collaborate across the flow is really important,” said Preston.
Chris Schreppel, a staff design consultant at Synopsys, described the design flow developed for the A73 implementation project as using the Galaxy Design Platform, DC Graphical, IC Compiler II, and PrimeTime for STA and ECO timing. The flow used multibit libraries and lots of scripting to achieve banking ratios of more than 80%. The banking happened in two passes: first an analysis of the RTL, and then an analysis based on physical proximity.
“By doing this you reduce the clock, reset and scan pins,” said Schreppel, which helps meet PPA targets. He added that this latest ARM core is better architected to enable integrated clock gating techniques to be applied than previous designs, which demanded more analysis to know where to use it.
To improve the timing of data RAM blocks, a 50% placement blockage was defined around them to reduce the density in the routing channels. Clock tree gating was done using a concurrent clock and data optimisation strategy, to balance the optimisation of each type of signal for best overall results.
The timing flow was based around using PrimeTime, and a version of its delay calculator in IC Compiler II to ensure good correlation between results for the two tools. Schreppel argued that ICCII has a little more timing pessimism than PrimeTime, which helps designers meet final timing requirements. He advised them to expect that pessimism, and therefore not to drive ICCII optimisations too hard, so that they didn’t end up with a design that uses excess power unnecessarily.
Schreppel also used Synopsys’ Formality tool for equivalence checking in the flow, and advised that the advanced optimizations in ARM core designs demand the use of the advanced checking strategies of Formality.
“We’ve ended up with a repeatable baseline flow,” said Schreppel.
Willy Chen, deputy director, design and technology platform at TSMC, discussed the challenges of 10nm process nodes compared to those of the 20nm processes the industry was talking about just three years ago. At 10nm, the libraries, layouts, I/O pins all have to be coloured for double or multi-patterning, as does the analysis of RC, IR, EM issues and DRC. Place and route tools for this node need to be able to handle full colouring, automatically align pins by colour, and route on pre-coloured routing tracks. This is also true for custom design at this node, where designers need help achieving device matching and critical timing in a coloured process.
For the 7nm node, an extension of the N10 node sharing a lot of its manufacturing flow, Chen said TSMC would be focusing the process on high-performance computing applications, and so it would need to deliver the best possible clock speeds. From a design point of view, therefore, this means moving to a mesh-based clock architecture, to reduce the number of buffer stages in the clock tree, and applying layer-promotion strategies, used in previous nodes to reduce the resistance of clock routes, to vias.
From a tool point of view, better correlation between what tools predict and the performance of the resultant chips, for example in crosstalk calculations, will be important to help cut design margins.
Zeng Qiuling, senior ASIC designer at Huawei/HiSilicon, talked about the challenges of designing mobile applications processors and networking chips at advanced nodes. The design priorities for the applications processors included PPA, multivoltage performance and usability, and turnaround time (TAT). For the networking chips, the focus shifts slightly, to PPA, TAT and the design tools’ overall and block capacity. Qiuling said that his team had used various features of ICCII to complete its designs, including placement strategies driven by total negative slack, multi corner multimode clocking and placement, which had enabled a 15% increase in maximum operating frequency, and post-route concurrent clock and data optimisation. Runtimes for their designs using ICCII had fallen threefold, and server memory usage had halved.
Henry Sheng, senior director, silicon implementation, and group director, R&D, at Synopsys, said that to support 7nm nodes, the company’s tools will have to manage technology changes such as asymmetric lithography requirements, the use of SADP with cut metal masks, new lithography rules and restrictions, and a metal pitch reduction. They’ll need to support fully coloured flows, a cut metal flow, and take into account the kind of ‘invisible’ shapes, such as mandrels, that are necessary to enable other shapes to be formed. They’ll have to do this for processes that have greater variations than before, and in which parameters vary asymmetrically. To ensure that the tools can both do the job and deliver the required quality of results, the development focus will be on creating tools and flows that can give predictable results out of the box.
“From a tools point of view, these kinds of collaborations are very, very important,” said Sheng.
The discussion took place at a Synopsys breakfast at the Design Automation Conference in Austin, Texas, earlier this year.