At the recent ARM TechCon event, Kelvin Chen, principal engineer for HiSilicon, described how a team at the company used Synopsys’ flow to put together a 16nm finFET-based design built around a cluster of ARM’s Cortex A57 processors.
Introducing Chen at TechCon, Muming Tang, Synopsys staff applications consultant, explained how the EDA company had built features into the flow to help with the challenges of finFET-based design. Rather than the finFET itself, one of the key issues with this type of design is handling the ramifications of double-patterned lithography. Design-rule restrictions make access to the pins of standard cells more difficult than with older nodes. So, where space is available, Tang said IC Compiler will use long pins so the router can more easily find an access point.
To avoid stitching problems with double patterning, the router will avoid odd-pitch jogs, ensuring that a route stays on one mask as far as possible. For power routing, the router will avoid via arrays so as not to introduce unnecessary jogs.
Further changes control the way in which cells are placed and sized to cope with design rules for threshold voltage control. As the wells implanted to suit a certain threshold voltage cannot scale below a certain size, placement engines need to be able to detect isolated cells and increase their area to the required size. “IC Compiler supports this very well,” Chen said.
Moving to 16nm
Starting out as an ASIC design center for Huawei, HiSilicon has been using ARM processors for a number of years, licensing the v8 architecture that lies behind the A57 in 2012. “Since then we have focused significant effort on developing with ARM cores,” Chen said.
An eight-member team worked on the 16nm finFET project at HiSilicon. Each team member was responsible for one or more blocks, with one of the group taking care of top-level and block integration, Chen explained. Many blocks were reused from earlier designs and the team did a test chip with a single A57 to validate the technologies. For the design, HiSilicon based implementation on the Synopsys high-performance core and emerging node reference methodology. “With this test chip, we had a tapeout-proven flow,” Chen said.
The team faced a number of challenges. Not only was the target frequency for the processors 2GHz, but there were constraints in terms of schedule, as well as the issues raised by the new process.
The HiSilicon team employed a number of layout-aware techniques to maintain a balance between performance and routability. An emerging theme in advanced-node design is the tactical use of blockages in placement to spread cells out a little to avoid congestion. Typically, the tools will favor tighter placements as this minimizes wire delay. But as this can make detailed routing difficult, it makes sense to spread cells out a little artificially.
A further change made to improve routability was to move away from stacked vias in the upper metals towards a staggered structure, Chen said. I/Os were carefully aligned to the fin pitch and spaced on non-double-patterned layers at double pitch. This helped with top-level integration, he added.
A number of techniques were used to improve timing, grouping paths and weighting their nets to ensure they were placed closer together on the die. “Magnetic placement” of clock signals and the critical output paths of RAMs helped increase overall clock speed. According to Chen, the integrated clock gating cells driving RAMs were pulled closer to clock pins, improving timing by 30MHz.
Interconnect parasitics proved a key concern during the project. The team enabled via-resistance estimation to improve the placement correlation with the post-route timing results. The team used the RC scaling factors introduced with the 2013.12 release of IC Compiler to better match timing after detailed route, leading to an improvement of 17 per cent in total negative slack.
“We found that for different net lengths and metal layers, resistance varied dramatically,” Chen said, leading to the use of additional rules to control clock routing and its use of metal layers and vias. To reduce their resistance, clock trunks were double-spaced and made double-width on the lower metal layers.
Chen said HiSilicon collaborated with Synopsys on the use of multi-tap clock-tree synthesis, using features available in IC Compiler 2013.12. The resulting skew was less than 40ps.
For the overall routing strategy, the designers increased the via cost to reduce their overall number and minimized routing on double-patterned layers, as well as configuring the router to use the preferred direction for these lower layers where possible.