Smartphone chip supplier MediaTek is extending ARM’s big.LITTLE concept of tailoring CPUs to the workload required by developing what it calls a ‘tri-cluster’ CPU.
Speaking at a Synopsys event at the Design Automation Conference earlier this month, Denny Liu, special assistant at MediaTek, outlined the thinking behind the three-CPU, 10-core HelioX20 chip, which is being built in TSMC’s 20nm process.
According to Liu, as smartphone displays get bigger they consume more of the total power budget of the smartphone, pressuring the CPU designers to work even harder to reduce the power consumption of their devices.
MediaTek has addressed the issue by developing whole system simulations of its CPU designs so it can explore multiple use cases running real applications software. It has used hybrid emulation systems from Synopsys to bring together ARM Cortex processor models, AMBA transactors, interface IP and peripherals, and a ZeBu hardware emulation of the Mali GPU and image processors in one simulation set-up that can run operating system and applications software at 6MHz.
The designers then profiled the popular applications used on smartphones and found they clustered in three groups: one group that tends to run for a long time with minimal CPU loading; another group that needs high performance for relatively short amounts of time; and a group in the middle that needs a medium amount of processing power for a medium amount of time.
Liu’s group specified a 1.4GHz CPU built using four ARM Cortex A-53 cores and a private L2 cache to service the long-running, low-performance tasks; and a 2.5GHz, dual-core Cortex A72 CPU with private L2 cache for the high-performance tasks. The dilemma for the design team was then to decide on the spec of a quad-core ‘mid-cluster’ CPU, with the variables being the clock speed and the version of the Cortex A5x family to use.
MediaTek used Synopsys’s IC Compiler II synthesis tool to explore a number of options for this processor, and settled upon a 2GHz, Quad Cortex-A53 implementation, again with a private L2 cache. [The three ‘cluster’ CPUs are linked via their private caches to what MediaTek calls its coherence system interconnect scheme.]
It also exploited IC Compiler II functions such as the ability to manage dynamic voltage and scaling, and to merge registers, lowering power consumption in the clock tree.
Liu also discussed the opportunities that TSMC’s current and emerging finFET process would bring to reduce power for smartphone chips. Defining the normalised speed of the Helio X10 chip, the predecessor the x20 chip discussed here, which was built on a TSMC’s 28HPM process, as scoring 1.0, the Helio X20 on the TSMC N20SOC process should offer 1.2x performance at the same nominal voltage, 1.8x on TMSC’s 16FF+ process, and at 2.1x on the upcoming TSMC N10 process. There’s also the option to reduce power consumption for chips built on the later nodes by running them at lower supply voltages.