Arm has unveiled two new cores – the Cortex-A77 CPU and the Mali-G77 GPU – that it isalso bundling together in a premium IP platform aimed at 5G smartphones through the offers of increased traditional performance and significantly higher machine-learning (ML) performance across both.
The cores represent part of ARM’s response to the increasing fragmentation of the compute market. Artificial intelligence and ML designers have been striving to balance applications on CPU, GPU and FPGA-based platforms. It has already seen some large players move toward custom and semi-custom silicon implementations – under the buzzphrase ‘domain-specific architecture’ – for datacenters, the cloud and cutting-edge applications.
ARM, through an overarching strategy it calls Total Compute, is looking to bring the same necessarily flexible design approach to the next generation of mobile devices.
The premium platform has already been adopted by MediaTek for its 5G mobile products set to reach the market in 2020. It also includes access to ARM’s software framework, the ML NPU, and other tools and peripherals.
In an article published alongside the Cortex-A77 launch at Computex in Taipei, Stefan Rosinger, Director Product Management at ARM, summarizes some of the more powerful features of the new CPU.
“Compared to Cortex-A76, Cortex-A77 demonstrates a number of performance improvements, including 20 percent plus more integer performance [@ 7nm, 3GHz], 35 percent plus more floating-point performance and 15 percent plus more memory bandwidth improvements,” he writes.
“The continuous performance innovation is enabled by the second generation 7nm designs following on from Cortex-A76.”
For ML, Rosinger sees options for greater efficiency across use cases such as AI cameras, visual scene detection, 3D scanning, face recognition, voice recognition, gaming and augmented reality.
The main features that have powered the improvements in the Cortex-A77 are summarized in Figure 2, and further information on its technical specifications is available here.
The Mali-A77 GPU introduces ARM’s Valhall architecture which, the company says, has enabled a 30 percent increases in both performance density and energy efficiency and a 60 percent improvement for machine learning, based on a like-for-like process comparison with the Mali-G76.
“All of this means that we expect Mali-G77-based devices to deliver a 40 percent better peak graphics performance when it arrives in mobile devices,” writes Andy Craigen in a separate launch paper.
Valhall, which will now be used across future iterations of Mali, includes a number of key innovations. These include:
- A new superscalar engine that underpins the lifts in energy efficiency and performance density;
- A simplified and scalar ISA that is “more compiler-friendly”;
- Dynamic instruction scheduling; and
- Reworked datastructures for better alignment with APIs such as Vulkan.
For the execution engine, ARM has raised the number of warps from eight to 16, and of FMA lanes from 24 to 32 (in two clusters of 16FMA per execution engine). Coupled with a move to one engine per shader core, this has allowed its design team to get 33 percent more compute in the same area against the Mali G-76.
Largely for the key gaming market, Valhall implements a quad texture mapper. “[This] provides four texels/cycle. This is 2X greater throughput than Mali-G76 and 4X greater than Mali-G73,” writes Craigen.
“In addition, because we increased the compute capability in Mali G-77, we also need to increase the texture capability to keep the machine balanced.”
Tuning for AI and ML
In a Computex presentation, Ian Smythe, said that ARM’s “objective” is to make system development “use-case driven”.
For the premium platform, the initial focus will likely be on enhancing and extending existing consumer mobile applications and handset features (ARM has a separate AIoT product strand). This sense that ML’s initial ‘killer application’ will be, in many senses, more-of-the-same was echoed by Kevin Jou, CTO of MediaTek, in announcing his company’s adoption of the platform.
But the strategy – and the likelihood that it will gradually trickle down from high-end to general handsets over time – underlines again the way in which AI and ML (and their increasing complexity and demands on compute) are forcing developers to tune across multiple processor types to get results.
Even in the course of ARM’s own work to get more ML performance out of its cores, Smythe explained that the company has had to look beyond traditional Moore’s Law/process improvements and optimize at a software and system level.
This, he said, has allowed the company to increase generational performance increases from 4 percent to 35 percent across the 76 and 77 generations.
The name of the game is “optimization from the ground up” whether you are developing IP, the hardware that runs it or the applications that run on them.