ARM has named the first pair of processor cores that employ its 64bit architecture and revealed that an architectural clean-up is likely to result in the smaller of the pair requiring less silicon real estate than the existing 32bit Cortex-A9.
AMD and other companies, such as Calxeda, plan to use the forthcoming Cortex-A50 series processor cores in servers and network switches that it anticipates will have a smaller energy footprint than those in use. Today’s equivalents are mostly based on x86 processors. STMicroelectronics sees a wider applications base for the 64bit processor cores.
To support its move into servers, ARM has developed a 64bit instruction set to expand the amount of memory its processors can address, following the move by other server architectures such as the x86. Noel Hurley, vice president of marketing and strategy in ARM’s processor division, said the opportunity of getting “clean sheet of paper to develop a new architecture” would enable performance improvements in the shift from 32bit to 64bit code not just for server applications but for smartphones.
Gian Luca Bertino, executive vice president and general manager of STMicroelectronics’ digital convergence group, said: “It’s not just for servers. This is a general architecture that will go everywhere. We see 64bit taking over. It will be an evolution that every kind of system will see.”
Bertino argued that, although mobile devices might not require the memory addressing range of servers, being able to process bigger data words in image, network and audio processing would provide advantages over traditional embedded 32bit processors.
ARM option quicker for AMD low-power plan
Suresh Gopalakrishnan, corporate vice president and general manager of AMD’s server business, said the decision to adopt ARM for low-power servers is a “time to market issue”, adding, “The x86 will get there. There is nothing inherent in the architecture that makes low power impossible to achieve with the x86.”
Hurley said the ARM 64bit processors will be able to switch between 32bit and 64bit modes, with 64bit operating systems hosting older 32bit applications until they were recompiled into 64bit code.
Hurley claimed the Cortex-A57 would offer three times the level of performance of today’s smartphones processors with its simpler sibling, the A53, being used for low-power tasks when workloads are light. This pairing follows the Big-Little approach that ARM launched last year, joining the low-power A7 processor to the current high-end A15 core.
When implemented on the same process, the A53 takes less silicon space than the older A9. This is largely due to design choices based on experience with real-world code that have allowed the engineering team to streamline parts of the pipeline or remove unnecessary logic.
“The software environment changes as you move forward, which leads you to make different tradeoffs in the pipeline architecture,” said Hurley.
Like the A9, the A53 has a dual-issue, in-order superscalar pipeline but an v8A-compliant architecture so that it can execute the same code as the A57 when workloads are light enough to allow the A57 to be switched off.
The performance improvements that ARM claims for the A57 come partially from the expanded single-instruction, multiple-data (SIMD) engine used for digital signal and image processing operations, Hurley said. The three-fold improvement, he added, is for 32bit applications: “They come from a wider pipeline doing more tasks in parallel.”
The A57 is at its heart a four-issue superscalar machine that, like its A15 predecessor will execute instructions out of order. However, Hurley stressed that some of the performance increase is also down to changes in the cache and memory hierarchy. “There is no silver bullet in terms of performance,” he said. “Performance comes at the cost of logic. But Big-Little allows you to deliver that in a power efficient way.”
Ian Drew, executive vice president of marketing at ARM, added: “If you add Big-Little in, you don’t have to try to optimize one core for two things. You can stretch that battery [with the A53].”
Hurley said one architectural change can help with common compiler optimizations. “The big difference we’ve made on the 64bit architecture is that it does not have banked registers. It’s a flat register file, which provides the compiler with a much richer register set. It means it can fit more of the stack into the register set, which reduces the number of pushes and pops from memory,” he explained.
The 64bit instructions are fixed-length and with a regular opcode structure designed to reduce the work needed to decode them. To save space, and allow more instructions to be fetched on each cycle, the instructions are, like the original ARM set, 32bit-wide. Mode switches, similar in concept to those used for Thumb, maintain a separation between 32bit and 64bit instructions. To save die space, Hurley said, “We can reuse much of the logic in the 32bit world in the 64bit instructions.”
Hurley said ARM rejected simultaneous multithreading as an option. Although it can be used to hide the latency of memory accesses in parallel applications – a technique used heavily in GPUs – multithreading complicates the design of the pipeline itself. The tradeoff did not make sense for the engineering team, he said.