When ARM’s 64
There’s already some love out there for ARM’s v8 64bit architecture as the processor giant builds out its ecosystem.
It was an architectural announcement. Even its positioning on the last day of the ARM TechCon event was intended to emphasize that the company’s confirmation of its move into 64bit processing is one for the future, albeit the relatively near future.
The ARMv8 was unveiled in its applications form only and then with just the basic details. According to Mike Muller, chief technology officer, the main reasons for making the announcement now are to clarify the roadmap and to allow time for the construction of an appropriate support ecosystem around the new core.
That second point is important. Once upon a time, the ARM “Connected Community” was relatively small even though the technology was becoming increasingly influential. Today, it has more than 770 members and continues to grow.
The idea that ARM could quietly nurture its 64bit architecture toward a commercial release without any details leaking across this size of ecosystem simply doesn’t hold water. More to the point, getting the best support out there for what is likely to be a fairly bloody commercial battle will involve getting the best tools and other support built around the v8 quickly.
A tough fight
And when we say “bloody” we mean it. ARM has again bearded Intel with a direct challenge to the x86 architecture. However, its intentions with the v8 are not directly or even largely focused on the desktop.
Yes, the move to 64bit does feed into Microsoft’s decision to tailor its still dominant Windows PC operating system (OS) for ARM-based chips as well as traditional x86 ones. At the announcement, K.D. Hallman, a general manager with the software giant, was on hand to provide one of the potted quotations.
“ARM is an important partner for Microsoft. The evolution of ARM to support a 64bit architecture is a significant development for ARM and for the ARM ecosystem. We look forward to witnessing this technology’s potential to enhance future ARM-based solutions,” he said.
Also present was Nvidia, with its declared intentions in low-power processors to take it beyond its historical strength in graphics.
“The combination of Nvidia’s leadership in energy-efficient, high-performance processing and the new ARMv8 architecture will enable game-shifting breakthroughs in devices across the full range of computing, from smartphones through to supercomputers,” said Dan Vivoli, senior vice president.
Muller’s comments at ARM TechCon, however, suggested that it is the very high performance and, more important perhaps, low power pressures on that market that his company sees as the sweet spot: servers and other enterprise-class hardware.
Figure 1
Application profiles for the ARMv8
Up and away
According to the Environmental Protection Agency, U.S. energy consumption powering servers will exceed $7B this year and has risen by more than 40% in the last five years.
With the move of an increasing amount of functionality and storage to the cloud—key features in the recent high profile launches of the Amazon Fire tablet and the Mac OS X Lion—it seems fair to assume that server activity will increase, but the degree to which such increases in power consumption are acceptable, economically or environmentally, must be open to question.
That has opened a window of opportunity for ARM. While the enterprise market is demanding in terms of performance, it is also relatively conservative at its heart. IT managers of huge, complex systems are wary of major architectural changes unless they are forced upon them. Change is risk, and in a world where the term “mission critical” is more than a cliché, risk must be mitigated to the greatest degree.
Now, however, the burgeoning demands being placed on server farms already are being added to those likely to spring from consumer devices and productivity-focused tablets pulling from them also. And that is a scenario that could apply within an individual company and its staff network. Throw consumers into the mix, and the likelihood of a further hugely costly ramp in power consumption becomes clear. All that plays to ARM’s strengths.
One other important point here is commoditization. Server chips attract far chunkier margins—some reports have put these as high as 67%—than either PC processors or ARM-based smartphone chips.
By contrast, shortly before putting the v8 on its roadmap, ARM also announced the Cortex A7 MPCore processor and introduced its concept of big.LITTLE processing. By combining the low-power focus of the existing Cortex A8 with high-performance features from the Cortex A15, ARM has assembled a clever combination.
But the product’s focus is largely on “sub-$100 entry-level smartphones.” There’s demand for this stuff alright, and not just in emerging markets. But the inherent price sensitivities are obvious.
Certainly, ARM’s recent relationship with analysts has been marked by plaudits for its success in mobile communications but also warnings about the perennially aggressive shrinkage in margins for that market. And there is also a long-standing requirement set upon ARM to prove itself beyond that space—something it has already addressed with a successful foray into microcontrollers and which now will also play out in servers and elsewhere with v8.
Having addressed a large part of the commercial background to the move to 64bit, let’s now go under the hood.
First look
Last year, ARM introduced the Large Physical Address Extension (LPAE) to translate the 32bit virtual addresses within its v7 architecture into 40bit physical addresses. However, the memory limit for that architecture remained 4GByte, insufficient for the more computationally complex software that runs on servers, particularly for database management. The v8 now fills that gap.
For the launch, the detail provided is mainly intended for OS and compiler companies as well as those providing tool support to hardware designers. As such, it focuses on the two execution states, AArch64 and an enhanced AArch32. The AArch64 execution state introduces a new instruction set, A64. Meanwhile, key features of the v7 architecture are maintained or extended in the v8 architecture.
In a separate ARM TechCon presentation from Mike Muller’s launch keynote, ARM fellow Richard Grisenthwaite went into some more detail as to how this will play out.
In addition to A64, headline features for the AArch64 state include: revised exception handling for exceptions in the AArch64 state, with fewer banked registers and modes; support for the same architectural capabilities as in ARMv7, includingTrustZone, virtualization and NEON advanced SIMD; and a memory translation system based on the existing LPAE table format.
Noting that work on the 64bit version has been under way since 2007, Grisenthwaite said that last year’s LPAE format “was designed to be easily extendable to AArch64-bit” and that the new technology features up to 48bit of virtual address space from a translation table base register.
Instructions in A64 are 32bit with a clean decode table based on 5bit register specifiers. The semantics are broadly the same as in AArch32 and changes have been made “only where there is a compelling reason.”
Some 31 general purpose registers are accessible at all times, with a view to a balance between performance and energy. The general purpose registers are 64bits wide, with no banking and neither the stack pointer nor the PC is one of them. An additional dedicated zero register is available for most instructions.
There are obviously differences between AArch64 and AArch32, although much has been done to preserve compatibility and scalability. Here are some of the key points.
There are necessarily new instructions to support 64bit operands, but most instructions can have 32bit or 64bit arguments. Addresses are assumed to be 64bits in size. The primary target data models are LP64 and LLP64, respectively the models used in Unix/Unix-based systems and in Windows. Meanwhile, there are far fewer conditional instructions than in AArch32, and there are no arbitrary length load/store multiple instructions.
Finally, here, Grisenthwaite’s paper set out some details of the A64 Advanced SIMD and floating point (PD) instruction set. It is semantically similar to A32: advanced SIMD shares the floating-point register file as in AArch32. A64 then provides three major functional enhancements:
- 1. There are more 128bit registers—32x128bit wide registers, and registers can be viewed as 64bit wide.
- 2. Advanced SIMD supports double-precision floating-point execution; and
- 3. Advanced SIMD support, full IEEE754 execution, including rounding-modes, denorms, and NaN handling.
The register packing model in A64 is different from that in A32, so the 64bit register view fits in the bottom of the 128-bit registers. In line with support for the current IEEE754-2008 standard for floating point arithmetic, there are some FP instructions (e.g., MaxNum/MinNum instructions, float-to-integer conversions with RoundTiesAway).
Changes between AArch32 and AArch64 occur on exception/exception return only. The increasing exception level cannot decrease register width (or vice versa) and there is no branch and link between AArch32 and AArch64. AArch32 applications are allowed under the AArch64 OS Kernel and also alongside AArch64 applications. An AArch32 guest OS will run under AArch64 Hypervisor and alongside an AArch64 guest OS.
Grisenthwaite’s entire introductory description of features underpinning the roll-out of v8 to the ARM ecosystem can be downloaded.
ARM and servers today
Some preparatory work has already taken place with selected partners. The ARM compiler and Fast Models with ARMv8 support have been distributed and, as noted, work has begun on support for a range of open source operating systems, as well as—it is reasonable to assume—for Windows. A number of applications and third-party tools are also in development.
According to Muller, we can expect the full framework for v8 implementations next year and products should begin to appear over the 2013-2104 timeframe. Nevertheless, the first implementation has already been announced.
Applied Micro has unveiled a demonstration based on an Xilinx Virtex6 FPGA running its Server SoC consisting of an ARM-64 CPU complex, coherent CPU fabric, high-performance I/O network, memory subsystem and a fully functional SoC subsystem.
The work is paving the way for Applied’s X-Gene server-on-a-chip family, which the company says, will be scalable from 2 to 128 cores running at 3.0GHz with power consumption of just 2W per core.
“The current growth trajectory of data centers, driven by the viral explosion of social media and cloud computing applications, will continue to accelerate,” said Dr. Paramesh Gopi, Applied’s president and CEO. “In offering the world’s first 64bit ARM architecture processor, we harmonize the network with cloud computing and environmental responsibility. Our next-generation of multicore SoCs will bring in a new era of energy-efficient performance that doesn’t break the bank on a limited power supply.”
Applied’s plan, running slightly ahead of Muller’s timetable, is to start offering customer sampling on a TSMC-produced v8 device in the second half of next year. Meanwhile, the Barcelona Supercomputing Center is also part of the push to take ARM into the high-performance market, in conjunction with Nvidia. It showed a hybrid system, Mont-Blanc, which combines ARM-based Nvidia Tegra CPU chips with GPUs based on Nvidia’s Cuda technology, at a supercomputing conference in Seattle this November. The objective of this work, much like Applied’s, places great emphasis on energy savings.
“In most current systems, CPUs alone consume the lion’s share of the energy, often 40 percent or more,” said Alex Ramirez, leader of the Mont-Blanc Project. “By comparison, the Mont-Blanc architecture will rely on energy-efficient compute accelerators and ARM processors used in embedded and mobile devices to achieve a four-to-10-times increase in energy-efficiency by 2014.”
Mont-Blanc’s use of the existing v7 architecture makes it more of a pathfinder than a product, but it remains very much a declaration of intent.
The battle ahead
However, one must note that ARM is not alone in pursuing low-power options for the high-performance market. It is, excuse the pun, a hot button issue throughout the server and enterprise business.
“Collectively, data centers around the world consume nearly 1.5 percent of total electricity production and almost $44.5B a year is spent on powering the servers in these data centers,” said Linley Gwennap, principal analyst from Linley Group. “Looking at the growth projections for data center usage and the future of power generation growth, this trajectory is unsustainable. A new paradigm for developing data centers based on energy efficiency will certainly help make data centers scale realistically with future demand growth.”
For example, in its Redstone low-energy server design project, HP plans to use and evaluate both ARM-based and Intel x86-based chips. AMD, which has had a tough time in servers of late, has put low-power on the agenda as part of its recent restructuring, unveiling its Opteron 3000 for the micro-server market as well as promising enhancements to its mainstream devices.
ARM’s other challenge will be getting silicon vendors to adopt the new architecture. It has had success in wooing players in the microcontroller market already. And it is the go-to brand for low power. But x86 is powerful here and the challenge in general computing is arguably greater than “evolving up” from mobile phones, particularly since while ARM-based 64bit-based PCs can be expected, the company essentially wants to leapfrog that heavily commoditized market.
Hence the focus on making the unveiling an architectural rather than product announcement, notwithstanding the innovative play from Applied Micro. Server players will want everything in place before they jump—ARM, though, has an existing infrastructure that can deliver the necessary components.