Co-design underpins infrastructure acceleration at Google
In his keynote at the recent VLSI Symposium, Parthasarathy Ranganathan, vice president and engineering fellow at Google described how acceleration and parallel processing have changed the nature of Google’s services over recent years, pointing to the evolution of the mapping service from a comparatively simple scalable vector map to an immersive tour of cities. Similarly, the photograph storage has acquired AI-assisted tools for enhancement and editing.
“Underpinning all these advances have been improvements in compute,” he said, and at a rate in many cases that outstrips classical silicon’s traditional scaling cadence.
“The leading challenge coming up is the slowing down of Moore’s Law,” said Ranganathan. Though scaling continues, he stressed, the slowing down in cost reductions for compute and storage is having a dramatic effect on viability, and that is coupled with improvements in power efficiency slowing down. Other problems caused by scaling include silent data corruption, where marginal devices cause errors in calculations, and the need to address side-channel attacks such as Spectre. “The fixes for these lead to performance degradations.”
The main answer is to shift more of the compute into accelerators but this calls for the increased use of software-hardware co-design to be effective, Ranganathan argued, adding that the organization has been able to learn from experience on its earlier accelerator projects, ranging from the TPU for machine learning to the VPU used for streamlining YouTube-related operations.
Stack considerations
In designing these accelerators, Ranganathan said it is vital to consider how the compiler and other tools will interact with the underlying hardware in order to extract the most performance and for the design to take account of the way in which, when it comes to data-center applications, each accelerator will form part of a distributed system. “There are challenges over how we handle quality of service and virtualization,” he said.
It will be a software-defined infrastructure, he explained. “We take a system and disaggregate it, to create pools of infrastructure. The design has to be very efficient in terms of how we design the building blocks. We no longer think about hardware but hardware surrounded by software.”
He pointed to a paper by Google engineers presented at ASPLOS in 2022 where the emphasis was on how hardware hooks would be presented to the software. “We made fundamentally different assumptions because we were designed according to a hardware-software co-design.
“The second highlight is that capabilities are significantly more important than efficiencies,” he claimed. Though it is natural to design an accelerator according to Amdahl’s Law and ensure that the most common operations for a particular task receive the most attention, Ranganathan stressed that the reason for designing the accelerator in the first place should focus on what the accelerator will enable.
“Every single accelerator that’s been successful at Google has been successful because it focused on capabilities it has enabled,” he explained, pointing to the additional search modes the company’s core product has been able to add.
Chiplet directions
Part of the emphasis on software issues is beginning to drive Google’s thoughts on using chiplets. The team sees multichip packaging as a way of increasing design velocity, iterating hardware more quickly than the traditional two-year cycle for monolithic devices. This is likely to put more emphasis on accelerating low-level software functions such as managing buffers, compressing data and allocating and copying memory. “One out of three cycles at Google are spent in five functions: I call them the data-center tax,” he said.
Accelerators that offload these functions from core processors and embed them in the communications channels that chiplets use could help improve results when designing with chiplets and make the modularity of these architectures more attractive.
The final component of future systems design lies in AI: “I call it ML for ML: machine learning for Moore’s Law. We have had some significant results,” he said, pointing to papers over the past few years that have focused on using machine learning for place-and-route, verification, and RTL synthesis.
“We can think of using it at the architectural level as well, using machine learning for prefetching, for example. One of every two cycles at Google is spent waiting for the memory hierarchy. If you can prefetch successfully, you can get nice benefits there.
However, he added, “ML is not a panacea: it is important for us to do it thoughtfully and carefully.”