Almost 21 years ago, Intel sold its tiny flash-based FPGA business to Altera. Now the processor giant is buying back into FPGAs, through an agreed $16.7bn all-cash purchase of Altera in its entirety. Intel aims to monolithically integrate Xeon processors and FPGAs for servers and to do the same for Atoms in IoT applications.
Intel CEO Brian Krzanich said in a conference call with analysts to announce the deal that the company “sees opportunities to combine Xeon processors and FPGAs to significantly increase performance and through integration reduce cost”. Using the combo parts, customers would “address workloads in creative new ways. In IoT, integrating Atom with FPGAs will allow us to pursue segments previously served by ASSPs”.
Krzanich pointed to automotive driver-assistance systems as possible applications of the Atom-based devices. But what those have in common with the Xeon-based parts is that both could end up running similar neural network-based applications. He pointed to the ability for FPGAs to switch between providing accelerators for encryption and facial recognition using the types of deep-learning techniques, based on convolutional neural networks, that currently power a number of cloud-based search-engine systems.
Although Intel is already working with Altera on copackaged processor-FPGA combinations, Krzanich claimed the acquisition is a necessary step: “To get the maximum footprint and cost reductions, you have to integrate the two companies. That’s why we wanted to move them inside.”
Krzanich said Altera would be an independent unit within Intel and would continue to support its ARM-based devices although engineers from both teams would be expected to work closely together on combo products. He said copackaged products would ship in the second half of 2016 with full integration following.
“By 2020, one third of cloud-server nodes may use these FPGA-processors,” Krzanich said.
Companies such as Baidu and Microsoft have publicly said they have switched to FPGA-based deep-learning systems in favor of GPU implementations – on which the core algorithms were optimized over the decade since Geoffrey Hinton and Ruslan Salakhutdinov from the University of Toronto published the basis for the technology.
At Hot Chips 26 last year, Baidu senior architect Jian Ouyang said, although individual GPUs offer better peak computational performance for neural networks, their FPGA implementation consumed less power for the same level of performance. That allowed the FPGA accelerator to be mounted on a server blade, and powered solely from PCI Express.
“With the FPGA, we don’t have to modify the server design and environment, so it is easy to deploy on a large scale,” Ouyang said, adding that the company uses reconfigurability to deploy different algorithms, generally within 10µs.
Server FPGA choices
For its implementation, Baidu uses Kintex-7 devices from Xilinx. But Microsoft in its search-acceleration projects for server-based FPGA acceleration opted to use Altera’s Stratix V. Although Microsoft sees hardware specialization as inevitable in order to bring server power down on cloud applications, the company wanted to use the flexibility of FPGAs despite their overall lower performance on a given application compared to fully custom silicon.
Although FPGAs currently provide a good fit for deep-learning applications, others are sceptical. At the recent Mentor U2U conference Qualcomm engineering vice president Karim Arabi said a custom neuroprocessor would ultimately be a superior choice for neuroprocessing and similar applications.
“The use of programmable logic show how inefficient existing architectures are for deep learning,” Arabi claimed. “That can’t continue. It works for low volume. But we need a new class of core – a neuroprocessor – that is a hundred times smaller than the FPGA and five times more power efficient.”
However, a further reason for buying Altera comes in the company’s early support for OpenCL as a way of developing hardware algorithms that can run on FPGAs instead of conventional HDLs. The company had early successes in high-performance financial trading systems before the banking crisis of 2008 and has rebuilt a base there since in an environment where algorithms can change rapidly, with the help of OpenCL.
For Intel, the acquisition provides a way of fending off the attack from ARM-based server processors based on customizable acceleration. The current Xilinx Zynq devices, as well as those from Altera, are aimed primarily at embedded processing and Intel’s server competitors do not currently have licences for FPGA technology. However, some startups are attempting to enter the market for programmable-logic IP cores despite the poor success of predecessors.