Deep learning offers the next major opportunity for specialist processors, Qualcomm engineering vice president Karim Arabi claimed in his keynote at Mentor Graphics’ U2U conference in San Jose (April 21).
Arabi said the design of embedded SoCs is being driven by the need for always-on processing of the surrounding environment and deal with real-world sensor inputs.
“New architectures will emerge. You will need to repartition your workload. Some of the workload will have to go to the cloud, so you will need a repartitioning of the workload between mobile and cloud devices. Overall, it will drive a need for high performance at low cost.”
Search engine advocates
Some of the necessary image and audio processing will need to be performed on the embedded devices themselves and this will encompass the deep-learning, neural-network technologies currently being exploited by large search-engine systems such as those used by Google, Microsoft and Baidu.
“In applications such as image recognition, machines are starting to beat humans in terms of accuracy,” Arabi claimed.
“Deep machine learning is a workload that is growing in data centers. At some point the majority of the [data center] workload will be deep learning. Today, there is no specialized hardware in the data center for this, but there will be. They will be cores similar to GPUs that do deep learning algorithms as part of their processing.
“In automotive, you will see innovation in ADAS and autopilots. These will be thanks to innovation in mobile computing and deep learning,” Arabi said, pointing also to applications in healthcare and robotics. “The sooner we embrace this, the most successful we will be.”
“Whoever is in the semiconductor industry has to pay close attention to this happening,” said Arabi.
Switch from FPGA
Although some of the data-center implementations of deep learning have switched from GPU-based computing, which made large-scale applications feasible over the past five years, to programmable logic, Arabi said future deep-learning processors will be created specifically for the purpose.
“The use of programmable logic show how inefficient existing architectures are for deep learning,” Arabi claimed. “That can’t continue. It works for low volume. But we need a new class of core – a neuroprocessor – that is a hundred times smaller than the FPGA and five times more power efficient.”
Drawing analogies between deep learning and the brain structure particularly in the way in which processing and memory are intermixed in the biological mind, Arabi said the arrival of new non-volatile memories such as MRAM will help reshape computer architecture to suit these new applications. “With emerging memories, you have opportunities to disrupt the relationship between memory and CPU. We see in-memory compute emerging, especially with big data. Memory will become smarter.
“The next decade will be quite active with all these new technologies,” said Arabi.