ARM targets cache-coherent GPU computing with CoreLink addition

By Chris Edwards | No Comments | Posted: October 29, 2015
Topics/Categories: Blog - Embedded, IP | Tags: Big.Little, GPGPU, heterogeneous processing, network-on-chip, OpenCL | Organizations: Arm

ARM has developed a version of its CoreLink on-chip interconnect IP intended to support systems based on its big.Little processors combinations that need a cache-coherent GPU connection with lower latency and higher peak throughput.

ARM claims the support for GPU coherency in the CoreLink CCI-550 should reduce development costs and time for applications that rely on heterogeneous processing to compute engines more efficiently. The interconnect is intended to support OpenCL 2.0, which provides shared virtual memory features and other programming models that can take advantage of system coherency.

The CCI-550 includes improvements in the microarchitecture to deliver higher peak throughput and quality of service (QoS) enhancements that reduce latency by 20 per cent. SoC designers can configure the number of memory channels, tracker sizes, snoop filter capacity and combine up to six fully coherent processor clusters.

A memory controller intended for use with the interconnect, the CoreLink DMC-500, has been designed to operate with for LPDDR4/3 memories up to LPDDR4-4267. Used together, ARM claims a peak system memory bandwidth of more than 50Gbyte/s.

Mike Demler, senior analyst, The Linley Group, said: “To provide advanced features such as 4K video recording/playback, 120fps cameras, and quad-HD displays, [SoC designers] must integrate heterogeneous CPUs, GPUs, and accelerators into a cache-coherent system while keeping within tight power budgets.”

ARM targets cache-coherent GPU computing with CoreLink addition

Related Posts

Tech Design Forum