EEMBC looks into heterogeneous compute
The Embedded Microprocessor Benchmark Consortium (EEMBC) has turned its attention to heterogeneous computing with plans to create a set of benchmarks to test systems that couple conventional processors with special-purpose accelerators on applications such as deep learning.
The group said the benchmark suite will be based on real-world workloads that stress highly parallel applications that include automotive surround view generation, image recognition, and mobile augmented reality.
“Optimal use of heterogeneous architectures implies load balancing of the compute tasks and distribution of data across multiple compute resources and separate fine-tuning for their individual performance profiles. This requires intimate knowledge of the architecture of the individual compute elements and of the heterogeneous architecture as a whole,” said Rafal Malewski, chair of EEMBC’s compute working group and senior graphics engineering manager at NXP Semiconductor.
“EEMBC has set out to create a benchmark that assists in identifying the performance criteria of the heterogeneous compute architecture and in determining the true potential of the architectures for real-world application use cases.”
Jon Peddie, president of Jon Peddie Research. “The competition will be fierce amongst the many software and silicon providers targeting the heterogeneous computing industry, and there will be a great need to help sort this out with real-world benchmarks, such as the one being developed by EEMBC.”
Markus Levy, EEMBC president, added: “To ensure consistency between compute implementations, EEMBC’s compute benchmark’s framework will utilize the popular Khronos OpenCLTM 1.2 Embedded Profile API, which is supported by most vendors providing a heterogeneous architecture. Once the OpenCL reference implementation is validated, the benchmark will be open for vendors to submit platform specific optimizations.”
EEMBC said the working group is fleshing out detailed requirements for the benchmarks that extend down to considerations on how to deal with data-type optimizations. In embedded deep learning applications, for example, a number of accelerator vendors favor the use of fixed-point mathematics during inferencing – which is the most common operation in use – with a small penalty in raw accuracy. Floating-point computation is then used for training that may be performed on offline on a server.
However, GPUs mostly support single-precision floating point and so do not see a benefit from fixed-point conversion. One option for the benchmark may be to use fixed-point implementations for the core and then provide the option for floating-point tests. The mixture of core and optional benchmarks has been a common feature of EEMBC processor benchmarks.