Ceva shares weights for lower DNN overhead
Ceva has developed a further iteration of its neural-network architecture with a core for embedded systems that uses compression techniques to reduce the memory overhead of inferencing. The NeuPro-S is also supported by a compiler intended to make it easier to developed hybrid deep neural network (DNN) pipelines better tuned to the needs of embedded applications.
With a nod to potential applications in self-driving vehicles and ADAS, Ceva launched the core at the Belgium-based AutoSens conference but also expects the NeuPro-S to be used in smartphones, robots and cameras. The main DNN engine is the NPS series of processor cores, with up to 4000 MAC units, each 8bit wide.
A compression scheme is used to straddle the various weight-optimization strategies now being explored for inferencing engines. Similar to embedded cores from other IP suppliers, the compression removes the overhead of zero weights but goes further by making it possible for weight calculations in a group to share the same weight as well as for reduced resolution down to 2bit. With extensive weight sharing, the compression supports techniques such as binary or ternary operation. With the weight sharing and compression, Ceva expects it should be possible to reduce the burden of memory accesses in high-throughput DNN pipelines.
The NPS processor is designed for the convolutional layers that dominate most pipelines. A compiler and API supports the development of custom layers within a pipeline that can be allocated to the DNN engine itself or run on a version of Ceva’s SIMD DSP processors: in this case the XM6 vision-oriented core.
The XM6 can also be used to run software and libraries such as the Ceva-SLAM software development kit for 3D mapping, the Ceva-CV and Ceva-VX software libraries for computer vision development, and other software for image dewarping and stitching.