Cadence Design Systems has increased the throughput of its vision-oriented DSP family to cater for the growing number of applications that use deep-learning and convolutional neural networks.
The Tensilica Vision P6 DSP quadruples multiply-accumulate (MAC) performance compared to the previous generation Vision P5. The design team also changed the instruction slot rules to better support the types of code encountered in CNN applications. In addition, the DSP uses on-the-fly data compression to sharply reduce the memory footprint and bandwidth requirements when processing fully connected layers that are commonly needed for deep-learning networks.
The core pipeline is fixed point, which will mean in many cases converting weights from a floating-point implementation used for training, which will generally be developed on a desktop or server machine.
There is an option to add a 32-way SIMD vector floating-point unit that supports the IEEE half precision standard (FP16). Floating-point performance capability is double that of the Vision P5 DSP. The existing architecture supports scatter-gather memory accesses which tend to be useful in CNN implementations.