Ceva's latest iteration of its XC architecture aims at the intensive DSP required for 5G basestations.
Emmanuel Gresset, business development director at Ceva, said the XC12 core provides a "major increase in terms of horsepower. We are getting very close to one tera-ops per second". The core is designed to run at up to 1.8GHz on a 10nm process and the company expects multiple instances of the processor to be used in basestations that need multiple-in, multiple-out (MIMO) and channel-aggregation capabilities.
Ceva has added a number of instructions to the core to deal with the algorithms used by 5G applications as well as increasing the level of floating-point precision used for some parallelized operations. "We need division and square root for inversion, so we have added dedicated instructions.
"5G is very, very challenging. To handle the matrices needed for equalization and beamforming, especially with inversion operations, you need higher-precision arithmetic," Gresset said.
As well as higher precision for some instructions, the width of the vector-based MAC unit has been doubled to increase throughput. For the target applications, Gresset claimed the XC12 uses four times fewer cycles than its predecessor, the XC4500.
As the control flow of the software needed for handling 5G is more complex, the architects have used a more sophisticated branch prediction unit, using dynamic rather than static prediction together with a branch-target buffer. "We provide native support for all C operators. And the instruction set is entirely orthogonal to make it compiler friendly," Gresset claimed. The scalar units used for handling control code still have dual-MAC engines so that they can be used for handling the regular channels measurements needed by LTE and 5G-NR, allowing the vector MAC units to be used for the channel data.
As basestations supporting massive-MIMO will need multiple processors to handle the throughput, Ceva has built a direct memory interface between the cores that runs in parallel to main the AXI-based interconnect. The direct memory channels allow each processor to write directly into the local memory/cache of its neighbors. "This helps address the ultra-low latency requirement of 5G," Gresset said. "On a 10nm process, you are going to see multiple clusters of XC12s in a device, and multiple devices in a macrocell. It is scaling to an incredible level."