Many IoT applications have a very strict energy budget. SoC designers targeting the IoT have to trade off providing the features that the market demands with the power budget the applications demand. What are their options?
One way forward is to integrate RISC and DSP functions in one core. IoT devices take in sensor data, operate on it and communicate the results over a network. RISC processors are good for setting up communication channels and transferring data, but may not be as efficient at processing sensor data as a DSP. Key tasks in always-on environments, such as voice triggering, voice control, speech playback, and inertial sensor processing, can use DSP instructions for filtering, Fast Fourier Transform (FFT), and interpolation at a lower energy cost than RISC implementations.
You could use separate processors for each function, but this adds cost and complexity to the system and its development and debug environments. This can be avoided by using a core that has both functions, such as the DesignWare ARC EMxD processor family, which integrates a DSP engine, running the ARCv2DSP instruction set, with ARC configurable processor cores (Figure 1).
The ARC EM DSP processors can be configured to balance the DSP and RISC performance needed for the application with the required power and area efficiency. For example, the ARC EM5D and EM7D suit applications needing around 50% DSP processing and the EM9D and EM11D, with support for XY memory, suit more DSP-intensive applications. Designers can add custom instructions using ARC Processor EXtension (APEX) technology, to support energy-efficient hardware accelerators.
Figure 1 ARC EMxD block diagram (Source: Synopsys)
Another energy saving step is to find ways to access memory more efficiently. A typical DSP MAC operation in a RISC + DSP processor loads data from memory and then does a MAC operation on the operands. This approach is limited to a maximum throughput of 1/3 MAC operation per cycle, as the instruction sequence consists of two data moves through load instructions, followed by the MAC operation (Figure 2).
Figure 2 DSP MAC operation in a RISC + DSP architecture (Source: Synopsys)
DSP applications that need more throughput can use an XY memory-based system, which has multiple memory banks and automated address generation units (AGUs) with pointers and update registers. The AGUs are built into the instruction pipeline, and allow one instruction to execute three data moves, a MAC operation and three address-pointer updates. Multiple address-pointer update modes can be supported. This enables an effective throughput of one MAC operation per cycle (Figure 3). An XY memory system also cuts code size, as there is no need for separate load and increment instructions.
Figure 3 DSP MAC operation in a RISC+DSP architecture with XY memory (Source: Synopsys)
Using XY memory also cuts energy use, since fewer clock cycles are needed for the same functions, especially when they are tailored to a RISC + DSP architecture that allows concurrent accesses for both RISC and DSP.
Demand for increased performance in IoT applications is driving a trend towards building 32bit processor, bus-based embedded systems, instead of tightly coupled, 8bit microcontroller based systems, which usually increases power consumption, area and cost. It’s possible to achieve high performance at lower energy and cost by using tightly coupled extensions to a 32bit embedded processor architecture to do away with the need for the power-hungry bus infrastructure of a full 32bit implementation. The processor can access memories and peripheral registers directly, cutting latency and clock frequency, which cuts the energy needed to perform an equivalent function.
Figure 4 compares a bus-based processor subsystem to a tightly coupled system processing sensor data. The processor core accesses the auxiliary registers in one cycle instead of at least four for the peripheral registers in a bus-based system.
Figure 4 Energy savings for processing sensor data in a tightly coupled system (Source: Synopsys)
Another way to cut energy use is through direct memory access (DMA), which enables the peripherals to move data without involving the CPU. To ensure an area-efficient system, the DMA has to be optimized for the processor and application. Combining DMA with multibank memory saves even more energy, as the internal DMA moves data in and out of XY memory without affecting the processor pipeline.
Synopsys’ µDMA option for ARC EM processors only includes the features needed for IoT applications. The µDMA controller enables lower energy operation by offering the option to put the EM core to sleep while the µDMA moves data, only waking the core when it’s needed. Multiple sleep modes are available to help designers achieve the lowest possible power.
The security requirements of the IoT demand the use of complex algorithms that add complexity to a system that is already tight on its power and area budget. One way to address this is to have SoC designers use APEX technology to develop processor extensions that accelerate common cryptographic algorithms so they take less time, memory, and energy to execute.
Designing IoT devices demands a set of complex trade-offs to get the right balance of functionality, energy consumption and performance for the target application. Working with a processor family that offers numerous options for tuning the architecture, the way memory is addressed and even extending the processor with custom instructions and functions can help SoC designers achieve their IoT design goals. A scalable processor, such as the ARC EM family, also enables work done for the current generation of products to be quickly re-used in future products.