Building processors to enable intuitive human-machine interaction
Our relationships with computers are getting closer all the time. Decades ago, computers were mainframes that were jealously guarded by a caste of IT specialists who might be persuaded to run a program and send you the results if it suited them. The advent of the personal computer brought computing to home offices and eventually schools, while the invention of mobile phones put computers in our pockets. Wearable technology is now creating an intimate relationship between people and computers, mediated in some cases by little more than our skins.
User interfaces have evolved too – from punched-card programming and tape drives through green-screen terminals to the windows, icons, menus and pointer (WIMP) approaches of today’s desktops and laptops. Smartphones have brought us touch, swipe and pinch to zoom gestures, and voice assistants such as Siri. The next generation of user interfaces will be more natural and seamless still, using cameras to recognize gestures made in free space, the direction in which we are looking, our levels of attention and moods, and even our body language.
Many of these user interfaces will be embedded in increasingly personal objects, and yet will demand orders of magnitude more computing power to run than, for example, a touchscreen or WIMP interface. That poses a dilemma for processor designers: how to get more performance out of a processor architecture when power constraints mean that it is no long possible to simply crank up the clock speed. The answer for many has been to build multi-core processors made up of a heterogeneous mix of specialized cores, each tuned to a different task. This seems a particularly appropriate way to address the HMI implementation issue, given that many of the core recognition algorithms are really specialized forms of signal processing.
Synopsys is addressing this issue by implementing a superscalar version of the pipeline of the popular HS3x family of processors in five new DesignWare ARC cores, some of which also have hardware support for advanced DSP algorithms. They are designed for use in embedded applications such as advanced human-machine interface (HMI), solid-state drives, wireless baseband, wireless control, home networking, automotive control, multi-channel home audio, industrial control and home automation.
All five cores are available in single-, dual- and quad-core configurations.
Figure 1 An overview of the HS4x architecture and options (Source: Synopsys)
The ARC HS44, HS46 and HS48 processors use the ARCv2 instruction set architecture (ISA). The cores have a dual-issue 10-stage pipeline that supports out-of-order instruction completion, as well as an advanced branch prediction unit and late-stage ALU to improve instruction throughput. This yields a performance of up to 6000 DMIPS per core at 2.5GHz on typical 16nm FinFET processes.
The HS46 and HS48 offer instruction and data caches (of up to 64Kbyte of each) and support for full Level 1 (L1) cache coherency. The HS48 also incorporates up to 8Mbyte of Level 2 (L2) cache and a full-featured memory management unit to support symmetric multiprocessing Linux. The processors are configurable and can also run custom instructions defined through ARC Processor EXtension (APEX) technology interface.
The ARC HS45D and HS47D processors are based on the ARCv2DSP instruction set and are compatible with the ultra-low power ARC EMxD processors, simplifying code migration between the two processor families.
The HS4xD cores support 150 additional DSP instructions, useful for baseband, audio, voice, speech and other signal-processing tasks. The DSP performance is twice that of HS3x family cores, because the processor can do two 32bit loads and a multiply-accumulate instruction in the same cycle. RISC instruction performance is 25% greater than in the HS3 cores, at the cost of a 15% increase in area and power consumption.
The HS4xD cores can simultaneously manage control tasks such as communications stacks and file-system support, and signal-processing tasks such as audio decoding, post-processing and voice-based HMI processing. Various HS4xD-optimized audio/voice codecs and post- processing software are available from Synopsys and third-party partners.
Both the HS4x and HS4xD cores feature support for close coupled memory and direct-mapped peripherals to cut system latency with single-cycle access. They can also be configured with native support for AXI or AHB-Lite bus interfaces, with 32bit, 64bit, or 128bit bus widths.
The cores can also be specified with a hardware integer divider, instructions for 64bit multiply, multiply-accumulate, vector addition and vector subtraction, and a configurable IEEE 754-compliant floating point unit (single- or double-precision, or both).
The ARC HS4x and HS4xD cores can be configured with separately licensed trace capabilities. The RTT option is a full real-time trace and debug unit for the HS and EM families that supports instruction and data trace capabilities. The SmaRT trace option is a small real-time trace and debug unit with support for limited trace capabilities.
Despite the dual-issue pipeline, the software view of the processors remains the same as the previous generations of ARC cores making it very easy for programmers to use the new family.
The MetaWare Development Toolkit features optimized support for the HS4x family’s dual-issue capability. The MetaWare C/C++ compiler profiles the program code and reorders the instructions to maximize instruction execution and performance through the pipeline. This happens automatically without need for intervention from the programmer. To make DSP programming of the HS45D and HS47D easier an optimized library of DSP functions such as FFT and DCT, FIR and IIR filters, as well as vector and matrix math functions and an ITU-T base-ops library for developing voice codecs are available for use with the MetaWare Development Toolkit.
The cores will be available in June 2017.