Accelerating the implementation of application-specific processors

By Markus Willems and Steve Cox | No Comments | Posted: March 1, 2019
Topics/Categories: Embedded - Architecture & Design, IP - Assembly & Integration, Selection, EDA - Verification | Tags: ASIP, compiler in the loop, instruction set processor, parallelism, synthesis in the loop

Application-specific processors can provide high performance for specialised tasks at low energy cost.

System-on-chip (SoC) designers are implementing an increasing amount of functionality in software, to gain flexibility, mitigate against the uncertainty of supporting evolving standards, and to make it possible for a single chip to serve many end products.

Moving functionality from hardware to software running on a processor comes at a potential cost of lower performance and higher power consumption than optimized hardware. One way to tackle this issue is to use a set of small, task-specific processors instead of one, large general-purpose processor. This approach is used in applications such as image recognition, database queries, and machine learning. Figure 1 shows how mobile SoCs deploy a variety of specialized processors to achieve high system performance while controlling system power.

Figure 1 Specialized processors working together in a mobile SoC (Source: Synopsys)

Many SoC developers turn to IP houses for such specialized processors. If the IP providers have yet to develop such processors for an emerging application, SoC designers have traditionally had to choose between using an existing processor that is “close enough” to the required functionality, or reverting to the inflexibility of fixed-function hardware. A third option now available is to build an application-specific instruction-set processor (ASIP), whose instruction set and architecture is tailored to the needs of the target application. The decision whether to develop an ASIP in-house or use off-the-shelf processor IP is a trade-off between the differentiating advantage the ASIP would bring, and the engineering effort required to design, optimize, verify and program it for the end application. This, in turn, is predicated upon the effort needed to build a reasonably capable toolchain for programming the ASIP.

Justifying the development of an ASIP

An ASIP is a combination of a software-programmable processor and application-specific functional units, optimized for a set of functions. ASIP designers use parallelism and specialization to achieve this optimization, while trying to retain full C programmability.

Parallelism enables designs to run multiple functions at once, and its three main forms can be applied individually or in combination to boost performance. The options are laid out in Figure 2 and described here.

Instruction-level parallelism uses an orthogonal instruction set, as in very long instruction word (VLIW) architectures, or an encoded instruction set (which delivers the operational parallelism needed without the overhead associated with VLIW architectures).

Data-level parallelism implements vector processing, which involves applying one instruction to multiple data items.

Task-level parallelism, as in multicore/multi-threading implementations, enables multiple cooperating algorithms that have different control flows to run alongside each other.

Figure 2 Design options — parallelism (Source: Synopsys)

Specialization enables designers to perform functions with one or a few instructions, by customizing their processor’s pipelines, internal/external memory, register architectures, and connectivity. Designers can also define application-specific data types and interfaces. Figure 3 depicts various forms of specialization.

Figure 3 Design options — specialization (Source: Synopsys)

Developing ASIPs

An ASIP is only worth developing if it can bring useful differentiating advantage within that design’s market window. Designers therefore need to be able to rapidly explore the impact of architectural choices upon their ASIP, by doing three things:

1 – Define benchmarks representative of the end application, to enable a quantitative comparison of the architectures being considered.

A benchmark must have:

A functional specification, describing the application kernels that need to be implemented. Benchmarks are usually represented in C for ease of implementation and architectural independence.
An environment describing the stimuli that exercise the benchmark.
Performance metrics such as power, performance, and target frequency.

2 – Describe a candidate architecture

Designers need a quick and easy way to define a candidate architecture, ideally using a modelling approach that avoids specifying deep implementation details early in the design process.

Designers also need software tools to map benchmark code onto the candidate architectures, and, since it is impractical to develop a new toolchain for each candidate architecture manually, this needs to be automated.

3 – Explore the design space

Design exploration involves evaluating each candidate architecture against the defined benchmarks. There are two main ways of doing this.

Compiler-in-the-loop: Designers need to use a compiler to run benchmarks onto each candidate architecture, rather than trying to use time-consuming and error-prone assembly language. It’s also useful to have a cycle-accurate instruction set simulator (ISS) and a profiler to analyze results. The C compiler, ISS and profiler can be combined with a debugger, assembler, and linker to form a full software development toolkit (SDK).

The SDK should be available early in the design process and quickly retargetable to the various architectural alternatives, to enable efficient design-space exploration.

Synthesis-in-the-loop: It can be useful to quickly analyze the hardware cost and characteristics of a candidate architecture in terms of its operating frequency, area, and power efficiency. To do this, there should be a way to automatically generate synthesizable RTL, and then use synthesis tools to analyze the hardware characteristics of each candidate architecture.

ASIP Designer

ASIP Designer from Synopsys helps automate the creation of ASIPs. It offers retargetable compilation and architectural exploration technology, fast simulation, and integration with implementation flows.

Figure 4 shows how ASIP development is supported within ASIP Designer, and the way in which it integrates into the Synopsys design and verification flows.

Figure 4 ASIP Designer tool flow (Source: Synopsys)

Processor modeling

The ASIP is described using nML, a structured architecture description language that efficiently and concisely describes processor architectures at the same level of abstraction as a programmer’s manual. The language is used to define the structural characteristics of the design (registers, functional units, signals, etc.) and the instruction-set architecture. nML also enables users to describe the cycle- and bit-accurate behavior of the datapaths and I/O interfaces.

SDK generation

In ASIP Designer, an ASIP’s nML description is used as an input to the retargetable SDK (step 1 in Figure 4), which automatically adapts to the defined processor architecture. The SDK includes an optimized C/C++ compiler, assembler/disassembler, linker, cycle-accurate as well as instruction-accurate instruction-set simulator, and the graphical debugger shown in Figure 5.

Figure 5 Components of the software development kit (Source: Synopsys)

It’s possible for the compiler to adapt to the detail of each candidate’s architecture because its compiler optimizations are implemented in a generic way. Other compiler frameworks, such as GNU, need an architecture-specific compiler backend for each candidate. The immediate availability of a compiler enables rapid architectural exploration and iteration (see step 2 in Figure 4).

Having a compiler in the loop also means software authors can provide feedback to the ASIP designer, and that the processor’s dynamic performance can be studied and optimized. Making these kinds of adaptations and trade-offs at this level of abstraction is much more efficient than trying to do it once an RTL description has been generated.

The SDK also makes it possible for end users to program the ASIP once it is implemented in an SoC. The architecture-specific SDK can be made available as a standalone package for such customers.

Hardware generation and verification

Once the design meets its functional requirements, ASIP Designer integrates with Synopsys implementation and verification tools to take the design from its RTL description to tape-out.

ASIP Designer will translate the nML model into fully synthesizable Verilog or VHDL (see step 3 in Figure 4), with full cycle-accurate and bit-accurate control of the hardware. Design and verification tools, such as Synopsys’ Design Compiler and VCS, can then be used to implement the ASIP. For example, Design Compiler can be used to generate a gate-level description that can be used to predict the circuit’s power requirement and area, or use place-and route tools such as IC Compiler to explore the risk of routing congestion.

This “synthesis-in-the-loop” approach enables educated decisions, and avoids surprises later in the design process. Should the design face problems during implementation, developers can go back to the nML description and adjust it to address the issue. Because of the single-source entry in nML, the SDK and RTL will remain in sync.

Verifying the ASIP

There are two aspects to verifying the ASIP.

The first is to verify that the processor model, as described in nML, acts as intended. ASIP Designer helps with this by supporting: confirmation of correct test-case execution as compared to native execution on the designer’s workstation, automatic consistency checks, diagnostic reports analyzing connectivity, hardware conflicts, unused instructions, pipeline hazards, automatic generation of processor specific “one-liner” C programs that check if all units necessary for a compiler are present, and other diagnostics.

The second key aspect is verification of the RTL model, ensuring that the generated RTL implements the processor model correctly. ASIP Designer supports: automatic generation of directed random instruction sequences as assembly code, templates of instruction sequences by generating random values for all required fields, automatic generation of coverage points, and many more. These are provided in SystemVerilog and can be integrated into an overall testbench.

The effort spent on verifying an ASIP depends on how it will be used. The more narrowly defined the ASIP is, the closer its functional verification will resemble that of a fixed-function RTL implementation. The more generic the ASIP, the closer functional verification will become to the effort that a processor IP provider has to make to ensure proper function of its IP in almost any use case.

Conclusion

ASIPs provide a useful mix of hardware efficiency and software flexibility – if they can be designed, verified and programmed quickly enough to meet project requirements.

ASIP Designer can help with this by enabling detailed architectural exploration, compiler- and synthesis-in-the-loop analysis, and straightforward programming through the automated generation of an optimized SDK.

Author

Markus Willems, is a product marketing manager, and Steve Cox is a business development manager, both at Synopsys.

Company info

Synopsys Corporate Headquarters

690 East Middlefield Road

Mountain View, CA 94043

(650) 584-5000

(800) 541-7737

www.synopsys.com

Sign up for more

If this was useful to you, why not make sure you’re getting our regular digests of Tech Design Forum’s technical content? Register and receive our newsletter free.