Enabling embedded multicore systems with multiple OSes and critical goals

By Colin Walls |  No Comments  |  Posted: July 16, 2019
Topics/Categories: Embedded - Architecture & Design, - Embedded Topics  |  Tags: , , , , , , , , , ,  | Organizations:

Embedded multicore systems require engineers to make choices around the hardware and software architectures, approaches to certification and more. This is a guide to the trade-offs involved and how to best leverage your options.

The implementation of embedded multicore systems is becoming increasingly common. The decision to realize a design using multiple processors may be influenced by a number of factors. Broadly stated these are technical goals, time to market, and target design and production costs. Using multicore in a design requires you to make a number of key decisions. As with most embedded systems, these largely fall under two headings: hardware and software.

Multicore hardware

When referring to multicore hardware, we are usually thinking in terms of a chip that includes multiple CPUs. Of course, there will very likely be other features on the system-on-a-chip (e.g., memory, CPU interconnect, peripheral devices, etc). The processors may be general-purpose CPUs, or they might be specialized cores like GPUs and DSPs.

Broadly speaking, there are three embedded multicore architectures:

  • Homogeneous multicore: all the cores are identical.
  • Heterogeneous multicore: all the cores are different.
  • Hybrid: multiple identical cores, but some different. This may be a selection of regular CPUs and specialized cores. It could also be pairs of CPUs, offering a low- and high-power option for each.

Although multicore systems are mostly constructed using a single chip – a multicore SoC – almost every detail of how such a system is implemented is the same as for where there are multiple CPU chips on a board or even multiple boards in a system.

Multicore software

The software architecture of an embedded multicore system represents a much more complex set of options. At the highest level, there are broadly three options:

  • Symmetric multi-processing (SMP)
  • Asymmetric multi-processing (AMP)
  • A hybrid of the two above (AMP+SMP)

These options are largely a description of the operating system (OS) utilization.

Symmetric multi-processing

In an SMP system, a single instance of an OS runs across all the cores. This needs to be a special (‘SMP’) version of the OS and all the cores must be identical (i.e., it must be a homogeneous multicore device).

Figure 1 shows the Nucleus real-time OS (RTOS) deployed in its SMP version across a number of CPUs.

Figure 1. SMP version of Nucleus RTOS deployed across multiple CPUs (Mentor)

Figure 1. SMP version of Nucleus RTOS deployed across multiple CPUs (Mentor)

The goal of an SMP system is simply to provide more CPU power. This is not a new idea; mainframe computers running an SMP OS were pioneered in the late 1960s. Modern desktop computers commonly feature this architecture, implemented using between two and 28 cores.

For embedded multicore systems, desktop-derived OSes – typically Linux – are available in an SMP variant. A number of RTOS products also have support for SMP available.

No special control or inter-CPU communications software is needed; this is handled by the standard API of the OS. Similarly, the tasks are distributed across the cores by the OS. An embedded SMP OS will most likely provide facilities to tune core allocation for the specific application (e.g., you may be able to lock a particular task to a specific core, if that task requires guaranteed continuous CPU access).

Asymmetric multi-processing

AMP is in many ways easier to understand than SMP: Each CPU has its own OS. The OS may be an RTOS, a non-real-time OS or it could be a ‘bare metal’ (i.e., no OS at all) implementation. An AMP system can be built on either homogeneous or heterogeneous multicore hardware. As each OS is independent, provision must be made for control and inter-CPU communication.

In Figure 2, Linux is running on one core and Nucleus RTOS (in its normal, non-SMP variant) on another.

Figure 2. In an AMP system, each CPU has its own OS (Mentor) - Embedded Multicore Feature

Figure 2. In an AMP system, each CPU has its own OS (Mentor)

Although conceptually simpler, AMP systems offer a very wide range of possibilities and facilitate the flexible design of a variety of types of system. As a result, AMP is a much more common choice for embedded systems than SMP. A more detailed exploration of AMP follows below.


For some applications, a derivative of the AMP architecture can make sense, where some (identical) CPUs are clustered together and run an SMP OS and other CPUs run their own OSes. This is a hybrid system; essentially an AMP system where one or more CPUs are replaced by SMP clusters.

AMP for embedded multicore systems

As soon as the decision is taken to implement an AMP system of any significant complexity, three key issues arise:

  • Inter-CPU communications. How can a task on one CPU/OS send data to or synchronize with a task on another CPU/OS?
  • Inter-CPU safety. How is each CPU protected from interference by another (which may be the result of a malfunction or hacking)?
  • Boot order. Which CPU starts first and how do the others follow? This can be very important when considering the initialization of shared data and peripherals and to ensure that synchronization does not go awry.

System management

There are multiple options for the management of an AMP system. The choice you make will vary according to the application. Broadly stated, it involves a trade-off between flexibility, security and overheads. The results of that trade-off will favor the implementation of an unsupervised or supervised system.


An unsupervised embedded multicore system uses some kind of AMP framework to implement system control and inter-CPU communications. Typically, this framework is based on the OpenAMP standard (e.g., the Mentor Embedded Multicore Framework (MEMF)) and can be implemented on lower power CPUs or even bare metal.

A straightforward system might have an OS for each CPU, integrated with the multicore framework (Figure 3).

Figure 3. A basic ‘unsupervised’ system (Mentor) - Embedded Multicore Feature

Figure 3. A basic ‘unsupervised’ system (Mentor)

In a more complex system, a number of CPUs may be clustered using an SMP OS, but also using the framework for overall system management (Figure 4).

Figure 4. A more complex ‘unsupervised’ system (Mentor) - Embedded Multicore Feature

Figure 4. A more complex ‘unsupervised’ system (Mentor).


A supervised embedded multicore system employs a hypervisor instead of a framework to implement control and inter-CPU communication. This offers more flexibility and control and a higher level of security (Figure 5).

Figure 5. A hypervisor replaces a framework in a ‘supervised’ system (Mentor) - Embedded Multicore feature

Figure 5. A hypervisor replaces a framework in a ‘supervised’ system (Mentor)

Unlike a framework, a hypervisor requires a more powerful CPU and the support of an OS.


It is possible to build a hybrid embedded multicore system that is part-supervised and part-unsupervised. The supervised part needs a hypervisor but also needs a framework to communicate with the unsupervised sub-system (Figure 6).

Figure 6. A hybrid system needs both a hypervisor and a framework (Mentor) - Embedded Multicore feature

Figure 6. A hybrid system needs both a hypervisor and a framework (Mentor)

System implementation

We have looked in some detail at how an AMP system may be built, but there remains the question of why various choices are taken. Unlike with SMP, where the motivation is simple (more CPU power), the implementation of a system using AMP can be driven by a couple of the possible characteristics of the application: mixed time domain and/or mixed criticality.

Mixed time domain

The most common reason for choosing to design an AMP system is that the application has various components that deal with time in different ways. There are typically three possibilities:

  • Parts of the system are real time. They need to be predictable and responsive to external events in a timely fashion. They are most likely to be implemented using an RTOS.
  • Parts of the system have no precise time constraints. That is to say, they are not real time. This might include functionality such as the user interface, where the exact speed is unimportant. The implementation of these parts can take advantage of the opportunities offered by Linux.
  • Parts of the system may need to utilize the maximum amount of CPU power with no overheads. In this instance, it is likely that no OS could be tolerated (i.e., the code can be implemented on bare metal).
Figure 7. Three time factors can influence an AMP implementation (Mentor) - Embedded Multicore feature

Figure 7. Three time factors can influence an AMP implementation (Mentor)

Mixed criticality

A less common application where an AMP design makes sense is for a critical system. This is a context when a novel design approach can reap great benefits. Such systems include secure systems (e.g., banking, medical, etc) and safety critical systems (e.g., automotive, mil/aero, medical, industrial, etc).

The common feature of such systems is that authorities require that they satisfy some kind of certification before they are marketed or deployed. Although the certification processes differ from one industry to another, these procedures are always expensive and time consuming. Anything that can reduce their cost and time is a boon.

The cost/time of certification is significantly affected by the volume of code. So, minimizing code size is very helpful. This also affects the choice of OS. A small RTOS, with source code readily available, is obviously an attractive option. Typically, it is not possible to have an OS certified alone, as the whole application must be subject to the process. However, some vendors offer a ‘pre-certification’ package to ease the process. Also, it is clearly sensible to select an OS that has a solid track record of being certified in the specific application area.

If the system is built as an AMP design, the application of subsystem certification can really help. Consider the moderately complex design in Figure 8.

Figure 8. Sample design illustrating a partial certification scenario (Mentor) - Embedded Multicore feature

Figure 8. Sample design illustrating a partial certification scenario (Mentor)

This design features a mixture of OSes, including two microcontrollers running a certifiable SMP RTOS. It is only this latter part that really needs to be certified, as that is where the critical application code is executed. The remaining CPUs perform other, non-critical functions. It is possible to certify just this subsystem, so long as there is an approved barrier – termed ‘a zone of trust’ – between the certified and uncertified parts of the system.

Recognizing that a system exhibits mixed criticality, and that subsystem certification may be employed, has numerous benefits:

  • You can select each OS in the system according to an appropriate mixture of price and functionality.
  • Linux becomes a viable option.
  • Existing and other open-source IP may be reused.

The bottom line is a reduction in costs and a faster time to market.


To summarize with a few overarching observations:

  • Embedded multicore designs are becoming very common.
  • There are multiple hardware options: homogeneous and heterogeneous.
  • There are several software architecture options: chiefly SMP and AMP.
  • Developers of multicore systems face many new challenges.
  • An AMP design may be very beneficial for applications that feature mixed time domains and mixed criticality.

A recent webinar also covered this topic and a recording, including the Q&A, may be found here.

Comments are closed.


Synopsys Cadence Design Systems Siemens EDA
View All Sponsors