Introduction to the Compute Express Link (CXL) device types

By Gary Ruggles | No Comments | Posted: September 13, 2019
Topics/Categories: Embedded - Architecture & Design, IP - Selection | Tags: Compute Express Link, CXL, PCIe, PCIe 5.0 | Organizations: Synopsys

A look at the device types defined by the Compute Express Link (CXL) standard.

Compute Express Link (CXL) is an open interconnect standard for enabling efficient, coherent memory accesses between a host, such as a CPU, and a device, such as a hardware accelerator, that is handling an intensive workload.

A consortium to enable this new standard was recently launched at the same time as CXL 1.0, the first version of its interface specification, was released.

CXL is expected to be implemented in heterogenous computing systems that include hardware accelerators that are addressing topics in artificial intelligence, machine learning, and other specialist tasks.

This article focuses on the CXL device types.

CXL uses three protocols: CXL.io, CXL.cache, and CXL.mem. The CXL.io protocol is used for initialization and link-up, so it must be supported by all CXL devices, because if the CXL.io protocol goes down the link cannot operate.

Different combinations of the other two protocols result in three unique CXL device types that are defined and can be supported by the CXL standard. Figure 1 shows the three defined CXL device types along with their corresponding protocols, typical applications, and the types of memory access supported.

Figure 1 Three defined CXL device types (Source: Synopsys)

For Type 2 devices, CXL has defined two coherency biases that govern how CXL processes the coherent data between memory attached to the host and the device. The bias modes are referred to as host bias and device bias, and the operating mode can change as needed to optimize performance for a given task during operation of the link.

When a Type 2 device (such as an accelerator) is working on data, in between the time-of-work submission to the host and its subsequent completion, the device-bias mode ensures that the device can access its attached memory directly without having to talk to the host’s coherency engines. This means that the device is guaranteed that the host does not have any line of its memory cached.

This approach gives the device the best possible latency performance, which means that any accelerator doing work will operate mainly in device-bias mode. The host can still access the memory attached to the device when it’s in device-bias mode, but at lower performance.

The host-bias mode prioritizes coherent access from the host to the memory attached to the device. Host-bias mode is used during work submission, when data is being written from the host to the memory attached to the device, and during work completion when data is being read by the host from the memory attached to the device. In host-bias mode, the memory attached to the device appears to the device just like memory attached to the host. If the device requires access, it is handled by a request to the host.

The choice of bias mode can be controlled using either software or hardware, via the two supported mode-management mechanisms, which are software-assisted and hardware autonomous. An accelerator or other Type 2 device can choose the bias mode, and if neither mode is selected, the system defaults to the host-bias mode such that all accesses to memory attached to devices must be routed through the host. The bias mode can be controlled to a granularity of a 4Kbyte page and is tracked via a bias table implemented within the Type 2 device.

One important feature of the CXL standard is that the coherency protocol is asymmetric. The home caching agent is only present in the host, where it controls the caching of memory and so resolves systemwide coherency issues for any addresses requested by attached CXL devices. This differs from many other coherency protocols, particularly those for CPU-to-CPU connection, which are usually symmetric and so make all the interconnected devices peers.

This approach has some advantages, but a symmetric cache-coherency protocol is more complex, and that complexity must be handled by every device. Different types of device may also have different approaches to coherency, which makes broad industry adoption more challenging. By using an asymmetric approach controlled by the host, different CPU and accelerator devices can join the emerging CXL ecosystem more easily.