SAME: Memory-saving standard to expand
The scope of the Low-Latency Interface (LLI) developed by the MIPI Alliance is expanding as it heads towards version 2. LLI was originally developed to share large blocks of memory between processors that do necessarily not sit on the same chip .
Philippe Martin of network-on-chip specialist Arteris explained at the Sophia-Antipolis Microelectronics 2012 conference this week the thinking that went behind the initial version of LLI, released last year, and how version 2 will add features for interprocessor communication, system-level power management and for high-bandwidth data.
The changes in version 2 will make LLI software visible. For the first releases, the focus on the protocol was to hide it from software view. Individual processors do not have to know where their memory sits – the hardware interface maps address spaces so that they can reside on an external DRAM array connected to a completely device.
The argument for using LLI is cost. “If you can have a big DRAM shared between various devices – typically the use-case is to share the DRAM between an applications processor and a memory – you can save about $2 for the entire platform. For a volume market like mobile phones, that adds up to millions of dollars,” said Martin.
Although cost improves, designers have to deal with the issue of getting data from an external memory through another chip, possibly crossing several clock domains.
“ARM CPUs typically have to have fast memories,” said Martin. Many of the transactions will be cache-line refills, which puts a greater onus on latency than bandwidth. “Hence the name: low-latency interface. It needs to be on the order of 100ns, similar to what we see from a DRAM. They are mostly short bursts.”
The need to provide a reliable memory bus interface over a slightly unreliable transport – the Serdes-based MIPI standard Mphy – without imposing high protocol overhead or extra command lines meant the use of a credit-based acknowledgement scheme that could react quickly. In practice, this meant the use of a negative-acknowledgement technique which in turn, to avoid incurring long latencies when data needs to be resent pushed the protocol designers to use relatively short 12-symbol packets.
“Now there are some applications that people are using LLI for where they want to shift a lot of data in the shortest possible time,” Martin explained. “That means larger frames. With LLI2 we are going to mix shorter frames for low-latency traffic with longer frames for more bandwidth, mostly for best-effort traffic.”
By supporting best-effort rather than guaranteed delivery, the new protocol avoids incurring a lot of additional overhead. Video and audio data can typically survive the bit errors that may occur on the Mphy lanes.
In the move to version 2, software will get more control with the ability to power elements of the LLI link up and down based on the state of the system. And, as shared memory is where programmers tend to put data to synchronize threads and other forms of interprocessor communication, the revision of the protocol will add transactions to support those. Typically, these transactions guarantee atomic behavior so that a core can guarantee that it was the last to touch a semaphore.
Given the requirement for low overhead, cache coherency is not on the menu, said Martin. “Those kinds of things are extremely complex. It would be extremely difficult to standardize them and the constraints on latency are so much stronger that we don’t think it makes sense [to implement cache coherency].”