Details of the evolving PCIe 4.0 spec, which brings new features to PCI Express plus the opportunity to shift up to 16GT/s transfer rates
The PCI Express (PCIe) standard has long been used in personal computers, networking and workstations. Its reliability, low power, latency, and the availability of bandwidth options scaling from 2.5GT/s to 16GT/s have also enabled PCIe to spread into storage, cloud-computing, mobile and automotive applications.
The Peripheral Component Interconnect Special Interest Group (PCI-SIG) announced the PCIe 4.0, 16GT/s (Gen4) version of the standard in November 2011, but it was almost two years before work began in earnest. The PCIe 4.0 Draft 0.7 specification has now been released to PCI-SIG members, creating renewed interest from system-on-chip (SoC) designers who want to use the latest version of the standard. The complementary Physical Interface for PCI Express (PIPE) 4.4 specification has also been released by Intel, bringing implementation a step closer.
Developing a new PCIe specification
To understand the importance of the Draft 0.7 release, it is necessary to understand the PCI-SIG specification development process and the history of the PCIe 4.0 releases. There are five main checkpoints in a PCI-SIG specification:
- Draft 0.3 (the concept) may have few details, but outlines the general approach and goals. For PCIe 4.0 this included the 16GT/s signaling rate, re-use of the 128/130 encoding scheme developed for PCI 3.0 8GT/s mode, and the maintenance of full backwards compatibility. It was released in February 2014.
- Draft 0.5 (the first draft) has a complete set of architectural requirements and must fully address the goals set out in the 0.3 draft. It was released in February 2015.
- Draft 0.7 (the complete draft) must define a complete set of functional requirements and methods, and no more functionality can be added to the specification after this release. Before the release of this draft, electrical specifications must have been validated using test silicon. For PCIe 4.0, two independent implementations were provided to PCI-SIG workgroup members, one from Synopsys, and the other from Mellanox. The PCIe 4.0 Draft 0.7 was released on 15 November 2016.
- Draft 0.9 (the final draft) enables PCI-SIG member companies to perform an internal review for intellectual property issues. No functional changes are allowed after this draft.
- 0 (the final release) is the final and definitive specification, and any changes or enhancements will be through errata documentation and Engineering Change Notices, respectively.
Figure 1 The evolution of the key characteristics of PCIe (Source: Synopsys)
Early adopters of a PCIe specification usually start designing with the Draft 0.5 specification, because they can confidently build up their application logic around the new bandwidth definition and even start developing logic to support new protocol features. At the Draft 0.5 stage, however, there is still a strong likelihood of changes in the actual PCIe protocol-layer implementation, so designers responsible for developing these blocks internally may be more hesitant to begin work than those using interface IP from external sources.
What Draft 0.7 brings to PCIe 4.0
Since new functionality cannot be added after the Draft 0.7 release, early adopters should now be more confident to start work. Designers can develop even the lowest levels of the PCIe protocol stack and be fairly secure in the solidity of the specification. There is a risk of a misinterpretation or oversight in the specification forcing a slight change to the details of implementation, but these are uncommon. PCI-SIG members can download the 0.7 Draft from the PCI-SIG website here.
The evolution from PCIe 8GT/s signaling to 16GT/s is similar to that from PCIe 2.5GT/s to 5GT/s – primarily a new speed, negotiated at link initialization. However, in contrast to earlier data rates, getting to PCIe 16GT/s data rates requires a two-stage process. First, the link is brought up to 8GT/s using the familiar four-phase equalization process, then the same four-phase process is repeated while running at the 8GT/s rate to shift up to 16GT/s. This requires adding some arcs to the PCIe link state machine, but re-uses methods well-proven in PCIe 8GT/s. The 128/130 encoding scheme from PCIe 8GT/s is used at PCIe 16GT/s data rates, so designers can re-use most of that logic. Naturally designers need to make some minor changes to the main protocol state machine, the Link Training and Status State Machine, to accommodate the new equalization. A few other minor symbol and test pattern tweaks are specified to ease operation at the higher speed, but overall a PCIe 4.0 16GT/s link looks very similar to an 8GT/s link.
Making the most out of 16GT/s
One concern raised during development of the PCIe 4.0 specification was that certain devices with specific workloads might not be able to fully use the 16GT/s data rate with the existing limits on credits and outstanding transactions. To address this, the Draft 0.7 expanded the Tagfield in the PCIe 4.0 packet header from 8 to 10 bits. One combination of the new bits is reserved to help detect erroneous hierarchy configurations, enabling a total of 768 tags. All devices implementing 16GT/s signaling have to be able to receive 10bit tags, but may choose whether or not to generate them based on their own needs. Therefore, all designers of PCIe 4.0 16GT/s devices will need to expand their received tag-tracking logic to handle the larger tags, but can continue to rely on header credits to throttle the total number of simultaneous requests they must accept.
To support full use of the additional tags, the PCIe 4.0 specification defines a scaling scheme for the flow-control credit mechanism. Devices requiring more credit than previously available can now advertise a scaling factor of 4x or 16x, whereby each numeric credit in the protocol actually represents 4 or 16 credits, respectively. Here again, all devices implementing PCIe 4.0 16GT/s have to support their link partner scaling by 4x or 16x, but are permitted to use 1x scaling for their own credits if desired. Using the new scaling factors, PCIe 3.1’s maximum of 127 header credits can be extended to 508 (using 4x scaling) or 2032 (using 16x scaling) – independently for each Posted (PH), Non-Posted (NPH) or Completion (CPLH) credit type. Likewise, data credits can grow from PCIe 3.1’s 2047 (~32Kbyte) to 8188 (~128Kbyte) or 32,752 (~512Kbyte) using 4x or 16x scaling, respectively, for each Posted (PD), Non-Posted (NPD) or Completion (CPLD) credit type.
Lane margining at the receiver
Probably the most significant item introduced by the 0.7 draft is ‘Lane Margining at the Receiver.’ This feature uses software that runs on the PCIe system board to evaluate how much margin exists in each lane of the PCIe channel – effectively, how close a given lane is to failing to transfer data reliably. The specification defines a set of registers and a basic command set the host software can use to instruct each receiver in a PCIe channel to move its sampling point in time (and optionally voltage) to determine roughly how wide (and optionally how high) the signal eye is at the receiver. A critical distinction is that this feature is intended for use as a system diagnostic/evaluation tool to provide an approximate measurement of the PCIe channel and not a measurement of the receiver. All devices supporting PCIe 4.0 16GT/s must support Lane Margining but use of Lane Margining is not required to run at 16GT/s. Lastly, implementation of this feature in an SoC requires close cooperation between a PCIe 4.0 16GT/s controller and a 16GT/s PHY.
The PIPE 4.4 specification
Fortunately for designers procuring their PCIe 4.0 16GT/s PHYs and controllers from different sources, Intel has incorporated PCIe 4.0 16GT/s operation into version 4.4 of the PIPE specification. The new PCIe 4.0 16GT/s rate is supported using 32bit, 16bit, or 8bit per-lane datapath options, just as the earlier PCIe 2.5GT/s through 8GT/s rates were. This means that designers will be dealing with clock rates topping out at 500MHz using 32bit per lane, all the way up to 2GHz using 8bit per lane.
The basic PHY controller interface signals familiar to users of previous PIPE specifications remain largely unchanged in PIPE 4.4, with expected changes to indicate PCIe 4.0 16GT/s and details related to the minor physical layer changes mentioned earlier. Extending this signaling to the Lane Margining mechanism, however, would have required a large number of new signals in each direction to exchange the needed control and status information between a PCIe 4.0 16GT/s PHY and controller. Using a mechanism originally proposed by Synopsys engineers, the PIPE specification now uses a generic register-type interface to provide control and communication between PHY and controller. Initially defined only for the PCIe 4.0 16GT/s Lane Margining feature, this interface could greatly simplify numerous future PHY features – both existing ones such as L1 sub-states control, and potential future controls for higher data rates, more complex equalization schemes, etc.
The PCI-SIG specification development process freezes functionality at the Draft 0.7, so designers can now start work on high-performance SoCs using the PCIe 4.0 16GT/s interface. PCIe 4.0 Draft 0.7 delivers scaled credits (1x, 4x, or 16x) and widened tags (from 8 to 10bit) to improve link bandwidth, and lane margining at the receiver for system designers to assess the performance variation tolerance of their system.
Synopsys’ DesignWare IP Solution for PCI Express 4.0 supports the latest Draft 0.7 specification and is available now. The complete PCIe IP solution consisting of PHYs and controllers is silicon-proven and supports a wide range of foundry process nodes.
Richard Solomon is a senior technical marketing manager at Synopsys.