The future of high-performance computing: direct low-latency peripheral-to-CPU connections

By  |  No Comments  |  Posted: March 1, 2006
Topics/Categories: EDA - DFT  |  Tags:

For peripheral card and system manufacturers, delivering low-latency high-performance computing solutions at affordable prices has been an insurmountable barrier. Although processor speeds and bandwidth have taken quantum leaps over the last decade, the last few inches between the adapter slot and system CPU represent a bottleneck that restricts the development of cost-effective high-performance computing solutions. For complex modeling, high-end transactional systems, commercial data centers and large system clusters, high latency or ‘wait time’ is a stumbling block that is holding back the development of systems with supercomputer-level performance using off-the-shelf components.

Now, a new expansion interconnect brings ultra-low latency and high performance to the input/output (I/O) slot, enabling direct communication with high-performance processors. HTX, an expansion connector specification that leverages industry-standard HyperTransport technology, overcomes the latency barrier common with standard systems. This white paper will address some of the challenges related to developing low-cost highperformance computing solutions and introduce the advantages of HTX.

Market Drivers

Every microsecond in improved latency can result in hours, days or weeks of processing improvements for complex and transactionintensive applications. Over the last few decades, processor speed and network bandwidth have improved dramatically, while latency has scarcely advanced. As computer science professor David Patterson of the University of California at Berkeley has said, “In the time that bandwidth doubles, latency improves by no more than a factor of 1.2 to 1.4.”[1] Had latency kept up with bandwidth, we would have 0.1-nanosecond processor latency today, Patterson added.

undefined

Figure 1. HTX card inserted into an HTX slot delivers direct communications to on-board CPUs via HyperTransport interconnects. The HTX slot is mechanically the same as a PCI Express connector, however HyperTransport signals and pin allocation are completely redesigned. In addition, the connector slot is installed backwards to prevent accidental board insertion.

PCI and PCI-X interconnects have been pushed to their performance limits with the addition of advanced I/O functions to the south bridge, such as high-speed drives and peripherals. Serious I/O bottlenecks have been introduced between system processor, memory controller and the system I/O interconnects.

As systems continue to advance, the need for faster I/O is therefore becoming more critical. For example, the 16-core system processors of tomorrow will be best leveraged by low-latency connections between the peripheral slot and the CPU. As the network infrastructure rockets forward, system latency will be less tolerable. Highperformance transaction systems, like those processing online airline reservations, are becoming more dependent on low-latency solutions in their quest to deliver rapid feedback to customers. Consumers and businesses alike are coming to enjoy and expect instant transactional results. In these types of applications, high latency can negatively impact users’ needs for instant gratification.

Given demand for this type of instant responsiveness, the limitations of existing buses and dramatic bandwidth and processor speed improvements, latency is the final holdout preventing low-cost systems from advancing to supercomputer speeds.

System and card manufacturer challenges

Manufacturers of high-performance systems and peripheral devices are held back by the cost of delivering low latency, the restraints of existing peripheral-to-CPU connections and the inherent limitations of standard buses.

The cost of low latency

The cost of developing high-performance systems can appear to be prohibitive. The expense of utilizing more silicon and engineering resources for small production volumes are deterrents to next generation system development, particularly when such systems are only likely to sell in relatively small volumes. Simulation, modeling and high-end transaction processing are widely seen as the domain of costly, proprietary mainframe or supercomputer systems.

undefined

Figure 2. This diagram demonstrates a possible dual CPU system with an HTX connector on a system motherboard

Yet as markets are increasingly commoditized, manufacturers want the option of being able to leverage off-the-shelf components to rapidly deliver cost-effective solutions to market. The challenge, then, is to reduce the processing latency of these inexpensive serverbased systems so they may successfully and profitably target highperformance applications.

Leveraging multiple, lower cost, yet powerful systems in interconnected server clusters is a solution that manufacturers are selling to address data center and scientific research requirements. However, system latency becomes an even bigger issue in clustered platforms.

Symmetric multiprocessing systems used in today’s ultra-high– performance applications leverage expensive proprietary processor interconnect fabrics that deliver super-low latency. To compete with them, today’s server clusters must be able to deliver the low latency demanded for high-performance computing applications. HTX connectivity is positioned as the only standardized interconnect technology that enables the bridging of multiple systems while delivering the ultra-low latency required by transactionintensive applications, but not the high costs and development time of proprietary interconnect fabrics.

Peripheral-to-CPU limitations

Crossing data over conventional peripheral interconnects, such as PCI, PCIX or PCI Express, to system CPUs involves forcing data through the many paths and stops that the system’s chipset logic imposes. All this is akin to a city bus making multiple stops while carrying eager passengers to their destinations. Latency can be a significant issue when data travels from chip to chip and passes through layers of intermediate IC controller functions.

With the increasing speed of processors and networks, this time lag has become a serious limitation for high-performance computing systems. In addition, each chip function that bridges data between the card slot and the CPU impacts the system’s cost and its reliability.

The overall system reliability is inversely proportional to the number of system components. This is even greater when complex, active components such as chipsets and bridge controllers are used, because they carry a statistically greater chance of failure.

When applications demand low latency, every 100 nanoseconds becomes a very significant improvement. For a 2GHz chip, a 100ns latency reduction means 400 more instructions can be processed per second. Optimizing the path from the north bridge to the south bridge and back is one way to reduce in-system latency. However, the ideal solution is a super-highway that drives data traffic directly from a peripheral subsystem to the system processor, dramatically reducing latency. HTX aims to provide such a solution.

Standard bus limitations

Many of today’s standard interconnect technologies, such as PCI, PCI-X and PCI Express, require a bridge chip to connect peripheral devices to the system CPU. As stated earlier, adding ICs increases system development costs, increases latency, lowers performance and lowers reliability.

undefined

Figure 3. The HyperTransport packet format is designed to be lean compared to the much larger PCI Express packet format

Although sporting a similarly modern, packet-based interconnect architecture, PCI Express was not intended to be a front side bus directly connected to the CPU. Instead, it was designed to supersede the older generation PCI and PCI-X buses as a new slot type interconnect. Its legacy support and handling of many possible configuration settings and system verifications burdens PCI Express with important latency-bound drawbacks that discourage its use in very latency sensitive applications.

HTX emerges as the most suitable interconnect to enable the highest-speed, lowest-latency communication between peripherals and the system CPU.

The HTX solution

Peripheral-to-CPU interconnect

Designed to enable the development of flexible and powerful high-performance computing systems while leveraging low-cost industry standards, HTX is an expansion interface specification that employs the popular HyperTransport protocol (see sidebar on HyperTransport technology).

HTX allows peripheral cards to directly plug into HyperTransportenabled servers, bypassing the traditional challenges of developing low-latency systems and opening the door for the development of cost-effective clustering and transactional systems and other low latency applications.

In this way, system and peripheral manufacturers can extend the power of HyperTransport to peripheral subsystems, bridging the last few inches that have prevented the development of supercomputing- level performance with commodity components.

One objective is to obviate the need for market-specific, costly motherboards. Instead, system manufacturers can design and market a single, powerful system or motherboard platform that can be easily tailored for a specific market by simply adding an HTX peripheral card.

Where HyperTransport is an in-system chip-to-chip interconnect, HTX enables plug-in subsystems to achieve the same direct-connect performance benefits. Leveraging the same mechanical connector as PCI Express combined with HTX-specific signal allocation, HTX system manufacturers can benefit from the low cost of industry-standard connectors when adding HTX to their systems. Compared to PCI Express, HTX delivers state-of-the-art latency by eliminating clock recovery circuit logic, adding HyperTransport’s Priority Request Interleaving and employing a lean packet payload protocol.

undefined

Figure 4. Main HyperTransport specification features

HTX business benefits for manufacturers

HTX provides system and peripheral card manufacturers numerous business benefits as follows:

State-of-the-art performance potential

The ultra-low latency available via HTX enables the development of a new class of very high-performance systems and peripheral subsystems that can drive transactional systems, complex modeling applications, security processing, storage control and other types of compute-intensive applications to new performance heights without the need for costly investments.

Extreme cost effectiveness

System manufacturers can tap the full performance of HTX for the cost of the HTX connector. The HTX connectivity standard leverages the economy of scale of widely available, multi-sourced PCI Express-type mechanical connectors.

Ease of integration

HTX integration does not require the use of any control logic or bridge controllers, extending the range of performance possibilities for conventional systems and motherboards, and making HTX integration easy and cost effective.

Market and platform universality

HTX’s low cost enables system manufacturers to implement HTX connectivity on a greater number of motherboards. Alternatively, it allows manufacturers to design fewer, HTX-equipped universal motherboards that can target high-performance markets while reducing bill of materials and stocking requirements.

Increased market share

Due to HTX’s cost-effectiveness and the inherent universality of any motherboard implementing it, system manufacturers can seek to capture high-performance market segments.

HyperTransport Technology

HyperTransport interconnect technology is an industry-standard, high-performance, high bandwidth, point-to-point link that provides the lowest possible latency for chip-to-chip and board-to-board communications. It enables a flexible, scalable interconnect architecture designed to optimize the number of buses within a system. The technology serves a wide array of applications, including embedded systems, personal computers, workstations, servers, network equipment and supercomputers.

Major system manufacturers, including, Acer, Apple, Cisco Systems, Cray, Fujitsu-Siemens, Hewlett-Packard, IBM and Sun Microsystems, are fully invested in the technology.

HyperTransport’s aggregate bandwidth of 22.4 Gbyte/s represents more than a 70-fold increase in data throughput over legacy PCI buses.While providing far greater bandwidth, HyperTransport technology complements legacy I/O standards like PCI as well as the latest interconnects such as PCI Express.

HyperTransport employs a packet-based protocol and clockforwarding technique that eliminates the need for many control and command signals. It supports asymmetric, variable-width data paths and operates with low-voltage differential signaling (LVDS) point-to-point links, delivering increased data throughput, minimized signal crosstalk and lowered electromagnetic interference. In short, HyperTransport seeks to combine the best features of parallel and serial interconnects.

Accelerated technology adoption

By leveraging the HTX connectivity standard, equipment and system manufacturers can rapidly develop products without the need for custom development. HTX allows manufacturers to quickly and efficiently deliver their innovations to a market.

No need for dedicated system designs

System developers can simply focus on the HTX slot and leverage existing HyperTransport-enabled motherboards. HTX eliminates the need for designing ad-hoc motherboards from scratch. Gone are costly product designs, complex engineering, tooling development and prototype debugging.

Multi-core, multi-processor support

HTX fully supports multi-core, multiprocessor architectures.

High-performance, low-cost clusters

By dramatically reducing the system latency of off-the-shelf systems, HTX enables an entirely new class of commoditized highperformance computing clusters that rival the parallel processing performance and scalability of leading supercomputer platforms.

Complements general-purpose interconnects

HTX is designed to complement general-purpose interconnects, like PCI, PCI-X and PCI Express, by meeting the needs of low latency, compute-intensive applications that could not be otherwise serviced.

Technical Advantages

We argue that HTX provides a number of competitive advantages over alternatives such as PCI Express:

Lowest possible latency

The direct interconnect between the HTX card slot and the system processor, combined with the inherent low-latency features of HyperTransport technology deliver the lowest possible latency and performance of all standard interconnects.

Best of parallel and serial interconnects

HTX and HyperTransport technology employ a packet-based protocol combined with a clockforwarding technique that eliminates the need for many control and command signals of older-generation buses like PCI and PCI-X. HTX and HyperTransport also have no need for clock recovery logic, a major latency burden common with interconnects like PCI Express.With priority request interleaving [MS1]and a very lean packet protocol, HTX seeks to combine the best features of parallel and serial interconnects.

Supports variable width links

For peripheral devices that receive more data than they transmit — such as graphical applications — HTX allows motherboard manufacturers to establish asymmetric traffic paths, such as 16 lanes receiving and 2 lanes sending data, to optimize motherboard real estate and cost.

HTX Application Examples

HTX is best utilized for low-latency, high-performance applications. In situations where latency improvements create exponential processing performance, HTX is ideal. Clustering, transactional and modeling systems are excellent candidates for the benefits of HTX. Specific examples include climate modeling, computational chemistry, molecular modeling, weapons simulations, security processing, storage management, data encryption, XML processing and real-time data analysis.

Other possible applications include:

  • 10-gigabit networking
  • High-performance co-processing
  • System cache optimization
  • High-speed storage subsystems
  • Grid systems
  • Medical imaging
  • Rendering

HTX Summary

HTX is an industry-standard interconnect that allows direct peripheral-card to system CPU communications, speeding performance and dramatically reducing latency. For system and peripheral card manufacturers, it offers the ability to deliver leading-edge performance at affordable prices using industry standard components.

References

  1. Patterson, David. Why Latency Lags Bandwidth and What It Means to Computing. Presentation October 2004

Comments are closed.

PLATINUM SPONSORS

Synopsys Cadence Design Systems Mentor - A Siemens Business
View All Sponsors