Altera is using a combination of Intel’s 14nm process technology and multidie packaging to boost the logic-cell count for its field-programmable gate arrays (FPGAs), together with a superpipelining strategy to help trade area for clock speed.
The FPGA maker has sidestepped the problem of putting advanced mixed-signal design on finFETs by dedicating the process to core logic and then using Intel’s 2.5DIC embedded multidie interconnect bridge (EMIB) packaging technology to let the logic array communicate with the outside world through a selection of copackaged serial interfaces and memory buses. EMIB uses a combination of silicon bridge chips and organic substrates to provide high-density interconnect between the active dice in the package.
Using the multichip-module substrate of EMIB, Altera is able to use the full reticle of the 14nm process at Intel for core logic and push beyond the 600mm2 area of that core die for the I/O slices, rather than try to squeeze everything onto a reticle-sized silicon interposer, which is the approach that competitor Xilinx currently favors. That will yield, according to Balough, an FPGA with five times more programmable logic on a monolithic die than is available today.
“14nm is just one of the underpinnings of this whole thing,” he claimed.
Xilinx today puts several core logic dice onto one interposer, which helps keep yield high for the total package. Balough argued Altera’s approach to redundancy in the core logic array lets the company make much larger monolithic devices without suffering the dramatic yield loss normally associated with such large die sizes.
Balough said transceivers would be made mainly on a 20nm process and the use of multidie packaging offers greater flexibility. “The issue has grown particularly acute around transceivers. Customers ask us for a diversity of speeds and we see a diversity of interface protocols as well as a diversity and uncertainty around modulation schemes because of the different standards customers are looking to use. You can imagine the cost of taping out 14nm products with all those options. Being able to modularise the transceivers provides us with the flexibility.”
Balough said options would include additional memory, copackaged in the same way as the transceivers, and ultimately optical communications modules.
“We did a proof of concept of optical in the past. But optical is clearly out there somewhere,” Balough said.
Craig Davis, field applications engineering manager at Altera, added that the market for intrasystem optical communications is not there yet. “But when speed goes up we expect to see real demand for optical interconnect between chips on a board. It’s more a question of time rather than anything else.”
Although finFET processes potentially provide higher clock speeds, a problem remains with interconnect delay, Balough said. “What we did was go after the routing delay. The way to deal with that was bring a lot more registers to the party.”
Altera took advantage of the routing congestion that is inevitable in the programmable routing needed for its devices to place a register together with the SRAM-based routing switches. The company has chosen to call the approach Hyperflex. However, the hyper-registers are no different to regular flops and have no additional logic next to them in the way that the full logic cells (ALMs) do.
“There are ten times more hyper-registers than ALM registers. But they only have a 1 per cent effect on total static power and less than a 1 per cent effect on the area of the device,” Balough claimed.
“We can now say to a designer: ‘imagine you have unlimited pipelining for free’,” said Balough.
Although the architecture keeps the ability to drive long lines, the idea behind Altera’s pipelining is to break the connections between long-distance logic into pipelined stages. The latency of data transmission will remain approximately the same but the technique could allow more closely spaced logic to operate at gigahertz speeds without being forced to clock more slowly because of intermodule routes.
“The more retiming you put into it the more you can get more out of it,” Balough said. “Generally, you will get the most out of this for datapath. But that’s the raison d’etre for many FPGAs: datapath with control.”
Balough argued the greater ability to insert pipelining stages could be used to save both power and area compared to traditional FPGA-based designs, particularly those that use parallelized datapaths.
“If you double the clock, you can potentially halve your bus width and halve the FPGA area you need. So you no longer pay the static power [of logic cells that are no longer required]. You could see power savings up to 70%. Hyperflex is not just for people who need extra throughput.”
The Quartus II release designed for the finFET-based Stratix 10 will automatically select registers based on their availability. Clock-domain crossing synchronization will typically be performed using the ALMs and the registers that remain within those. Davis said clock gating would be clustered to avoid the interconnect overhead of gating hyper-registers individually.
“We’ve been working on this architecture for five years so we could be confident that we have the tools for it,” Balough said. “We have now run a hundred designs through our simulation tools. It’s non trivial decision to make such fundamental changes to your FPGA architecture. Getting it wrong can be a fundamentally bad thing for your company.”
Further changes lie in the security architecture of the Stratix family. “We’ve continued to add security capabilities to this. Security used to be more of a nichey conversation that you had with the defence industry. Now it’s a conversation you have with many markets,” Balough said. “Security features are now in the top-five requests we get from customers.”
To support device authentication, the devices include a physically unclonable function (PUF) using technology licensed from specialist IntrinsicID. The key-management and encryption architecture has also been beefed up in such a way that it allows different sectors of the FPGA to employ bitstreams with different keys.
“Allowing multiple keys to be used opens up some new applications. You can manage blocks in a hierarchical fashion, with each one using its own encryption and authentication scheme,” Balough said.
This could allow accelerators to be loaded onto the FPGAs deployed in cloud servers that act on behalf of different clients. “I don’t want anyone else to decrypt the IP. The data-center provider can assure customers that only they have the key to that.”