Improving ASIC prototyping on multiple FPGAs through better partitioning

By Zied Marrakchi, Ramsis Farhat and Ramine Roane |  1 Comment  |  Posted: July 5, 2012
Topics/Categories: EDA - DFM  |  Tags: , , ,  | Organizations: ,

Using a new design-partitioning tool and stacked-silicon interconnect FPGA to develop an ASIC prototyping platform that can be reprogrammed several times a day.

ASIC prototyping with FPGAs enables fast system modelling and verification, as well as speeding up the software and firmware development process. Complex ASIC SoC designs have to be partitioned on to multiple FPGAs, which cuts system performance because signals have to exit one FPGA, cross an interconnecting layer and enter another. Given the limited number of I/O pads on an FPGA, it can also be necessary to multiplex these chip-to-chip signals, which further reduces system performance.

Flexras [1] is working with Xilinx’s Virtex-7 2000T FPGA [2]. This uses a stacked silicon interconnect layer [3] to link multiple FPGA die in one package. The resultant device can offer 2 million logic cells (LC) or look-up table (LUT) equivalents, designed using 6.8 bn transistors and several 12.5Gbit/s serial transceivers. This makes it ideal for ASIC prototyping and emulation.

Figure 1
Silicon Innovation: The first 3D FPGA Virtex-7 2000T (Source: Xilinx – click image to enlarge)

Xilinx has added a new level of hierarchy to the Virtex-7 FPGA architecture by building what it calls a

Super Logic Region (SLR) on a separate die (see figure 1). Taking this extra level of hierarchy into account while the design is being partitioned may help improve the visibility of interconnection issues as well as enabling designers to anticipate constraints early in their implementation process, so improving the final result.

The object of our research is to partition an ASIC design directly between FPGA die to minimize both the inter-FPGA and inter-die connections. We also want to build a board with two Virtex-7 2000T FPGAs for ASIC prototyping.

The experiments

We used two example ASIC designs, containing the equivalent of approximately 2.4 million LUTs, as the basis of our experimentation. The results detailed in table 1 and 2 are based on the average performance of the two designs.

We compared two ways of implementing these designs.

The first used the DN2076K10 DINI board [4], which contains six FPGA Virtex-6 LX760 FPGAs, with a total logic capacity of around 2.8 million LUTs.

The second used a board designed with two Virtex-7 2000T FPGAs, with a total logic capacity of around 4 million LUTs and a maximum logic uilization of 70%.  The FPGAs are interconnected using 400 LVDS pairs.

Our comparison metrics are:

• The multiplexing ratio between FPGAs, which is the maximum number of signals sharing the same I/O pin and physical board track

• The inter-FPGA hops on the critical path, which is the number of times the critical path exits an FPGA (see figure 2)

Figure 2
Example of combinatorial path with 2 hops (Source: Flexras – click image to enlarge)

• The system clock frequency of the device under test

For each board, we used the Flexras Wasga timing-driven tool to partition the design between the available FPGAs. As shown in figure 3, the Wasga compiler automatically partitions large designs onto multiple FPGAs while addressing chip resources, connectivity and the clock frequency constraints.

Figure 3
Wasga compiler: timing-driven partitioning (Source: Flexras – click image to enlarge)

As shown in table 1, both the multiplexing ratio and the number of combinatorial hops are reduced when the FPGA capacity increases, as happened in the board using Virtex-7 parts. This means we can absorb more signals in a single part and better manage combinatorial paths between them, enabling us to more than double the system clock frequency.

Table 1
Comparison of VIRTEX-6 and VIRTEX-7 based boards (Source: Flexras – click image to enlarge)

Partitioning the Virtex-7 board

We tried two ways to partition designs on to the Virtex-7 based board, as shown in figure 4.

Figure 4
Board architecture modelling: unbalanced interconnects (FPGA vs die) (Source: Flexras – click image to enlarge)

Scenario 1: Without considering that the Virtex-7 parts use multiple die on a silicon interposer The Flexras Wasga tool splits the design in two partitions, corresponding to the two Virtex-7 devices. Then Xilinx’s Vivado tool, which can place and route 2 million LUTs in less than five hours, does the intra-FPGA compilation.

Scenario 2: Taking into account the multiple die used in the Virtex-7 devices during the partitioning process. In this scenario, we partition directly between the 8 total dies used in the two FPGAs. This gives better visibility of both die to die connections and and I/O pad to die connections.

The XDC constraints that define instances and I/O pad assignment are given as an input to the Xilinx place and route tool. The Wasga tool generates a sub-netlist for each FPGA and a constraint file assigning design instances to individual die.

Table 2 shows that the first approach leads to a lower pin multiplexing ratio and more difficult FPGA place and route. The second approach has a beneficial effect on congestion distribution when we run partitioning, as shown by the reduction of the inter-die CUT, and of the time it took to run place and route by 22%. Congestion is reduced, despite putting the same amount of logic on to the die in each scenario, and the system clock frequency is reduced by 12%.

Table 2
The impact of the different partitioning approaches on performance and compilation run-time (Source: Flexras – click image to enlarge)

Figure 5
Hardware verification vs. software validation (Source: Flexras – click image to enlarge)

As shown in figure 5, each scenario is adapted to a specific verification purpose. In the case of hardware verification, the DUT is not stable and so multiple revisions must be tried out each day. Thus, FPGA place and route run time is critical and scenario 2 seems to be the better choice. In the case of software validation, the DUT is stable and the FPGA place and route process is probably only done once. Consequently, the execution time is the critical and scenario 1 is a better choice in this case.


We developed a rapid prototyping system based on Xilinx’s Virtex-7 2000T FPGA, whose design enables rapid prototyping platforms with high capacity and system performance. The combination of the Xilinx place and route, and Flexras WASGA partitioning tools offers an efficient way to implement large ASIC and SoC designs on multi-FPGA boards, enabling multiple design revisions per day.

Flexras is now leading a European-funded project called PPR to build a multi-FPGA board for ASIC rapid prototyping including two Virtex-7 2000T devices. This board is fully supported by Wasga partitioning.


Zied Marrakchi and Ramsis Farhat

Flexras Technologies153 Bd, Anatole France93200 Saint-Denis France

Tel: +33 149 22 0023

Ramine Roane

Xilinx2100 Logic DriveSan Jose, CA 95124

Tel: (408) 879-6954







Comments are closed.


Synopsys Cadence Design Systems Siemens EDA
View All Sponsors