Not Applicable.
As circuit technology scales, the costs of chip-fabrication increase rapidly. Field Programmable Gate Arrays (FPGAs) can be a cost-effective alternative to Application Specific Integrated Circuits (ASICs) for many systems. The implementation of systems on FPGAs has the advantage of shorter time-to-market as compared with ASIC implementations. In addition, a system having FPGAs can be tested and debugged repeatedly in a shorter period of time at a lower cost compared to ASIC implementations.
Some FPGAs, such as those from Xilinx corporation, have two-dimensional arrays of tiles having configurable logic blocks (CLBs) and switches. The switches have programmable interconnection points (PIPs) to enable connection to the four adjacent switches on 2-D array. Connecting elements in a 2-D FPGA is well known in the art.
Since the speed and density of FPGA chips increase with technology, the application area of FPGAs, which was previously limited primarily to system prototyping, has been extended to higher performance and more complex custom applications. However, FPGAs are slower and power-inefficient than ASIC systems because the speed of internal FPGA interconnects do not scale with technology. The speed and energy consumption of FPGAs are dominated by element interconnects. The speed of FPGAs is also limited by the delay of long wires and buffers and the energy consumption is limited by the larger capacitance of wires and programmable interconnects.
The present invention provides methods and apparatus for a three-dimensional (3-D) Field Programmable Gate Array (FPGA). In general, components of the FPGA, e.g., 3-D switches, programmable logic blocks (PLBs), and slices (e.g., look-up table, flip-flop), are interconnected to reduce wiring distance and thereby provide more efficient energy consumption as compared to conventional FPGA architectures.
In one aspect of the invention, embodiments of the invention can include one or more of the following features: Designing an FPGA, in a three dimensional array of tiles having respective switches, by connecting a first switch in a first tile in the array of tiles to one or more of at least five other switches in the array of tiles nearest, in one embodiment, to the first tile across first and second strata of the array. Coupling the first switch to one of the at least five other switches using an inter-strata via. The structure for a first wire of the first switch includes four intra-stratum connections and one or more inter-strata connections. Reducing wire-length as a parameter to route and place components in the FPGA. Using simulated annealing to place slices. Performing intra-strata optimization. Using a center of gravity for slices coupled to a net coupled to a selected instance of a slice. Performing an available location search in an inter-strata spiral. Performing power-driven placement. Locating instances having a higher switching activity on a strata close to a heat sink. Partitioning the FPGA to collocate elements for a tile including a PLB, switch, and configuration memory. Partitioning the FPGA to alternate support strata and core strata. Partitioning the FPGA to place PLBs on a first strata and switch blocks on a second strata.
In another aspect of the invention, an FPGA device can include one or more of the following features: A three dimensional (3-D) array of tiles each having a 3-D switch and a PLB, wherein a first one of the switches in the array of tiles can be coupled to at least five other ones of the switches in the array of tiles across first and second or more strata. The first one of the switches includes a number of configurable interconnection points equal to fifteen times the number of wires per channel. Pins of the PLB are connected to wires, wherein a structure for a wire connection includes four intra-stratum connections and two inter-stratum connections. The FPGA is partitioned. The FPGA is partitioned based upon one or more of tile element collocation, support and core stratum, PLB stratum and switch stratum, and alternating stratum.
In another aspect of the invention, a CAD system includes one or more of the following features. A processor coupled to a memory, an operating system, and a CAD application having a placement module and a routing module. The memory can store instructions that when executed enable one or one or more of the following. Designing an FPGA, in a three dimensional array of tiles having respective switches, by connecting a first switch in a first tile in the array of tiles to one or more of at least five other switches in the array of tiles nearest to the first tile across first and second strata of the array. Coupling the first switch to one or more of the at least five other switches using an inter-strata via. The structure for a first wire of the first switch includes four intra-stratum connections and one or more inter-strata connections. Reducing wire-length as parameter to route and place components in the FPGA. Using simulated annealing to place slices. Performing intra-strata optimization. Using a center of gravity for slices coupled to a net coupled to a selected instance of a slice. Performing an available location search in an inter-strata spiral. Performing power-driven placement. Locating instances having a higher switching activity on a strata close to a heat sink. Partitioning the FPGA to collocate elements for a tile including a PLB, switch, and configuration memory. Partitioning the FPGA to alternate support strata and core strata. Partitioning the FPGA to place PLBs on a first strata and switch blocks on a second strata.
The exemplary embodiments contained herein will be more fully understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
In an exemplary embodiment, a tile 108 in the 3-D FPGA 100 includes a 3-D switch 104b and a PLB 106b. A 3-D mesh array of tiles 108 constitutes a FPGA where wires of each 3-D switch 104 are connectable to that of nearest six switches (four sides, top and bottom), for example, for a 3-layer embodiment. For the top and bottom strata, each switch shares wires with the nearest six switches. A two-layer (first and second strata) embodiment can have a switch connectable to five other switches and a three-layer (first, second, and third strata) embodiment can have a switch connectable to six other switches.
The tiles 108 in the 3-D FPGA include a 3-D switch 104 and a PLB 106. In one particular embodiment, the PLB 106 includes a pair formed from a LUT (Look-Up-Table) 110 and a FF (flip-flop) 112, which can be referred to as a slice 114a. A PLB is composed of n slices 114a-n, which are coupled to the 3-D switch 104. It is understood that slice should be construed broadly and is not intended to denote any particular architecture.
It is understood that the PLB can be formed using structures and techniques other than the illustrated slices, LUTs, and particular switch circuits. It is further understood that slice should be construed broadly to include configurations having any number of component-types/elements that provide LUT-FF functionality.
The number of CIPs (Configurable Interconnection Points) in a 3-D switch 104b is larger than that of 2-D switch. If there are m wires per channel, there are 15m CIPs in a disjoint switch since there are six sides in a regular hexahedron and each wire can be connected to the other five wires on the other five sides. Therefore, six wires on six sides are connected to the other five wires on the other five sides but each connection is counted twice. Thus, the number of connections is 6*5/2=15. A disjoint switch is a switch where each wire on one side is connected to only the corresponding wire on the other side. Assume that there are n wires on one side of 3-D switch with index 0, 1, . . . n−1. In a disjoint switch, wire 0 is connected only to wire 0 on the other side, wire 1 is connected to only wire 1 on the other side and so on. If all of the n wires are connected to all other n wires on the other side, then it is called a fill-crossbar switch. A full-crossbar switch requires a relatively large number of interconnections and thus a disjoint switch is typically used.
Pins of the PLB are connected to wires. In an exemplary embodiment, if there are p pins in a slice and each pin is connected to q wires in 3-D switch, there are npq connections. The number of CIPs in a switch for pq connections is log2(npq).
While an exemplary switch structure is shown in the illustrated embodiment, modifications, substitutions and alternatives, etc. to the switch structure will be readily apparent to one of ordinary skill in the art without departing from the invention. In addition, while reference may be made to first and second inter-strata connections U, D, and/or first, second, and third strata, it is understood that a 3-D FPGA refers to a FPGA having two or more layers. In an exemplary embodiment, a 3-D FPGA includes two layers and a switch may or may not include both U and D inter-strata connections.
In another aspect of the invention, in an exemplary 3-D FPGA CAD (Computer Aided Design) flow, a HDL (Hardware Description Language) netlist is synthesized and mapped into slices. HDL systems and techniques for describing FPGAs are well known to one of ordinary skill in the art. Using a CAD system, slices in tiles are located in 3-D space after consideration of wire-length, speed, and power consumption. 3-D routing assigns a set of physical wires in 3-D switches to each net. 3-D placement and routing plays a role in determining the performance of 3-D FPGAs.
The wire-length of a net is defined as the sum of lengths of physical wires assigned to the net after placement and routing. As used herein, a net refers to a logical connection interconnecting ports of a PLB in a FPGA. There are many logical blocks (instances) in a circuit. Nets are logical connections that connect each port of the logical block to the other ports of other blocks. Routing determines the assignment of physical wires to a net and placement determines the physical location of logical blocks. A net is a logical connection describing port-to-port connections among logical blocks. Routing assigns a set of physical wires to a net such that signals are transferred through physical wires.
It will be appreciated that the inventive 3-D FPGA configuration reduces total wire-length and improves the operating speed of the circuit by reducing the path length of a net that is proportional to the signal delay.
In an exemplary embodiment, a 3-D placement algorithm is based on simulated annealing. As is well known to one of ordinary skill in the art, simulated annealing is a technique that can escape local minima, and potentially discover global minima, when attempting to find an optimal solution to a given problem. Simulated annealing has been applied to placement in two-dimensional FPGAs, for example, as is well known in the art.
In exemplary embodiments of the inventive 3-D FPGA design techniques, simulated annealing includes a number of iterations, where at each iteration, an instance is randomly selected and moved to a specific location. If the cost, where cost can be determined in terms of speed, power, or other parameter of interest, or combination of parameters, decreases the movement is accepted. If the movement cost increases, the acceptance is determined by the current ‘annealing temperature.’ If the ‘temperature’ is high, the probability of acceptance is high and it is probable that the movement is accepted. If the temperature is low, the probability of acceptance low. In general, instances are moved to a specific location to minimize the cost.
In the initial stage of simulated annealing, the instances are randomly selected and moved to a random location. This random movement is repeated many times. The variance of cost is set as an initial temperature. In each iteration, the temperature is computed as follows: Tnew=Tcurrent×α, where the acceptance ratio α is less than 1.0. The decision for movement acceptance is decided as follows: R=random value between 0 and 1, if r<e(−cost gain/temperature) then it is accepted. As the temperature goes down, e(−cost gain/temperature) becomes smaller and the probability of acceptance ratio goes down. Otherwise, the slice movement is not accepted. The temperature decreases at each iteration and lower temperature forces slice movement with higher cost gain to be accepted.
In an exemplary embodiment, 3-D placement includes 3-D simulated annealing with two-phase slice movement: inter-strata optimization and intra-stratum optimization phase. In the inter-strata optimization phase, simulated annealing tries to move slices by computing forces near the selected slice.
Slice S0 is the selected instance during simulated annealing in the illustrated embodiment. Three nets, A, B, C are connected to instance S0. For each net, there are several connected slices, as shown. Net A is connected to slices S1 and S2, net B is connected to slice S3, and net C is connected to slices S4, S5, and S6. The ‘location vector’ of S0 is the 3-D vector to the location where slice S0 is located.
As shown in
where vi is the vector of the instance with index i, Ni is a set of nets that are connected to instance i, nk is one of the nets in Ni, and I(nk) is a set of indices of instances which are connected to nk. The term on the right side computes the weighted sum of the location vector difference for the current instance i.
In general, the tile locations of the inventive 3-D FPGA are discrete so that the ‘new’ location for S0 may not be available, i.e., the tile at the new location is fully occupied by other slices. In an exemplary embodiment, the location of an instance is an integer value larger than or equal to 0, not a floating-point value—discrete indicates that a location is an integer value. If a new location is not available, the system searches around the new location for an available tile.
In one embodiment, the system searches around the new location S0 in a spiral direction. The search sequence is enumerated by the numbers on the tiles, e.g, 1, 2, . . . , 9 as shown on each strata ST1, ST2, ST3. At each candidate location, the system attempts to find an available location at a different strata with the same x and y location (but different z location). Reasons for searching an available location at different strata include that the number of strata (e.g., 2˜4) will be typically be smaller than that of rows and columns in each stratum (e.g., ˜100). If the original location of S0 is relatively far from the new location of S0 and the cost gain, as defined for the selected parameter(s) is relatively large, it is better in speed and wire-length to find an available location just above or below of the new location than searching for another location in the spiral direction that is farther from the new location. In addition, the distance between strata, Lz (˜50 μm) is typically smaller than intra-strata distance (e.g., Lx or Ly ˜500 μm) although the delay through the inter-strata via (U, D in
In another aspect of the invention, after multiple iterations of the inter-stratum phase described above, instances tend to equilibrium locations. An acceptance ratio α, i.e., the percentage of movements accepted for each iteration, becomes under a certain level. If the acceptance ratio α is under a predetermined threshold γ, intra-stratum optimization begins. In the intra-stratum optimization phase, instances are permitted to move only in the same stratum with local optimization in each stratum.
In general, 3-D routing is based on negotiated-cost global routing with advanced 3-D wavefront expansion. In one embodiment, the routing algorithm operates as follows. For each net, the router searches for a wavefront including ports of the net. The router includes a driver port that drives the value of the net. It iteratively includes nearby physical wires in the wavefront. If all ports of the net are included then the wavefront expansion stops. If one looks at the wavefront, there are many possible paths that connect all the ports. Then so-called backtracing starts from each port and selects physical wires to meet the driver port. Initially, 3-D routing performs breadth-first searching to enumerate paths and compute timing criticality for the paths. Assume there is a tree structure with P as a parent, A, B, and C are children of A. A0, A1, and A2 are children of A and so on. In breadth-first searching, the algorithm searches nodes as follows P->A->B->C->A0->A1->A2->B0->B1->B2 . . . . The other depth-first searching searches nodes as follows. P->A->A0->A1->A2->B->B0->B1->B2 . . . ].
For each net, wavefront expansion around a driver pin of the net searches possible routing paths and, after determining that wavefront meets the load pins, backtracing determines lowest-cost routing path for each net. The number of overused wires decreases by the cost-based negotiation between nets as the routing of all nets is iterated. During the routing, a physical wire can be occupied by several nets so that it is considered overused. That is, an ‘overused’ physical wire is occupied by several nets temporarily during the routing process. As iterations progress, a physical wire becomes occupied by only a single net.
It should be noted that heat removal capability may deteriorate in a 3-D FPGA as multiple strata are integrated. In vertically integrated 3-D FPGAs with a heat sink on the package, the strata not in contact with a heat sink may have relatively limited heat removal capability. The temperature increase on strata farther from a heat sink may have an impact on leakage power consumption as well as device reliability since the leakage power has an exponential dependency on temperature. Typically, wires and switches of FPGAs are responsible for significant power consumption in FPGAs. The switching power of FPGAs is reduced by allocating a smaller number of physical wires to a net with higher switching activity. The leakage power in 3-D FPGA is a consideration due to temperature increases caused by vertical transistor stacking.
In an exemplary embodiment, a 3-D FPGA has a heat sink at the printed circuit board (PCB), i.e., the bottom stratum is in contact with a heat sink. It may be efficient to allocate wires going through 3-D switches at lower strata to a net with higher switching activity to decrease temperature on each stratum in the 3-D FPGA. In other words, it reduces leakage power consumption to locate instances connected to a net with higher switching activity in a lower stratum because the router typically allocates wires that go through 3-D switches located between the maximum stratum number and the minimum stratum number of instances connected to the net. Energy driven 3-D placement locates slices connected to the net with higher switching activity at lower strata.
Initially, the “power criticality” of each slice is initialized. The normalized switching activity of each net is added to the power criticality of slices connected to the net. In an exemplary embodiment, the slices are sorted according to the power criticality and each slice is assigned “max. stratum number (MSN)”. The slices having higher power criticality are assigned a lower MSN where the stratum in contact with a heat sink has stratum number of 0, for example.
In another aspect of the invention shown in
In order to shorten the wire-lengths among the core logic and routing area, it is beneficial to locate configuration memory, here shown as SB config and PLB config, and other peripheral elements to a different stratum. The stratum with the logic blocks, here shown as PLB and switch block, and routing switches is the “core stratum” and the stratum with configuration memory and other non-essential items is the “support stratum.”
With this separation, the configuration information is delivered via the inter-strata vias, as shown in the exemplary partition in
For example as shown in
An additional advantage of using support strata for configuration memory is the ability to use a different process technology for these elements. The support layer can be designed in a first memory process tuned for low leakage power while the core layer can be designed in a second process for high performance logic.
In another embodiment, 3-D partitioning places pipelining registers that are in the switch block on a separate layer. Again, this technique reduces the area overhead of supporting interconnect pipelining by reducing the number of gates in the core layer.
The support strata can be used for the distribution of power. In a distributed power domain design, power switches, which can require significant area, are placed on a support layer and the power rails are supplied by inter-strata vias. The distributed power switches can be independently controlled to provide the optimal voltage level for a particular tile or to greatly reduce current leakage in an unused tile.
The support strata can contain analog components such as Analog-to-Digital Converters (ADC) or Phase-Locked Loops (PLL), which could benefit from use of a different process technology. This partitioning also improves manufacturability as the support layer can be tested separately from the core layer.
Other embodiments are within the scope of the following claims.
The Government may have certain rights in the invention pursuant to DARPA Contract No. N66001-04-C-8032.