The present disclosure generally relates to integrated circuits. More particularly, the present disclosure relates to techniques and systems for performing signal activity extraction in an integrated circuit.
Programmable logic devices are a type of integrated circuit that can be programmed by a designer to implement a desired custom logic function. In a typical scenario, a logic designer uses computer-aided design (CAD) tools to design a custom logic circuit. These tools use information on the hardware capabilities of a given programmable logic device to help the designer implement the custom logic circuit using multiple resources available on that given programmable logic device. To ensure that the customized programmable logic device performs satisfactorily, the computer-aided design tools optimize placement and routing of resources on the device.
To satisfy the needs of system designers, programmable logic devices are being developed that contain increasingly large amounts of circuit resources. Although such devices are able to implement complex circuit designs, these devices also tend to consume large amounts of power. Circuits that consume too much power can create thermal management problems and can adversely affect system performance.
One of the largest contributors to power consumption on an integrated circuit is dynamic power. Dynamic power is consumed when a signal toggles between high and low values. Dynamic power consumption scales with the product of load capacitance and signal switching frequency. As a result, dynamic power consumption increases as a capacitive load being driven increases and as a frequency at which a particular logic signal toggles increases.
Conventional computer-aided-design tools for designing customized circuits for programmable logic devices are generally unable to help a system designer reduce dynamic power consumption. System designers are therefore unable to make informed decisions regarding tradeoffs between dynamic power consumption, timing performance, and circuit real estate consumption.
In one aspect, a technique for performing signal activity extraction in an integrated circuit an integrated circuit is described. The integrated circuit includes multiple nodes. The technique includes compiling a design of the integrated circuit, estimating signal activities at the nodes, determining a node of interest from the nodes, and connecting a signal activity circuit to the node of interest. The determination of the node of interest and the connection of the signal activity circuit to the node of interest first compared to the remaining nodes of the integrated circuit improves efficiency in determining nodes of the integrated circuit at which signals can be analyzed first. Such signal activity extraction may involve power analysis and power optimization.
The invention may best be understood by reference to the following description taken in conjunction with the accompanying drawings, which illustrate specific embodiments of the present systems and techniques.
Host computing device 102 may be a computer, a cell phone, a tablet, or a Personal Digital Assistant (PDA). Host computing device 102 includes configuration information used to configure integrated circuit 108 to configure integrated circuit 108. Host computing device 102 sends the configuration information via cable 106 to integrated circuit 108. Integrated circuit 108 performs certain logic, such as binary addition, transmission of signals, filtration of signals, or reception of signals, according to the configuration data. Host computing device 102 reads data, such as counts, received from a signal activity circuit, described below. A signal activity includes a toggle rate and/or a static probability, both of which are also described below. The signal activity circuit is connected to integrated circuit 108.
In another embodiment, integrated circuit 108 is a combination of an Application Specific Integrated circuit (ASIC) and a PLD. In an alternative embodiment, integrated circuit 108 is an ASIC.
Housing 214 is an exemplary housing of host computing device 102 (
Processing unit 202 may be a central processing unit (CPU), a microprocessor, a floating point coprocessor, a graphics coprocessor, a hardware controller, a microcontroller, a programmable logic device programmed for use as a controller, a network controller, or other processing unit. Memory device 204 may be a random access memory (RAM), a read-only memory (ROM), or a combination of RAM and ROM. For example, memory device 204 includes a computer-readable medium, such as a floppy disk, a ZIP™ disk, a magnetic disk, a hard disk, a compact disc-ROM (CD-ROM), a recordable CD, a digital video disc (DVD), or a flash memory. Memory device 204 stores a set of techniques, described herein, for performing signal activity extraction in integrated circuit 108.
Network interface 206 may be a modem or a network interface card (NIC) that allows processing unit 202 to communicate with a network 216, such as a wide area network (WAN) or a local area network (LAN). Processing unit 202 may be connected via a wireless connection or a wired connection to network 216. Examples of the wireless connection include a connection using Wi-Fi protocol or a WiMax protocol. The Wi-Fi protocol may be an IEEE 802.11, IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, or IEEE 802.11i protocol. Examples of input device 208 include a mouse, a keyboard, a stylus, or a keypad. Output device 212 may be a liquid crystal display (LCD) device, a plasma display device, a light emitting diode (LED) display device, or a cathode ray tube (CRT) display device. Examples of output interface 210 include a video controller that drives output device 212 to display one or more images based on instructions received from processing unit 202. Processing unit 202 accesses the techniques, described herein, for performing signal activity extraction in integrated circuit 108, from memory device 204 or from a remote memory device (not shown), similar to memory device 204, via network 216, and executes the techniques. Processing unit 202, memory device 204, network interface 206, input device 208, output interface 210, and output device 212 communicate with each other via a bus 218.
In an alternative embodiment, system 200 may not include input device 208 and/or network interface 206.
PLD 300 includes a two-dimensional array of programmable logic array blocks (LABs) 302 that are interconnected by a network of multiple column interconnects 310 and multiple row interconnects 312 of varying length and speed. For the purpose of avoiding clutter in
A node of PLD 300 may be a register of PLD 300, a RAM block of PLD 300, I/O element 308 of PLD 300, buffer 314 of PLD 300, DSP block 306 of PLD 300, or LAB 302 of PLD 300.
A clock tree 316 is overlaid on PLD 300. Clock tree 316 is shown in dashed lines for maintaining clarity of
In an alternative embodiment, PLD 300 does not include MegaRAM block 304. Moreover, in another embodiment, clock tree 316 has different shapes and sizes of clock spine paths than that shown in
Referring to
Processing unit 202 executes the compilation technique 500 to convert a user design expressed, for example, as a Hardware Description Language (HDL) by a user, into the configuration information used to configure integrated circuit 108 (
Processing unit 202 executes synthesis phase 504 to convert the register transfer layer description of the user design into a set of logic gates. Processing unit 202 executes technology mapping phase 506 to map the set of logic gates into a set of atoms, which are irreducible constituents of the user design. The atoms may correspond to groups of logic gates and other components of the user design matching the capabilities of LEs 402 or other functional blocks of PLD 300. The user design may be converted into any number of different sets of atoms, depending upon the underlying hardware of integrated circuit 108 (
Processing unit 202 further executes cluster phase 508 to group related atoms together into clusters. Processing unit 202 also executes place phase 510 to assign clusters of atoms to locations on PLD 300. Processing unit 202 executes route phase 512 to determine a configuration of multiple configurable switching circuits of PLD 300 used to connect the atoms implementing the user design. Processing unit 202 executes delay annotator phase 514 to determine multiple signal delays, such as data delays, for the set of atoms and their associated connections in the configurable switching circuits by using a timing model of PLD 300. Processing unit 202 executes timing analysis phase 516 to determine whether the implementation of the user design in PLD 300 will meet multiple long-path and short-path timing constraints specified by the user via input device 208.
Processing unit 202 executes assembler phase 518 to generate the configuration information specifying the configuration of PLD 300 implementing the user design, including the configuration of each LE 402 used to implement the user design and the configuration of the configurable switching circuit used to connect the LEs 402. Processing unit 202 executes assembler phase 518 to write the configuration information to a configuration file, which can be stored within memory device 204 and can then be accessed by host computing device 102 (
Processing unit 202 estimates 604 multiple signal activities at all nodes of PLD 300. For example, processing unit 202 estimates 604 static probability at a node of PLD 300 by computing a ratio of a number of clock cycles, for a pre-determined time period, for which a signal output from the node has a pre-determined value, such as 1 or 0, to a total number of clock cycles measured during the pre-determined time period. A signal output from a node is referred to herein as an output signal, which is a data signal and not a clock signal. As an illustration, processing unit 202 calculates static probability of 25% of a signal output from a node of PLD 300 to indicate that the node or the output signal has a value of 1 for 25% of the pre-determined time period. As another example, processing unit 202 calculates static probability of 75% of a signal output from a node of PLD 300 to indicate that the node or the output signal has the pre-determined value of 1 for 75% of the pre-determined time period. The pre-determined time period and the pre-determined value are provided by the user via input device 208 (
As another example, processing unit 202 estimates 604 a toggle rate of a node of PLD 300 by calculating a ratio of a number of clock cycles during which a signal output from the node toggles, such as transitions, from one value, such as 1 or 0, to a different value, such as 0 or 1, to the total number of clock cycles. For instance, processing unit 202 determines that a signal that transitions from 1 to 0 and back from 0 to 1 during the pre-determined time period, such as, for example, one second, has a toggle rate of 2 transitions per second. Unit of a toggle rate may be transitions per second.
As yet another example, processing unit 202 applies a vectorless signal activity estimation technique to estimate 604 signal activities at all nodes of PLD 300. Examples of vectorless estimation techniques are described in a patent application having application Ser. No. 11/414,933, inventors David Neto et al., filed on May 1, 2006, and titled “Method And Apparatus For Deriving Signal Activities For Power Analysis And Optimization”, another patent application having application Ser. No. 11/414,803, inventors David Neto et al., filed on May 1, 2006, and titled “Method And Apparatus For Deriving Signal Activities For Power Analysis And Optimization”, and yet another patent application having application Ser. No. 11/414,855, inventors David Neto et al., filed on May 1, 2006, and titled “Method And Apparatus For Deriving Signal Activities For Power Analysis And Optimization”.
Processing unit 202 determines 606 a node of interest of PLD 300 from the all the nodes of PLD 300. For example, processing unit 202 determines that a register, I/O element 308, a RAM block, or DSP block 306 (
As another example, processing unit 202 determines that a node of PLD 300 that has a higher number of fanouts than the remaining fanouts of the remaining nodes of PLD 300 is a node of interest. For instance, if an output of a node P of PLD 300 is directly connected to nodes Q and R and an output of a node S of PLD 300 is directly connected to a node T, Q, or R, processing unit 202 determines that the node P has a higher number of fanouts than a number of fanouts of node S. It is noted that a direct connection is a connection that includes only a conductor, which is described below.
As yet another example, processing unit 202 determines that a node of PLD 300 that has an output connected to a higher number of nodes of PLD 300 downstream from the node is a node of interest. In this example, the higher number of nodes is higher than a number of nodes connected to the remaining nodes of PLD 300. Moreover, in this example, the node of interest may have a higher number of layers of logic than that of the remaining nodes. For instance, if an output of a node A of PLD 300 is directly connected to a node B and connected to a node C via node B, and an output of a node D is directly connected to node C, node A is connected to a higher number of nodes downstream than a number of nodes connected downstream to node D. In this instance, node B, which receives a signal output from node A, is located downstream from node A. Moreover, in this instance, node C, which receives a signal output from node B, is located downstream from node B. Moreover, node C, which receives a signal output from node D, is located downstream from node D. Further, node C, which is affected by a signal output from node A, is located downstream from node A.
As another example, processing unit 202 determines that a node of PLD 300 that has a number of fanouts greater than a pre-determined number of fanouts is a node of interest. The pre-determined number of fanouts is provided by the user via input device 208 (
As another example, processing unit 202 determines that a node of interest is a node of PLD 300 that has an output directly connected to a conductor, such as a wire, of the PLD 300 and the conductor has a capacitance higher than the remaining capacitances of the remaining conductors directly connected to the respective remaining outputs of the remaining nodes of the PLD 300. As still another example, processing unit 202 determines that a node of PLD 300 that is directly connected to a conductor having a capacitance that is greater than a pre-determined capacitance is a node of interest. The pre-determined capacitance is provided by the user via input device 208 (
As another example, processing unit 202 determines that a node of PLD 300 that has a lower depth than depths of the remaining nodes of the PLD 300 is a node of interest. For instance, a first node at an input of PLD 300 has a depth of 0, a second node directly connected to an output of the first node has a depth of 1, and a third node directly connected to an output of the second node has a depth of 2. As yet another example, processing unit 202 determines that a node of PLD 300 that has a depth, within PLD 300, greater than a pre-determined depth is a node of interest. The pre-determined depth is provided by the user via input device 208 (
As another example, processing unit 202 determines a discrepancy of a first signal activity estimated 604 at a node of PLD 300 and a second signal activity estimated at the node. In this example, the second signal activity is estimated at the node during an iteration preceding an iteration estimating 604 the first signal activity. Further, in this example, processing unit 202 determines whether the discrepancy is greater than a pre-determined discrepancy. Upon determining that the discrepancy is greater than the pre-determined discrepancy, processing unit 202 determines that the node is a node of interest. On the other hand, upon determining that the discrepancy is not greater than the pre-determined discrepancy, processing unit 202 determines that the node is not a node of interest. Moreover, in this example, the determination of the node of interest based on the pre-determined discrepancy helps reduce signal activity error by ensuring that highly volatile signals have samples signal activities. The pre-determined discrepancy is provided by the user via input device 208 (
As yet another example, processing unit 202 determines that a node of interest is a node of PLD 300 that consumes a higher amount of power compared to power consumed by the remaining nodes of the PLD 300. For instance, processing unit 202 determines that a node of PLD 300 consumes the higher amount of power by determining that a result of multiplication of a toggle rate estimated 604 at the node with static probability estimated 604 at the node is greater than the remaining results of multiplications of the remaining toggle rates estimated 604 for the remaining nodes with the remaining static probabilities estimated 604 for the remaining nodes. In this example, this determination of the node of interest by using the higher amount of power helps reduce error in power estimation and provides better insight for power optimization.
Signals of higher interest, such as a signal output at a node of interest, should be prioritized so that higher priority signals are sampled first. To determine a signal of higher interest, it is determined, by using the techniques described herein, how sensitive a power estimate is to change signal activity at a node of PLD 300. A signal that has a high amount of sensitivity causes a large change in a power estimate and should be sampled early because a node to which the signal is output may have a large potential for error.
The determination 606 of a node of interest of PLD 300 is also a determination of a node of interest of integrated circuit 108.
Upon determining 606 a node of interest, processing unit 202 makes a determination to connect a signal activity circuit to an output of the node of interest. Upon receiving the determination to connect a signal activity circuit to an output of a node of interest, the signal activity circuit is physically connected 608 to an output of a node of interest of integrated circuit 108 (
In an alternative embodiment, processing unit 202 estimates or determines a signal activity at a node of PLD 300 based on static probability at the node and not on a toggle rate at the node. In yet another alternative embodiment, processing unit 202 estimates or determines a signal activity of a node of PLD 300 based on a toggle rate at the node and not on static probability at the node. In an alternative embodiment, processing unit 202 estimates multiple signal activities at multiple, not all, nodes of PLD 300.
In various embodiments, processing unit 202 determines any number of nodes of interest from all or multiple nodes of PLD 300. For example, processing unit 202 determines multiple nodes of interest based on the pre-determined number of fanouts, the predetermined number of downstream nodes, the pre-determined capacitance, the pre-determined depth, or the pre-determined discrepancy.
It is noted that in another embodiment, a plurality of signal circuits are connected to a plurality of nodes of interest and the number of signal circuits determined to be connected depend on real estate area on board 104.
Glitching occurs at a time an output signal of a node of interest toggles more than once per clock cycle of a clock signal provided by a clock tree, such as clock tree 316 (
Oscillator 802 oscillates to generate high frequency clock signal 814 that has a frequency higher a clock signal applied via clock tree 316 (
Sampling counter 810 counts a number of clock cycles of high frequency clock signal 812 for the pre-determined time period. Synchronizer 806 receives high frequency clock signal 812 from PLL 804, receives an output signal 816 from a node of integrated circuit 108 (
Static probability counter 808 receives synchronized output signal 818 and high frequency clock signal 812, and counts, for the pre-determined time period, a number of clock cycles of high frequency clock signal 812 from which synchronized output signal 818 has the pre-determined value. For example, static probability counter 808 counts that out of M clock cycles generated during the pre-determined time period, synchronized output signal has the pre-determined value of 1 for N clock cycles to output a count of N, where N and M are integers greater than 0, and N is less than or equal to M. In this example, sampling counter 810 counts the M clock cycles.
Flip-flop 908 receives synchronized output signal 922 to output a preceding sample 924. Preceding sample 924 precedes a current sample 926 of synchronized output signal 922 by one clock cycle of high frequency clock signal 918. XOR gate 910 performs an exclusive OR operation on preceding sample 924 and current sample 926 to output an assertion, such as a value of 1, during a clock cycle of high frequency clock signal 918 in which the preceding and current samples 924 and 926 are not matched and to output a deassertion, such as a value of 0, during a clock cycle of high frequency clock signal 918 in which the preceding and current samples 924 and 926 are the same.
Toggle rate counter 914 receives any assertion and/or deassertion from XOR gate 910 and counts, for the pre-determined time period, a number of clock cycles of high frequency clock signal 918 during which the assertions are received to output a count of the assertions.
Signal activity circuit 1002 includes oscillator 1004, PLL 1010, sampling counter 1006, a synchronizer 1012, a static probability counter 1014, another flip flop 1016, an XOR gate 1018, and a toggle rate counter 1020.
It is noted that a single oscillator 1004, PLL 1010, sampling counter 1006, and synchronizer 1008 is used with both static probability counter 808 and toggle rate counter 914. Each of oscillator 1004, PLL 1010, sampling counter 1006, and synchronizer 1008 is shared by toggle rate counter 914 and static probability counter 808. A high frequency clock signal 1022 generated by oscillator 1004 may be high frequency clock signal 814 (
An output signal 1026 received by synchronizer 1008 may be output signal 816 (
Synchronizer 1012 synchronizes an output signal 1029, output by a node of integrated circuit 108 (
Moreover, toggle rate counter 1020 counts, for the pre-determined time period, a number of clock cycles of high frequency clock signal 1024 during which synchronized clock signal 1030 toggles from one value to another value. It is noted that oscillator 1004, PLL 1010, high frequency clock signal 1024, and sampling counter 1006 are shared by signal activity circuits 1000 and 1002. For example, oscillator 1004, PLL 1010, high frequency clock signal 1024, and sampling counter 1006 are shared by toggle rate counters 914 and 1020 and by static probability counters 808 and 1014.
In an alternative embodiment, signal activity circuits 800, 900, 1000, and 1002 (
Referring back to
Signal activity determination 610 is made upon connecting a signal activity circuit, such as signal activity circuit 800 (
Processing unit 202 determines 612 whether to reiterate performing techniques similar to techniques 608 and 610. For example, processing unit 202 determines that signal activities at all nodes of integrated circuit 108 (
As another example, processing unit 202 determines that signal activities at all nodes of interest of PLD 300 are determined 610 and determines not to reiterate. On the other hand, upon determining, by processing unit 202, that signal activities at all nodes of interest of PLD 300 are not determined 610, processing unit 202 determines to reiterate.
As yet another example, processing unit 202 determines a difference between a first amount of power calculated at a node of interest from signal activity estimated 604 and a second amount of power calculated at the node of interest from signal activity that is determined at technique 610. Processing unit 202 determines the second amount of power from signal activity that is determined 610 in the same manner in which processing unit 202 determines the first amount of power from signal activity estimated at technique 604. Upon determining that the difference is not greater than a first pre-determined difference, processing unit 202 determines 612 not to reiterate. On the other hand, upon determining, by processing unit 202, that the difference is greater than the first pre-determined difference, processing unit 202 determines 612 to reiterate.
As another example, processing unit 202 determines a difference between signal activity, such as a toggle rate or static probability, estimated 604 at node of interest of PLD 300 and signal activity, such as toggle rate or static probability, determined 610 at the node of interest of integrated circuit 108, determines whether the difference is not greater than a second pre-determined difference, and if so, determines not to reiterate. On the other hand, upon determining, by processing unit 202 that the difference is greater than the second pre-determined difference, processing unit 202 determines to reiterate. Processing unit 202 receives the first pre-determined difference and/or the second pre-determined difference from the user via input device 208 (
Upon determining 612 not to reiterate, processing unit 202 ends technique 600. On the other hand, upon determining 612 to reiterate, processing unit 202 makes a determination to connect a signal activity circuit to the remaining nodes, of integrated circuit 108, other than a node of interest determined at technique 610 (
Referring to
Processing unit 202 determines 704 signal activity at the remaining nodes upon receiving a count from static probability counter 808 (
Processing unit 202 may determine to optimize power consumption by a node of PLD 300 based on signal activities determined at 610 and 704. For example, upon determining that a static probability at a node of interest is high, such as greater than 0.5, processing unit 202 determines to place, during place phase 510 (
Technical effects of the herein described systems and techniques for performing signal activity extraction include determining signal activity at a node of interest efficiently and/or by considering glitching. For example, a signal activity is efficiently determined by determining a node of interest and/or by determining signal activity, first, at the node of interest compared to determining signal activities at the remaining nodes of integrated circuit 108.
Although the foregoing systems and techniques have been described in detail by way of illustration and example for purposes of clarity and understanding, it will be recognized that the above described systems and techniques may be embodied in numerous other specific variations and embodiments without departing from the spirit or essential characteristics of the systems and techniques. Certain changes and modifications may be practiced, and it is understood that the systems and techniques are not to be limited by the foregoing details, but rather is to be defined by the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
6515508 | Chang et al. | Feb 2003 | B1 |
6611945 | Narasimhan et al. | Aug 2003 | B1 |
6661235 | Rokunohe et al. | Dec 2003 | B2 |
6842040 | Chang et al. | Jan 2005 | B1 |
7242229 | Starr et al. | Jul 2007 | B1 |
7260808 | Pasqualini | Aug 2007 | B1 |
7317644 | Lin et al. | Jan 2008 | B1 |
7440532 | Chang | Oct 2008 | B1 |
7453968 | Chang et al. | Nov 2008 | B2 |
7490302 | Rahman et al. | Feb 2009 | B1 |
7539900 | Plofsky | May 2009 | B1 |
7555741 | Milton et al. | Jun 2009 | B1 |
7557608 | Lee et al. | Jul 2009 | B2 |
7623609 | Chang et al. | Nov 2009 | B2 |
7840918 | Duthou | Nov 2010 | B1 |
7865860 | Sawano | Jan 2011 | B2 |
8201121 | Sankaralingam et al. | Jun 2012 | B1 |
20060225020 | Chandrakasan et al. | Oct 2006 | A1 |
20070234266 | Chen et al. | Oct 2007 | A1 |
20080069276 | Wong et al. | Mar 2008 | A1 |
Entry |
---|
U.S. Appl. No. 11/414,933, filed May 1, 2006, entitled “Method and Apparatus for Deriving Signal Activities for Power Analysis and Optimization” by David Neto et al. |
U.S. Appl. No. 11/414,803, filed May 1, 2006, entitled “Method and Apparatus for Deriving Signal Activities for Power Analysis and Optimization”, by David Neto et al. |
U.S. Appl. No. 11/414,855, filed May 1, 2006, entitled “Method and Apparatus for Deriving Signal Activities for Power Analysis and Optimization”, by David Neto et al. |