Increasing demand for computer system scalability (i.e., consistent price and performance and higher processor counts) combined with increases in performance of individual components continues to drive systems manufacturers to optimize core system architectures. One such systems manufacturer has introduced a server system that meets these demands for scalability with a family of application specific integrated circuits (“ASICs”) that provide scalability to tens or hundreds of processors, while maintaining a high degree of performance, reliability, and efficiency. The key ASIC in this system architecture is a cell controller (“CC”), which is a processor-I/O-memory interconnect and is responsible for communications and data transfers, cache coherency, and for providing an interface to other hierarchies of the memory subsystem.
In general, the CC comprises several major functional units, including one or more processor interfaces, memory units, I/O controllers, and external crossbar interfaces all interconnected via a central data path (“CDP”). Internal signals from these units are collected on a performance monitor bus (“PMB”). One or more specialized performance counters, or performance monitors, are connected to the PMB and are useful in collecting data from the PMB for use in debugging and assessing the performance of the system of which the CC is a part. Currently, each of the performance counters is capable of collecting data from only one preselected portion of the PMB, such that the combination of all of the performance counters together can collect all of the data on the PMB. While this arrangement is useful in some situations, there are many situations in which it would be advantageous for more than one of the performance counters to access data from the same portion of the PMB. Additionally, it would be advantageous to be able to use the performance counters in conjunction with data collected in a clock domain different from that associated with the performance counter.
In one embodiment, a system for validating data collected in a first clock domain is disclosed. A performance counter is disposed in a second clock domain to perform performance computations relative to the data. Validation circuitry is in communication with the data in order to provide to the performance counter a validation signal indicative of the validity of the data.
In the drawings, like or similar elements are designated with identical reference numerals throughout the several views thereof, and the various elements depicted are not necessarily drawn to scale.
Further, in the illustrated embodiment, the system state space 100 is disposed in a first clock domain and the performance counters 106(1)-106(M) are disposed in a second clock domain. As will be described in further detail hereinbelow, validation circuitry, which may be integrated with the data collection and selection circuitry 102 at the interface of the first clock domain and the second clock domain, forwards the data collected in system state space 100 along with a validation signal to one or more of the performance counters 106(1)-106(M). The validation signal indicates the validity of the data and may inform the performance counters 106(1)-106(M) of invalid cycles that contain no data or duplicate data, for example, especially where the two clock domains are clocked at different frequencies.
In general, the AND/OR circuit 201 enables access to any and all of the bits of the debug_bus signal coming into the performance counter 200 from the observability bus 104, which depending on the configuration of the validation circuitry, may collect data from either the clock domain of the performance counter 200 or another clock domain. In one embodiment, as illustrated in
The match/threshold circuit 202 receives inputs from the sm_sel circuit 204 and szero circuit 206 in addition to a mmask [15:0] input. When the match/threshold circuit 202 is operating in “match” mode, a portion of the circuit activates a match_thresh event signal to the AND/OR circuit 201 when an N-bit portion of the debug_bus signal selected as described in greater detail below with reference to the sm_sel circuit 204 and the szero circuit 206 matches an N-bit threshold for all bits selected by a match mask (“mmask”). In particular, for all bits of the selected N-bit debug bus signal portion that are “don't cares”, the corresponding bit of mmask will be set to 0; conversely, for all bits of the selected N-bit debug bus signal portion that are not “don't cares”, the corresponding bit of mmask will be set to 1. The match_thresh_event signal is one of the two bits appended to the debug_bus signal. In the illustrated embodiment, N is equal to 16.
When the match/threshold circuit 202 is operating in “threshold” mode, a portion of the circuit 202 activates the match_thresh_event signal to the AND/OR circuit 201 when an S-bit portion of the debug_bus signal selected and zeroed as described in greater detail below with reference to the sm_sel circuit 204 and the szero circuit 206 is equal to or greater than the threshold. In the illustrated embodiment, S is equal to N/2, or 8.
The sm_sel circuit 204 selects an N-bit portion of the debug_bus signal aligned on a selected 10-bit block boundary into both the match portion and the threshold portion of the match/threshold circuit 202 and to a sum input of the counter circuit 208. As previously stated, in the illustrated embodiment, N is equal to 16. The szero circuit 206 zeroes out none through all but one of S bits aligned on a selected 10-bit block boundary into the threshold portion of the match/threshold circuit 202 and the sum input of the counter circuit 208. In the illustrated embodiment, S is equal to eight. The selected 10-bit block boundary is identified by the value of a three-bit control signal sm_sel input to the sm_sel circuit 204.
The operation of counter circuit 208 is enabled by setting an counter enable signal B, which comprises one input of a three-input AND gate (not illustrated in this FIG.) associated with the counter circuit 208. The second input of the AND gate comprises a validation signal (valid_cycle) that, in one embodiment, discriminates against invalid cycles of a third input signal (i.e., an inc signal) to the AND gate. In general, the counter circuit 208 is an X bit counter that can hold, increment by one, add S bits, clear or load a value. In one embodiment, the performance counter 200 is 48 bits plus overflow that provides a general purpose counter in that it looks at all D bits of the debug_bus signal for an event mask plus two extra events, eight separate selections of 16 bits for the match compare operation and eight separate selections of eight bits for the threshold compare and the accumulate operations. The eight bits for the threshold compare and the accumulate operations are the bottom eight bits of the 16 bits selected for the match compare operation.
As will be discussed in greater detail hereinbelow, the operation of the edge detect circuit 210 is controlled by the inc_raw signal, a valid_cycle signal, and an edge_op signal. Specifically, when the edge detect circuit 210 is operational, the number of times an event begins is detected and driven to the counter circuit as an inc signal. On the other hand, when the edge detect circuit 210 is nonoperational, the inc_raw signal, which is representative of the event itself, is driven to the counter circuit as the inc signal. The validation signal (valid_cycle) received by the edge detect circuit 210 discriminates against invalid cycles of the inc_raw signal.
Similar to the edge detect circuit 210, the operation of the min/max circuit 212 is controlled by the inc_raw signal and the valid_cycle signal. The min/max circuit 212 forwards a duration signal, i.e., duration_end_ff signal, to the counter circuit 208 that counts the minimum or maximum time an event persists. As will be explained further in
The synchronizer controller 304 drives a control signal to the synchronizer 302 to effectuate the transfer of data across the clock domain interface from the debug_bus_core, which is driven at a core clock rate, to another data signal, debug_bus_core_link, which is driven at a link clock rate. The debug_bus_core_link and the debug_bus_link, which may originate in the link clock domain, provide inputs for a multiplexer (MUX) circuit block 306 comprising a number of MUXes, each of which operates under the control of a MUXSEL signal that may be supplied by a control status register (CSR) (not shown). In one embodiment, if the MUXSEL signal is asserted, then the debug_bus_core_link signal is selected. Otherwise, the debug_bus_link signal is selected. The intermixed data signal output from the MUX block 306 is designated as debug_bus which can include some groups of data from the debug_bus_core_link signal and some groups of data from debug_bus_link signal. In one implementation, as explained generally hereinabove and in particular detail in U.S. patent application Ser. No. 10/635,083 (U.S. Pat. No. 7,424,397), filed Aug. 6, 2003 entitled “GENERAL PURPOSE PERFORMANCE COUNTER”, cross-referenced hereinabove, the debug_bus_core_link and debug_bus_link signals comprise 80-bit data signals each, with 8 groups or blocks of 10-bits apiece, wherein the MUX block 306 comprises eight 2-input MUXes for intermixing the data on a block-by-block basis. With this arrangement, the performance monitoring system described herein may perform performance calculations on data collected in the same domain as that of the performance counter 200 or data collected in a domain different from the clock domain of the performance counter 200, even where the debug data is intermixed from different domains.
Continuing to refer to
The c21_valid_ff_delayed signal is driven to a logic block 310 which comprises an OR gate 312 coupled to an inverter 314 and an AND gate 316. Additionally, a mode control signal called core_mode is driven to the logic block 310, which core_mode signal may be provided via a CSR (not shown). The core_mode signal indicates whether the observability bus data being utilized is obtained from the same link clock domain as performance counter 200 or a clock domain different from the performance counter 200, e.g., the core clock domain. The c21_valid_ff signal or its delayed counterpart signal is utilized in the latter case when the observabilty bus data being utilized is from a different clock domain. Essentially, the c21_valid_ff signal (or its delayed counterpart) is asserted (e.g., active high) when the data is valid and de-asserted when the data is invalid (which may arise due to the dead cycles between the different clock domains).
In an implementation where data is gathered from the debug_bus_link, the MUXSEL signal is de-asserted such that the MUX block 306 is operable to select the debug_bus link data, on a block-by-block basis. Additionally, the core_mode signal is driven to the inverter 314, which inverts the core_mode signal and drives the inverted core_mode signal to the OR gate 312. In this case, regardless of the output of the AND gate 316, the OR gate 312 asserts the valid_cycle signal as an active high signal to indicate that the data is valid.
On the other hand, in an implementation where data is gathered from the debug_bus_core_link, the MUXSEL signal is asserted thereby causing the MUX block 306 to select the debug_bus_core_link data. Moreover, the core_mode signal is driven to the AND gate 316 along with the c21_valid_ff_delayed signal. The AND gate 316 drives a valid_and signal high to the OR gate 312 only when the core_mode signal and c21_valid_ff signal are both asserted (e.g., active high). Responsive to the valid_and signal, the OR gate 312 asserts the valid_cycle signal.
When considering both implementations, the valid_cycle signal is asserted high when 1) the core_mode is de-asserted or 2) the core_mode is asserted and the c21_valid_ff is asserted. Accordingly, the valid_cycle signal is asserted (i.e., high) when the observability bus is in the same domain as the performance counter 200 or the observability bus is in a different domain from the performance counter 200 and the synchronizer controller indicates that the data is valid. As will be explained in more detail hereinbelow, the valid cycle signal, i.e., the validation signal, indicates the validity of the in-coming performance data associated with the selected observability bus. As previously observed, the data on the invalid cycles could be a repeat of previous or future cycles, or could be zeroed data for invalid cycles. Regardless, the data on the invalid cycles is data that may adversely affect the performance calculations. In one embodiment, the validation signal disables the performance counter 200 on invalid clock cycles by causing particular advanced features of the performance counter 200 to ignore the invalid cycles.
As discussed, the edge detect circuit portion of the circuit 400 can operate in an active mode or in an inactive mode, wherein the edge_op control signal of the multiplexer 408 determines the mode of operation. If the edge_op signal is de-asserted, then the edge detect circuit is in a inactive mode and the inc_raw signal is selected and asserted as the inc signal. In an active (i.e., operational) mode, the edge_op signal is asserted active high and the inc_and signal is selected to be asserted as the inc signal. More specifically, an edge is detected when the inc_raw signal is asserted active high, the valid_cycle signal is asserted active high, and the inc_hold_ff signal is de-asserted. Hence, an edge is detected during an asserted active high valid cycle which follows a de-asserted cycle. It should be appreciated that during an invalid cycle, i.e., valid_cycle is a logic 0, the register 404 holds the inc_raw signal until a valid cycle is detected.
With respect to the min/max portion of the circuit 400, the valid_cycle signal, the inc_hold_ff signal and an inverted inc_raw signal are provided as inputs to the AND gate 410 which asserts a duration_end_ff signal that is held in register 412 before being driven to the counter circuit 208. Hence, the end of an events duration signal occurs when a de-asserted valid cycle follows an asserted valid cycle. It should be appreciated that with respect to both the edge detect and mm/max portions of circuit 400, the valid_cycle signal indicates the validity of the data associated with the inc_raw signal. If the valid_cycle signal is de-asserted, then the respective ctrcuits ignore the data associated with inc_raw signal. On, the other hand, if the valid_cycle signal is asserted active high, then the circuit portions effectuate the respective counterpart circuits described in the following patent applications: “EDGE DETECT CIRCUIT FOR PERFORMANCE COUNTER,” U.S. Patent Publication No. 2005/0283669, in the names of Richard W. Adkisson and Tyler J. Johnson (hereinafter the “Edge Detect Circuit application”); and “DURATION MINIMUM AND MAXIMUM CIRCUIT FOR PERFORMANCE COUNTER,” U.S. Patent Publication No. 2005/0283677, in the names of Richard W. Adkisson and Tyler J. Johnson (hereinafter the “MIN/MAX Circuit application”), both of which are hereby incorporated by reference in their entirety for all purposes. Further information regarding the operation of the edge detect circuit and the min/max circuit may therefore be found in these applications.
Additionally, in the context of performing computations on data obtained across clock domain interfaces, the operation is set forth as follows. When the counter circuit 208 is enabled, a valid_cycle signal is asserted, and the validated inc signal is activated, a logic one is output from the AND gate 514. In any other case, the output of the AND gate 514 will be a logic zero. The output of the AND gate 514 is replicated by an 8× replicator 516 and the resulting 8-bit signal is bit-wise ANDed with an 8-bit signal output from a MUX circuit 518. The inputs to the MUX circuit 518 are the sum[7:0] signal output from the szero circuit 206 and an 8-bit signal the value of which is [00000001]. The sum[7:0] signal will be output from the MUX circuit 518 when the acc signal is activated; otherwise, the [00000001] signal will be output from the MUX circuit.
An AND circuit, represented by an AND gate 520, bit-wise ANDs the signals output from the replicator 516 and from the MUX circuit 518. The resulting 8-bit signal is input to a register 522. An adder 524 adds the 8-bit signal stored in the register 522 to the 48-bit sum stored in the count value register 512. The new sum output from the adder 524 is input to a MUX circuit 526 that is connected to receive two other inputs: a logic zero and a csr_write_value, respectively. When a csr_write signal is enabled and the MUX circuit 526 is activated, the value of csr_write_value is output from the MUX circuit 526 and written to the count value register 512. In this manner, a value can be selectively loaded into the count value register 512.
Similarly, when a clear signal is asserted, 48 zero bits are output from the MUX circuit 526 to the count value register 312, thereby clearing the register. The generation of the clear signal involves an OR gate 528 having a max_op signal and a min_op signal as inputs that represent the minimum duration mode and maximum duration mode of the min/max circuit 212, respectively. If either operational mode is activated, an op signal is driven to AND gate 530 which also receives the duration_end_ff signal from the min/max circuit 212. If both the op signal and duration_end_ff are asserted active high then a clear_counter—2 signal is driven to an OR gate 532 which also receives a clear_counter—1 signal. Hence, if either the clear_counter—1 signal or the clear_counter—2 signal is asserted active high, then the clear signal is driven to MUX circuit 526 as discussed hereinabove.
If neither the csr_write signal nor the clear signal is asserted and the acc signal is asserted, the output of the adder 524 is written to the count value register 512, thereby effectively adding S bits (i.e., the value of the sum[7:0] signal) to the previous value of the count value register 512. Not enabling the counter circuit 208 results in the count value register 512 being held at its current value. Finally, to increment the value of the count value register 312 by one, the counter circuit 208 must be enabled, the inc signal must be asserted, and the acc signal must not be asserted.
An implementation of the invention described herein thus provides for a general purpose performance counter that may be utilized to validate data collected in a clock domain different from that of the performance counter or to validate data collected from the clock domain of the performance counter. The embodiments shown and described have been characterized as being illustrative only; it should therefore be readily understood that various changes and modifications could be made therein without departing from the scope of the present invention as set forth in the following claims. For example, while the embodiments are described with reference to an ASIC, it will be appreciated that the embodiments may be implemented in other types of ICs, such as custom chipsets, Field Programmable Gate Arrays (“FPGAs”), programmable logic devices (“PLDs”), generic array logic (“GAL”) modules, and the like. Furthermore, while the embodiments shown are implemented using CSRs, it will be appreciated that control signals may also be applied in a variety of other manners, including, for example, directly or may be applied via scan registers or Model Specific Registers (“MSRs”). In addition, the various logic modules may be realized in any known or heretofore unknown hardware implementations where certain signal logic levels or their complements are utilized appropriately. Accordingly, all such modifications, extensions, variations, amendments, additions, deletions, combinations, and the like are deemed to be within the ambit of the present invention whose scope is defined solely by the claims set forth hereinbelow.
This nonprovisional application claims priority based upon the following prior United States provisional patent application entitled: “PERFORMANCE MONITORING SYSTEM,” Application No. 60/576,764, filed Jun. 3, 2004, in the name(s) of: Richard W. Adkisson and Tyler J. Johnson, which is hereby incorporated by reference. This application is related to U.S. patent application Ser. No. 11/021,259 (U.S. Patent Publication No. 2005/0283677), filed Dec. 23, 2004 entitled “DURATION MINIMUM AND MAXIMUM CIRCUIT FOR PERFORMANCE COUNTER”; U.S. patent application Ser. No. 11/022,023 (U.S. Pat. No. 7,346,824), filed Dec. 23, 2004 entitled “MATCH CIRCUIT FOR PERFORMING PATTERN RECOGNITION IN A PERFORMANCE COUNTER”; U.S. patent application Ser. No. 11/022,021 (U.S. Patent Publication No. 2005/0283669), filed Dec. 23, 2004 entitled “EDGE DETECT CIRCUIT FOR PERFORMANCE COUNTER”; and U.S. patent application Ser. No. 10/635,083 (U.S. Pat. No. 7,424,397), filed Aug. 6, 2003 entitled “GENERAL PURPOSE PERFORMANCE COUNTER”; all of which are hereby incorporated by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
4084225 | Anderson | Apr 1978 | A |
4796211 | Yokouchi | Jan 1989 | A |
4799190 | Douglas | Jan 1989 | A |
4821178 | Levin | Apr 1989 | A |
5260979 | Parker et al. | Nov 1993 | A |
5347540 | Karrick | Sep 1994 | A |
5517155 | Yamauchi | May 1996 | A |
5579527 | Chin | Nov 1996 | A |
5581163 | Alves de Lima | Dec 1996 | A |
5588115 | Augarten | Dec 1996 | A |
5590304 | Adkisson | Dec 1996 | A |
5610925 | Takahashi | Mar 1997 | A |
5644578 | Ohsawa | Jul 1997 | A |
5651112 | Matsuno et al. | Jul 1997 | A |
5729678 | Hunt | Mar 1998 | A |
5796633 | Burgess et al. | Aug 1998 | A |
5819053 | Goodrum | Oct 1998 | A |
5835702 | Levine et al. | Nov 1998 | A |
5880671 | Ranson | Mar 1999 | A |
5881223 | Agrawal et al. | Mar 1999 | A |
5881224 | Ranson | Mar 1999 | A |
5887003 | Ranson et al. | Mar 1999 | A |
5930482 | Carter | Jul 1999 | A |
5931926 | Yeung et al. | Aug 1999 | A |
5956477 | Ranson | Sep 1999 | A |
6112317 | Berc et al. | Aug 2000 | A |
6112318 | Jouppi et al. | Aug 2000 | A |
6134676 | VanHuben | Oct 2000 | A |
6189072 | Levine | Feb 2001 | B1 |
6226698 | Yeung et al. | May 2001 | B1 |
6356615 | Coon | Mar 2002 | B1 |
6360337 | Zak et al. | Mar 2002 | B1 |
6360343 | Turnquist | Mar 2002 | B1 |
6463553 | Edwards | Oct 2002 | B1 |
6487683 | Edwards | Nov 2002 | B1 |
6502210 | Edwards | Dec 2002 | B1 |
6546359 | Week | Apr 2003 | B1 |
6557119 | Edwards et al. | Apr 2003 | B1 |
6615370 | Edwards et al. | Sep 2003 | B1 |
6658578 | Laurenti et al. | Dec 2003 | B1 |
6684348 | Edwards et al. | Jan 2004 | B1 |
6732307 | Edwards | May 2004 | B1 |
6750693 | Duewer | Jun 2004 | B1 |
6826247 | Elliott et al. | Nov 2004 | B1 |
6831523 | Pastorello et al. | Dec 2004 | B1 |
7003599 | Warren | Feb 2006 | B2 |
7346824 | Adkisson | Mar 2008 | B2 |
7373561 | Baumer et al. | May 2008 | B2 |
7424397 | Adkisson | Sep 2008 | B2 |
20020054537 | Pascucci | May 2002 | A1 |
20020166012 | Natarajan | Nov 2002 | A1 |
20020196886 | Adkisson | Dec 2002 | A1 |
20030036883 | Mericas | Feb 2003 | A1 |
20030217302 | Chen | Nov 2003 | A1 |
20040003329 | Cote et al. | Jan 2004 | A1 |
20040059967 | Kleppel et al. | Mar 2004 | A1 |
20040083077 | Baumer et al. | Apr 2004 | A1 |
20040210782 | Hsu | Oct 2004 | A1 |
20050162199 | Green et al. | Jul 2005 | A1 |
20050283669 | Adkisson | Dec 2005 | A1 |
20050283677 | Adkisson | Dec 2005 | A1 |
Number | Date | Country |
---|---|---|
3700426 | Aug 1987 | DE |
102005020656 | Dec 2005 | DE |
0897152 | Feb 1999 | EP |
2401447 | Nov 2004 | GB |
2313829 | Dec 2005 | GB |
Number | Date | Country | |
---|---|---|---|
20050273671 A1 | Dec 2005 | US |
Number | Date | Country | |
---|---|---|---|
60576764 | Jun 2004 | US |