This invention relates generally to digital systems. More particularly, this invention relates to forming a bus transaction trace stream for a master-slave system, where the bus transaction trace stream has simplified bus transaction descriptors.
Complex digital systems with multiple master devices (e.g., multi-purpose processors, digital signal processors, audio processors, video computation elements, or direct memory access controllers) commonly share bus resources. Such systems can exhibit poor performance related to bus utilization and bus master priority issues. In such systems, the bus is formed within a single chip and therefore the bus is not visible to a traditional external logic analyzer. An internal logic analyzer may be used to visualize bus traffic so that the system can be tuned for optimal performance. Implementing an internal logic analyzer is not practical in view of the large amount of data to be processed and limited silicon area. While comprehensive bus data can be routed off-chip for processing, such an approach still leads to information processing challenges.
Thus, it would be desirable to develop a technique for efficiently processing bus data associated with a complex digital system with multiple master devices.
The invention includes a method of monitoring bus transactions between masters and slaves. Simplified bus transaction descriptors are generated to characterize bus transactions. Simplified bus transaction descriptors are consolidated to form a bus transaction trace stream. The bus transaction trace stream is routed to a probe.
The invention also includes a system with a bus and bus agents connected to the bus, Each bus agent generates simplified bus transaction descriptors characterizing bus traffic. A funnel consolidates the simplified bus transaction descriptors from the bus agents to form a bus transaction trace stream.
The invention also includes a computer readable storage medium with executable instructions to characterize a bus and bus agents connected to the bus. Each bus agent generates simplified bus transaction descriptors characterizing bus traffic. A funnel consolidates the simplified bus transaction descriptors from the bus agents to form a bus transaction trace stream.
The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which:
Like reference numerals refer to corresponding parts throughout the several views of the drawings.
The computer 120 includes standard components such as a set of input/output devices 122. The input/output devices 122 may include a probe port, a keyboard, mouse, display, printer, and the like. The input/output devices 122 are connected to a central processing unit 124 via a bus 126. A memory 128 is also connected to the bus 126. The memory 128 includes a bus transaction constructor 130, which includes executable instructions to process the bus transaction trace stream and thereby reconstruct bus activity within the system 102. Standard techniques may be used to reconstruct the bus activity. These standard techniques are facilitated by the compressed and efficient nature of the information within the bus transaction trace stream. In other words, because the system 102 efficiently processes and condenses bus activity information, the process of reconstructing bus traffic is simplified. The reconstructed bus activity may be presented on a display associated with the input/output devices 122. The reconstructed bus activity allows for the visualization of bus traffic in a complex digital system, which facilitates debugging and tuning of the complex digital system.
The simplified bus transaction descriptors are then consolidated into a bus transaction trace stream 204. As discussed below, a circuit, referred to herein as a funnel, is used to combine the bus transaction descriptors into a bus transaction trace stream. The bus transaction trace stream is then routed to a probe 206. The probe (e.g., probe 104 of
In one embodiment of the invention, information is directly routed from one or more master devices (e.g., 302_1 and 302_2) to a second circuit, such as bus funnel_2312. The second bus funnel 312 routes a second bus transaction trace stream to a second probe port 314. This pathway and funnel may be used to support known tracing operations, as discussed below. The known tracing operations may be used to supplement the information generated using the techniques of the invention.
Attention now turns to a discussion of a specific embodiment of the invention that is compatible with systems sold by MIPS Technologies, Inc., Mountain View, Calif. In particular, attention turns to a discussion of a bus agent that may be used in connection with a MIPS 34K processor, In one embodiment, up to two requests and two responses may occur in a bus clock cycle. Therefore the bus agent includes two request messages and two response messages, designated A and B, with A being the messages from the earlier of the two CPU cycles. In one embodiment, the bus agent uses both a processor clock (to time data sampling from the bus) and the bus clock (to transmit results to a funnel). The agent does not format a trace message, but instead passes enough information to the funnel to allow the funnel to formulate a trace message. All agent outputs are registered using the bus clock rising edge. In one embodiment of the invention, request phase signals from an agent to a funnel adhere to the following format.
Observe that the agent operates to compress an address and offset to the number of bits needed to uniquely identify a slave and an offset into the slave. The burst length signal is another example of compression, by specifying a burst length and ignoring the data associated with the burst, a great deal of information may be omitted from subsequent processing.
In one embodiment, response phase signals adhere to the following format.
The foregoing example relates to a bus agent processing information associated with a master device in the form of a processor. Bus agents may also be configured for other types of master devices, such as a video computation element. For example, assume that master device 302_N is a video computation element attached to switch 307. Bus agent 306_N is connected to the master device 302_N via the switch 307. In this example, one request and one response may occur in any bus clock cycle. Therefore, the bus agent 306_N includes one request message and one response message leading to the funnel 308. The bus agent 306_N does not format a trace message, but instead passes enough information to the funnel 308 to allow the funnel 308 to formulate a trace message. All bus agent outputs are registered using the bus clock rising edge. The request and response phase signals are the similar to the bus agent of the previous example except that the MTagID and STagID fields are replaced with a separate field, MConnID[2:0], which indicates the video computation element number (0 to 6). In this example, the request phase signals from the agent to the funnel may observe the following format.
In this example, the response phase signals have the following format.
A different type of agent may be used in connection with direct memory access controllers. For example, a 16-channel direct memory access unit uses a single bus to connect to the switch 307. In this example, the bus associated with the direct memory access unit does not use split or retry signals, and out-of-order responses cannot occur. Therefore, the information required to associate a response with its corresponding request is simpler than in the previous examples. One request and one response can occur in each bus clock cycle. Therefore, in this example, request phase signals from an agent to a funnel may be configured as follows.
Response phase signals in this example may be configured as follows.
Observe that bus agents of the invention may be configured in different ways depending upon their location within the system. Thus, each bus agent may be optimized for the particular set of traffic that it must handle. Consequently, a funnel need not accommodate the complexities of different bus traffic flows within the system. Rather, the bus agents insulate the funnels from this complexity.
In one embodiment, the bus funnel 308 accepts simplified bus transaction descriptors from each agent at the bus clock rate, The simplified bus transaction descriptors are concatenated to form a trace frame or bus transaction trace stream
The trace frame may or may not include inputs from a particular agent. User selections affect the trace frame format. In one embodiment, both the funnel and the receiver in the probe are configured according to user selections so that they agree on the trace frame format without that information needing to be present in the data itself. In one embodiment, every enabled agent's trace message is included in every trace frame whether or not there is an active request or response in a particular cycle.
If the fractional bus clock is configured slower than necessary to transmit an entire trace frame in each bus clock, there are idle cycles present on the trace port outputs between frames. In one embodiment, the first 16-bit slice of each trace frame includes at least one non-zero bit, marking the first slice of a frame transmission. The receiver knows, based on the user setup, how many slices to expect in the trace frame. Once the entire frame is completed, the trace port outputs zeroes and the receiver waits for the next valid bit to start receiving the next frame.
If the fractional bus clock is configured too fast, the funnel does not have time to transmit an entire frame before the next frame arrives. This is a system setup error that can be detected and flagged prior to starting a trace session.
Trace messages are generated from each agent's outputs. A message includes both the request and the response that occur in a particular cycle. Trace formats for different agents may be different
For example, the trace message for an agent associated with a processor may be as follows.
Note that the MasterID is not needed because the probe knows which master it is by the position in the trace frame. In full mode, an additional set of 24 request phase address bits are recorded. A2 is the lowest address bit recorded. A1:A0 are assumed to be zero.
The Slave ID field is computed by the funnel in the same way for all three types of agents discussed herein. The computation is done using a lookup table based on masked comparisons of address bits 30 down to 15. In the present embodiment, up to 7 slaves may be configured by the user or automatic configuration file. A SlaveID of zero indicates that there is no request in this cycle. Values and masks for the comparison are stored in registers accessible through a funnel JTAG port if provided,
In combination with the SlaveID, the MAddr field can identify the peripheral and offset within that peripheral that is being accessed, assuming the maximum size of any one slave is 2̂26 (64M) bytes. In one embodiment, the lookup algorithm is a priority encoder with the following function.
The trace message format for a video computation element is similar to that of the previously described processor trace message format. However, the tag bits have different meaning—they identify the specific video computation element rather than a processor read buffer. The video computation element agent trace message format may be configured as follows.
Note that the MasterID is not needed because the probe knows that it is the video computation element (VCE) bus by the position in the trace frame. Which VCE generated the request is encoded in the MConnID bits The funnel maintains a counter of the number of outstanding transactions that have been requested Each response message includes that counter value so that the response can be associated with its corresponding request. The RespOrder field varies from 0 to 14 to indicate the first through 15th preceding requests. A RespOrder value of 15 indicates that the corresponding request is 16 or more requests earlier.
The trace message format for a direct memory access agent is slightly different than the previous embodiments. The direct memory access agent is capable of more burst lengths, requiring a 3-bit BurstLength field, but does not allow overlapping or out-of-order cycles, so no tag fields are needed in the response. The trace message format may be configured as follows.
Attention now turns to format issues associated with a trace frame or bus transaction trace stream. In one embodiment, a trace frame may be between 16 and 240 bits, depending on user configuration. In one embodiment, the trace frame begins with 16 or 40 bits from a processor agent A phase 1 (if enabled), then proceeds with 16 or 40 bits each from processor agent A2, processor agent B1, processor agent B2, the video control element agent, and the direct memory access agent. Any combination of agents may be enabled, though A1/A2 and B1/B2 are enabled in pairs.
One possible trace frame is:
The trace funnel outputs a trace frame in 16-bit slices starting at the least significant enabled trace message. This slice is routed to the trace port (or probe port 310), which re-clocks and outputs it. Time between valid trace frames is filled with zeroes. At least one of the least significant 4 bits of the first enabled trace message is always non-zero, allowing the trace port receiver to identify the first slice of a trace frame.
The trace or probe port receives the 16-bit trace frame slice output and simply re-clocks it. In one embodiment, the trace port is put into a separate module so that it can be easily located close to I/O pads of the chip. In one embodiment, the trace port probe interface ports are intended to connect to chip pins and are shown in the following table.
RRT_TR_CLK and RRT_TR_DATA[15:0] are each driven directly from registers. Skew control between RRT_TR_CLK and each of the signals in RRT_TR_DATA is critical for accurate transmission to a probe. RRT_TR_CLK and RRT_TR_DATA transition simultaneously and the probe is expected to create a reception sampling clock by doubling and phase shifting RRT_TR_CLK in order to latch RRT_TR_DATA at approximately the center of its valid zone. Routing of RT_TR_CLK and RRT_TR_DATA must meet impedance and maximum skew specifications associated with the MIPS 34K Integrator's Guide, section 4.4.5. These specifications affect both on-chip logic and the board layout.
In one embodiment, the trace port Connector is a 38-pin AMP Mictor connector, part number 2-0767004-2 or equivalent, the same connector used by some high-speed logic analyzer probes. Pinout, signal definition, and timing of the connector follow.
In one embodiment, at least one funnel includes a JTAG TAP, which is placed on the JTAG chain of the device in a daisy chain with the 4 TAP's of two processors. The funnel TAP Instruction Register is 4 bits long. The TAP instructions are:
The IDCODE register is a fixed value of 0×465332DX, where X begins at 0 and is incremented for future versions. The RRTCTRL register is organized as follows.
The RRTSLAVE1 through RRTSLAVE6 registers are organized as follows:
As shown in
The following table lists the TCTrace interface signals between each 34K Processor and the PDtrace Funnel (e.g., funnel 312).
In one embodiment, the fractional and full-speed clocks are provided by system logic and are not generated by the PIB. Therefore, the TC_ClockRatio[2:0] output from the TCB is ignored. TCB data always appears at the CPU clock rate and the funnel outputs trace to the trace bus at 333 MHz.
In operation, valid trace words from the TCB that contain data (indicated by lower bits not all zero) are latched from the TC_Data input into an internal 64-bit register. The register is clocked onto funnel outputs at the 333 MHz rate, requiring four cycles to complete transmission of each trace word, or eight cycles to complete one trace word from each processor. If valid data is present on both TCTrace buses, it is accepted alternately from the two buses.
TC_Stall is used to throttle the inputs but occurs only if the CPU clock rate is more than ⅛ of the 333 MHz PDtrace funnel clock. TC_Stall only affects data flow between the TCB and the Funnel. While TC_Stall is asserted trace words are held in the TCB and as long as the TCB's own internal FIFO does not fill, real-time CPU operation is not affected.
The funnel probe interface ports may be configured as follows.
PDT_TR_CLK and PDT_TR_DATA[15:0] are each driven directly from registers. Skew control between PDT_TR_CLK and each of the signals in PDT_TR_DATA is critical for accurate transmission to a probe. PDT_TR_CLK and PDT_TR_DATA transition simultaneously and the probe is expected to create a reception sampling clock by doubling and phase shifting PDT_TR_CLK in order to latch PDT_TR_DATA at approximately the center of its valid zone. Routing of PDT_TR_CLK and PDT_TR_DATA must meet impedance and maximum skew specifications listed in the MIPS 34K Integrator's Guide, section 4.4.5
In one embodiment, the trace port connector is a 38-pin AMP Mictor connector, part number 2-0767004-2 or equivalent, the same connector used by some high-speed logic analyzer probes.
As described in the MIPS 34K Integrator's Guide, VIO configures the probe for the logic level implemented on all other pins of this interface. VIO in the PDtrace and Mictor connectors should be the same voltage level.
In one embodiment the probe supports two simultaneous 16-bit trace ports with independent clocks along with ordinary JTAG and sideband signals The SP supports a 333 MHz data transmission speed.
Any number of probe configurations may be used in accordance with embodiments of the invention. The details of any such probe configuration are insignificant. However trace signal formats and timing stamping issues are noteworthy.
In one embodiment, PDtrace trace words are recorded into memory as they arrive from the system. As detailed in the PDtrace specification, a trace word consists of 4 tag bits indicating where the start of the first full message begins, 2 bits indicating which core generated the word, and 58 bits of trace messages. The two source bits in PDtrace trace words are either 00 or 01.
A bus transaction trace stream with simplified bus transaction descriptors has trace frames that may be longer or shorter than a 64-bit DRAM trace word. Probe hardware first compresses each trace frame by removing messages in which there is neither a valid request nor a valid response and adds a 6-bit format field to indicate which agents' messages remain in the trace frame. The resulting compressed trace frame may be between 22 bits (if there is one message) and 246 bits (all 6 messages valid in full mode). If no valid messages occur in a frame, then nothing is recorded in trace memory.
Probe hardware concatenates compressed trace frames into trace words by appending a 2-bit source field with value 10 and a Type bitfield indicating where a frame begins. If a frame cannot fit in the remaining space in a trace word, the first portion of the frame is inserted and the remainder is put into the next trace word. A 244-bit compressed frame could take more than four words to record. If a trace frame begins somewhere in a trace word, the Type field indicates the nibble number (1 to 15) in the 58-bit data field where the fra me begins. Type is zero if a trace word contains only the continuation of a previous trace frame.
Trace words from trace ports may be interleaved and recorded in the order they arrive at the probe. Along with the trace data, the probe records timing information of received trace words. This allows software to determine the time between a request and its corresponding response. The timestamp is created using the local 266 MHz probe DRAM timing clock and therefore has a 3.75 ns resolution.
Normally, the 8 upper bits of the DRAM word represent the timestamp and indicate 0 to 255 clocks of separation from a trace word to the preceding trace word.
If there is more than 255 clocks of separation, a full trace word is inserted containing the spacing in clocks. The timestamp counter is 32 bits, which will accommodate a time period of about 16 seconds at 266 MHz. If the timestamp counter overflows (indicating that no valid frames are recorded for a 16-second period), a timestamp record is inserted containing a time value of all ones. A timestamp trace word has the following format:
In one embodiment, a triggering system is a multi-state event detector that controls the capture system and target operation. Event detectors compare incoming capture data on each clock to a set of previously specified patterns. Pattern matching includes don't-care, high, low, rising edge falling edge either edge, double high, double low, and steady. When a match is detected, that event “fires” and feeds into the trigger engine.
Events are defined to apply to one or more of the channels and only those channels are compared with the event settings. An event fires if any enabled event matches the preprogrammed setting. For example, one could set an event to fire when either processor (meaning any of the four associated channels) initiates a write cycle to a specified slave in single-cycle mode.
Events apply only to compressed trace data, not PDtrace trace data. PDtrace has separate event recognition hardware inside each processor core which can generate breakpoint status (BS) outputs which in turn can generate RRT_TR_TRIGOUT signal on the Mictor connector.
The trigger engine generates actions when a specified condition is true. The conditions are combinations of the following trigger engine inputs:
These conditions may be combined in any way (using and, or, and not operators). When a specified trigger condition occurs, one or more actions is generated.
The triggering system always begins in sequencer state 0. The user specifies whether the capture system and each trigger counter begins in the active or inactive state.
On each cycle, the trigger system simultaneously checks for each specified trigger condition and executes the actions corresponding to each active condition. In some cases, conflicting actions are generated and in this case, a priority is defined. For example, if trigger actions occur to both start and stop the capture engine, then the start action is executed.
The user specifies the trigger program using a GUI editor or a set of Tcl commands. The editor includes a method to graphically enter an event definition and construct an if/then/else style trigger program.
The bus transaction constructor 130 of
Various techniques may be used to display the resultant bus activity data. For example, the bus activity data may be displayed in a raw mode that shows all requests and responses occurring in each bus clock cycle.
Various techniques associated with the bus transaction constructor 130 may be used to simplify the interpretation of the raw data. For example, graphical indicia may be used to distinguish the activity of each master device. In the example of
Alternately, the bus transaction constructor 130 may filter the raw data and only present data characterizing a subset of bus activity. For example,
A transaction mode shows requests occurring in each clock cycle and associated responses in the display. In the transaction mode, the display also shows a cycle duration which is the time between the request and its associated response, computed by subtracting the timestamps of the request and response messages.
In
The second row of data in
An embodiment of the present invention relates to a computer storage product with a computer-readable medium having computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs, DVDs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using Java, C++, or other object-oriented programming language and development tools. Another embodiment of the invention may be implemented in hardwired circuitry in place of, or in combination with, machine-executable software instructions.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, they thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention.