The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which:
The next processing operation of
The traced information is then debugged 14. This is typically accomplished using an image of the program executed by the processor. A debug module operating with the program image may be used to implement this operation. The debug module links the simplified instruction set descriptors to instructions in the program image. Branch instructions are tracked with the periodically traced program counter differential information, as discussed below.
The probe 104 may include a memory 108 to store traced information. Alternately, an external memory may be used in conjunction with the probe 104. In one embodiment, the memory 108 is configured as a FIFO to store traced information. The instruction trace control block 106 is configured to identify when the FIFO is close to being full and in response to this condition, generates a stall signal applied to node 117 to prevent the processor from generating additional trace information that would otherwise overflow the FIFO. A FIFO control circuit 110 is connected to the memory 108 and the instruction trace control block 106 to coordinate this operation.
An optional probe interface block 112 provides an interface between the memory 108 and an external trace port, which delivers trace information to the computer 120. The probe 104 may also include a control bus 114 to apply control signals to the instruction trace control block and to coordinate memory control.
Thus, the invention provides a compressed and minimal set of information to reconstruct a simple instruction trace from an execution stream. The simple mechanism enables a small, efficient tracing methodology for relatively small processor cores.
A trace methodology is often defined by its inputs and outputs. Hence, an embodiment of the invention is described by the inputs to the core tracing logic and by the trace output format from the core. The execution flow of the program is traced at the end of the execution path.
The invention is disclosed in connection with a processor compatible with the family of processors sold by MIPS Technologies, Inc., Mountain View, Calif. The disclosure of the invention in this context is by way of example; naturally, the techniques of the invention are applicable to any number of chip architectures.
Attention initially turns to the trace inputs. One embodiment of the invention uses an In_TraceOn signal. When this signal is on, trace information is received from the core. The information is received when the signal is activated; that is, for the first traced instruction, a full PC value is output. When off, it cannot be assumed that legal trace words are available at the core interface.
An embodiment of the invention also uses an In_Stall signal. The In_Stall signal stalls the processor to avoid buffer (FIFO) overflow that can lose trace information. When off, a buffer overflow will simply throw away trace data and start over again. When on, the processor is signaled from the tracing logic to stall until the buffer is sufficiently drained and then the pipeline is restarted.
Depending on the core pipeline and the ease with which the pipe can be stalled, the trace control block needs to know the latency between the assertion of the In_Stall signal and the maximum number of cycles before the pipe can be halted. This information is then used to determine how many empty trace FIFO entries are needed to store potential trace information after the stall is asserted and the Out_Valid signal will be de-asserted.
For a given core implementation, the maximum pipeline stall latency is known and this will be used by the Instruction Trace Control Block (ITCB) 106 for its worst case calculations on FIFO space requirements. Note that if tracing is turned on, stalls are enabled to ensure no lost trace data, and the code being run has a particularly large number of unpredictable jumps, then for a given FIFO size, it is possible to make the core stall quite often. This will affect the use of the processor and the performance that one would see on the core when running under these conditions. If it is anticipated that tracing will be enabled and stalls will be enabled for full traces as the default configuration, then it is essential to take typical code that will run under these situations and characterize the number of bits of trace that will be needed for say 100 instructions and correlate that back to both the size of the FIFO in the ITCB 106 as well the expected rate at which this FIFO will be cleared in order to prevent an excessive amount of stalling.
With respect to trace outputs, stall cycles in the pipe are ignored by the tracing logic and are not traced. This is indicated by a valid signal (Out_Valid) that is turned off when no valid instruction is being traced. When the valid signal is on, instructions are traced out as described below. The traced instruction program counter (PC) value is a virtual address. In the output format, every sequentially executed instruction is traced as bit 0. Every instruction that is not sequential to the previous one is traced as either a 10 or an 11. This implies that the target instruction of a branch or jump is traced this way, not the actual branch or jump instruction. A 10 instruction implies a taken branch for a conditional branch instruction whose condition is unpredictable statically, but whose branch target can be computed statically and hence the new PC does not need to be traced out. Note that if this branch was not taken, it would have been indicated by a 0 bit that is sequential flow.
A 11 instruction implies a taken branch for an indirect jump-like instruction whose branch target could not be computed statically and hence the taken branch address is now given in the trace. This includes, for example, instructions like jr and jalr (associated with the MIPS instruction set) and interrupts:
In one embodiment, the instruction trace node 113 consists of 36 data signals plus a valid signal. The 36 data signals encode information about what the processor is doing in each clock cycle. A valid signal on node 115 indicates that the processor 102 is executing an instruction in this cycle and therefore the 36 data signals carry valid execution information. The data bus 113 is encoded as shown in Table I. Note that all the non-defined upper bits of the bus are zeroes.
Thus, when the valid signal is low (0), an instruction is not executed and the PC value is unchanged. When the valid signal is high (1) and the data signal is 0, a sequential instruction is executed. Thus, this simplified instruction state descriptor results in the PC value being incremented when interpreting the trace information in connection with the program image. The remaining simplified instruction state descriptors allow the PC value to be derived on a differential offset basis.
Thus, the ITCB 106 controls trace using the In_TraceOn signal. When 0, all data appearing on the trace outputs on node 113 is considered invalid. To turn on trace, the ITCB 106 switches In_TraceOn from 0 to 1. A 1011 record represents the first instruction executed thereafter with a full PC indicating the current execution point.
Records from the trace information are inserted into a memory stream exactly as they appear on node 113. Records are concentrated into a continuous stream starting at the LSB. When a trace word is filled, it is written to memory along with some tag bits. Each record consists of a 64-bit word, which comprises 58 message bits and 6 tag bits or header bits that clarify information about the message in that word.
In one embodiment, the ITCB 106 includes a 58-bit shift register to accommodate trace messages. Once 58 or more bits are accumulated, the 58 bits and 6 tag bits are sent to the memory write interface. Messages may span a trace word boundary. In this case, the 6 tag bits indicate the bit number of the first full trace message in the 58-bit data field.
The tag bits are not strictly binary because they serve a secondary purpose of indicating to off-chip trace hardware when a valid trace word translation begins. At least one of the 4 LSB's of the tag is always a 1. The longest trace message is 36 bits, so the starting position indicated by the tag bits is always between 0 and 35.
When trace stops (ON set to zero), any partially filled trace words are written to memory 108. Any unused space above the final message is filled with 1's. The decoder distinguishes 1111 patterns used for fill in this position from an 1111 overflow message by recognizing that it is the last trace word.
These trace words are written to a trace memory that is either on-chip or off-chip. No particular size of SRAM is specified; the size is user selectable based on the application needs and area trade-offs. Each trace word typically stores about 20 to 30 instructions, so a 1 KWord trace memory could store the history of 20 K to 30 K executed instructions
In one embodiment, the ITCB 106 includes a drseg memory interface (control bus 114) to allow the MIPS CPU to set up tracing and read current status. There are two drseg register locations to the ITCB as shown in Table II.
In one embodiment, the off-chip interface consists of a 4-bit data port (TR_DATA) and a trace clock (TR_CLK). TR_CLK can be a Double Data Rate (DDR) clock, that is, both edges are significant. TR_DATA and TR_CLK follow the same timing and have the same output structure as the PDTrace TCB described in MIPS specifications (see, e.g., www.mips.com). The trace clock is the same as the system clock or related to the system clock as either divided or multiplied. The OfClk bit in the Control/Status register is of the form X:Y, where X is the trace clock and Y is the core clock. The Trace clock is always ½ of the trace port data rate, hence the “full speed” ITCB outputs data at the CPU core clock rate but the trace clock is half that, hence the 1:2 OfClk value is the full speed, and the 1:4 OfClk ratio is half-speed.
When a 64-bit trace word is ready to transmit, the PIB 112 reads it from the FIFO and begins sending it out on TR_DATA. It is sent in 4-bit increments starting at the LSB's. In a valid trace word, the 4 LSB's are never all zero, so a probe listening on the TR_DATA port can easily determine when the transmission begins and then count 15 additional cycles to collect the whole 64-bit word. Between valid transmissions, TR_DATA is held at zero and TR_CLK continues to run. TR_CLK runs continuously whenever a probe is connected. An optional signal TR_PROBE_N may be pulled high when a probe is not connected and could be used to disable the off-chip trace port. If not present, this signal must be tied low at the PIB input.
The following encoding is used for the 6 tag bits in each trace word. As discussed above, the four least-significant bits in the encoded field are non-zero to tell the PIB receiver that a valid transmission is starting:
The invention supports breakpoint-based enabling of tracing. Each hardware breakpoint in the EJTAG block has a control bit associated with it that enables a trigger signal to be generated on a break match condition. This trigger signal can be used to turn trace on or off, thus allowing a user to control the trace on/off functionality using breakpoints. For the simple hardware breakpoints, there are already defined registers TraceIBPC, TraceDBPC, etc. in PDtrace that are used to control tracing functionality. Similar registers need to be defined to control the start and stop of trace information. In addition, the new complex Tuple breakpoints need to be added to the list of breakpoints that can trigger trace. The details on the actual register names and drseg addresses are shown in Table III.
The bits in each register are defined as follows:
While various embodiments of the invention have been described above, it should be understood that they have been presented by way of example, and not limitation. It will be apparent to persons skilled in the relevant computer arts that various changes in form and detail can be made therein without departing from the scope of the invention. For example, in addition to using hardware (e.g., within or coupled to a Central Processing Unit (“CPU”), microprocessor, microcontroller, digital signal processor, processor core, System on chip (“SOC”), or any other device), implementations may also be embodied in software (e.g., computer readable code, program code, and/or instructions disposed in any form, such as source, object or machine language) disposed, for example, in a computer usable (e.g., readable) medium configured to store the software. Such software can enable, for example, the function, fabrication, modeling, simulation, description and/or testing of the apparatus and methods described herein. For example, this can be accomplished through the use of general programming languages (e.g., C, C++), hardware description languages (HDL) including Verilog HDL, VHDL, and so on, or other available programs. Such software can be disposed in any known computer usable medium such as semiconductor, magnetic disk, or optical disc (e.g., CD-ROM, DVD-ROM, etc.). The software can also be disposed as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical, or analog-based medium). Embodiments of the present invention may include methods of providing the apparatus described herein by providing software describing the apparatus and subsequently transmitting the software as a computer data signal over a communication network including the Internet and intranets.
It is understood that the apparatus and method described herein may be included in a semiconductor intellectual property core, such as a microprocessor core (eggs, embodied in HDL) and transformed to hardware in the production of integrated circuits. Additionally, the apparatus and methods described herein may be embodied as a combination of hardware and software. Thus, the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.