The present disclosure relates to data processors and more particularly to error detection for data processors.
Some data processors employ watchdog timers to detect an error condition at the processor, such as may result from a problem with a program flow (e.g. a non-exiting loop) at the processor. The watchdog timer continuously counts towards a threshold and if it reaches the threshold an interrupt is typically generated. In response to the interrupt the data processor takes a recovery action to address the error condition, such as initiating a system reset. Accordingly, in order to prevent the watchdog timer from generating the interrupt, the watchdog timer must periodically be serviced. Typically the watchdog timer is serviced by placing explicit instructions into the program flow to reset the timer to assure its periodic reset. However, watchdog timers typically do not provide an indication as to the cause of an error condition. For example, it can be difficult to determine whether a watchdog timer timed out due to an infinite loop in a program flow or due to a stall in an instruction pipeline. In addition, it is difficult for a watchdog timer to detect a stall at an execution unit of an instruction pipeline when other execution units continue to function and are therefore able to service the timer. Accordingly, there is a need for an improved technique for detecting error conditions at a data processor.
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
To detect a non-responsive condition at a processor, a counter is associated with an operation at a first stage of an instruction pipeline. A value stored in the counter is periodically adjusted towards a threshold value. An error indicator is provided in response to the value stored in the counter reaching the threshold value thereby indicating that a defined amount of time expired before a subsequent stage has completed processing of the operation. However, if the subsequent stage completes processing of the operation prior to the value stored in the counter reaching the threshold, the counter is automatically disassociated with the operation and can, therefore, be associated with another operation at the first stage of the pipeline. Accordingly, the counter does not use an explicit instruction that is responsible for resetting its value.
Referring to
The instruction pipeline 110 includes a number of pipeline stages including stage 111, stage 112, and additional stages through stage 113. The stage 111 can represent the first stage of the pipeline and the stage 113 can represent the final stage of the pipeline. Alternatively, there may be additional stages before stage 111, and additional stages after stage 113 that are not illustrated. Each stage represents a portion of the instruction pipeline 110 that executes a defined task as part of executing an instruction in a single clock cycle based on an operation that is at the stage for that clock cycle. It will be appreciated that although operations are typically operated on at a stage in a single clock cycle, they can remain at a stage of the instruction pipeline 110 for more than one clock cycle while the processor 100 executes tasks resulting from the operation. For example, an operation can remain at a load/store stage of the instruction pipeline 110 for more than one clock cycle while the processor 100 retrieves data from memory in response to the load/store operation. The instruction pipeline 110 also includes a microcode module 115 which can provide operations to the pipeline for execution.
The control module 120 includes a counter 121 that is configured to be associated with an operation at a specific stage at the instruction pipeline 110 in response to an asserted signal at the R input. In response to assertion of a signal at the R input, the counter 121 is reset. As used herein, the term “reset” means that the value stored by the counter 121 is set to an initialization value or that a new threshold value for comparison to the value stored by the counter 121 is calculated. In addition, the control module 120 is configured to assert a signal at its FAIL output in response to the counter 121 indicating that a defined amount of time has expired, e.g. the threshold value has been reached.
In one embodiment, the control module 120 includes a single counter 121 which is associated with an operation at the instruction pipeline 110 in response to assertion of a signal at the R input. In this case, the control module 120 does not monitor the progress of the operation at the instruction pipeline 110 because the operation is deterministically associated with the counter 121. In another embodiment, the control module 120 can include multiple counters to monitor operations at the instruction pipeline 110, with each counter associated with a different operation. In this case, it may be necessary for the control module 120 to monitor the progress of operations at the instruction pipeline 110, as operations may complete out of order. Accordingly, if a first operation associated with a first counter completes, asserting the OP COMPLETE signal, the control module can determine which of the counters should be reset.
During operation, the instruction pipeline 110 executes operations based on instructions being executed at the data processor 100. The operations are advanced stage by stage through the instruction pipeline 110. In response to an operation reaching the stage 112, the OP START/COMPLETE signal is asserted, thereby resetting the counter 121 and associating it with the operation at the stage 112. In a particular embodiment, the value stored in the counter 121 is adjusted (e.g. incremented or decremented) over time towards a threshold value, such that it is an indicator that a defined amount of time has elapsed since the operation began execution at the stage 112 if the value stored by the counter 121 reaches the threshold. In a particular embodiment, the defined amount of time is programmable. In another embodiment, the defined amount of time is predefined.
In response to the value stored at the counter 121 reaching the threshold value, the control module 120 asserts the OPFAIL signal to indicate that the operation did not reach an expected stage, such as stage 113, prior to the threshold value being reached, i.e., a defined amount of time elapsing. This can be indicative of an error condition at the instruction pipeline 110. The assertion of the OPFAIL signal causes the instruction pipeline 110 to simulate completion of the operation. In a particular embodiment, the instruction pipeline 110 simulates completion of the operation by indicating an exception at the pipeline, thereby causing operations to be flushed from one or more of the stages 111-113. In addition, in response to assertion of the OPFAIL signal the instruction pipeline 110 executes a debug procedure by instructing the microcode module 115 to provide debug code to one or more of the stages 111-113 for execution. The debug code can perform a machine check to retrieve state information from the instruction pipeline 110 that can be analyzed to determine which operation resulted in the stall.
In response to a stage of the instruction pipeline 110, such as the stage 113, completing processing of the operation prior to assertion of the OPFAIL signal, the instruction pipeline 110 asserts the OP START/COMPLETE signal to de-associate the counter 121 with the completed operation. In one embodiment, the OP START/COMPLETE signal is subsequently asserted to associate another operation at stage 112 with the counter 121. In another embodiment, when the OP START/COMPLETE signal is asserted to indicate completion of an operation, another operation at the instruction pipeline 110 is immediately and automatically associated with the counter 121.
Referring to
The clock module 230 includes an output to provide a clock signal to adjust a value stored at the counter 221. The clock signal can be a periodic signal or non periodic signal, such as a system clock, a real time clock, and the like. It will be appreciated that the clock signal can also be received by the control module 220 from an external source rather than generated internally at the clock module 230. In addition, the clock module 230 could receive the external clock signal and modify the received signal to provide the clock signal to the counter 221.
During operation, the OP START/COMPLETE signal at the R input is asserted to associate the counter 221 with an operation at a stage of the instruction pipeline 110 (
If the OP START/COMPLETE signal at the R input is asserted prior to assertion of the OPFAIL signal, indicating that the operation associated with the counter 221 has completed operation at a stage of the instruction pipeline 110, the threshold value stored in the threshold register 225 is again loaded into the counter 221. This prevents an assertion of the OPFAIL signal for the completed operation and associates the counter 221 with another operation at the instruction pipeline 110. Accordingly, as operations associated with the counter 221 are completed at the instruction pipeline 110, the operation is disassociated with the counter 221, and the counter 221 can be subsequently be associated with other operations at other stages of the pipeline. In one embodiment, the counter 221 is automatically associated with another operation at the counter in response to an assertion of the OP START/COMPLETE signal that indicates an operation has completed processing at a stage of the instruction pipeline 110. In another embodiment, the assertion of the OP START/COMPLETE signal to indicate that the operation has been completed disassociates the counter 221 with the completed operation, but does not associated the counter 221 with another operation. In this case the counter 221 may not be reset, but adjustment of the counter can be stopped so that the counter 221 does not assert the OPFAIL signal. A subsequent assertion of the OP START/COMPLETE signal to indicate that a new operation has reached a particular stage of the instruction pipeline 110 associates the counter 221 with the operation by resetting the counter 221.
Referring to
The counter 321 is a free-running counter that stores a value that is adjusted based on a clock signal provided by the clock module 330. The clock signal can be a periodic signal based on a system clock, a periodic signal based on a real time clock, a signal based on the timing of system events, and the like.
The threshold control module 325 includes a register 327 that stores a time value representing a defined amount of time. The register 327 can be user programmable and can store a value that is expressed in clock cycles of the clock signal provided by the clock module 330. The threshold control module 325 also includes a register 326 to store a threshold value.
During operation, in response to an assertion of the OP START/COMPLETE signal at the R input to indicate that the counter 321 should be associated with an operation, the threshold control module 325 calculates a threshold value based on the time value stored in the register 327 and on the value stored at the counter 321 when the OP START/COMPLETE signal is asserted. For example, the threshold control module 325 can add the time value 327 to the value stored in the counter 321 to determine the threshold value. Calculation of the threshold value thus associates the counter 321 with an operation at the instruction pipeline 110 (
The compare module 340 compares the value stored at the counter 321 to the threshold value stored in the register 326. If the values match, indicating that the defined amount of time represented by the time value stored in the register 327 has elapsed, the compare module 340 asserts the OPFAIL signal, thereby indicating an error condition.
If the OP START/COMPLETE signal is asserted to indicate completion of an operation at a stage of the instruction pipeline 110 prior to a match being indicated by the compare module 340, the threshold control module 325 calculates a new threshold value and stores it at the register 326. This prevents assertion of the OPFAIL signal for the completed operation.
Referring to
During operation, the instruction pipeline 410 executes instructions in a pipelined fashion at each stage of the portions 440-445. The fetch portion 440 fetches instruction data from an instruction cache (not shown) and provides the instruction data to the decode portion 441. The instruction data represents instructions of a program flow. The decode portion 441 decodes the instruction data to identify individual instructions and to determine one or more operations associated with each individual instruction. These operations are provided to the selection module 442. The selection module 442 receives operations from the decode portion 441 and from the microcode module 415 and based on control signals such as the signal DEBUG determines which operations are provided to the dispatch portion 443.
The dispatch portion 443 provides the received operations to an execution unit (not shown) of the execution portion 444. The execution unit of the execution portion 444 executes the instruction, and provides the instruction to the retire portion 445. The retire portion 445 uses an exception module to determine if the operation has resulted in an exception, such as mispredicted branch. If an exception is determined, the retire portion 445 can take actions to remedy the exception, such as asserting the FLUSH signal to flush operations from the instruction pipeline 410.
When a first operation reaches a particular stage of the decode portion 441, the operation is available to be associated with the counter 121 (
If the OPFAIL signal is asserted by the control module 120 (
In addition, in response to the OPFAIL signal, the exception module 450 asserts the DEBUG signal. This causes the microcode module 415 to provide debug operations to the selection module 442. Based on the asserted DEBUG signal, the selection module 442 provides the debug code to the dispatch portion 443 so that the debug code can be executed at the execution portion 444. Accordingly, the error condition at the data processor 102 automatically results in execution of the debug operations. The debug operations can execute tasks to allow the instruction pipeline 410 to be analyzed and the cause of the error condition state to be determined.
Referring to
At decision block 504, it is determined whether a fail indicator is received prior to a stage of the instruction pipeline completing processing of the operation. If the processing of the operation is complete before a fail indicator is received, the method flow returns to block 502, the control module is again set. The counter is therefore available for association with another operation.
If, a fail indicator is received, this indicates an error condition at the data processor, e.g. an operation has not been completed at a specific stage of an instruction pipeline. In a particular embodiment, the fail indicator is received in response to the value at the counter indicating that a defined amount of time since the counter was set. In response to the fail indicator, the method flow moves to block 506 and the instruction pipeline simulates completion of the operation that was associated with the counter at block 502. The method flow moves to block 508 and a debug operation is executed at the instruction pipeline.
Other embodiments, uses, and advantages of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. For example, it will be appreciated that although it has been described herein that a counter is associated with an operation by resetting the counter when the operation reaches a particular stage of an instruction pipeline, the operation could be associated with the operation when it reaches a first stage of the instruction pipeline, and the counter reset in response to the operation reaching a second stage of the instruction pipeline. It addition, it will be appreciated that the stage which associates an operation with the counter, and the stage which resets the counter, can each be programmable. Similarly, the stage that disassociates the operation with the counter can be programmable. It will further be appreciated that, although some circuit elements and modules are depicted and described as connected to other circuit elements, the illustrated elements may also be coupled via additional circuit elements, such as resistors, capacitors, transistors, and the like. The specification and drawings should be considered exemplary only, and the scope of the disclosure is accordingly intended to be limited only by the following claims and equivalents thereof.