This application is related to the U.S. patent application Ser. No. 09/888,296, titled “System And Method For Reading And Writing A Thread State In A Multithreaded Central Processing Unit,” filed on Jun. 22, 2001, which is incorporated by reference herein in its entirety.
1. Field of the Invention
This invention relates generally to computer processor pipeline control, and more particularly, to a system and method for controlling the pipeline of a multithreaded computer processor to ensure that deterministic execution of the threads is not affected by pipeline hazards.
2. Description of the Related Art
In a pipelined computer processor, pipeline hazards may reduce the performance of software codes. A typical cause of a pipeline hazard is that an instruction needs to use a result that is not yet available from a preceding instruction that is concurrently executed in the same pipeline. In a single-threaded pipeline, a conventional method of resolving the pipeline hazard is to stall the pipeline at the stage holding the instruction until the preceding instruction completes execution and the result is available.
However, this method, if used in a multi-threaded pipeline processor, can affect the real-time performance of other threads in the pipeline. In one application, a pipelined computer processor schedules the execution of two types of threads: hard-real-time (HRT) threads and non-real-time (NRT) threads. HRT threads require that a minimum number of instructions be executed per second to satisfy hard timing requirements, which may be imposed by standards such as IEEE 802.3 (Ethernet), USB, HomePNA 1.1 or SPI (Serial Peripheral Interface). NRT threads are programmed to perform those tasks having no hard timing requirements. Therefore, they can be scheduled in any clock cycle where there are no HRT threads actively running. Since the allocation of execution time for each HRT thread is set and the time required to execute each HRT thread is known, the deterministic performance of HRT threads is affected when the pipeline is stalled to remove the hazard.
Another method of resolving a pipeline hazard is to delay the instruction that encounters a hazard, and to allow other instructions to complete execution before the delayed instruction. However, this method requires complex and costly hardware to implement.
Accordingly, what is needed is a pipeline control mechanism to cope with pipeline hazard in a pipelined computer processor, for example, a multithreaded processor, to ensure deterministic execution of multiple threads. The pipeline control mechanism should also be easy and less expensive to implement than conventional systems.
The present invention is a pipeline control mechanism that maintains deterministic execution of multiple threads in a computer processor pipeline. In one embodiment, the present invention provides a nonstalling computer pipeline that does not delay the execution of threads that do not encounter pipeline hazards. When an instruction of a thread encounters a hazard condition, the instruction as well as other instructions in the pipeline belonging to the same thread is annulled. The cancelled instruction's program counter is recirculated to the fetching stage of the pipeline so that the instruction can be later rescheduled and retried for execution.
Reference will now be made in detail to several embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever practicable, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
The features and advantages described in the specification are not all inclusive, and particularly, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims hereof. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter.
The CPU core 101 is coupled to the instruction memory 103 through a signal line 106. In one implementation, the instruction memory 103 is a conventional flash memory device. The CPU core 101 can fetch instructions from the flash memory device without experiencing wait-states and without stalling the instruction pipeline. In an alternative embodiment, the instruction memory 103 may also include a conventional SRAM (static RAM) device. Accessing data, e.g., instructions, from SRAM is significantly faster than accessing the same data or instruction from a flash memory device.
In one embodiment, the embedded processor 100 executes multiple instructions from multiple threads in its pipeline simultaneously, each at a different stage. The appearance of concurrent execution of multiple threads is achieved by time-division multiplexing of the processor pipeline between the available threads. The details of conventional computer pipeline processing are known to persons of ordinary skill in the art.
In one preferred embodiment, the embedded processor 100 provides a pipeline control mechanism to guarantee a deterministic execution of threads even though software programmers may not be aware of the impacts of pipeline hazards upon the compiled software threads. As will be described below in detail with reference to
Each stage of the pipeline 200 can be implemented through various hardware, perhaps with associated firmware. For ease of illustration, the pipeline 200 is fairly simple. In alternate embodiments, the pipeline 200 may be longer. For example, in one alternate embodiment, the pipeline 200 includes two stages for each of the fetch, decode, operand fetch, and writeback stages.
The CPU core 101 includes a scheduler 215, a thread cancellation module 201, a program counter recirculation module 202 and a hazard detection module 203. The hazard detection module 203 is coupled to hardware that is used in each pipeline stage 205-213. The thread cancellation module 201 is also coupled to the hardware that is used in each pipeline stage and is coupled to the hazard detection module 203 through a signal line 240. The recirculation module 202 is coupled to the hazard detection module 203 through a signal line 242.
In one embodiment, the scheduler 215 includes a scheduling table where the schedule of HRT threads and NRT threads are provided to fetch corresponding instructions from the instruction memory 103. The scheduler 215 may use a pointer to control which thread is fetched from the instruction memory 103 into the pipeline 200 at the instruction fetch stage 205. In one embodiment, each thread is associated with an independent program counter. The program counter contains identification information for the thread and the address of an instruction in the instruction memory 103 from which the instruction is to be fetched in the next clock cycle and scheduled for that thread. Each time the scheduler 215 instructs to fetch an instruction belonging to a particular thread, the scheduler 215 then obtains a program counter and sends it to the hardware of the CPU core 101 at the fetching stage 205. Additional details about multi-threaded scheduling is set forth in U.S. patent application Ser. No. 09/888,296 that is incorporated by reference herein in its entirety.
After the CPU core 101 fetches an instruction by using a program counter for a particular thread, the program counter value and the thread identification number are kept and saved in registers associated with the instruction while the instruction continues to be processed in different stages of the pipeline 200. As an instruction progresses through the pipeline its program counter and thread identification number are moved or associated with each succeeding pipeline stage. As shown in
While the CPU core 101 processes the instructions in its pipeline 200, the hazard detection module 203 monitors each stage of the pipeline 200 and detects whether any instruction encounters a hazard. If the hazard detection module 203 detects the presence of a hazard, it sends a hazard detection signal to the thread cancellation module 201 and the recirculation module 202 via the signal line 240 and 241 respectively. This hazard detection signal includes the pipeline stage and thread identification number, or the program counter value of the instruction or any other information to identify the instruction encountering the hazard and the related instructions. The thread cancellation module 201 then uses this signal to annul the instructions of the thread in the pipeline 200. The cancellation of the instruction having a hazard and all subsequent instructions in the same thread in the pipeline avoids the need for pipeline stalling. As a result, other threads, including HRT threads that are concurrently executed in the pipeline 200, are not affected by the pipeline hazard.
In addition, the recirculation module 202 receives the program counter value of the instruction that encountered the hazard and forwards it to the scheduler 215. The scheduler 215 reschedules the execution of the cancelled instructions beginning with the instruction encountering the hazard.
During the operation of computer processor 100, the hazard detection module 203 detects 301 the presence of pipeline hazard at each stage of the pipeline 200. The techniques of detecting the hazard are well known in the art and may vary depending on types of the software threads. In one example, the hazard detection module 203 may determine that an instruction encounters a hazard condition if its source address is the same as the destination address of a preceding instruction of the same thread. In this scenario, since the preceding instruction may not have been completed, the current instruction cannot obtain the result from the preceding instruction to proceed, i.e., a potential pipeline hazard occurs.
Upon detecting the presence of the pipeline hazard, the hazard detection module 201 deploys a hazard resolution procedure. The hazard detection module 203 sends a hazard detection signal to the thread cancellation module 201. The thread cancellation module 201 then cancels 307 the instruction encountering the hazard and subsequent instructions in the same thread. As indicated above, the hazard detection signal may include the program counter or the pipeline stage and thread identification number associated with the instruction encountering the hazard or the thread can be determined in another manner, e.g., by checking the value stored with the PCs or the TIDs 216-223. In particular, the thread ID specifies to which thread this instruction belongs. The thread cancellation module 201 then uses the thread ID to identify all the instructions in the pipeline 200 that belong to the same thread. In one approach, the thread cancellation module 201 cancels all these instructions by sending an invalidation signal to the hardware at each pipeline stage to disable the operation of the identified instructions. For example, the operands of these instructions will not be fetched or the data to be operated by these instructions will not be latched into data registers. The result of the cancellation is to annul the thread without delaying the pipeline 200. In an alternative embodiment, the invalidation signal does not stop the execution of these instructions in the pipeline 200. Instead, the invalidation signals are used by the thread cancellation module 201 to annul the instructions by preventing the writing back of the results of such instruction during the write back stage of the pipeline 200 and results are also not used by other threads.
While the thread cancellation module 201 cancels the instructions encountering a hazard, the recirculation module 201 also receives the hazard detection signal from the hazard detection module 203 and recirculates 308 the program counter value associated with the cancelled instruction to the scheduler 215. The scheduler 215 retries the cancelled thread by using the program counter value. In one implementation, the scheduler 215 is programmed to recognize the program counter value and to refetch the instructions of the cancelled thread at a new clock cycle. The details of one scheduling process that can be used in accordance with the present invention are described in the U.S. patent application Ser. No. 09/748,098, entitled “System and Method for Instruction Level Multithreading in an Embedded Processor Using Zero-time Context Switching,” filed on Dec. 21, 2000, which is incorporated by reference in its entirety.
Accordingly, in the present invention, the real-time performance of other threads is not affected by pipeline hazards and the cancelled thread can also be re-executed. This aids in the deterministic execution of HRT threads.
The foregoing discussion discloses and describes merely exemplary methods and embodiments of the present invention. As will be understood by those familiar with the art, the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
| Number | Name | Date | Kind |
|---|---|---|---|
| 4110831 | Langdon, Jr. | Aug 1978 | A |
| 4777587 | Case et al. | Oct 1988 | A |
| 4821187 | Ueda et al. | Apr 1989 | A |
| 4825355 | Kurakazu et al. | Apr 1989 | A |
| 4939735 | Fredericks et al. | Jul 1990 | A |
| 5163146 | Antanaitis, Jr. et al. | Nov 1992 | A |
| 5179672 | Genduso et al. | Jan 1993 | A |
| 5247636 | Minnick et al. | Sep 1993 | A |
| 5260703 | Nguyen et al. | Nov 1993 | A |
| 5317701 | Reininger et al. | May 1994 | A |
| 5361337 | Okin | Nov 1994 | A |
| 5392435 | Matsui et al. | Feb 1995 | A |
| 5404469 | Chung et al. | Apr 1995 | A |
| 5410658 | Sawase et al. | Apr 1995 | A |
| 5430884 | Beard et al. | Jul 1995 | A |
| 5515538 | Kleiman | May 1996 | A |
| 5524250 | Chesson et al. | Jun 1996 | A |
| 5553305 | Gregor et al. | Sep 1996 | A |
| 5655133 | Dupree et al. | Aug 1997 | A |
| 5727211 | Gulsen | Mar 1998 | A |
| 5761470 | Yoshida | Jun 1998 | A |
| 5865624 | Hayashigawa | Feb 1999 | A |
| 5867725 | Fung et al. | Feb 1999 | A |
| 5907694 | Suzuki et al. | May 1999 | A |
| 5933627 | Parady | Aug 1999 | A |
| 5933650 | van Hook et al. | Aug 1999 | A |
| 5944816 | Dutton et al. | Aug 1999 | A |
| 6009505 | Thayer et al. | Dec 1999 | A |
| 6016542 | Gottlieb et al. | Jan 2000 | A |
| 6026503 | Gutgold et al. | Feb 2000 | A |
| 6061710 | Eickemeyer et al. | May 2000 | A |
| 6076157 | Borkenhagen et al. | Jun 2000 | A |
| 6085215 | Ramakrishnan et al. | Jul 2000 | A |
| 6163839 | Janik et al. | Dec 2000 | A |
| 6314511 | Levy et al. | Nov 2001 | B2 |
| 6317774 | Jones et al. | Nov 2001 | B1 |
| 6366998 | Mohamed | Apr 2002 | B1 |
| 6374286 | Gee et al. | Apr 2002 | B1 |
| 6378018 | Tsern et al. | Apr 2002 | B1 |
| 6385713 | Yung | May 2002 | B2 |
| 6421701 | Elnozahy | Jul 2002 | B1 |
| 6460116 | Mahalingaiah | Oct 2002 | B1 |
| 6493741 | Emer et al. | Dec 2002 | B1 |
| 6542991 | Joy et al. | Apr 2003 | B1 |
| 6567839 | Borkenhagen et al. | May 2003 | B1 |
| 6684342 | Szeto et al. | Jan 2004 | B1 |
| 6694425 | Eickemeyer | Feb 2004 | B1 |
| 6718360 | Jones et al. | Apr 2004 | B1 |
| 6725355 | Imamura | Apr 2004 | B1 |
| 6728722 | Shaylor | Apr 2004 | B1 |
| 6766515 | Bitar et al. | Jul 2004 | B1 |
| 7010612 | Si et al. | Mar 2006 | B1 |
| 7047396 | Fotland et al. | May 2006 | B1 |
| 7082519 | Kelsey et al. | Jul 2006 | B2 |
| 20020002667 | Kelsey et al. | Jan 2002 | A1 |
| 20020038416 | Fotland et al. | Mar 2002 | A1 |
| 20030037228 | Kelsey et al. | Feb 2003 | A1 |
| 20030110344 | Szczepanek et al. | Jun 2003 | A1 |
| 20040087839 | Raymond et al. | May 2004 | A1 |
| Number | Date | Country |
|---|---|---|
| WO 9921081 | Apr 1999 | WO |
| WO 9954813 | Oct 1999 | WO |