A related U.S. patent application Ser. No. 10/246,937, U.S. Pat. No. 6,965,986 B2 issued Nov. 15, 2005, and entitled “METHOD AND APPARATUS FOR IMPLEMENTING TWO-TIERED THREAD STATE MULTITHREADING SUPPORT WITH HIGH CLOCK RATE” by Harold F. Kossman, and Timothy John Mullins, and assigned to the present assignee is being filed on the same day as the present patent application.
The present invention relates generally to the data processing field, and more particularly, relates to a method and apparatus for implementing thread replacement for optimal performance in a two-tiered multithreading structure.
Efficiency of hardware within the CPU is improved by dividing a processing task into independently executable sequences of instructions called threads. When the CPU, for any of a number of reasons, cannot continue the processing or execution of one of these threads, the CPU rapidly switches to and executes another thread. Multithreading is an effective way to improve the overall speed of a computer system or system throughput.
Multithreading design techniques have become an important means of enabling processor performance to scale up with clock frequency. Where past processor designs encountered stall conditions that degraded performance, multithreading allows continued execution of instructions by a separate parallel thread of activity. However, as clock frequencies continue to increase, more and more threads need to be supported in hardware to provide a continuously available option for execution by the processor.
Traditional hardware multithreading schemes provide for some number of thread states to be stored in hardware register sets. These register sets are generally implemented at relatively high chip design cost including chip area, circuit speed, and the like in the interest of achieving peak performance.
Alternative multithreading designs use more than one tier of state storage. For example, a first tier or first level state storage can be provided with high chip-resource cost but limited thread-holding capacity, and a second tier or second level state storage can be provided with additional thread capacity to support throughput need, but lower speed of access. Exchange of threads supported in second level state storage with those in first level state registers is necessary to enable threads to have opportunity to run on the processor. Simple hardware schemes could select a second level thread by way of a straightforward algorithm, such as, a round robin algorithm. Unfortunately, this generally leads to non-optimal overall performance, since proper scheduling of when threads are to run on the processor is required to fully leverage the throughput capability of the machine.
A need exists for a mechanism to solve the non-optimal performance problem by providing a method to pick the appropriate next thread from second level state storage for change with first level state registers.
A principal object of the present invention is to provide a method and apparatus for implementing thread replacement for optimal performance in a two-tiered multithreading structure. Other important objects of the present invention are to provide such method and apparatus for implementing thread replacement for optimal performance in a two-tiered multithreading structure substantially without negative effect and that overcome many of the disadvantages of prior art arrangements.
In brief, a method and apparatus are provided for implementing thread replacement for optimal performance in a two-tiered multithreading structure. A first tier thread state storage stores a limited number of runnable thread register states. A second tier thread storage facility stores a second number of thread states that is greater than the limited number of runnable thread register states. Each stored thread state includes predefined selection data. A runnable thread selection logic coupled between the first tier thread state storage and the second tier thread storage facility, uses the stored predefined selection data for selectively exchanging thread states between the first tier limited number of runnable thread register states and the second tier thread storage facility.
In accordance with features of the invention, the stored predefined selection data used by the runnable thread selection logic for selectively exchanging thread states between the first tier runnable thread register states and the second tier thread storage facility includes specific thread historical usage data. The stored predefined selection data used by the runnable thread selection logic includes processor cycle usage efficiency for each particular thread. The stored predefined selection data includes a time since the particular thread ran on the processor that is used to signal a starvation condition by the runnable thread selection logic. The stored predefined selection data includes thread system priority to enable the runnable thread selection logic to obey system policies and select a higher priority thread over lower priority thread to move into the first tier runnable thread register states. The stored predefined selection data includes a ready-to-run indicator so that a stalled thread maintained in the secondary thread storage facility does not become activated until its stalled condition is resolved. The stored predefined selection data is used by ranking logic for ranking runnable threads to be maintained in the first tier runnable thread register states. The first tier runnable thread register states are available for selection when a currently executing processor state is changed at idle events.
The present invention together with the above and other objects and advantages may best be understood from the following detailed description of the preferred embodiments of the invention illustrated in the drawings, wherein:
Having reference now to the drawings, in
In accordance with features of the preferred embodiment, optimal multithreading performance is provided by increasing the information saved with a thread state. Predefined selection data is stored with the thread state that is used for selectively exchanging thread states between the first tier runnable thread register states 102 and the second tier thread storage facility 104. By referencing this saved selection data, the runnable-thread selection logic 108 can choose the proper thread to move into position in the first-tier runnable register states 102 for running on the processor. In particular, past history about a specific thread usage of the processor resources 106 is maintained and used to rank thread choices.
Referring now to
Predefined selection data 200 includes a time since the specific thread last ran on the processor 206. A timestamp 206 stored with the thread state enables the runnable-thread selection logic 108 to determine how long threads have been inactive on the processor. If a threshold value is exceeded, a starvation condition is signaled for the particular thread, and the particular thread can be given special priority to ensure running on the processor despite other decision criteria that would keep it inactive.
Predefined selection data 200 includes a system priority 208. System algorithms often need to set policies regarding which threads are to receive favorable treatment in case of contention for system resources. Keeping system priority 208 in the thread state allows the runnable-thread selection logic 108 to obey such system policies and select high-priority threads for running on the processor when they might otherwise be held out of runnable state.
Predefined selection data 200 includes a ready-to-run indicator 210. At higher processor clock frequencies, there are many threads supported in hardware that are enabled to use the processor when an opportunity arises to start a new execution. Threads that reach a stall point and switch out of active execution 106 need to have state maintained in the first tier runnable thread register states 102 or in the second tier thread storage facility 104 and contend again for run cycles once the stall is resolved. While such threads are stalled and may have state saved only in the second tier thread storage facility 104, they need to be bypassed by the runnable-thread selection logic 108. The ready-to-run indicator 210 is kept in their thread state to signal the selection logic not to spend machine cycles evaluating the particular state for runnability, but to move on to other threads for assessment. This ready-to-run indicator 210 is reset when the stall condition is resolved, enabling the thread to be runnable again.
Ranking of threads 212 is performed using specified criteria set into ranking selection logic 214, for example, set by software. To allow for flexibility in selecting a single thread from among several candidate threads from the second tier thread storage facility 104, additional opportunity is given to system software to establish an evaluation sequence using the predefined selection data 200 for the thread state metrics. Ranking selection logic 214 performs selection logic functions and evaluates the relative ranking of the threads for each metric, for example, picking the single thread that has the highest overall ranking. For example, the ranking selection logic 214 could be set up by software to pick a ready-to-run thread that has the highest starvation time exceeding threshold, at the highest system priority, with the lowest long latency count per cycle. In case of no threads meeting current criteria, the most significant selection attribute would be skipped in a next re-evaluation. In case of multiple threads meeting the current criteria, a random choice could be made.
By incorporating the predefined selection data 200 in the saved state for threads supported by processor hardware, runnable thread selection logic 108 determines the most suitable choice for exchanging a currently active thread with an inactive thread. As a runnable thread becomes inactive due to a stall condition, processor resources are managed efficiently by substituting another selection, with optimal operating history, in the active first-tier runnable register states for runnable threads. This provides for peak possible performance by the multithreaded processor, avoiding degradations in efficiency that limit the performance of current state-of-the art hardware management schemes.
While the present invention has been described with reference to the details of the embodiments of the invention shown in the drawing, these details are not intended to limit the scope of the invention as claimed in the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5692192 | Sudo | Nov 1997 | A |
5771382 | Wang et al. | Jun 1998 | A |
5812811 | Dubey et al. | Sep 1998 | A |
5815727 | Motomura | Sep 1998 | A |
5872963 | Bitar et al. | Feb 1999 | A |
6018759 | Doing et al. | Jan 2000 | A |
6076157 | Borkenhagen et al. | Jun 2000 | A |
6105051 | Borkenhagen et al. | Aug 2000 | A |
6212544 | Borkenhagen et al. | Apr 2001 | B1 |
6223208 | Kiefer et al. | Apr 2001 | B1 |
6418460 | Bitar et al. | Jul 2002 | B1 |
6567839 | Borkenhagen et al. | May 2003 | B1 |
6662204 | Watakabe et al. | Dec 2003 | B1 |
6697935 | Borkenhagen et al. | Feb 2004 | B1 |
6766515 | Bitar et al. | Jul 2004 | B1 |
6785889 | Williams | Aug 2004 | B1 |
6965986 | Kossman et al. | Nov 2005 | B1 |
Number | Date | Country |
---|---|---|
61-187116 | Aug 1986 | JP |
08-164867 | Jun 1996 | JP |
09-006007 | Jan 1997 | JP |
09-194346 | Jul 1997 | JP |
10-320759 | Nov 1998 | JP |
2000-333724 | Oct 2000 | JP |
Number | Date | Country | |
---|---|---|---|
20040060052 A1 | Mar 2004 | US |