The present invention relates to microprocessor systems, and more particularly to allocating processor cycles in a block multithreaded processor.
In multithreaded processors, the processor holds the state of several active threads, which can be executed independently. When one of the threads becomes blocked, for example due to a cache miss, another thread can be executed so that processor cycles are not wasted. If thread switching were only performed due to a thread becoming blocked, the percentage of processor cycles allotted to each thread would be nearly impossible to predict. Furthermore, the maximum time between activation of a thread would also be nearly impossible to predict.
Conventional thread switching units have been implemented using timer interrupts and progress-monitoring software in a real-time kernel. In general, the progress-monitoring software can dynamically reconfigure the thread mappings as necessary to maintain the required net allocation of processor cycles to the threads. However, this approach adds software complexity and runtime overhead. Furthermore, the runtime overhead limits the granularity of control that can be obtained by the progress monitoring software. Specifically, as the interrupt timers are set to smaller intervals, the system would spend more time responding to interrupts than actually processing the threads.
Hence there is a need for a method or system to control the thread switching in multithreaded processors so that the percentages of processors cycles can be allotted to the threads without undue overhead reducing the amount of processor cycles that can be allotted to the threads.
Accordingly, a multithreaded processor in accordance with the present invention implements thread switching in hardware to remove the software overhead of conventional thread switching systems. Furthermore, the present invention includes a novel thread allocation method of selecting a priority thread and executing the priority thread if the priority thread is not blocked. In general the priority thread is selected independently from thread execution. In one embodiment of the present invention, a set of “maxtime” registers control the maximum number of cycles a thread remains the “priority thread”.
In one embodiment of the present invention the thread selection unit includes a priority thread selector configured to generate a priority thread value associated with a priority thread and an execution thread selector coupled to receive the priority thread value and to generate an execution thread value associated with an execution thread. If the priority thread is not blocked, the execution thread value is set equal to the priority thread value so that the priority thread is executed by the execution unit. However, if the priority thread is blocked, the execution thread value is set to another value so that another thread can be executed. The priority thread selector includes a maxtime register for each active thread, a priority thread counter, a comparator and a counter. The priority thread counter provides the priority thread value. The maxtime value of each thread is the number of cycles a particular thread can remain the priority thread before another thread is selected as the priority thread. The counter counts the number of cycles since the current priority thread first became the priority thread. The comparator compares the count value from the counter with the maxtime value associated with the priority thread, when the count value matches the maxtime value the counter is reset and the priority thread counter is incremented.
The present invention will be more fully understood in view of the following description and drawings.
As explained above, conventional multithreaded processors use timer interrupts and progress monitoring software to control thread switching. The software overhead associated with conventional methods is eliminated with the present invention.
In general, a multithreaded processor can have a maximum number of active threads. A priority thread PT and an execution thread ET are selected from among the active threads. For clarity, the embodiments described herein are capable of supporting N active threads numbered from 0 to N-1, and are referred to as thread 0, thread 1, . . . thread N-1. Priority thread PT and execution thread ET refer to one of the active thread. For clarity a priority thread value PTV is used herein to refer to priority thread PT. Priority thread value PTV is an integer value between 0 and N-1, inclusive. Thus, priority thread PT is the same as thread PTV. Similarly, an execution thread value ETV is used herein to refer to execution thread ET. Execution thread value ETV is an integer value between 0 and N-1, inclusive. Thus, execution thread ET is the same as thread ETV.
Priority thread selector 110 is used to control the allocation of processor cycles among the threads.
Maxtime registers 220 includes one register for each active thread. Therefore, maxtime registers 220 includes N independent registers. For clarity, each maxtime register is referred to using the notation maxtime register MT_REG[X], where X is an integer from 0 to (N-1), inclusive. Maxtime register MT_REG[X] is associated with thread X. Furthermore, the content of maxtime register MT_REG[X], is referred using the notation maxtime value MAXTIME[X]. Maxtime registers 220 provides maxtime value MAXTIME[PTV] i.e. the contents of maxtime register MT_REG[PTV] which is associated with thread PTV (priority thread PT). The content of the maxtime register determines how long the associated thread can remain the priority thread as explained below.
Counter 230 simply counts up from zero and provides a count value COUNT to comparator 240. Counter 230 can be reset via reset signal RESET from comparator 240. Comparator 240 compares maxtime value MAXTIME[PTV] with count value COUNT. If maxtime value MAXTIME[PT] is equal to COUNT, comparator 240 resets counter 230 and increments priority thread counter 210.
Thus, each thread X is selected as priority thread PT for maxtime value MAXTIME[x] cycles. By controlling the maxtime values associated with each active thread the processor cycles can be distributed between the active threads as desired. Specifically, thread X is selected as priority thread PT for 100*MAXTIME[X]/TOTAL_MT percent of the time, where TOTAL_MT is the sum of the N maxtime registers. For example in a system with 4 active threads where the processor cycle allocation should be 10%, 35%, 25%, and 30%, maxtime values MAXTIME[0], MAXTIME[1], MAXTIME[2], and MAXTIME[3] can be assigned values 10, 35, 25, and 30 respectively. Alternatively the set of maxtime values can be assigned other values in the same ratio such as 4, 14, 10, and 12.
Another benefit of the present invention, is that the interval between the time that thread X is no longer selected as priority thread PT and the time thread X is again selected as priority thread PT can be predetermined with the selection of the maxtime values. Specifically, after thread X is no longer priority thread PT, thread X will become priority thread PT again within TOTAL_MT-MAXTIME[X] cycles. This value also indicates the maximum number of processor cycles that can elapse before a thread that is ready to execute is actually executed because execution thread selector 120 selects priority thread PT as execution thread ET, if priority thread PT is not blocked.
In some embodiments of the present invention, a thread can be assigned a maxtime value of zero. Threads with a maxtime value of zero are never selected as priority thread PT but may be executed when priority thread PT is blocked.
Comparator 340 also compares maxtime value MAXTIME[ITV] to count value COUNT from counter 230. When count value COUNT is equal to maxtime value MAXTIME[ITV] comparator 340 resets counter 230 and increments internal thread counter 310. However, comparator 340 also determines whether maxtime value MAXTIME[ITV] is equal to zero. When maxtime value MAXTIME[ITV] is equal to zero, comparator 340 increments internal thread counter 310 so that thread ITV, which has a maxtime value of zero can not become priority thread PT. When maxtime value MAXTIME[ITV] is not equal to zero, comparator 340 causes priority thread register 310 to store internal thread value ITV as priority thread value PTV.
APPENDIX I provides another embodiment of a priority thread selector in accordance with the present invention implemented in pseudocode. One skilled in the art can easily convert the pseudocode to a hardware definition language such as VHDL or Verilog to create a priority thread selector in accordance with the present invention.
As stated above, execution thread selector 120 (
The exact method of selecting a non-priority thread as execution thread ET varies among different embodiments of the present invention. For example, some embodiments may randomly select an unblocked non-priority thread as execution thread ET. Other embodiments may try to select the next closest unblocked thread relative to the priority thread as the execution thread. For example, these embodiments would check threads (PTV+1 MOD N) then thread (PTV+2 MOD N) etc. to find the next unblocked thread, which would be selected as the execution thread.
Based on the two basic rules, various conditions could cause a new thread to be selected as execution thread ET. One condition is when priority thread PT was blocked but becomes unblocked. In this situation priority thread PT should be selected as execution thread ET. Another condition is if a new priority thread is selected. If the new priority thread is not blocked, execution thread selector 120 should select the new priority thread as the execution thread. If the new priority thread is blocked, execution thread selector 120 can either keep the current execution thread as the execution thread or select a new execution thread based on the new priority thread value. If the current execution thread becomes blocked, then execution thread selector must select a new execution thread. The exact method of selecting a new execution thread in these situations may differ between different embodiments.
Controller 440 follows the two basic rules given above to select a new execution thread value NETV when needed and writes new execution thread value NETV into execution thread register 410.
Controller 440 can be implemented using a state machine 500 as illustrated in
Executing priority thread state E_PT has four transition arrows 510, 520, 530, and 540. Transition arrow 510 which returns to Executing priority thread state E_PT is triggered under the conditions that priority thread value PTV is equal to execution thread value ETV and that block value BLOCK[ETV] is equal to 0. No actions are taken with transition arrow 510. Transition arrow 520 which also returns to Executing priority thread state E_PT is triggered under the conditions that priority thread value PTV is not equal to execution thread value ETV and that block value BLOCK[PTV] is equal to 0. When transition arrow 520 is triggered new execution thread value NETV is set to be equal to priority thread value PTV and stored in execution thread register 410 (
In find unblocked thread state FUBT, controller 440 finds an unblocked thread. The exact method of finding an unblocked thread may vary. For the embodiment of
Executing non-priority thread state E_NPT has three transition arrows 550, 560, and 570. Transition arrow 550 which returns to executing non-priority thread state E_NPT is triggered under the conditions that block value BLOCK[ETV] is equal to zero and block value BLOCK[PTV] is equal to 1. No actions are taken with transition arrow 550. Transition arrow 560 which causes a transition to executing priority thread state E_PT is triggered under the conditions that block value BLOCK[PTV] is equal to 0. When transition arrow 560 is triggered, new execution thread value NETV is set to be equal to priority thread value PTV and stored in execution thread register 410 (
APPENDIX II provides another embodiment of an execution thread selector in accordance with the present invention implemented in pseudocode. Furthermore, APPENDIX III provides another embodiment of an execution thread selector in accordance with the present invention implemented in pseudocode. One skilled in the art can easily convert the pseudocode to a hardware definition language such as VHDL or Verilog to create a execution thread selector in accordance with the present invention.
In the various embodiments of this invention, novel structures and methods have been described to fairly allocate processor cycles to various active threads. The various embodiments of the structures and methods of this invention that are described above are illustrative only of the principles of this invention and are not intended to limit the scope of the invention to the particular embodiments described. For example, in view of this disclosure, those skilled in the art can define other priority thread selectors, execution thread selectors, state machines, controllers, comparators, maxtime registers, thread block checkers, and so forth, and use these alternative features to create a method or system according to the principles of this invention. Thus, the invention is limited only by the following claims.
Definitions:
(priority thread starts with Thread 0).
Definitions:
BLOCK[X] indicates whether thread X is blocked. A value of 1 means blocked, 0 means not blocked
Definitions:
BLOCK[X] indicates whether thread X is blocked. A value of 1 means blocked, 0 means not blocked