A key feature of many modern computer systems is multitasking, involving the concurrent execution of many programs comprising one or more threads of execution. Such multitasking commonly involves a task scheduler, which allocates shares (e.g., time slices) of the central processing unit to various threads in turn. Many task schedulers are capable of handling interrupts, in which a thread is permitted to request a larger share of the central processing unit, and of providing preemptive multitasking, in which threads are assigned ordered priorities (e.g., numeric priorities) and the central processing unit is more heavily allocated to higher-priority threads. More recently developed multitasking techniques include the allocation of shares of multiple central processing units, such as in multicore processors having several (e.g., two or four) processing cores that operate in asynchronous parallel to provide processing resources to a potentially large and diverse set of threads of execution.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key factors or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
This disclosure presents task scheduling techniques at a thread task level. A thread of execution may accomplish a particular operation by performing a set of tasks, which may be performed in series in various forms, e.g., in a cyclic or loop order, or in a hierarchical order, or in an arbitrary order. Moreover, the nature of each thread task may have rate determinants of different factors. For example, a first task may be rate-determined by the central processing unit, such that the allocation of more shares of a processor will more quickly complete the task. A second task may be rate-determined by the speed of a communications bus (e.g., in communicating with memory in a set of memory operations, such as memory compaction.) A third task may be rate-determined by the speed of a network connection (e.g., in transferring data over a network.)
The techniques presented herein involve identifying the rate determinants of various thread tasks, such that a task scheduler may schedule the thread tasks according to the availability of the rate determinants. For example, if a thread task is identified as rate-determined by a network connection, then a task scheduler may allocate the scheduling of this thread task based on the availability of the network connection. On the other hand, if a thread task is identified as rate-determined by the central processing unit, then the task scheduler may allocate more shares of the central processing unit to this thread task. This manner of task scheduling based on the rate determinants of various thread tasks may provide a more incisive allocation of computing resources and may yield a more efficient completion of the thread tasks.
To the accomplishment of the foregoing and related ends, the following description and annexed drawings set forth certain illustrative aspects and implementations. These are indicative of but a few of the various ways in which one or more aspects may be employed. Other aspects, advantages, and novel features of the disclosure will become apparent from the following detailed description when considered in conjunction with the annexed drawings.
The claimed subject matter is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the claimed subject matter. It may be evident, however, that the claimed subject matter may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing the claimed subject matter.
Computer systems may be tasked with executing several processes in parallel, where each process comprises one or more threads of execution. For example, a computer system may concurrently handle a word processor application, comprising a user interface thread for displaying the application and handling user input, and a data handling thread for managing access to the word-processor data; a media player application, comprising a user interface thread for displaying the media player window and handling user input, and a media playing thread for rendering increments of the media through multimedia devices; and one or more operating system processes having one or more threads for managing various hardware and software components of the operating environment.
The concurrent processing of various threads is handled by a task scheduler, which allocates shares of the central processing unit to the threads. The task schedulers referred to herein are generally used for scheduling processing time among threads of execution in a computer system, but at least some aspects of this disclosure and the techniques presented herein may relate to other forms of task schedulers. Some simple (“time-slicing”) task schedulers may allocate shares of the central processing unit to all threads in sequence. However, these task schedulers may not distinguish between active threads that can advantageously utilize the central processing unit and passive threads that are awaiting some other event (e.g., through polling), nor between higher-priority processes (e.g., system processes) and lower-priority processes (e.g., background workers), and may yield inefficient computing allocation and reduced system performance. Other task schedulers may preemptively allocate the central processing unit, such that more active or higher-priority threads receive larger shares of processing resources than less active or lower-priority threads.
The media player application 10 and the data compression algorithm 12 are both very active applications, and both involve steady consumption of computing resources, including central processing unit power. A preemptive task scheduler may therefore attempt to facilitate the processing of these applications by allocating a significant amount of computing resources to the respective threads of execution. Moreover, because the threads are both user-level applications, the threads are likely to have the same thread priority, so neither thread may preempt the other thread. As a result, the computing resources are likely to be evenly distributed during the phases of these applications that involve significant central processing unit utilization.
As a result of the frequent wait cycles involved in these algorithms while input/output operations are performed, the task schedule 20 of
While this problem may resemble the recognized inefficiencies of time-slicing task schedulers, the cause of the inefficiencies in this task schedule 20 is significantly different. Time-slicing task schedulers suffer from low performance by remaining insensitive to the comparative priorities of the threads, and by failing to reallocate resources during long thread wait states. By contrast, in this example, the tasks involved in each thread cycle very quickly between central processing unit intensive tasks and tasks involving short-term wait operations, e.g., memory, network, and storage device accesses. Such alternation between central processing unit usage and short-term wait operations may be quite common for many applications. A preemptive task scheduler may be capable of detecting a wait state in a thread, but it may be more difficult for the task scheduler to determine why such a rapidly cycling thread is waiting at a particular instant, and therefore whether allocating processing time is likely to generate significant progress in the thread. For instance, a preemptive task scheduler may endeavor to induce a wait state for any instruction involving a memory, storage, or network access. However, the preemptive task scheduler cannot predict whether the wait state will be short (e.g., where only a small amount of data is to be read via a high-performance bus, or where a memory access can be read from a local cache) or long (e.g., where a significant amount of data is to be read from a low-bandwidth device, or where a network access involves high latency.) Accordingly, task schedulers that attempt to detect waiting based on the nature of the executing instructions may be unable to produce significant efficiency gains. Moreover, a very sensitive preemptive task scheduler that acutely analyzes the status of various threads and makes adjustments may actually diminish performance; the acute analysis might be unable to produce significant efficiency gains, yet may induce additional inefficiencies by diverting computing resources, including central processing unit time segments, away from threads that are performing useful work.
Accordingly, the inefficiency evident in the task schedule 20 of
As a second example, a compiler may be able to identify the rate determinant of an instruction in the context of the preceding operations (e.g., whether an object accessed in memory was recently accessed by a previous instruction, which may increase the probability of caching that reduces the memory access as a rate determinant.) The compiler may therefore be able to determine the likely rate determinant for a block of instructions, and may be able to specify the rate determinant of the instruction block for use during task scheduling. The compiler may be capable of more accurate predictions than the task scheduler because the compiler isn't compelled to make a rapid determination, and because the compiler can utilize the context of the instruction block in view of the preceding instructions and the operating context of the instruction block.
Accordingly, this disclosure presents some techniques for specifying one or more likely rate determinants for a particular computing task that may be operating within a thread (referred to herein as a “thread task”), and for making task scheduling determinations based on the rate determinant information. By providing such information and utilizing such information for task scheduling, a computer system may be able to reduce some inefficiencies of less informed task scheduling techniques, and may therefore improve the allocation of computing resources among threads performing various types of tasks.
In view of the rate determinant information included in the algorithms, an improved task schedule 40 may be devised, such as shown in
One the thread tasks that are rate-determined by the communications bus are complete, both the media player application 30 and the data compression algorithm 32 move on to tasks that are rate-determined by the central processing unit (at 0.05 seconds and 0.09 seconds, respectively.) In response, the task scheduler re-prioritizes each thread upon initiating the thread tasks that are rate-determined by the central processing unit, and allocates significant shares of the central processing unit to the threads during the performance of these tasks. Conversely, upon completion of the compression and decompression thread tasks, the media player application 30 proceeds (at 0.08 seconds) to another task that is rate-determined by the communications bus, and the data compression algorithm 32 proceeds (at 0.15 seconds) to a task that is rate-determined by the network communications rate, and the task scheduler again temporarily de-prioritizes these threads and spends spare cycles on other tasks or in an idle state.
The improved task schedule 40 of
The techniques described herein and illustrated in the contrasting examples of
The techniques described herein also relate to task scheduling techniques that include a consideration of the rate determinants of the thread tasks currently performed by the threads to be scheduled.
Still another embodiment involves a computer-readable medium comprising processor-executable instructions configured to apply the techniques presented herein. An exemplary computer-readable medium that may be devised in these ways is illustrated in
Many aspects of the techniques described herein may be devised with many variations by those of ordinary skill in the art while implementing the techniques described herein. Such variations may be available for aspects of both the techniques for identifying rate determinants for various thread tasks, and the techniques for scheduling threads and providing resources thereto in view of the identified rate determinants of the thread tasks. Moreover, some variations of such aspects may present additional advantages and/or reduce disadvantages with respect to other variations of these and other techniques.
A first aspect that may vary among embodiments of these techniques relates to the types of rate determinants that may be identified for a thread task. In the context of task scheduling, it may be particularly helpful to determine whether a thread task is significantly rate-determined by a central processing unit; this determination may inform the task scheduler as to whether or not to allocate significant central processing unit time segments to a thread task. However, other rate determinants may be identified for a thread task, where the identification of such rate determinants may facilitate the allocation of the resources represented by a thread task. For example, two applications may share a network resource: a network streaming media application, which significantly depends on a steady throughput of network data, and a high-performance computation application (e.g., a complex mathematics processor) that only uses the network to send progress updates to a network log. The network streaming media application (in particular, the media buffering component thereof) is rate-determined by the network, because most of the computing work involves requesting and receiving new data; by contrast, the high-performance computation application is rate-determined on the central processing unit(s), and its network usage does not significantly determine its rate of progress. Accordingly, the thread tasks comprising the network streaming media application may be associated with a communications network rate determinant indicator, while the thread tasks comprising the high-performance computation application may be associated with a central processing unit rate determinant indicator. The task scheduler may therefore prioritize the high-performance computation application, while the network communication device may prioritize the network streaming media application for network communication.
Other types of rate determinants may also be advantageously associated with thread tasks. For instance, a graphics processing unit may be identified as a rate-determinant (e.g., such that a high-detail 3D application running in a window may be prioritized for graphics resources over a relatively static dialog window.) Other advantageous rate determinants include a communications bus (e.g., the data compression algorithm that cannot perform any compression or decompression work until a data block is read from a hard disk drive may be prioritized over a thread that writes files to the hard disk drive in the background); a device, such as a peripheral input or output device (e.g., an application that is unable to proceed without user input may be prioritized over an application that autonomously performs meaningful work but may be altered by user input); and a second thread task (e.g., an application is blocked in order to synchronize with a task being performed by the second thread may be prioritized over an application that is configured simply to monitor the progress of the second thread.) Still other types of rate determinants may be devised and included in the set of rate determinants that may be associated with various thread tasks. However, it may be detrimental to identify too many types of rate determinants, because the process of identifying and reacting to identified rate determinants of various threads may become so complex as to introduce some inefficiencies.
A second aspect that may vary among embodiments of these techniques relates to the manner of identifying a thread task as associated with a particular rate determinant.
The exemplary source code 90 of
In this example 90, the three instructions comprising the worker-thread loop within the Render method 98 represent three different thread tasks, each of which has a particular rate determinant: the media stream read and the rendering are rate-determined by the communications bus of the system, and the decoding is rate-determined by the central processing unit. Accordingly, these portions of the Render method 98 of the media player class 92 are identified with rate determinant attributes 100, comprising a “using” statement specifying the type of rate determinant (which may be specified, e.g., as a selection among an enumeration, such as the RateDeterminant enumeration included in this exemplary source code 90.) These rate determinant attributes 100 may be included in the compiled code of the media player application, and the thread executing the media player application may (e.g.) alter its rate determinant status based on the attributes associated with the thread task currently being executed by the thread. These rate determinant attributes 100 may additionally be evaluated as synchronization mechanisms, i.e., as requests for a share of the resource comprising the rate determinant. For example, in addition to indicating that the various thread tasks are rate-dependent on these resources, these rate determinant attributes 100 may be interpreted as requests to suspend the processing of the thread altogether if a share of the resource is unavailable, and to unsuspend the thread task once a share of the resource later becomes available.
The association of instruction blocks with various rate determinants, and the generation of rate determinant attributes associated with such instruction blocks and indicating same, may also be performed by various types of automated methods. As one example, the identification may be made (and the association may be performed) by an integrated development environment, such as a programming environment that analyzes instructions received by a programmer and logically determines the rate determinants of various blocks of instructions. The identification may also be made by a compiler, which may perform a similar analysis and may include supplemental instructions or metadata, such as in an assembly manifest, comprising rate determinant attributes associated with various instruction blocks. These automated techniques may be more capable of making such identifications than a task scheduler, because such integrated development environments and compilers may function on a less time-sensitive nature than a task scheduler, and also because these components may benefit from a more direct analysis of the instructions preceding and following the invocation of the instruction block (i.e., the operating context of the instruction block, which may indicate its likely rate determinant.)
Still other components may be capable of identifying the rate determinants of various instruction blocks during the operation of the thread tasks. As one example, a code profiler may be utilized to monitor the flow of execution through the compiled application, and to identify rate determinants of various instruction blocks comprising various thread tasks. Alternatively or additionally, a resource monitor may be utilized to monitor the resources accessed by the thread during the performance of various instruction blocks comprising various thread tasks (e.g., by monitoring the usage of the central processing unit during the execution of an instruction block.) Hence, the identification may be performed by analyzing the causes of the rate determination (i.e., the instructions performed by the thread task) and/or by the effects of the rate determination (i.e., the particular resources used while performing the instructions.) Many such techniques for identifying rate determinants of various thread tasks may be devised by those of ordinary skill in the art while implementing the techniques discussed herein.
A third aspect that may vary among embodiments of these techniques relates to the manner of associating rate determinants with various thread tasks. As one example, such as described in the example of
A fourth aspect that may vary among embodiments relates to the utilization of the rate determinant information (e.g., rate determinant identifiers associated with thread tasks, and rate determinant attributes associated with instruction blocks comprising task threads.) Computer systems may advantageously utilize this information in many ways. As one example, a task scheduler may allocate shares of a central processing unit in view of whether an executing thread task is rate-determined by the central processing unit. If the thread task is not rate-determined by the central processing unit, the task scheduler may allocate only a small segment of processing time to prevent the thread from starving or becoming suspended. For instance, if a thread task is receiving a file from a network and is rate-determined by the network, the task scheduler may allocate for the thread task a small amount of processing time, so that the thread task can determine whether the file transfer is complete and monitor the status of a download buffer. However, if the central processing unit is a rate determinant for the thread task, then the task scheduler may allocate more processing time for the thread task in order to achieve improved performance.
Some additional techniques may be advantageous for allocating shares of a processing unit (such as a central processing unit or a graphics processing unit) to thread tasks that are rate-determined by such processing units. As one example, in a multiprocessing system, such as a multiprocessor or a multicore environment, a thread task that is rate-determined by such processing units may be assigned an affinity of a processor for the thread task, such that the thread task is preferentially run on a particular processor or processor core. This technique may improve the utilization of a processor memory cache (e.g., by improving the odds that memory accessed by the thread task will remain in the cache) and/or reduce the incidence of context-switching (e.g., by dedicating an unallocated processor to the thread task for uninterrupted performance.) As another example, a task scheduler that identifies a thread task as rate-dependent on a processor may endeavor to provide more contiguous shares of the processor for the thread task, thereby permitting the thread task to run on the processor for longer periods without interruption, which may also improve cache utilization and reduce the incidence of context switching.
More generally, a computer system may utilize a rate determinant with which a thread task is associated in order to manage the allocation of resources shared by many such threads. When a component of such a computer system (e.g., a resource manager) determines that a thread task is rate-determined by a particular resource, such as a communications network, a graphics processor, or a device such as a tape backup system, the resource manager may query the resource for an unallocated resource share (e.g., a portion of the bandwidth of a network connection.) Upon identifying a free share of the resource, the resource manager may allocate the resource share to the thread task that is rate-determined by the resource. As with the allocation of processing time, if a thread task utilizes a resource but is not identified as being rate-determined by the resource, the resource manager may still allocate a smaller share of the resource to the thread task, but may reserve other shares of the resource for use by thread tasks that may be rate-dependent on the resource. Additionally, upon detecting the completion, failure, or user termination of a thread task to which a share of a resource has been allocated, the resource manager may deallocate the resource share allocated to the thread task.
Another variation of these resource management techniques pertains to techniques for handling a failure to identify a share of a resource that comprises a rate determinant for a thread task. For example, a thread task may be rate-dependent on a communications network, but the resource manager may have allocated all shares (i.e., all of the bandwidth) of the communications network to one or more other thread tasks that are also rate-dependent on the communications network. In this scenario, the resource manager may attempt to redistribute the allocated shares of the resource among the rate-dependent thread tasks to reclaim some shares that may be allocated to the new thread task that is rate-dependent on the resource. This technique may permit the operation of many resource-dependent thread tasks in parallel, but the parallel accesses may cause inefficiency in the use of the resource. For example, if seven thread tasks are simultaneously accessing a hard disk drive, the hard disk drive controller may spend a great deal of time jumping to various sectors of the hard disk platter in order to read or write small amounts of data, and the frequent hard drive head relocations may create additional inefficiency in the form of reduced data throughput.
An alternative technique for handling unavailable shares of a rate-determining resource involves suspending the thread task to which a share cannot be allocated. The suspension of the thread task may be temporary, and perhaps short-term, and may permit some or all of the thread tasks to which the resource is allocated to complete, so that some allocated shares of the resource may be reclaimed by the resource manager. Upon subsequently detecting availability of an unallocated resource share of the resource, the computer system may unsuspend a suspended thread task that is rate-determined by the resource, and may allocate one or more of the unallocated resource shares to the unsuspended thread task. This technique induces a delay in the performance of the suspended thread task, which may starve for lack of resources (e.g., a protracted suspension of a thread task that is rate-determined by an otherwise allocated communications network may end up causing a timeout and the closing of a connection between the suspended thread and a remote server.) However, this technique may achieve an overall more efficient allocation of the shared resource by reducing the amount of context-switching to be performed by the resource. As an intermediate alternative, various thread tasks that are rate-dependent on a resource with limited availability of unallocated shares may be granted temporary allocations of such shares, such that a thread task awaiting a share of a rate-determining resource may be unsuspended, allocated some shares of the resource for a short time, and resuspended, whereupon the shares of the resource allocated to the thread task may be temporarily allocated to the next suspended thread task that is rate-dependent on the same resource. This technique may permit some limited sharing of a heavily allocated resource in order to reduce the incidence of starvation among the suspended thread tasks that are rate-determined by the shared resource. Many variations may be devised in the management of suspended thread tasks due to the unavailability of a rate-determining resource may be devised of those in ordinary skill in the art while implementing the techniques discussed herein.
An additional variation of this technique involves the association of a suspended thread task queue with the resource that is configured to hold references to the threads that have been suspended pending access to the resource. In this variation, upon determining an unavailability of shares of a resource by which a thread task may be rate-determined, and upon suspending the thread task pending the subsequent availability of shares of the resource, the computer system may place the thread task in the suspended thread task queue that is associated with the resource. The computer system therefore forms a set of thread tasks that are waiting on the completely allocated resource, such that when one or more shares become available (e.g., when a thread task utilizing the resource completes, fails, or is terminated by the user), the computer system may unsuspend a suspended thread task within the suspended thread task queue associated with the resource and remove the unsuspended thread task form the suspended thread task queue, while also allocating at least one share of the rate-determining resource to the newly unsuspended thread task. The selection of a thread task from the suspended thread task queue may be performed in many manners, such as on a first-in-first-out (FIFO) basis.
As a further refinement of the queuing of suspended thread tasks associated with a rate-determining resource, the suspended thread task queue may comprise a priority queue, such as a heap, based on an ordered priority indicator that is used to order the selection of suspended thread tasks from the suspended thread task priority queue. For instance, where the thread tasks are hosted by threads having ordered priorities (e.g., numeric priorities where higher numbers indicate higher thread priority), the priority queue may be structured such that the thread task associated with the highest priority among all such threads is selected first for unsuspending when a share of the rate-determining resource becomes available. Many other techniques for suspending and unsuspending thread tasks in response to the dynamic availability of a rate-determining resource may be devised by those of ordinary skill in the art while implementing the techniques provided herein.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
As used in this application, the terms “component,” “module,” “system”, “interface”, and the like are generally intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Additionally it may be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
Moreover, the word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as advantageous over other aspects or designs. Rather, use of the word exemplary is intended to present concepts in a concrete fashion. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless specified otherwise, or clear from context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims may generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form.
Also, although the disclosure has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The disclosure includes all such modifications and alterations and is limited only by the scope of the following claims. In particular regard to the various functions performed by the above described components (e.g., elements, resources, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure. In addition, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms “includes”, “having”, “has”, “with”, or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising.”