This disclosure relates generally to computer software, and more particularly to systems and methods for caching and reusing threads in a multi-threaded execution environment.
Modern computer systems conventionally include the ability to execute applications that include multiple threads that may execute simultaneously. While some applications may statically allocate a set of executing threads, it is common for applications to dynamically create and destroy threads as processing demands. This thread creation and destruction, however, requires significant processing time and additionally requires memory allocation operations in both application and operating system kernel memory. Therefore, dynamic thread management may introduce significant latencies, leading to scalability problems as concurrency is increased and giving rise to a need to manage thread creation and destruction efficiently. What is needed are techniques that mitigate these creation and destruction latencies to improve scalability in these applications.
Methods, techniques and systems for providing a thread cache are described. A computer may implement a thread manager including a process-local cache of standby threads for an application. Upon request to create a thread for the application, the thread manager may select a standby thread from the process-local cache to create the requested thread, initialize thread-local storage elements for the selected thread and schedule the thread for execution. Upon request to terminate a thread of the application, the thread manager may place the thread in an unscheduled state and add the thread to the process-local cache of standby threads. The thread manager may also add and remove standby threads to the process-local cache of standby threads in the event the thread manager determines that the number of standby threads in the process-local cache is lies outside a range defined my upper and lower thresholds.
While the disclosure is described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the disclosure is not limited to embodiments or drawings described. It should be understood that the drawings and detailed description hereto are not intended to limit the disclosure to the particular form disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. Any headings used herein are for organizational purposes only and are not meant to limit the scope of the description or the claims. As used herein, the word “may” is used in a permissive sense (i.e., meaning having the potential to) rather than the mandatory sense (i.e. meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to.
Various units, circuits, or other components may be described as “configured to” perform a task or tasks. In such contexts, “configured to” is a broad recitation of structure generally meaning “having circuitry that” performs the task or tasks during operation. As such, the unit/circuit/component can be configured to perform the task even when the unit/circuit/component is not currently on. In general, the circuitry that forms the structure corresponding to “configured to” may include hardware circuits. Similarly, various units/circuits/components may be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase “configured to.” Reciting a unit/circuit/component that is configured to perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) interpretation for that unit/circuit/component.
This specification includes references to “one embodiment” or “an embodiment.” The appearances of the phrases “in one embodiment” or “in an embodiment” do not necessarily refer to the same embodiment, although embodiments that include any combination of the features are generally contemplated, unless expressly disclaimed herein. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
Modern computer systems conventionally include the ability to execute applications that include multiple threads that may execute simultaneously. While some applications may statically allocate a set of executing threads, it is common for applications to dynamically create and destroy threads as processing demands. Creating and destroying threads, however, may incur significant processing time leading to high latency, even absent concurrency, and scalability problems as concurrency increases. To address these performance issues, a process-local cache of threads may be used. Specifically, instead of destroying terminated threads, the terminated threads may be cached for reuse in the context of subsequent thread creation requests. With caching, the cost of creating a new thread may drop as much as an order of magnitude over conventional approaches.
To mitigate the costs of thread creation and destruction, methods, techniques and systems for providing a thread cache are described below. A computer may implement a thread manager including a process-local cache of standby threads for an application. Upon request to create a thread for the application, the thread manager may select a standby thread from the process-local cache that maintains standby threads to create the requested thread, initialize thread-local storage elements for the selected thread and schedule the thread for execution. Upon request to terminate a thread of the application, the thread manager may place the thread in an unscheduled state and add the thread to the process-local cache of standby threads. The thread manager may also add and remove standby threads to the process-local cache of standby threads in the event the thread manager determines that the number of standby threads in the process-local cache lies outside a range defined my upper and lower thresholds. Threads that have logically terminated are thus retained for subsequent reuse, sparing the cost of creating new threads in the future.
Thread manager 150 may include a thread cache 160 and an application programming interface (API) 170 to provide thread management for the application 180. The thread cache 160 of the thread manager 150 may further include a dynamically varying number of standby threads, each including an associated standby thread data structure 165.
The thread manager 150 may interface with an operating system kernel 130 through a separate API 135 to manage threads including the standby threads in the thread cache. Threads created through the API 135 may have associated kernel-mode thread data structures 140 used by the operating system kernel 130 for scheduling and executing individual ones of the threads. The kernel-mode thread data structures 140 may further include thread storage, including thread stacks, as well as memory used to save and restore processor and thread state in various embodiments.
The application 180 may include a dynamically varying number of application threads 190. The application 180 may manage the application threads 190 using the thread manager 150 via the API 170. For example, the application 180 may request, through the API 170, the creation of a thread or the termination of a thread.
The thread manager 150 may implement thread creation in response to a request received via the API 170. In some embodiments, the thread manager may allocate a standby thread from the thread cache 160 to create the requested thread. In the event a standby thread exists, the thread manager 150 may remove the standby thread from the thread cache 160, may initialize data structure(s) 165 of the standby thread, and schedule the thread for execution using the API 135. The standby thread data structure(s) 165 may include thread-local storage and a thread stack, in various embodiments.
The thread cache 160 may be implemented in any number of ways in various embodiments. For example, in some embodiments the thread cache may be implemented as a linked list of standby threads, where the linked list implements a stack, or last-in-first-out (LIFO) list of standby threads. To remove a standby thread from the stack, the thread manager 130 may remove, or “pop”, a thread at the head of the list and update the head of the list to identify the next standby thread in the list. To add a new standby thread to the stack, the thread manager 130 may insert, or “push”, the new standby thread onto the head of the stack. This stack implementation, however, is only one possible embodiment of the thread cache 160 and is not intended to be limiting, as any number of thread cache implementations may be envisioned.
Should no standby thread exist in the thread cache 160, the thread manager 150 may create a thread using the API 135. Once created, the thread manager 150 may add the newly created thread as a standby thread to the thread cache 160, in some embodiments, or it may use the newly created thread to satisfy the thread creation request, in other embodiments. Implementation of thread creation requests is discussed in further detail below in
In addition, to improve the likelihood that a standby thread will exist in thread cache 160 when a thread creation request is received, the thread manager 150 may, in some embodiments, monitor the number of standby threads in the thread cache 160. Should the number of standby threads not exceed a lower threshold, the thread manager may create one or more standby thread using the API 135. Once created, the thread manager 150 may then add the newly created standby thread(s) to the thread cache 160. Should the number of standby threads exceed an upper threshold, the thread manager may remove one or more standby thread using the API 135. Further details are discussed below in
The thread manager may implement thread termination in response to a request received via the API 170. In some embodiments, the thread manager may retain the identified thread in the thread cache rather than destroying the thread through the API 135. To retain the thread, in some embodiments the thread manager 160 may place the thread into a standby, or unscheduled, state using the API 135 and add the thread to the thread cache 160. Retaining the thread in the thread cache may result in the thread-specific data structures, including the data structures 165 and 140, being retained to enable lower latency creation of future threads.
If a standby thread is available, as shown in 220, in some embodiments the process may proceed to step 230, where the standby thread is allocated by removing the standby thread from the cache of standby threads, for example by removing a first standby thread from a linked list of available standby threads as discussed above in regard to
If, however, a standby thread is not available, as shown in 220, in some embodiments the process may proceed to step 240, where a new thread is created, such as via the API 135 as shown in
If a standby thread is not available, as shown in 320, in some embodiments the process may proceed to step 330, where one or more new threads may be created, such as via the API 135 as shown in
If, however, a standby thread is available, as shown in 320, in some embodiments the process may proceed directly to step 340, where a standby thread is allocated by removing the standby thread from the cache of standby threads, for example by removing a first standby thread from a linked list of available standby threads as discussed above in regard to
If the number of standby threads in the thread cache exceeds a lower threshold number of standby threads, as shown in 410, then the method proceeds to step 415. If the number of standby threads in the thread cache does not exceed an upper threshold number of standby threads, as shown in 415, then the method is complete.
If, however, the number of standby threads in the thread cache does not exceed the lower threshold number of standby threads, as shown in 410, then the method may proceed to step 420 in some embodiments, where one or more new threads may be created, such as via the API 135 as shown in
If the number of standby threads in the thread cache does exceed the upper threshold number of standby threads, as shown in 415, then the method may proceed to step 430 in some embodiments, where one or more threads may be removed from the thread cache and terminated, such as via the API 135 as shown in
The upper and lower thresholds may be determined statically or dynamically, in various embodiments. For example, thread creation and termination for the application may be tracked to predict future thread management requests in order to dynamically adjust the number of standby threads in the thread cache using the upper and lower thresholds. Memory usage within the application may also be tracked to determine the upper and lower thresholds and system-wide memory resource usage may also be tracked, alone or in combination with application memory usage, in order to optimize system-wide thread caching as well as intra-process caching. These examples are not intended to be limiting, as any number of methods of determining upper and lower thresholds may be envisioned.
Once received, the method proceeds to step 510, where the thread of the application process is placed in an unscheduled, or standby, state such as via the API 135 as shown in
In some embodiments, the method may not retain a portion of memory assigned to the standby thread, or the method may alter the configuration of memory assigned to the standby thread in order to allow the operating system kernel, such as the operating system kernel 130 as shown in
The application 600 may make a first request to create a thread for an application process, as shown in 630, to the thread manager 610 in some embodiments. The thread manager 610 may then determine that no standby threads are available in a thread cache, such as the thread cache 160 as shown in
The application 600 may then make a request to terminate a thread of an application process, as shown in 660, to the thread manager 610 in some embodiments. As shown in 665, the thread manager may then, in some embodiments, retain the thread in a standby, unscheduled state, in a thread cache, such as the thread cache 160 as shown in
The application 600 may make a second request to create a thread for an application process, as shown in 670, to the thread manager 610 in some embodiments. The thread manager 610 may then determine that a standby thread is available in the thread cache and satisfy the second request using the standby thread from the thread cache, as shown in 675. The thread manager 610 may then, in some embodiments, place the standby thread in a scheduled state and return the thread, as shown in 680, to the application 600.
Some of the mechanisms described herein may be provided as a computer program product, or software, that may include a non-transitory, computer-readable storage medium having stored thereon instructions which may be used to program a computer system 600 (or other electronic devices) to perform a process according to various embodiments. A computer-readable storage medium may include any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The machine-readable storage medium may include, but is not limited to, magnetic storage medium (e.g., floppy diskette); optical storage medium (e.g., CD-ROM); magneto-optical storage medium; read only memory (ROM); random access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; electrical, or other types of medium suitable for storing program instructions. In addition, program instructions may be communicated using optical, acoustical or other form of propagated signal (e.g., carrier waves, infrared signals, digital signals, etc.)
In various embodiments, computer system 700 may include one or more processors 710; each may include multiple cores, any of which may be single- or multi-threaded. For example, multiple processor cores may be included in a single processor chip (e.g., a single processor 710), and multiple processor chips may be included in computer system 700. Each of the processors 710 may include a cache or a hierarchy of caches, in various embodiments. For example, each processor chip 710 may include multiple L1 caches (e.g., one per processor core) and one or more other caches (which may be shared by the processor cores on a single processor). The computer system 700 may also include one or more storage devices 770 (e.g. optical storage, magnetic storage, hard drive, tape drive, solid state memory, etc.) and one or more system memories 710 (e.g., one or more of cache, SRAM, DRAM, RDRAM, EDO RAM, DDR RAM, SDRAM, Rambus RAM, EEPROM, etc.). In some embodiments, one or more of the storage device(s) 770 may be implemented as a module on a memory bus (e.g., on I/O interface 730) that is similar in form and/or function to a single in-line memory module (SIMM) or to a dual in-line memory module (DIMM). Various embodiments may include fewer or additional components not illustrated in
The one or more processors 710, the storage device(s) 770, and the system memory 720 may be coupled to the system interconnect 730. The system memory 720 may contain application data 726 and program code 725. Application data 726 may contain various data structures while program code 725 may be executable to implement one or more applications, shared libraries, and/or operating systems.
Program instructions 725 may be encoded in platform native binary, any interpreted language such as Java′ byte-code, or in any other language such as C/C++, the Java™ programming language, etc., or in any combination thereof. In various embodiments, applications, operating systems, and/or shared libraries may each be implemented in any of various programming languages or methods. For example, in one embodiment, operating system may be based on the Java programming language, while in other embodiments it may be written using the C or C++ programming languages. Similarly, applications may be written using the Java programming language, C, C++, or another programming language, according to various embodiments. Moreover, in some embodiments, applications, operating system, and/shared libraries may not be implemented using the same programming language. For example, applications may be C++ based, while shared libraries may be developed using C.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. For example, although many of the embodiments are described in terms of particular types of operations that support synchronization within multi-threaded applications that access particular shared resources, it should be noted that the techniques and mechanisms disclosed herein for accessing and/or operating on shared resources may be applicable in other contexts in which applications access and/or operate on different types of shared resources than those described in the examples herein. It is intended that the following claims be interpreted to embrace all such variations and modifications.
In conclusion, embodiments providing a thread manager including thread cache are described are disclosed. Applications requesting dynamic thread creation and termination may interact with the thread cache to reduce latency and improve scalability in highly concurrent applications.