This invention relates to the field of computational models, and more specifically to the parallel execution of a computer program, and to a method for automatic task-level parallelization and concurrency control.
Given the increasing number of processing elements of computing devices, it becomes apparent that the mainstream sequential computational model is not well-suitable to create computer programs that take advantage of the underlying processing capabilities. Multiple processing elements on a single computing device require highly parallel computing; therefore, highly parallel computing is moving from a scientific discipline of a few skilled software engineers to mainstream software development. Though declarative languages (functional and other computational models) perhaps are more suitable for highly parallel computing and more popular among scientific community, many real-life applications are naturally represented by imperative programs; thus, most professional programmers in mainstream software development choose imperative languages.
In essence, parallel computing is composed of tasks to be executed in parallel, where a task is a unit of computation. Programmers need the ability to define, start/stop, and coordinate parallel tasks. While a significant progress is done in compilers for automatic data-level parallelization (in which the same operation is performed on many data elements by many processing elements at the same time), task-level parallelization is still done manually. Task-level parallelization refers to operations that are grouped in tasks and performed on the same or different data by many processing elements at the same time. Known systems and methods require defining boundaries of tasks and using various concurrency control mechanisms by programmers explicitly. As a result, it becomes more difficult to ensure efficient parallel execution on a large number of processing elements, including automatic scalability for the increasing number of processing elements.
This invention provides a method for automatic task-level parallelization of execution of a computer program with automatic concurrency control. The method frees an application programmer from details of such parallelization. As a result, embodiments of this invention would allow making efficient and scalable parallel execution of a computer program regardless of skills of an application programmer.
This invention primarily addresses data access mechanism. According to this invention, shared data in memory must be queried. Such memory queries represent side-effects of their enclosing tasks and allow determining how tasks must be executed with regard to each other based on intersections of their queried data. Tasks that have intentions to modify the same data (their side-effects intersect) must be executed sequentially; otherwise, tasks can be executed in parallel. The term “task” as used herein refers to a function, method, procedure, etc. This invention does not change the sequential programming model but rather enhances it with intrinsic parallelism.
The object of this invention is to define a method for automatically parallel computer programs. The main advantages of this invention are portability and ease of use. The advantages occur due to the declarative way of memory access. Such high-level abstraction allows providing efficient parallel execution of a computer program with compilers and run-time libraries which embody this invention. Embodiments of this invention in languages, compilers, and run-time libraries would vary in forms based on a variety of computing platforms and programming languages.
Here I disclose a general method for automatic task-level parallelization of execution of a computer program with automatic concurrency control. Embodiments of this invention include programming languages, compilers, and run-time libraries.
In general, as illustrated on
According of this invention, a task queries memory 206 to get references (pointers) to shared data 208 instead of using global variables and collections. Said memory query defines shared data to be processed by a task and an intention to read or modify the data. A result of said memory query is local variables and collections that store references (pointers) to the queried data in memory or to a copy of the queried data. Said references (pointers) are only valid within a lifetime of a task activation 205 which queried memory and safe to use for the intended purpose with disregard to parallel execution of other tasks of a computer program. If data are queried for read-only access, then the data are safe to be read; and if data are queried for writable access, then the data are safe to be created, updated, or deleted as well as to be read. The term “memory query” as used herein refers to an application programming interface of a run-time library 200 to create, read, update, or delete shared data 208 in memory. The run-time library 200 plays roles of a memory manager and task scheduler.
Using a language-independent notation and for the illustration purpose only, the following illustrates this invention in comparison with the traditional programming model. In the traditional programming model, function A creates an instance of data structure Foo and assigns it to global variable G, then functions B and C update attribute Y of the instance of data structure Foo in parallel:
Thus, in the traditional programming model, it is responsibility of a programmer to use an appropriate concurrency control mechanism to prevent the concurrent modification of attribute Y. On the contrary, this invention proposes to query memory:
where q and f are local variables, YIELD queries memory for writable access. According to this invention, the run-time library 200 will execute the memory query of function C when function B is complete, or in reverse bases on FIFO scheduling. Therefore, it frees a programmer from handling the concurrency manually. Moreover, it deduces boundaries of the tasks automatically: from the memory queries to the end of their enclosing functions.
This invention distinguishes said memory query for read-only access from said memory query for writable access. Using a language-independent notation and for the illustration purpose only, the following memory query illustrates said memory query for read-only access:
, and the following memory query illustrates said memory query for writable access:
According to this invention, a run-time library 200 which embodies this invention is responsible to handle said memory queries 206. The run-time library forms queues 201 of active tasks with said memory queries which produce intersected data sets as illustrated on
When the run-time library extracts a task from a queue 201 for the execution 300, the run-time library eliminates the queue from the subsequent extraction of its tasks. Such a queue is called blocked. When a task is complete, the run-time library restores the corresponding queue for the subsequent extraction of its tasks. Such a queue is called ready. Thus, as illustrated on
In another embodiment, the run-time library employs the multiple-read, single-write strategy in which multiple sequential tasks from the same queue with said memory queries for read-only access can be executed in parallel. Then, when the run-time library extracts a task from a queue for the execution, it checks the next task also. Only when the next task is pending with said memory query for writable access, the queue is blocked and eliminated from the subsequent extraction of its tasks.
In another embodiment, the run-time library employs the copy-on-write strategy in which tasks with said memory queries for read-only access can be executed in parallel with tasks with said memory queries for writable access. Then, only when the run-time library extracts a task with said memory query for writable access, the queue is blocked and eliminated from the subsequent extraction of its tasks.
Other embodiments of this invention can use other strategies to form the queues and extract tasks from them. For instance, in another strategy, tasks with said memory queries for read-only access are executed without said queueing and only tasks with said memory queries for writable access are placed in said queues.
According to this invention, said memory queries are not executed as they are called but scheduled/queued to be executed as illustrated on
According to this invention, every task activation has its own stack, and execution of a task can be suspended on one processing element and resumed on another processing element. When execution of a task is resumed, the memory query is executed. A result of the execution is local variables and collections that store references (pointers) to the queried data in memory or to a copy of the queried data (the queried data can be copied to a CPU cache also). The run-time library is responsible for both allocating memory for data and keeping references (pointers) to the allocated data as well as deallocating memory. The run-time library can use any suitable collections to keep the references (pointers): lists, hash-tables, red-black-trees, etc. A programmer is responsible for defining data types. The run-time library uses the definitions as the blueprint for allocating memory for new data and searching existing data by their attributes.
Using a language-independent notation and for the illustration purpose only, the following illustrates a definition of data structure Foo with two attributes X and Y:
This specification does not provide an exact syntax of said memory query and data definition. One skilled in the art can define an exact syntax of said memory query and data definition with relevance to a concrete programming language and an underlying computing platform. A preferred embodiment of this invention is a specialized programming language with a corresponding compiler and run-time libraries. Although, it is to be understood that this invention is not limited to the preferred embodiment and can be embodied into any existing or new programming language without departing from scope of this invention.