This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2007-119839, filed Apr. 27, 2007, the entire contents of which are incorporated herein by reference.
1. Field
One embodiment of the invention relates to program processing, and in particular to program processing for parallel processing.
2. Description of the Related Art
In conventional multi-thread parallel processing, a plurality of thread is generated, and each thread is forced to programming assuming synchronous processing. For example, it is necessary to disperse processing ensuring synchronization at several positions in a program, in order to keep the order of execution appropriate. This complicates program debugging, and increases maintenance costs.
Jpn. Pat. Appln. KOKAI Publication No. 2005-258920 discloses a method of realizing parallel processing based on a result of execution of a thread and a dependent relationship between threads, when a plurality of thread is generated. In this method, it is necessary to previously and quantitatively define a thread that is redundantly executed. This arises a problem that flexibility of program changing is lost.
It is necessary to previously determine a dependent relationship between programs or between threads for parallel processing of programs by keeping an appropriate execution order. It is also preferable to provide a scheme to dynamically adjust the load of execution of each program, according to occasional situations.
A general architecture that implements the various feature of the invention will now be described with reference to the drawings. The drawings and the associated descriptions are provided to illustrate embodiments of the invention and not to limit the scope of the invention.
Various embodiments according to the invention will be described hereinafter with reference to the accompanying drawings. In general, according to one embodiment of the invention, a program processing method includes converting parallel execution control description into graph data structure generating information, extracting a program module based on preceding information included in the graph data structure generating information when input data is given, generating a node indicating an execution unit of the program module for the extracted program module, adding the generated node to a graph data structure configured based on preceding and subsequent information defined in the graph data structure generating information, executing a program module corresponding to a node included in a graph data structure existing at that time, by setting values for the parameter, based on performance information of the node when all nodes indicating a program module defined in the preceding information have been processed, and obtaining and saving performance information of the node when a program module corresponding to the node has been executed.
The processor 100 has a function of interpreting program code stored in various storage units and executing a process previously described as a program. In
The memory 101 indicates a storage unit composed of a semiconductor, for example. A program processed by the processor 100 is previously read into the memory 101 accessible at a relatively high speed, and accessed by the processor 100 during execution of a program.
The HDD 102 indicates a magnetic disc unit, for example. The HDD 102 can store a large amount of data, compared with the memory 101, but the access speed is lower. Program code processed by the processor 100 is previously stored in the HDD 102, and only a processing part is read into the memory 101.
The internal bus 103 is a common bus configured to connect the processor 100, memory 101 and HDD 102, to transfer data among them.
The system may be provided with a not-shown image display to output the processing result, or an input/output unit such as a keyboard to input processing data.
A basic module 200 is a program to be executed by the system according to the embodiment. The basic module 200 is configured to receive more than one parameter 200a, and adjust the load of execution by changing an algorithm in use or by changing threshold values and coefficients in an algorithm.
A parallel execution control description 201 is data to be referred to during execution of a program. The parallel execution control description 201 indicates a dependent relationship between basic modules 200 during parallel processing, and is converted to graph data structure generating information 204 by a translator 202 before execution by the information processing system 203.
The translator 202 may be used by a runtime task, etc. for sequential translation during execution of the basic module 200, in addition to previous conversion before processing the basic module 200.
Software at an execution point in the information processing system 203 consists of the basic module 200, the graph data structure generating information 204, a runtime library 205, and an OS 206. The runtime library 205 includes an application program interface (API) used when the basic module 200 is executed in the information processing system 203, and has a function of realizing an exclusive control necessary for parallel processing of basic modules 200. On the other hand, the software may be configured to call up the function of the translator 202 from the runtime library 205, and to convert the parallel execution control description 201 of a part to be next processed at each time, whenever called up in the course of processing the basic module 200.
The OS 206 manages the entire system, including the hardware of the information processing system 203 and scheduling of task.
The programs are not independently processed. When using processing results of other programs, or ensuring data integrity, each program must wait until a specific part of another program is executed. When processing programs with such characteristics in parallel, it is necessary to embed a scheme to know execution states of other programs at several locations in a program. By embedding such a scheme, heretofore, a program is configured to ensure data, realize exclusive control, and cooperate each other.
For example, when a predetermined event occurs during processing of the program A300, the program A300 requests the program B301 to take any action (event 303). Receiving the event 303, the program B301 executes predetermined processing, and when a predetermined condition is established, issues an event 304 for the program C302. By the event 303, the program B301 replies the processing result received from the program A300 to the program A300, as an event 305.
However, when a program itself is written to realize synchronous processing in parallel processing, consideration is required in addition to primary logic, and a program becomes complex. During the time waiting for the end of another program, resources are wastefully consumed. Further, the processing efficiency is largely fluctuated by a slight shift in timing, and later program modification becomes difficult.
In the information processing system according to this embodiment, a method for acceleration component-based design of a basic module and compact management of parallel processing definition is proposed, by dividing synchronous processing and data transfer at necessary portions, and defining the relation between them as parallel execution control description. A method of dynamically adjusting a load of execution of each basic module configured as a component is also proposed.
It is assumed that the program A400 executes a thread 402, and the program B401 executes a thread 407. It is assumed that when the program A400 is executed up to a point 406, the processing result needs to be transferred to the program B401. After executing the thread 402, the program A400 informs the program B401 of the processing result as an event 404. The program B401 can execute a thread 405 only when the processing results of the event 404 and thread 407 are obtained. After the thread 402 is processed, the program A400 executes programs subsequent to the point 406 as a thread 403.
The above thread 402 is a part that can be unconditionally processed. At the point 406, a processing result to be notified to another thread during execution of a program can be obtained. There are other points requiring a processing result from another thread as a condition to start processing.
As shown in
The node 600 as a graph structure of a basic module has a dependent relationship with other nodes by a link. Viewing as a node as shown in
The link 601 is connected to an output end of another node required to obtain data necessary for the node 600 to execute predetermined processing. The link 601 has definition information to indicate which output end is to be linked.
The connector 602 has identifying information to identify data to be output after the node 600 finishes processing. Subsequent nodes can judge whether the executable conditions are established, based on the identifying information of the connector 602 and parallel execution control description 201.
When the executable conditions are assumed established by the runtime library, the node 600 is queued to an executable queue 603 in units of node as shown in
The information about a link to a preceding node defines conditions of a node that is to become a preceding node of that node. For example, a node to output a predetermined data type or a node having a specific ID is defined.
The graph data structure generating information 700 expresses the corresponding basic module 200 as a node, and is used as information to add this basic module to an existing graph data structure as shown in
The runtime library managing multi-thread processing accepts input data to be processed (block S101). The runtime library sets the operation environment to be called up from each core to execute multi-thread processing. Therefore, a parallel program can be captured as a model operated mainly by a core, not a model operated mainly by runtime, and a queue for synchronization in parallel processing can be reduced by decreasing the runtime overhead. If the operation environment is configured so that only one runtime task calls up a basic module, a task to execute a basic module and a runtime task are frequency switched, and the overhead is increased. A runtime task judges existence of input data (block S102), and when there is no input data (No in block S102), terminates this processing flow.
When there is input data (Yes in block S102), a runtime task extracts the graph data structure generating information 204 taking this input data as an input, and obtains them (block S103). The output data of basic module 200 is previously divided into several types to be described in the types of output buffer of the graph data structure generating information 700. When the graph data structure generating information 204 using the input data as an input is extracted, the information whose data type matches the input data is extracted, based on the data type that is to be the input data included in the information about the link to a preceding node described in the graph data structure generating information 700.
Next, the node 600 corresponding to the graph data structure generating information 700 obtained in block S103 is generated (block S104). When two or more graph data structure generating information 700 are extracted, the node 600 corresponding to each of these graph data structure generating information is generated.
The generated node 600 is added to an existing graph data structure (block S105). The existing graph data structure mentioned here is a structure of a dependent relationship before/after generated nodes as shown in
Next, whether processing of each node corresponding to a node preceding to the added node, included in the existing graph data structure, is judged completed or not (block S106). When all preceding nodes are completed for a certain node (Yes in block S106), conditions for starting execution of this node are regarded as established, and this node is queued to the executable queue 603 (block S107).
In contrast, when there is a preceding node not completely processed (No in block S106), the processing of this node cannot be started, and the flow is terminated. As described above, even if the node 600 is generated, the basic module 200 corresponding to that node is not immediately executed, and the execution is held until a dependent relationship with other added nodes of the graph data structure is satisfied.
First, the performance information of the executed basic module 200 is obtained and saved (block S201), and the executed flag of that node in the graph data structure is set to “processed” (block S202).
Whether all subsequent nodes included in the graph data structure of that node have been processed is judged (block S203). When all subsequent nodes have been processed (Yes in block S203), that node can be deleted from the graph data structure (block S204). At this time, as the output data of that node is not used, the output buffer secured is released. In contrast, when subsequent nodes include one not yet processed, the output data of that node may be used in the basic module of a subsequent node, and must not be deleted from the graph data structure.
Whether all preceding nodes have been processed for each of all nodes included in the graph data structure is judged (block S205). When there is a node whose all preceding nodes have been processed (Yes in block S205), that node is regarded as having established execution start conditions, and queued to the executable queue 603 (block S206). For a node whose preceding nodes include one not yet processed, whether that node is processed is judged again at the end of processing the preceding nodes.
Next, a next processing node is selected from executable nodes queued to the executable queue 603, based on predetermined conditions (block S207). The predetermined conditions include an oldest node queued, a node with many subsequent nodes, and a costly node, for example. The cost of each node may be calculated by the following equation.
Generally, a throughput of parallel processing is increased by processing nodes sequentially from a higher cost node. The frequency of nonscheduled execution means the frequency of a situation that no node is queued to the executable queue 603 during execution of its basic module. This situation means that an underflow occurs in the executable queue 603, and is not preferable because the efficiency of parallel processing of the basic module 200 is lowered. As the basic module 200 under execution at this time is calculated at a higher cost, it is processed early, and an effect to prevent a bottleneck can be expected.
Coefficients α-δ in the linear expression of the above cost calculation equation may use predetermined values, or may be configured to dynamically change while monitoring the processing situation.
If executable node is not exists (No in block S208), the process of
After the output buffer is secured, values of more than one parameter receivable by the corresponding basic module is set, based on the performance information obtained at the time of the previous execution of basic module corresponding to that node and saved (block S211). As a result, execution of the basic module 200 corresponding to this node is started.
In block S201, a set of parameters and execution time of the processed basic module 200 is recorded as performance information. The principle of determining parameters from this performance information in block S211 will be explained by referring to the flowchart of
First, performance data such as quality of generated data and execution time is collected by changing parameters of each module (changing in the direction to decrease a default value and load) while keeping the real-time restrictions, by using the information about the execution time of each module (block S301). Collection of this performance data is executed by the number of times until obtaining least minimum data to build up a mathematic model.
Then, a mathematic model is built up based on the obtained performance data (block S302). This equation expresses a parameter having an influence on the quality of generated data and execution time of a program, and is basically a linear expression, but sometimes, the terms of second and third degrees are demanded.
The quality of generated data and execution time are expressed as follows, for example.
According to the mathematic model built up as above, whether there is an allowance is judged from the system CPU use rate and program execution time, and parameters are changed (block S303). For example, when there is an allowance in the execution time, a parameter of the module of the term having the largest influence on the quality are changed. When there is no allowance in the execution time, a parameter having no influence on the quality but having a large influence on the execution time are changed.
Each time program generation data is obtained, blocks S302 to S303 are basically executed. However, when buffering has an allowance in stream processing, the blocks may be executed at a longer interval. A speed may be estimated each time a basic module is executed.
In the above configuration, a runtime task independently selects an executable basic module 200, and sequentially updates the graph data structure, thereby executing parallel processing. Therefore, a series of such processing need not be considered as an application. Further, the basic module 200 does not include a part branched from other tasks, and adjustment is unnecessary for other tasks in execution. It is also possible to realize a scheme that dynamically adjusts an execution load of each program according to the circumstances.
Therefore, it is possible to provide a programming environment, in which a program can be created without considering parallel processing, and can be flexibly executed even in multi-thread parallel processing.
While certain embodiments of the inventions have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2007-119839 | Apr 2007 | JP | national |