This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2008-086933, filed Mar. 28, 2008, the entire contents of which are incorporated herein by reference.
1. Field
One embodiment of the invention relates to a technique of verifying programs for, e.g., a computer that mounts a CPU including a plurality of CPU cores or a computer that mounts a plurality of CPUs.
2. Description of the Related Art
In recent years, various types of computers (personal computers) for private use, such as notebook type computers and desktop type computers are widely used. For such computers, demands for information processing capability have been increasing to close to the limits of CPU performance improvement. For example, there is a demand for playing back high resolution moving image data by software.
In view of this, for example, computers which mount a plurality of CPUs, and recently, a CPU including a plurality of CPU cores, have become available. These computers shorten the turnaround time and improve the performance by processing programs in parallel. Various mechanisms for efficiently executing programs in parallel have been proposed (see, e.g., Jpn. Pat. Appln. KOKAI Publication No. 2005-258920).
One parallel processing technique of a program comprises two components, i.e., runtime processing including a scheduler, which assigns processing units in the program to execution units (when a computer mounts a plurality of CPUs, the scheduler assigns the processing units to the CPUs, and when a computer mounts a CPU including a plurality of CPU cores, the scheduler assigns the processing units to the CPU cores), and a processing unit processed on each execution unit.
To accomplish parallel processing of a program, the processing units must keep independent of one another. Assume that the output data of a processing unit “A” is input to processing units “B” and “C”. In this case, the outputs of processing units “B” and “C” should results from only the output data of processing unit “A”. Therefore, the storage areas of the memory that hold the input data of every processing unit must be managed as a read-only area, i.e., un-rewritable area in which no data can be rewritten. If processing unit “B”, which starts before processing unit “C” does, overwrites the input data, it will influence the output of processing unit “C”.
To prevent such an event, or to verify the authenticity of the program, the program is tested by using the test code embedded in itself. The test requires a higher cost than programming. Moreover, in the parallel processing of a program, the reproducibility of program errors is so low that debugging can hardly be achieved.
In order to process any program in parallel, the processing units constituting the program must be verified to have a re-entrant property in consideration of the case where same basic module may be simultaneously read from a plurality of execution unit. This verification also has the same problem as described above.
A general architecture that implements the various feature of the invention will now be described with reference to the drawings. The drawings and the associated descriptions are provided to illustrate embodiments of the invention and not to limit the scope of the invention.
Various embodiments according to the invention will be described hereinafter with reference to the accompanying drawings. In general, according to one embodiment of the invention, an information processing apparatus includes a plurality of execution modules, a system memory shared by the plurality of execution modules, and a scheduler which controls assignment of a plurality of basic modules to the plurality of execution modules in order to execute a program in parallel by the plurality of execution modules. The scheduler saves data items, which is to be input by the execution modules as input data items of the basic modules and is stored in the storage areas of the system memory, in other storage areas of the system memory before the basic modules are executed, and compares the data items stored in the storage areas of the system memory and accessed by the execution modules with the data items saved in the other storage areas of the system memory after the basic modules have been executed.
The processor 1 serves as a central processing unit (CPU) which controls the execution of a program loaded in the main memory 2 from the HDD 3, and includes a plurality of cores 11 serving as main arithmetic circuits (CPU cores).
The main memory 2 is, e.g., a semiconductor storage device, and can be accessed by the processor 1.
The HDD 3 is a low-speed mass storage device (in comparison with the main memory 2) serving as an auxiliary storage in the computer.
Although not shown in
The computer which mounts the processor 1 including the plurality of cores 11 can execute a plurality of programs in parallel, and also execute a plurality of processes in one program in parallel. The schematic configuration of a program, based on parallel processing specifications, which is executed by the computer will be described with reference to
As shown in
In so-called multi-thread processing, as shown in
Therefore, in this embodiment, as shown in
“A” in
A method by which the computer executes the execution program 100 having a unique configuration in that a plurality of basic serial modules 101 and a parallel execution control description 102 are included will now be described.
To execute, in parallel, the execution program 100 having such unique configuration, a runtime library 200 shown in
When data is input, there is a need for executing some of the basic serial modules 101 for processing the data. Each time the need arises, the runtime library 200 dynamically generates/updates a graph data structure 202 on the basis of the graph data structure generation information 201. The graph data structure 202 is graph data representing the relationship between preceding and succeeding nodes to be executed as needed. The runtime library 200 adds the nodes to the graph data structure 202 in consideration of the relationship between preceding and succeeding nodes in an execution waiting state as well as the relationship between the preceding and succeeding nodes to be added.
Upon completion of the execution of a node, the runtime library 200 deletes the node from the graph data structure 202, and checks the presence/absence of a succeeding node which designates the node as a preceding node and which does not have other preceding nodes or of which all other preceding nodes have been completed. If there exists a succeeding node which satisfies the condition, the runtime library 200 assigns the node to one of the cores 11.
With this operation of the runtime library 200, the parallel execution of the plurality of basic serial modules 101 progresses on the basis of the parallel execution control description 102 without contradiction. After the basic serial modules 101 have been executed, the runtime library 200 is exclusively called for checking the input/output data, updating the graph data and selecting a basic serial module 101 that should be executed next. Then, the runtime library 200 returns. The core 11 executes the basic serial module 101 selected by runtime library 200. The other cores 11 call, one after another, the runtime library 200 to acquire the basic serial module 101 and execute the acquired one. The exclusive control of each thread is limited only when the runtime library 200 selects a node from the graph data structure 202 or only when the graph data structure 202 is updated (see
As indicated above, the program is split into such segments as can be executed asynchronously, thus providing a plurality of basic serial modules 101, and the runtime library 200 allocates the basic serial modules 101 to a plurality of cores 11, respectively. Hence, in the computer provides a mechanism that detects the problem that each basic serial module 101 overwrites the input data when the program is executed, without rewriting the source code of the input data. The computer further provides a mechanism that determines whether the basic serial modules 101 have a re-entrant property so that they may be simultaneously called. The mechanisms will be described below, in detail.
As shown in
The node generating module 210 and graph structure interpretion execution engine 220 cooperate to dynamically generate and update a graph data structure 202 based on the graph data structure generation information 201, and to allocate the basic serial modules 101 to a plurality of cores 11 in accordance with the graph data structure 202. The parallel process verification module 230 verifies the parallel processing implemented by the node generating module 210 and graph structure interpretion execution engine 220.
Such restriction, if is imposed, cannot be checked by a compiler when the basic serial modules 101 are described in C or C++ in order to achieve some performance. Further, although a data items can be allocated to the read-only sections, by a program loader, in accordance with different of allocating address spaces, basic serial module 101 of preceding nodes writes the result of executing and, thus, the result of executing each basic serial module 101 cannot be fully protected by means of simple hardware.
This is why the parallel process verification module 230 of the runtime library 200 determines whether the input data has changed or not while the basic serial module 101 is being executed. More specifically, as shown in
The parallel process verification module 230 of the runtime library 200 not only monitors the change in the input data, but also determines whether the basic serial modules 101 have a re-entrant property (or thread safety). More precisely, as shown in
Whether the input data changes may be determined (see
The parallel process verification module 230 of the runtime library 200 saves all input data items in the work memory before each basic serial module 101 is executed (Block A1).
After all input data items in the work memory have been so saved, the parallel process verification module 230 executes each basic serial module 101 (Block A2). Thereafter, the parallel process verification module 230 determines whether the original input coincides with the data saved in the work memory (Block A3). If the original input does not coincide with the data saved (NO in Block A3), the parallel process verification module 230 performs an error processing, for example, displaying a warning message and then terminating the execution of the program (Block A4).
Before the parallel process verification module 230 executes each basic serial module 101, it attains a buffer for outputting another result of executing another basic serial module 101 (Block B1). Then, the module 230 executes the basic serial module 101 and another basic serial module 101 (Block B2).
After executing the two basic serial modules 101, the parallel process verification module 230 determines whether the result of executing one module 101 coincides with the result of executing the other module 101 (Block 53). If the result of executing one module 101 does not coincide with that of executing the other module 101 (NO in Block B3), the parallel process verification module 230 performs an error processing, for example, displaying a warning message and then terminating the execution of the program (Block B4). If the result of executing one module 101 coincides with that of executing the other module 101 (YES in Block B3), the module 230 deallocates one of these results (Block B5). Then, the parallel process verification module 230 parallel process verification module 230 continues the execution of the program.
Thus, in the computer, the parallel process verification module 230 provided in the runtime library 200 verifies the parallel processing. This helps to reduce the cost of verifying any program that should be processed in parallel.
In the embodiment described above, parallel process verification module 230 examines each basic serial module 101 for overwriting of the input data and for a re-entrant property. Moreover, the parallel process verification module 230 may be configured to find, as early as possible, a timing problem that the parallel processing may potentially have.
For example, the parallel process verification module 230 may include the function of managing such a table that holds, as shown in
The various modules of the systems described herein can be implemented as software applications, hardware and/or software modules, or components on one or more computers, such as servers. While the various modules are illustrated separately, they may share some or all of the same underlying logic or code.
While certain embodiments of the inventions have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2008-086933 | Mar 2008 | JP | national |