1) Field of the Invention
The present invention relates to a method for loading a Multiple-Processor Multiple-Data program to each of a plurality of processing elements.
2) Description of the Related Art
Recently, some computer systems include a plurality of processors and adopt a distributed-memory multiprocessors scheme to improve the processing performance (for example, see Japanese Patent Application Laid-Open Publication No. S56-40935 or No. H7-64938).
In such a system, a Single-Program Multiple-Data (SPMD) program is often executed by means of an inter-processor communication mechanism, such as a Message-Passing Interface (MPI).
For example, in the program shown in
In the above scheme, however, each PE has to include a memory with a sufficient capacity to store the entire program because each PE is allocated the entire program in spite of the fact that it executes only a part of the program (hereinafter “a partial program”). Therefore, an increase in cost cannot be avoided.
By the way, a system adopting the above scheme conventionally includes a plurality of chips (or a plurality of boards) due to limitations of semiconductor integration technology. However, with recent improved semiconductor integration technology, a plurality of PEs can be accommodated in one chip.
In this case, data exchange among the PEs via an interconnection network can be performed at a higher speed by directly reading/writing data from/in a shared memory. A scheme with a shared memory readable and writable from a plurality of processors is called “a distributed-shared-memory multiprocessor scheme”.
It is assumed that the SM of PE#1 is allocated to an address of 0×3000 or lower in the memory space of PE#0 and to an address of 0×2000 or lower in the memory space of PE#1. For example, PE#0 writes data at 0×3000 and PE#1 reads data from 0×2000 to exchange the data between PE#0 and PE#1.
Here, only PE#0 can read and write the SMs of all of the other PEs. On the other hand, each of the PEs can only read and write the SM and the LM within the same PE allocated to memory space thereof.
In such a computer system, a Multiple-Programming Multiple-Data (MPMD) program can solve the above cost problem.
The MPMD program, unlike the SPMD program including all partial programs, includes a plurality of program each of which is dedicated to each PE. Each program for each PE does not include a partial program for other PEs, thereby reducing the capacity of the memory.
A function Th0 shown
After requesting PE#1 to perform the process (that is, after Th0-2), PE#0 performs another process unrelated to PE#1. Here, for convenience of description, only a cooperative portion between PE#0 and PE#1 is shown.
The applicant has already filed a patent application for an invention regarding the creation of a load module of such a program as shown in
In the computer system adopting the distributed-shared-memory multiprocessor scheme, a piece of data can have a plurality of addresses for each PE. Therefore, a linker according to the above invention converts, for example, an address of the variable “in” in the MPMD program for PE#0 to “0×3000”, whereas converting the same variable “in” in the MPMD program for PE#1 to “0×2000”, thereby creating a load module executable by each PE.
However, conventionally, a multi PE loader for efficiently distributing the load module created according to the invention has not been present.
That is, since the conventional loader is targeted for the SMPD program, the loader only transfers the load module in the ROM 404 to the memory 402 within the PE that executes the loader. Therefore, when there is a plurality of PEs, each PE has to execute its loader. In this case, since different programs are loaded by different PEs, a different loader is required for each PE.
It is an object of the present invention to at least solve the problems in the conventional technology.
A method according to an aspect of the present invention is a method for loading a multiple processor multiple data (MPMD) program to a computer system. The computer system includes a first processing element (PE) and a plurality of second PEs, and the first PE and the second PEs respectively include a memory. The method includes allocating the memory of each second PE to memory space of the first PE; and transferring the MPMD program from the memory of the first PE to the memory of each second PE that is allocated to the memory space.
A computer-readable recording medium according to another aspect of the present invention stores a loader program that causes a computer system to execute the above method.
A computer system according to still another aspect of the present invention includes a first processing element (PE) and a plurality of second PEs. The first PE and the second PEs respectively include a memory. The first PE includes an allocating unit that allocates the memory of each second PE to memory space of the first PE; and a transferring unit that transfers the MPMD program to the memory of each second PE that is allocated to the memory space.
The other objects, features, and advantages of the present invention are specifically set forth in or will become apparent from the following detailed description of the invention when read in conjunction with the accompanying drawings.
Exemplary embodiments of the present invention will be explained in detail below with reference to the accompanying drawings. First, the basic concept of the present invention is briefly described.
On the other hand,
An initializing unit 1200 performs initialization (such as zero-clearing a variable, or setting parameters) of the loader. A memory space allocating unit 1201 allocates the LM of each PE other than the master PE to the memory space of the master PE.
A program transferring unit 1202 shown in
An execution instructing unit 1203 instructs each PE to execute the MPMD program loaded into the memory 402 of each PE by the program transferring unit 1202.
In the PE#0 (the master PE) executing the multi PE loader, after initialization of the loader by the initializing unit 1200 (step S1401), the memory space allocating unit 1201 sequentially allocates the LMs of PE#1 to PE#n to the memory space of PE#0. That is, as shown in
Furthermore, in the PE#0, the program transferring unit 1202 sequentially loads the MPMD program for each PE into the LM of each PE. That is, the program transferring unit 1202 loads an MPMD program for PE#0 into the area to which the LM of PE#0 has been allocated, an MPMD program for PE#1 into the area to which the LM of PE#1 has been allocated, . . . , and an MPMD program for PE#n into the area to which the LM of PE#n has been allocated (step S1403).
Then, in the PE#0, the execution instructing unit 1203 instructs the processors 401 of PE#1 to PE#n to execute the loaded program (step S1404). Thereafter, each PE receiving the instruction executes the loaded program (step S1405).
According to the first embodiment described above, the programs for the respective PEs stored in the ROM 404 can be distributed to the memories 402 of relevant PEs by the multi PE loader executed by the master PE.
In the first embodiment, however, the memory 402 of the master PE requires a capacity sufficient to allocate all the LMs of PE#1 to PE#n since they are allocated to different areas of the memory space of PE#0 respectively. In contrast, in the second embodiment described below, the same area is reused by the LMs of PE#1 to PE#n in turn to reduce a hardware capacity required for PE#0.
The functional structure of a computer system according to the second embodiment is similar to that according to the first embodiment shown in
In the PE#0 (the master PE) executing the multi PE loader, after initialization of the loader by the initializing unit 1200 (step S1501), the memory space allocating unit 1201 allocates the LM of PE#k to the memory space of PE#0 (step S1502). The program transferring unit 1202 then loads an MPMD program for PE#k into the area to which the LM of PE#k is allocated (step S1503).
Then, after repeatedly performing the process at steps S1502 and S1503 on PEs with k from 1 to n, the execution instructing unit 1203 instructs the processors 401 of PE#1 to PE#n to execute the loaded program (step S1504). Thereafter, each PE receiving the instruction executes the loaded program (step S1505).
According to the second embodiment described above, the LMs of PE#1 to PE#n are allocated to the same area in the memory space of PE#0 as shown in
In the first and second embodiments described above, the LMs of PE#1 to PE#n are allocated one by one to the memory space of PE#0. However, if the number of PEs is increased, an increase in overhead required for this mapping becomes not negligible. In contrast, in the third embodiment described below, programs are transferred by a DMA controller.
In the third embodiment, PE#0 (the master PE) includes a DMA controller for transferring the programs from PE#0 to PE#1 to PE#n, in addition to the hardware components shown in
A program transferring unit 1702 has an identical function to the program transferring unit 1202 in loading a program for each PE recorded on the ROM 404 into the memory 402 of each PE. However, the program transferring unit 1702 is realized not by the processor 401 but the DMA controller.
The computer system includes a definition information setting unit 1701, whereas it does not include a functional unit corresponding to the memory space allocating unit 1201 in the first and second embodiments.
The definition information setting unit 1701 sets definition information required for the program transferring unit 1702 (that is, the DMA controller) in a predetermined register. Specifically, the definition information includes the following three pieces of information: (1) a transfer destination (the ID of a transfer-destination PE and an address in that PE), (2) the size of a transfer area, and (3) a transfer source (the ID of a transfer-source PE and an address in that PE). It is assumed herein that these pieces of definition information are previously retained in the loader.
In the PE#0 (the master PE) executing the multi PE loader, after initialization of the loader by the initializing unit 1700 (step S1801), the definition information setting unit 1701 sets definition information for transferring data from the ROM 404 to the memory 402 of PE#k (step S1802). Then, the program transferring unit 1702 loads the program into PE#k according to the information (step S1803).
Then, after repeatedly performing the process at steps S1802 and S1803 on PEs with k from 1 to n, the execution instructing unit 1703 instructs the processors 401 of PE#1 to PE#n to execute the loaded program (step S1804). Thereafter, each PE receiving the instruction executes the loaded program (step S1805).
According to the third embodiment described above, as shown in
The program loading methods according to the first to third embodiments are realized by the processor 401 executing the multi PE loader stored in the ROM 404. Alternatively, this program can be recorded on various recording medium other than the ROM 404, such as an HD, FD, CD-ROM, MO, and DVD. The program can be distributed in the form of the recording medium or via a network, such as the Internet.
As described above, according to the present invention, even when each of the PEs is caused to execute each different program, the program for each PE is appropriately loaded to each PE under the control of the master PE, thereby allowing a load module of a MPMD program into a computer system adopting a distributed-shared-memory-type multiprocessor scheme.
Although the invention has been described with respect to a specific embodiment for a complete and clear disclosure, the appended claims are not to be thus limited but are to be construed as embodying all modifications and alternative constructions that may occur to one skilled in the art which fairly fall within the basic teaching herein set forth.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP03/05806 | May 2003 | US |
Child | 11135659 | May 2005 | US |