The present invention relates to a distributed processing system, a distributed processing method, and a program recording medium, and more particularly, to a distributed processing system, a distributed processing method, and a program recording medium which divide data and perform distributed processing.
A distributed processing system as illustrated in
The distributed processing system having such a configuration is operated as described below. Slave computers 321 to 323 divide one piece of data and hold the divided data. The divided data are referred to as data partitions. Master computer 310 generates procedures as tasks, each of which is performed on the data partition held in each of slave computers 321 to 323, and instructs each of the slave computers to perform the task. Each of slave computers 321 to 323 performs the instructed task on the held data partition. In this manner, desired processing is performed on all the data partitions.
In PTL 1, a system that divides image data and performs distributed processing is disclosed. The distributed processing system performs distributed image processing by sending divided image data and parameters (a procedure and an identification tag) associated with the image data to work stations for performing the distributed processing.
[PTL 1] Japanese Unexamined Patent Application Publication No. H8-16766
[PTL 2] Japanese Unexamined Patent Application Publication No. 2000-020327
Methods of performing processing on data partitions in distributed processing differs depending on data formats of data subjected to the distributed processing. Further, the above-mentioned distributed processing systems do not concern the data formats of the data subjected to the distributed processing. Thus, there arises a problem that the distributed processing cannot be performed on various data formats and lacks versatility.
Therefore, an object of the present invention is to solve the above-mentioned problem that distributed processing cannot be performed on various data formats and lacks versatility.
A distributed processing system according to one aspect of the present invention, is configured to include an interface means for receiving a data format of data subjected to distributed processing and a parameter depending on the data format of the data subjected to the distributed processing, and a divided data generation means for generating, from the data, data partitions being processing units used when performing the distributed processing on the data, and generating meta data including information based on the parameter that is associated with each of the data partitions and depends on the data format of the original data from which the data partition is generated.
A program recording medium according to another aspect of the present invention, is configured to record a program causing an information processing device to realize an interface means for receiving a data format of data subjected to distributed processing and a parameter depending on the data format of the data subjected to the distributed processing, and a divided data generation means for generating, from the data, data partitions being processing units used when performing the distributed processing on the data, and generating meta data including information based on the parameter that is associated with each of the data partitions and depends on the data format of the original data from which the data partition is generated.
A distributed processing method according to another aspect of the present invention, is configured to include, by an information processing device, receiving a data format of data subjected to distributed processing and a parameter depending on the data format of the data subjected to the distributed processing, generating, from the data, data partitions being processing units used when performing the distributed processing on the data, and generating meta data including information based on the parameter that is associated with each of the data partitions and depends on the data format of the original data from which the data partition is generated.
According to the present invention having the above-mentioned configuration, distributed processing depending on a data format of data subjected to the distributed processing can be performed, and versatility can be improved.
With reference to
As illustrated in
As illustrated in
In this example embodiment, data obtained by dividing a distributed processing target are referred to as “data partitions”. The distributed processing dealt in this example embodiment is realized in such a way that the processing on a data partition is regarded as a unit and is performed by the plurality of accelerators in a distributed manner.
Host 1 is an information processing device including an arithmetic operation device and a memory device. Further, as illustrated in
As illustrated in
Now, the configuration of the above-mentioned host 1 is further described in detail.
API unit 12 (interface unit) provides user program 11 with an application program interface for creating a program that causes the plurality of accelerators 2 to perform the distributed processing. API unit 12 requests accelerator control unit 14 to perform user program 11, which is created with use of the interface provided to user program 11 by API unit 12.
In
Specifically, the map processing is an interface for performing the same processing on each of the data elements included in the data. In this case, the processing specified by “ProcessFunc” is applied to each of the elements of the image. “ProcessFunc” is a user-defined function provided by user program 11, and is specific processing applied to each of the elements of the image. Note that user program 11 is provided randomly from the outside, and hence the user-defined function is also provided randomly from the outside. Further, the output data file is named as “FileName2”. In this program, at the point when “outputFile” is called, accelerator control unit 14 is requested to perform the processing of the accelerators specified in the first line to the third line.
As in the example of “outputFile”, API unit 12 defines an interface for triggering (starting) the request for the processing. As described above, the processing with delay in which the actual processing is performed by the plurality of accelerators 2 after the user program 11 calls the interface is referred to as delay evaluation in general. A person skilled in this field generally understands that various types of processing can be performed on the data elements included in “DDD” by defining processing other than the “map” as processing provided by the API unit 12.
In this example embodiment, in addition to the above-mentioned “image”, various data formats such as a “dense matrix” and a “sparse matrix” can be dealt as the data formats for performing the distributed processing. In such a case, the dense matrix uses “DenseMatrixReader” in place of “ImageReader” illustrated in
In
A unit for expressing the widths is the number of pixels. A “data partition size” refers to a vertical width and a horizontal width of a divided image included in each of the data partitions. A “partition fringe width” (redundant part size) refers to a width of a region of an image, which is held redundantly by overlapping with other partitions adjacent to each of the data partitions.
In
As described above, API unit 12 receives, from user program 11, a data format of the data subjected to the distributed processing and parameters depending on the data format of the data subjected to the distributed processing. Further, the parameters depending on the data format include information based on a data structure of the data such as an image size, a matrix size, and a non-zero element as described above.
Data storage unit 13 stores data being a distributed processing target before being divided. Further, data storage unit 13 is a file system, for example, and stores and manages the data with use of the memory device held in host 1.
Program analysis unit 141 receives a request for performing user program 11 from API unit 12. The processing specified in user program 11 is performed for each of the data partitions obtained by dividing the data being the processing target. The processing for the entire data, which is specified in user program 11, is referred to as a “task”, and the processing for the data partitions obtained by dividing the data is referred to as a “subtask”. The subtasks are generated from the task. Program analysis unit 141 generates the necessary number of subtasks required for the processing of the data, and requests data scheduler unit 142 to prepare the data partitions being processing targets in accelerator 2. In the example of
Data scheduler unit 142 requests divided data generation unit 144 to prepare input data partitions of the subtasks that accelerator 2 is requested to perform. When preparation of the input data partitions regarding the plurality of subtasks is requested from program analysis unit 141, data scheduler unit 142 determines an optical preparation procedure.
Divided data generation unit 144 (divided data generation means) receives the request for the preparation of the input data partitions from data scheduler unit 142 to accelerator 2. At this point, the accelerator for preparing the input data partitions is also specified. Divided data generation unit 144 reads, from data storage unit 13, the data in the range associated with the input data partitions of the subtasks, and loads the data in specified accelerator 2. In this manner, the data partitions being processing units used when performing the distributed processing are generated. When reading the data, an identifier such as a file name, which is given from user program 11 to the interface of API unit 12, is used. Further, at this point, meta data regarding the loaded data partitions are generated and registered in meta data storage unit 146 (meta data storage means).
Examples of the data partitions generated by divided data generation unit 144 are illustrated in
The meta data generated for each of the data partitions are information associated with each of the data partitions, and are information depending on data formats of the original data from which the data partitions are generated. This indicates that types of parameters included in the meta data depend on the data formats. The data formats and the information of the meta data depending on the data formats are generated from the information in
In
An example where API unit 12 illustrated in
As described above, the meta data generated by divided data generation unit 144 includes the parameters depending on the data formats of the original data before the division, from which the data partitions are generated, and the information based on the data structures of the data partitions.
Task scheduler unit 143 receives a notification of subtasks, which have completed preparation for the input data partitions, from data scheduler unit 142, and requests task performing unit 145 to perform the subtasks. In a case where a plurality of subtasks are being performed or waiting to be performed, scheduling for determining the performing order of those subtasks is performed.
Task performing unit 145 (task performing means) causes the specified accelerators to perform the subtasks specified by task scheduler unit 143. That is to say, task performing unit 145 transfers the meta data together with the data partitions to a program function for processing the data partitions. Note that the meta data is transferred from meta data storage unit 146. As an example, the case where the subtask is performed by accelerator 21 is considered. Processor 21a for performing the subtask receives a user-defined function for the subtask, addresses of the data partitions in memory 21b, which are processing targets subjected to the user-defined function, and the meta data of the data partition. Processor 21a uses the meta data and performs the user-defined function. Accordingly, the processing depending on the data format can be realized.
As an example of performing the processing depending on the data format, description is made on the processing on the image illustrated in
Next, mainly with reference to a flowchart of
When user program 11 is performed, the interface provided by API unit 12 is used in user program 11 (Step S1). At this point, the data format of the data to be processed in the distributed manner and the parameters depending on the data format are transferred to the interface.
When a command for triggering the processing is called at the interface provided by API unit 12, accelerator control unit 14 is requested to perform the processing of user program 11, which has been commanded to API unit 12 by that time. That is to say, delay evaluation is performed on the processing of user program 11 (Step S2).
Program analysis unit 141 that receives the request to perform user program 11 generates entries of the subtasks for performing the processing of user program 11 for each of the data partitions obtained by dividing the data to be processed (Step S3). Subsequently, program analysis unit 141 requests data scheduler unit 142 to prepare the data partitions being input of the subtasks in any of accelerators 2.
Data scheduler unit 142 selects the accelerator for preparing the input data partitions, and requests divided data generation unit 144 to prepare the input data partitions (Step S4). In a case where data scheduler unit 142 receives a request to prepare input data partitions of a plurality of subtasks from program analysis unit 141, scheduling for determining an optical order for preparing the data partitions is performed.
Divided data generation unit 144 reads part of the data to be processed, which is stored in data storage unit 13, the part associated with the input data partitions of the subtasks. Then, divided data generation unit 144 loads the read part to the memory of accelerator 2 specified by data scheduler unit 142 (Step S5). Divided data generation unit 144 generates the meta data depending on the data to be processed from which the data partitions are loaded, and stores the generated meta data in meta data storage unit 146 (Step S6).
Task scheduler unit 143 receives, from data scheduler unit 142, a notification of the subtasks which have completed the preparation of the input data partitions, and requests task performing unit 145 to perform the subtasks. At this point, in a case where a plurality of subtasks that are not yet performed are present, scheduling for determining an order for performing the subtasks is performed (Step S7).
Task performing unit 145 causes accelerator 2, which have completed the preparation for the input data partitions, to perform the subtasks notified from task scheduler unit 143 (Step S8). At this point, the meta data of the input data partitions, which are stored in meta data storage unit 146, are transferred to the user-defined function for performing the subtasks. Then, the user-defined function is performed by using the transferred meta data.
As described above, this example embodiment includes API unit 12 which provides the interface receiving, from the user program, the data format of the data subjected to the distributed processing and the information depending on the data format. Further, this example embodiment includes divided data generation unit 144. Divided data generation unit 144 generates the meta data depending on the data format subjected to the distributed processing for each of the data partitions by combining the information that API unit 12 receives from the user program at the time of generating the data partitions being units for performing the distributed processing and the information acquired at the time of generating the data partitions. Further, this example embodiment includes task performing unit 145 which transfers the meta data to the user-defined function in the case where the user-defined function provided by the user program is performed for the data partitions in the accelerator. With this configuration, this example embodiment operates in such a way as to transfer the meta data to the user-defined function in the case of receiving, from the user program, the data format of the data to be processed in the distributed manner and the information depending on the data format, generating the meta data for each of the data partitions by combining the received information and the information acquired at the time of generating the data partitions, and performing the processing for the data partitions by using the user-defined function. As a result, the distributed processing depending on the data format can be realized, and the distributed processing can be performed to the various data formats.
Next, with reference to
As illustrated in
Divided data generation means 202 generates the meta data based on, for example, the information received by interface means 201 and the information acquired by reading the original data from which the data partitions are generated.
The distributed processing system having the above-mentioned configuration operates in such a way as to transfer the meta data to the user-defined function in the case of receiving, from the user program, the data format of the data subjected to the distributed processing and the information depending on the data format, generating the meta data for each of the data partitions by combining the received information and the information acquired at the time of generating the data partitions, and performing the processing for the data partitions by using the user-defined function. As a result, the distributed processing depending on the data format can be realized, and the distributed processing can be performed on the various data formats.
Each unit of host 1 illustrated in
In the above-mentioned example embodiments, as an example of performance by processor 50 illustrated in
The provided computer program may be stored in a readable and writable memory (temporary storage medium) or a computer-readable memory device such as a hard disk device. Further, in such a case, it can be understood that the present invention is configured by codes indicating the computer program or a memory medium storing the computer program.
A part of or an entirety of the example embodiments can be described as in the following supplementary notes. Now, an outline of the distributed processing system, the program recording medium, and the distributed processing method according to the present invention is described. However, the present invention is not limited to the following configurations.
A distributed processing system including:
interface means for receiving a data format of data subjected to distributed processing and a parameter depending on the data format of the data subjected to the distributed processing; and
divided data generation means for generating, from the data, data partitions being processing units used when performing the distributed processing on the data, and generating meta data including information based on the parameter that is associated with each of the data partitions and depends on the data format of the original data from which the data partition is generated.
The distributed processing system according to supplementary note 1, wherein
the divided data generation means generates the meta data based on information received by the interface means and information acquired by reading the original data from which the data partition is generated.
The distributed processing system according to supplementary note 2, wherein
the divided data generation means generates the meta data including the parameter depending on the data format of the original data from which the data partition is generated.
The distributed processing system according to supplementary note 2 or 3, wherein
the divided data generation means generates the meta data, based on a data structure of the data partition.
The distributed processing system according to any one of supplementary notes 1 to 4, wherein
the parameter includes information based on a data structure of the data.
The distributed processing system according to any one of supplementary notes 1 to 5, wherein
the data format of the data is an image, and
the parameter includes an image size of the data, an image size of the data partition to be generated, and a redundant part size of the data partition to be generated.
The distributed processing system according to supplementary note 5.1, wherein
the meta data include an image size of the data, an image size of the data partition to be generated, and an offset of the data partition to be generated from a start of the data.
The distributed processing system according to any one of supplementary notes 1 to 5, wherein
the data format of the data is a dense matrix, and
the parameter includes a matrix size of the data and a matrix size of the data partition to be generated.
The distributed processing system according to supplementary note 5.3, wherein
the meta data include a matrix size of the data partition to be generated.
The distributed processing system according to any one of supplementary notes 1 to 5, wherein
the data format of the data is a sparse matrix, and
the parameter includes a matrix size of the data, a matrix size of the data partition to be generated, and a non-zero element number in the data.
The distributed processing system according to supplementary note 5.5, wherein
the meta data includes a matrix size of the data partition to be generated and a non-zero element number in the data partition to be generated.
The distributed processing system according to any one of supplementary notes 1 to 5, further including
task performing means for transferring the meta data together with the data partition to a program function for processing the data partition.
The distributed processing system according to supplementary note 6, wherein
the program function for processing the data partition is a user-defined function received from an outside.
The distributed processing system according to supplementary note 6 or 7, further including
meta data storage means for storing the meta data generated by the divided data generation means and providing the task performing means with the meta data being stored, when the task performing means causes the program function for processing the data partition to be executed.
A program recording medium recording a program causing an information processing device to realize:
an interface means for receiving a data format of data subjected to distributed processing and a parameter depending on the data format of the data subjected to the distributed processing; and
a divided data generation means for generating, from the data, data partitions being processing units used when performing the distributed processing on the data, and generating meta data including information based on the parameter that is associated with each of the data partitions and depends on the data format of the original data from which the data partition is generated.
The program recording medium according to supplementary note 9, wherein
the divided data generation means generates the meta data, based on information received by the interface means and information acquired by reading the original data from which the data partition is generated.
The program recording medium according to supplementary note 9.1, wherein
the divided data generation means generates the meta data including the parameter depending on the data format of the original data from which the data partition is generated.
The program recording medium according to supplementary note 9.1 or 9.2, wherein
the divided data generation means generates the meta data, based on a data structure of the data partition.
The program recording medium according to any one of supplementary notes 9 to 9.3, wherein
the parameter includes information based on a data structure of the data.
The program recording medium according to any one of supplementary notes 9 to 9.4, further causing the information processing device to realize
task performing means for transferring the meta data together with the data partition to a program function for processing the data partition.
The program recording medium according to supplementary note 9.5, further causing the information processing device to realize
meta data storage means for storing the meta data generated by the divided data generation means, and providing the task performing means with the meta data being stored, when the task performing means causes the program function for processing the data partition to be executed.
A distributed processing method by an information processing device, the method including:
receiving a data format of data subjected to distributed processing and a parameter depending on the data format of the data subjected to the distributed processing; and
generating, from the data, data partitions being processing units used when performing the distributed processing on the data, and generating meta data including information based on the parameter that is associated with each of the data partitions and depends on the data format of the original data from which the data partition is generated.
The distributed processing method according to supplementary note 10, further including
generating the meta data, based on received information and information acquired by reading the original data from which the data partition is generated.
The distributed processing method according to supplementary note 10.1, further including
generating the meta data including the parameter depending on the data format of the original data from which the data partition is generated.
The distributed processing method according to supplementary note 10.1 or 10.2, further including
generating the meta data, based on a data structure of the data partition.
The distributed processing method according to any one of supplementary notes 10 to 10.3, wherein
the parameter includes information based on a data structure of the data.
The distributed processing method according to any one of supplementary notes 10 to 10.4, wherein
the information processing device further transfers the meta data together with the data partition to a program function for processing the data partition.
The distributed processing method according to supplementary note 10.5, wherein
the information processing device further stores the generated meta data, and provides task performing means with the meta data being stored, when the task performing means causes the program function for processing the data partition to be executed.
The above-mentioned program recording medium is a computer-readable recording medium. The program recording medium is a portable medium such as a flexible disk, an optical disk, a magneto-optical disk, and a semiconductor memory, for example.
As described above, the invention of the present application is described with reference to the example embodiments. However, the invention of the present application is not limited to the above-mentioned example embodiments. Various changes that can be understood by a person skilled in the art can be made to the configuration and details of the invention of the present application without departing from the invention of the present application.
This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2016-204770, filed on Oct. 19, 2016, the disclosure of which is incorporated herein in its entirety by reference.
The present invention is applicable to a case where distributed processing is performed for data of various data formats using an accelerator. As an application field, a computer for image processing or data analysis is conceivable.
Number | Date | Country | Kind |
---|---|---|---|
2016-204770 | Oct 2016 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2017/037460 | 10/17/2017 | WO | 00 |