The present invention relates to a computing job information managing device for managing information by allocating identification information to a computing job independent of an existing computing job control server device, a client terminal of a user communicating data with the computing job information managing device, and a computing job information managing system including the device and the terminal.
In a numerical analysis field, a necessary physical quantity in a temperature distribution etc. is obtained on a model by, for example, executing computation with a specified boundary condition and model.
Conventionally, when in-progress data of computation is to be checked, a computing job information managing system is first referenced, a computer to which a user input computing job has been allocated is checked, and then the computer is logged in, and the log file generated by a computing program is opened by a text editor, thereby confirming the in-progress computation data. However, the problem is that the operation is a time and labor consuming process.
Recently, a computing time has been shortened by performing parallel computation with a plurality of CPUs (central processing units) loaded into a computer.
However, since a computer is loaded with a plurality of CPUs, the computer executes a plurality of computing jobs. In this case, operations of reading data using the interface of an executing program simultaneously occur in a computer among a plurality of users. That is, there is the possibility of illegal access.
As a technique of performing parallel computation, for example, the patent document 1 discloses a parallel processing system to which a plurality of clients are connected to a centralized server. In this system, a client can be a submitter (source of a request) and a client that processes a work-load. In addition, a centralized server performs work-load balancing, and starts and completes a job. Other commands and status information are directly communicated among a plurality of clients who process the work-load without a centralized server.
Patent Document 1: Japanese Laid-open Patent Publication No. 2005-004740 “Peer-to-Peer Job Monitor and Control in Grid Computing System”
The computing job information managing device according to the first aspect of the present invention manages computing job information using a computing job controller and a plurality of computers allocated computing jobs by the computing job controller. The computing job information managing device includes: an identification information assignment unit for assigning computing job identification information independent of the computing job controller to an input computing job upon receipt of a notification that a terminal has input the computing job to the computing job controller together with user identification information and identification information about a computation executing program from a terminal; a computing job information storage unit for storing a record in which at least the computing job identification information is associated with the user identification information; a matching unit for performing a matching process using user identification information and identification information about a computation executing program upon receipt of the state change information about the computing job from a computer assigned a computing job together with user identification information and identification information about a computation executing program; and a computing job identification information access control unit for writing received state change information to a related item of a record of the computing job information storage unit associated with the received state change information by the matching process.
The computing job information managing device according to the second aspect of the present invention is based on the first aspect above, and further includes: an available port determination unit for determining whether or not there is an available port for a computer executing computation upon receipt of a request to acquire in-progress data of the computation or data of a result of the computation; and a computer information transmission unit for transmitting to a terminal a combination of address information about a computer that performs computation and the number of available ports when there are available ports.
The computing job information managing device according to the third aspect of the present invention is based on the first aspect further includes: a second matching unit for performing a matching process using user identification information and identification information about a computation executing program upon receipt of a feature value as a part of data output by a computation executing program together with user identification information and identification information about a computation executing program as a feature value entry notification; a second computing job identification information access control unit for writing a received feature value to a related item of a record of the computing job information storage unit associated with the received feature value by the matching process; a collective job completion determination unit for determining whether or not all computing jobs belonging to a collective job have been completed when the feature value entry notification is issued from a computing job belonging to the collective job; a statistical analysis execution unit for executing a statistical analysis using each feature value of a computing job belonging to the collective job when it is determined that all computing jobs belonging to the collective job have been completed; and an analysis result transmission unit for transmitting an analysis result to a terminal.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed
The embodiments of the present invention are described below in detail with reference to the attached drawings.
In each embodiment of the present invention, a distribution memory system is configured by a cluster forming a parallel computer by connecting a computer over a network. An interface for parallelism can be, for example, an MPI (message passing interface).
The client device 4 is a terminal device operated by a user. The user generates data specifying an analysis condition and a model of a computation program for a structure analysis etc. using a UI (user interface) unit (not illustrated in the attached drawings) of the client device 4, issues a job input notification to the computing job information managing device 3 described later, and transmits a visualize request for all jobs input by the user to the computing job information managing device 3.
The job control server device 1 assigns identification information (JOB_ID) to the job input from the client device 4, and allocates the job to one or more computers 2. The JOB_ID allocated to the job functions as the information identifying the job and shared by the job control server device 1 and the (one or more) computers executing the job.
In each of the embodiments of the present invention, a plurality of computers are connected over a LAN to configure a parallel computer so that one job can be processed in parallel. One of a plurality of computers acts as a head node collecting input data and output data of the job. There can be a node storing input data and another node storing output data.
The job control server device 1 and the computer 2 are provided for an existing system.
The computing job information managing device 3 assigns a JOB_ID independent of the JOB_ID assigned by the job control server device 1 to the job input by a user to the job control server device 1 through the client device 4. The computing job information managing device 3 is described later in detail.
In the second embodiment described later, a (collective) JOB_ID is assigned to a plurality of jobs. As distinguished from the collective JOB_ID, a job is defined as a single job in the second embodiment, and an ID assigned to one job is referred to as a single JOB_ID.
In the first embodiment, a user issued a visualize request through the client device 4 to visualize the in-progress data of the computation of all jobs input from the client device 4, and display the data on the display of the client device 4.
In the second embodiment of the present invention, when a plurality of jobs are executed by a user through the client device 4, a part of the result of the computation of each job is used as a feature value, and the notification of the feature value is transmitted from each computer 2 to the computing job information managing device 3 after the completion of the computation, thereby performing the statistical analysis on the result of the computation of the plurality of jobs.
In any embodiment, it is preferable for a user to specify a combination of analysis condition data and model data as input data for the executing program designated by an execute command name (also referred to as a “executing program ID”) using the UI unit of the client device 4. The number of combinations of analysis condition data and model data specified for an executing program is one for a single job, and a plurality of combinations for a collective job.
Normally, the analysis condition data and model data are stored in a file, they are called an analysis condition file and a model file respectively.
A model file (model data) is a file (data) storing the information about the shape of a substance simulated in an executing program. An analysis condition file (analysis condition data) is a file (data) specifying the amount (temperature distribution, stress, etc.) computed by an executing program. The analysis condition file stores a user ID.
In
If a combination of the number of jobs belonging to the collective job, the analysis condition file, and the model file is completely specified in step S2, and the “next” button on the selection screen in
On the other hand, if the “next” button on the selection screen in
In step S4, when a combination of an analysis condition file belonging to a single job and a model file is completely specified and the “next” button on the selection screen in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
On the other hand, the computer can recognize only the user ID set in the analysis condition file. Therefore, the DB server device 10 converts the user ID included in the information received from the computer into a user name using the storage unit (not illustrated in the attached drawings) in which the user name is associated with the user ID. In addition, the DB server device 10 also includes a storage unit in which a computer name is associated with a computer IP address.
In
Upon receipt of the job input notification, the DB server device assigns a single JOB_ID to the single job by automatic numbering when the input job is a single job. When the input job is a collective job, it assigns a collective JOB_ID to the collective job by automatic numbering, and also assigns a single JOB_ID to all single jobs belonging to the collective job by automatic numbering.
In (2) illustrated in
The job control server device to which one or more jobs are input assigns a JOB_ID shared between the job control server device and the computer to each of the input jobs, and allocates each of the input jobs to each computer in (3).
When the execution of the allocated job is started or terminated, each computer uses the function responsive to the start and end of the execution, and issues a state change notification to the DB server device in (4). The state change notification has the data structure as illustrated in
Upon receipt of the state change notification, the DB server device associates the state change notification with any of the job assigned an ID by the device by performing a matching process by referring to the value of each item included in the state change notification.
In
The job input notification has the data structure as illustrated in
If it is determined that the input job is a single job, then the job input notifying unit 11 activates the single/collective JOB_ID assignment unit 12 and notifies it of the single job, and acquires a single JOB_ID.
If it is determined that the input job is a collective job, then the job input notifying unit 11 activates the single/collective JOB_ID assignment unit 12 and notifies it of the number of the analysis condition file names or the model file names included in
In the next step S12, the job input notifying unit 11 generates a template of 1-line data of the job information storage unit illustrated in
In
The state change notification has the data structure as illustrated in
Then, the state change notifying unit 13 writes the value included in the state change notification to the item of the executing computer IP address in the acquired record in step S22, writes “during computation” to the item of the job state to change the record, and writes the changed record to the job information storage unit 21.
In the first embodiment, the client computer issues a visualize request to the DB server device, thereby acquiring the information about the computer executing the computation of all jobs executed by a user through the client computer. Then, using the first interface (request to acquire in-progress computation data) provided by the executing program, the client computer accesses the array of the executing programs and acquires the in-progress data of the computation at the client computer side.
As illustrated in
The “use permission” is flag information (1 indicating “permitted”, and 0 indicating “not permitted (port unused)”) indicating whether or not the client computer is permitted to use a port. The “during communication” is flag information (1 indicating “during communication”, and 0 indicating “not in communication”) indicating whether or not the client computer permitted to use a port is communicating with the computer having the port.
As illustrated in
The “type of array” is flag information (0 indicating the array storing data of a node, and 1 indicating the array storing data of a cell center) for determining which array storing a specific value on the cell set on an object to be simulated is to be acquired from the executing program. The “size” is an item indicating what size of data is to be acquired from the array. The “starting position” is an item for determining the starting position in the array when the data determined for the item of the “size” is acquired.
In
The DB server device receives the visualize request, generates the information about the computer that has executed computation on all jobs of the user, and transmits the information as reply information (2) to the client computer.
That is, if there is an available port for a computer that has executed computation, then a notification of a combination of the IP address of the computer and the number of the available port is transmitted to the client computer. If there is no available port for the computer, then a notification of no available port is transmitted to the client computer.
At the client computer that has received the combination of the IP address of the computer and the number of the available port from the DB server device, a “visualized data acquisition” button (not illustrated in the attached drawings) in (3), thereby transmitting a request to acquire in-progress computation data (socket communication) to an executing program on the computer having the IP address in (3). The request to acquire the in-progress computation data has the data structure as illustrated in
The request to acquire the in-progress computation data is made through the first interface provided by the executing program as described above, but the executing program also has the second interface for answering the request to acquire data to a port and the third interface for answering upon completion of the data transfer from the port.
In (4), a notification that a socket communication is being performed is transmitted from the computer that has received the request to acquire in-progress computation data to the DB server device using the second interface. Upon receipt of the notification, the DB server device changes the value of the item “during communication” of the computer of the computer information storage unit illustrated in
In addition, the executing program of the computer that has received the request to acquire the in-progress computation data transmits the data in the range of the specified array to the client computer in (5).
When the transmission of the data is completed, the notification that the socket communication has been completed is transmitted from the executing program of the computer that has received the request to acquire in-progress computation data to the DB server device using the third interface in (6). Upon receipt of the notification, the DB server device changes the values of the items “use permission” and “during communication” of the computer of the computer information storage unit illustrated in
The client computer that has received the in-progress computation data performs the visualizing process using the in-progress computation data. For example, when only two variables have changed in the in-progress computation data, the two variables are set on the axes of abscissas and ordinates to perform the two-dimensional visualization and display a processing result on the display unit of the client computer. When only three variables have changed in the in-progress computation data, the three variables are set on the X, Y, and Z axis directions orthogonal to one another to perform the three-dimensional visualization and display a processing result on the display unit of the client computer.
In
In the next step S32, the visualize request processing unit 16 determines whether or not the process has been completed on all computers in the acquired list. If it is determined that the process has not been completed, then a port number acquisition unit 19 acquires the information (items of a computer IP address, a number of connection-permitted port, a use permission, and during communication) about the current computer from the computer information storage unit 22. Then, the visualize request processing unit 16 determines whether or not there is a connection-permitted port in the current computer, that is, whether or not there is a port in which the value of the item “use permission” is set to “0” indicating that the value is not permitted.
When there are connection-permitted ports, for example, a port having the smallest port number in those ports can be connected in step S35. In the example illustrated in
On the other hand, if there are no connection-permitted ports, the visualize request processing unit 16 adds the notification that there is no available port for the computer to the reply information to the client computer in step S34, and control is returned to step S32.
If it is determined in step S32 that the processes have been completed on all computers in the list, then the visualize request processing unit 16 determines in step S36 whether or not the reply information to the client computer is null, that is, whether or not a visualize request was issued before starting the computation.
If it is determined in step S36 that the reply information is null, then a reply information transmission unit 18 notifies the client computer (user) that the computation has not been started yet in step S38, thereby terminating the series of processes.
On the other hand, if it is determined that the reply information is not null, then the reply information transmission unit 18 transmits the reply information to which data is sequentially added in steps S34 and S35 to the client computer in step S37, thereby terminating the series of processes.
Described next is the second embodiment. In the second embodiment, a user performs a collective job through the client device (client computer) 4. The executing program provides the fourth interface for accessing the service published by the DB server device. The fourth interface outputs a part of the output data as a feature value to the DB server device at the end of the program. For example, when N feature values (integers) and M feature values (double-precision) are output, the following items are output to the DB server device.
Each time a single job belonging to a collective job is completed, each feature value of the single job is output to the DB server device. When all single jobs belonging to the collective job are completed, a statistical analysis is performed on the DB server device using each feature value of the single jobs belonging to the collective job.
As illustrated in
As illustrated in
The operations of the job input notifying unit 11 and the state change notifying unit 13, and the data structure of the job information storage unit 21 are the same as in the first embodiment, and the descriptions are omitted here.
As illustrated in
The “total number of jobs” indicates the number of single jobs included in a collective job. The “number of computation-completed jobs” indicates the number of single jobs for which the computation has been completed in the single jobs included in a collective job. The “collective job state” has the settings of “completed” if the computation has been completed for all single jobs included in a collective job, and “during computation” if the computation has not been completed in all single jobs included in a collective job.
The “analysis type” is flag information (for example, 1 indicates a statistical analysis, 2 indicates an analysis on the basis of an experiment planning method, and 3 for an analysis performed on the basis of quality engineering) for determining what evaluating method is to be adopted when a statistical analysis is performed using each feature value of a single job belonging to a collective job for which the computation of all single jobs has been completed. The “control factor orthogonal table type” includes a value significant when the value set in the “analysis type” is an “analysis on the basis of an experiment planning method”. The “number of control factors”, and the “control factor orthogonal table type including errors” have values significant when the value set in the “analysis type” is the “analysis on the basis of quality engineering”.
In
The DB server device receives a feature value entry notification, associates (matches) the notification with a record in the job information storage unit 21, and writes a feature value to the related item of the record. It also determines on the collective job including the single job for which the computation has just been completed as to whether or not all single jobs in the collective job have been completed.
If it is determined that the computation of all single jobs included in the collective job has been completed, then a statistical analysis is performed using each feature value of the single jobs belonging to the collective job, and the analysis result is transmitted by email to the client computer in (2).
In
The feature value entry notification has the data structure as illustrated in
The feature value entry notifying unit 31 changes the record by writing a value included in the feature value entry notification to the item of the feature value in the acquired record in step S42, and writes the changed record to the job information storage unit 21.
Then, in step S43, the feature value entry notifying unit 31 activates a collective job completion determination unit 33 and determines on the collective job including the single job processed in steps S41 and S42 as to whether or not the computation of all single jobs included in the collective job has been completed.
If it is determined in step S43 that the computation of all single jobs included in the collective job has not been completed, then the series of processes are terminated.
On the other hand, if it is determined in step S43 that the computation of all single jobs included in the collective job has been completed, then the feature value entry notifying unit 31 activates the statistical analysis execution unit 36 in step S44. The statistical analysis execution unit 36 reads the information about the analysis type etc., and performs the statistical analysis depending on the analysis type using each feature value of the single job belonging to the collective job. In step S45, the analysis result transmission unit 37 transmits an analysis result to the client computer (user), thereby terminating the series of processes.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment(s) of the present invention has (have) been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International PCT Application No. PCT/JP2007/000310, filed on Mar. 27, 2007, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2007/000310 | Mar 2007 | US |
Child | 12543563 | US |