The present invention relates to an output device, a data structure, an output method, and an output program, and particularly relates to an output device, data structure, output method, and output program used in performance prediction of a distributed processing system using a machine learning technique.
A distributed processing system divides a given process (job) into one or more processes, and executes the processes (tasks) by a plurality of computers in parallel to reduce processing time.
To efficiently execute the job using the distributed processing system, the user needs to appropriately control the execution order of the tasks and assign computer resources, according to the features of the tasks.
For example, it is known that the processing time of the whole job is shortened by starting execution from a task having a long processing time. The user needs to recognize the processing time of each task beforehand, in order to achieve such execution order control that starts execution from a task having a long processing time.
For example, it is also known that the number of tasks executed in parallel is maximized by assigning minimum required the amount of computer resources upon task execution. The user needs to recognize the amount of computer resources required for task processing beforehand, in order to achieve the assignment of minimum required the amount of computer resources.
To recognize the features of the tasks such as processing time and the amount of computer resources beforehand, a method of estimating the features of the tasks using a machine learning technique is available. With this estimation method, for example, the user inputs observation information indicating the behavior of a task to a program including a machine learning algorithm, and executes the program.
By executing the program, the user obtains a mathematical model indicating the feature of the task as an output result. Feeding, into the obtained mathematical model, observation information of a task whose feature has not been recognized yet enables the user to obtain estimation information of the feature of the task.
Patent Literatures (PTLs) 1 to 3 and Non Patent Literatures (NPLs) 1 to 2 describe techniques relating to the estimation of the amount of computer resources required for task processing.
PTL 1 describes a technique of estimating the relationship between the resource usage and the load value from a log of the amount of resources used in previously executed tasks.
PTL2 describes a system for estimating the load characteristics of a program.
PTL3 describes a virtual machine arrangement structure control device including a prediction unit for predicting the peak usage of physical resources per time interval.
NPL 1 describes a technique of deriving the basis function of the amount of resource changes from change information of the amount of resources used by a virtual machine through the use of a wavelet transform, and estimating future the amount of resources demand using the derived basis function.
NPL 2 describes a technique of estimating the amount of resources necessary to satisfy a service level objective (SLO) from past task execution history and a result of executing a short duration test for a task scheduled for assignment, using collaborative filtering.
PTL4 describes an enterprise web mining system for generating online prediction and recommendation.
PTLs 5 to 6 each describe a technique relating to the conversion of information used for processing.
PTL5 describes a printer for assisting in settings upon print output to improve user convenience. The printer described in PTL5 divides, in the case where a print feature value is character information, the character information, and handles each character after the division as an independent print feature value.
PTL6 describes a computer system for performing task assignment in consideration of not only the load information of each individual computer but also the task being executed in each individual computer, the degree of association between the assigned task and another task, and the distance of the computer in a network. The computer system described in PTL6 uses a method of converting the amount of communication data of 100 kilobytes into 1, a method of assigning a value for each band, or a method of numerically converting a packet collision rate.
PTL1: Japanese Patent No. 5354138
PTL2: International Patent Application Publication No. 2011/071010
PTL3: Japanese Patent Application Laid-Open No. 2012-159928
PTL4: Japanese Patent No. 5620933
PTL5: Japanese Patent Application Laid-Open No. 2012-022516
PTL6: Japanese Patent Application Laid-Open No. 2005-310120
NPL 1: Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Sethuraman Subbiah, John Wilkes. “AGILE: elastic distributed resource scaling for Infrastructure-as-a-Service.” In Proc. of the 10th International Conference on Autonomic Computing (ICAC '13), pp. 69-82, 2013.
NPL 2: Christina Delimitrou and Christos Kozyrakis. “Quasar: Resource-Efficient and QoS-Aware Cluster Management.” In Proc. of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14), pp. 127-144, 2014.
To accurately estimate the amount of computer resources required for task processing in the distributed processing system, data indicating the operation of the distributed processing system and task observation data need to be converted into data in a format suitable for the estimation algorithm.
In PTLs 1 to 3 and NPLs 1 to 2, task observation data and the like are not converted into data in a format for the estimation algorithm to accurately estimate the amount of computer resources required for task processing in the distributed processing system. Therefore, in the case of simply using the technique described in any of PTLs 1 to 3 and NPLs 1 to 2, the user may be unable to obtain the amount of computer resources estimates with estimation accuracy that should be available.
In PTLs 4 to 6, too, there is no particular mention about a data format that contributes to the prediction of the operation of the distributed processing system by the estimation algorithm.
The present invention has an object of solving the problems stated above and providing an output device, data structure, output method, and output program that provide information in a format suitable for a model that estimates the amount of computer resources required for task processing in a distributed processing system.
An output device according to the present invention includes an output unit which outputs estimation model application information on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.
A data structure according to the present invention includes estimation model application information generated on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.
An output method according to the present invention includes outputting estimation model application information on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.
An output program according to the present invention causes a computer to execute an output process of outputting estimation model application information on the basis of job feature information indicating the features of the job of a distributed processing system, estimation model application information that is information in a format suitable for an estimation model that estimates the amount of computer resources required for processing a task constituting the job.
According to the present invention, it is possible to provide information in a format suitable for a model that estimates the amount of computer resources required for task processing in a distributed processing system.
[Structure]
The following describes an exemplary embodiment of the present invention with reference to drawings.
The computer resources usage estimation device 100 depicted in
The input data conversion unit 101 has a function of converting job feature information included in the input data used for the generation of an estimation model into estimation model application information which is information in a format suitable for the estimation model to be generated, and outputting data including the estimation model application information.
As depicted in
The task identifier corresponds to an identification symbol of job feature information. The word candidate indicates whether or not a predetermined word is included. In
For example, consider the case of indicating that a character string which is job feature information A corresponding to task identifier Task1 includes word α1. To indicate that word α1 is included, binary information True (true) is set in the word candidate “job feature information A includes word α1?” in the word-containing information of Task1. The word-containing information of Task1 indicates that job feature information A includes word α1.
Consider the case of indicating that a character string which is job feature information B corresponding to task identifier Task2 does not include word βn. To indicate that word βn is not included, binary information False (false) is set in the word candidate “job feature information B includes word βn?” in the word-containing information of Task2. The word-containing information of Task2 indicates that job feature information B does not include word βn.
The task identifier corresponds to an identification symbol of numerical information. The numerical information corresponds to numerical job feature information. In
For example, consider the case of indicating that the label information of numerical information A corresponding to task identifier Task1 is 8. To indicate that the label information of numerical information A is 8, character string information “8” is set in the label information “label information of numerical information A” in the numerical inversion label information of Task1. The numerical inversion label information of Task1 indicates that the label information of numerical information A is 8.
Consider the case of indicating that the label information of numerical information B corresponding to task identifier Task2 is 0. To indicate that the label information of numerical information B is 0, character string information “0” is set in the label information “label information of numerical information B” in the numerical inversion label information of Task2. The numerical inversion label information of Task2 indicates that the label information of numerical information B is 0.
The computer resources usage estimation model generation unit 102 has a function of receiving the data output from the input data conversion unit 101 and generating the estimation model. As depicted in
The computer resources usage estimation unit 103 has a function of estimating the computer resources usage of a task whose feature has not been recognized yet, using the received estimation model. The computer resources usage estimation unit 103 may output an estimate of an index relating to process execution, such as processing time, other than the computer resources usage.
Although the computer resources usage estimation device 100 in this exemplary embodiment estimates computer resources usage, the computer resources usage estimation device 100 may estimate a value other than computer resources usage. For example, the computer resources usage estimation device 100 may estimate task processing time in the distributed processing system. Any value estimated by the computer resources usage estimation device 100 in this exemplary embodiment is expected to have improved estimation accuracy.
The computer resources usage estimation device 100 in this exemplary embodiment is, for example, realized by a central processing unit (CPU) that executes processes according to a program stored in a storage medium. In other words, the input data conversion unit 101, the computer resources usage estimation model generation unit 102, and the computer resources usage estimation unit 103 are, for example, realized by a CPU that executes processes according to program control.
Each unit in the computer resources usage estimation device 100 may be realized by a hardware circuit.
[Operation]
The following describes the operation of the input data conversion unit 101 in this exemplary embodiment, with reference to
The operation of the input data conversion unit 101 in this exemplary embodiment generating, for a job name which is one type of job feature information, word-containing information indicating whether or not each word of a word group constituting the job name is included on the basis of the job feature information is described first, with reference to
When the job feature information depicted in
When forming the word-containing information, the input data conversion unit 101 generates each word candidate name by, for example, prefixing the identifier of the generation source information. The input data conversion unit 101 may generate each word candidate name by any other method, as long as the generated name is uniquely identifiable.
The job name with the task number “1” in the job feature information depicted in
In detail, the input data conversion unit 101 prefixes “Jobname” to each of the words “Cluster”, “Iterator”, “running”, “iteration”, “3”, “over”, “priorPath”, “kmeans”, “46”, and “clusters-2” present in the job name with the task number “1”, to generate the word candidate name.
The input data conversion unit 101 also prefixes “Jobname” to each of the words “5”, “106”, and “clusters-4” not present in the job name with the task number “1” and only present in the job name with the task number “2”, to generate the word candidate name. The input data conversion unit 101 forms the word-containing information by the word candidate group indicating the generated names.
The input data conversion unit 101 generates the same number of pieces of word-containing information as the number of input pieces of job feature information. The input data conversion unit 101 sets the task number of the input job feature information as the task number of the generated word-containing information corresponding to the job feature information.
The input data conversion unit 101 then sets all word candidates in each generated piece of word-containing information to False, as an initialization process (step S102).
Following this, the input data conversion unit 101 divides the job name of the input job feature information into words (step S104). For example, the job name with the task number “1” is divided into the words “Cluster”, “Iterator”, “running”, “iteration”, “3”, “over”, “priorPath”, “kmeans”, “46”, and “clusters-2”.
The delimiter or delimiter character when the input data conversion unit 101 divides the job name into words is, for example, set by the user, the system, or the like. Alternatively, the input data conversion unit 101 may hold the delimiter or delimiter character beforehand.
The input data conversion unit 101 then sets the word candidates in the word-containing information corresponding to the divided words, to True (step S106). The binary information “True” indicates that the set word candidate is included in the job name. The input data conversion unit 101 sets True for the number of divided words (step S107).
For example, in the case of the job feature information of the task number “1”, True is set in each of the word candidates “Jobname-Cluster”, “Jobname-Iterator”, “Jobname-running”, “Jobname-iteration”, “Jobname-3”, “Jobname-over”, “Jobname-priorPath”, “Jobname-kmeans”, “Jobname-46”, and “Jobname-clusters-2” for which the corresponding words are present. Meanwhile, False remains to be set in each of the word candidates “Jobname-5”, “Jobname-106”, and “Jobname-clusters-4” for which the corresponding words are not present.
The data conversion unit 101 may set information other than True in the corresponding word candidate, as long as it is clear that the word candidate is included in the job name. For example, the input data conversion unit 101 may set the numerical value 1 in the corresponding word candidate, instead of True. In the case of setting the numerical value 1, the input data conversion unit 101 sets the numerical value 0 in each word candidate instead of False in the initialization process of step S102.
As a result of the input data conversion unit 101 setting True for the number of divided words (the determination condition in step S107 is met), the word-containing information corresponding to the input job feature information is generated. The input data conversion unit 101 repeatedly performs the process of steps S103 to S108 for the number of input pieces of job feature information.
After generating the word-containing information for the number of input pieces of job feature information (the determination condition in step S108 is met), the input data conversion unit 101 ends the generation process.
The following describes the effect of the information obtained as a result of the conversion as depicted in
By referencing the word-containing information corresponding to the set of tasks that differ in the relationship between task feature information and the amount of computer resources, the computer resources usage estimation unit 103 can classify the tasks included in the task set depending on whether or not a predetermined word set is included.
For example, the task corresponding to each piece of task feature information depicted in
The operation of the input data conversion unit 101 in this exemplary embodiment generating, for a program class name which is one type of job feature information, word-containing information indicating whether or not each word of the word group constituting the class name is included on the basis of the job feature information is described next, with reference to
When the job feature information depicted in
When forming the word-containing information, the input data conversion unit 101 generates each word candidate name by, for example, prefixing the identifier of the generation source information. The input data conversion unit 101 may generate each word candidate name by any other method, as long as the generated name is uniquely identifiable.
The class name with the task number “1” in the job feature information depicted in
In detail, the input data conversion unit 101 prefixes “Class” to each of the words “org”, “apache”, “mahout”, “clustering”, “Aerator”, and “CIMapper” present in the class name with the task number “1”, to generate the word candidate name.
The input data conversion unit 101 also prefixes “Class” to each of the words “cf”, “taste”, “hadoop”, “item”, and “ItemIDIndexMapper” not present in the class name with the task number “1” and only present in the class name with the task number “2”, to generate the word candidate name. The input data conversion unit 101 forms the word-containing information by the word candidate group indicating the generated names.
The input data conversion unit 101 generates the same number of pieces of word-containing information as the number of input pieces of job feature information. The input data conversion unit 101 sets the task number of the input job feature information as the task number of the generated word-containing information corresponding to the job feature information.
The input data conversion unit 101 then sets all word candidates in each generated piece of word-containing information to False, as an initialization process (step S112).
Following this, the input data conversion unit 101 divides the program class name of the input job feature information into words (step S114). For example, the class name with the task number “1” is divided into the words “org”, “apache”, “mahout”, “clustering”, “iterator”, and “CIMapper”.
The delimiter or delimiter character when the input data conversion unit 101 divides the class name into words are, for example, set by the user, the system, or the like. Alternatively, the input data conversion unit 101 may hold the delimiter or delimiter character beforehand.
The input data conversion unit 101 then sets the word candidates in the word-containing information corresponding to the divided words, to True (step S116). The binary information “True” indicates that the set word candidate is included in the class name. The input data conversion unit 101 sets True for the number of divided words (step S117).
For example, in the case of the job feature information of the task number “1”, True is set in each of the word candidates “Class-org”, “Class-apache”, “Class-mahout”, “Class-clustering”, “Class-iterator”, and “Class-CIMapper” for which the corresponding words are present. Meanwhile, False remains to be set in each of the word candidates “Class-cf”, “Class-taste”, “Class-hadoop”, “Class-item”, and “Class-ItemIDIndexMapper” for which the corresponding words are not present.
The data conversion unit 101 may set information other than True in the corresponding word candidate, as long as it is clear that the word candidate is included in the program class name. For example, the input data conversion unit 101 may set the numerical value 1 in the corresponding word candidate, instead of True. In the case of setting the numerical value 1, the input data conversion unit 101 sets the numerical value 0 in each word candidate instead of False in the initialization process of step S112.
As a result of the input data conversion unit 101 setting True for the number of divided words (the determination condition in step S117 is met), the word-containing information corresponding to the input job feature information is generated. The input data conversion unit 101 repeatedly performs the process of steps S113 to S118 for the number of input pieces of job feature information.
After generating the word-containing information for the number of input pieces of job feature information (the determination condition in step S118 is met), the input data conversion unit 101 ends the generation process.
The following describes the effect of the information obtained as a result of the conversion as depicted in
By referencing the word-containing information corresponding to the set of tasks that differ in the relationship between task feature information and the amount of computer resources, the computer resources usage estimation unit 103 can classify the tasks included in the task set depending on whether or not a predetermined word set is included.
For example, the task corresponding to each piece of task feature information depicted in
Even if the computer resources usage estimation unit 103 has not recognized beforehand that the task executes the program implemented by Apache Mahout, the computer resources usage estimation unit 103 can recognize the tendency of the implementation of Apache Mahout by extracting the task group corresponding to the word-containing information whose word candidate “Class-mahout” is True in
The operation of the input data conversion unit 101 in this exemplary embodiment generating numerical inversion label information on the basis of one type of job feature information that includes an observation value during program execution and an option numerical value designated during program execution is described next, with reference to
When the job feature information depicted in
When forming the numerical inversion label information, the input data conversion unit 101 generates each label information name by, for example, prefixing the identifier of the generation source information. The input data conversion unit 101 may generate each label information name by any other method, as long as the generated name is uniquely identifiable.
The input data conversion unit 101 may set the job feature information whose value has been replaced, as the numerical inversion label information. The numerical inversion label information depicted in
Following this, the input data conversion unit 101 converts value v included in the job feature information into value v′ using function f (step S124). Function f used when the input data conversion unit 101 converts the value is, for example, set by the user, the system, or the like. Alternatively, the input data conversion unit 101 may hold function f beforehand.
The input data conversion unit 101 uses any mathematical function for function f. Function f used for the conversion into the value depicted in
The input data conversion unit 101 then sets the label information of the numerical inversion label information corresponding to value v, to converted value v′ (step S125). The input data conversion unit 101 performs the value conversion and the converted value setting for the number of conversion target values included in the job feature information (step S126).
For example, in the case of the job feature information of the task number “1” depicted in
For example, in the case of the numerical inversion label information of the task number “1” depicted in
As a result of the input data conversion unit 101 performing the value conversion and the converted value setting for the number of conversion target values included in the job feature information (the determination condition in step S126 is met), the numerical inversion label information corresponding to the input job feature information is generated. The input data conversion unit 101 repeatedly performs the process of steps S122 to S127 for the number of input pieces of job feature information.
After generating the numerical inversion label information for the number of input pieces of job feature information (the determination condition in step S127 is met), the input data conversion unit 101 ends the generation process.
The following describes the effect of the information obtained as a result of the conversion as depicted in
Accordingly, in the case of using the numerical inversion label information depicted in
For example, the naive Bayes algorithm handles input data as discrete values. When handling numerical information which is a continuous quantity, the naive Bayes algorithm interprets all values as discontinuous discrete values.
The operation of interpretation as discontinuous discrete values is not an operation that is supposed to be performed by the naive Bayes algorithm. In the case of interpreting the information as discontinuous discrete values, the naive Bayes algorithm performs overfitting or the like in the estimation process. Overfitting or the like degrades the accuracy of the amount of computer resources estimates by the naive Bayes algorithm.
The numerical inversion label information output from the input data conversion unit 101 in this exemplary embodiment includes the numerical value converted from a continuous quantity to a discrete quantity by function f, as label information. In the case where the numerical inversion label information including the label information is the input data, the computer resources usage estimation unit 103 can use an algorithm, such as the naive Bayes algorithm, that can only handle discrete values. The possibility that the computer resources usage estimation unit 103 can accurately estimate the amount of computer resources required for task processing using the naive Bayes algorithm is thus increased.
By adjusting function f, the input data conversion unit 101 can convert the distribution of the input data into another distribution. The conversion of the data distribution increases the possibility that the computer resources usage estimation unit 103 can classify data more clearly.
According to this exemplary embodiment, the amount of computer resources required for task processing in the distributed processing system are estimated accurately. By receiving the information output from the input data conversion unit 101 as input, the computer resources usage estimation model generation unit 102 can easily classify, for each estimation algorithm, the determinant of the format of function for computing the amount of computer resources. The classification of the determinant for each estimation algorithm corresponds to extracting the task group whose word candidate “Jobname-kmeans” is True or extracting the task group whose word candidate “Class-mahout” is True as mentioned above.
By receiving the classified determinant as input for generating the amount of computer resources estimation algorithm, the computer resources usage estimation model generation unit 102 can generate a function in a format close to the value distribution in task processing. The computer resources usage estimation unit 103 can enhance estimation accuracy by estimating computer resources usage using the function in a format close to the value distribution in task processing which has been generated by the computer resources usage estimation model generation unit 102.
[Structure]
The following describes Exemplary Embodiment 2 of the present invention with reference to drawings.
As depicted in
The estimate reverse conversion unit 104 has a function of reversely converting the value output from the computer resources usage estimation unit 103, into a computer resources usage estimate. The estimate reverse conversion unit 104 is, for example, realized by a CPU that executes processes according to program control.
In this exemplary embodiment, the computer resources usage estimation model generation unit 102 receives the data output from the input data conversion unit 101, and generates the estimation model. The computer resources usage estimation unit 103 receives the data output from the input data conversion unit 101, and outputs, in the same format as the received data, the value of computer resources usage of a task whose feature has not been recognized yet.
The estimate reverse conversion unit 104 converts the value indicating the computer resources usage estimate output from the computer resources usage estimation unit 103 into numerical information indicating the computer resources usage estimate, and outputs the numerical information. The use of the computer resources usage estimation device 100 in this exemplary embodiment enables the user, the distributed processing system scheduler, etc. to estimate the amount of computer resources required for task processing.
[Operation]
The following describes the operation of the input data conversion unit 101 and the operation of the estimate reverse conversion unit 104 in this exemplary embodiment, with reference to
The operation of the input data conversion unit 101 in this exemplary embodiment generating numerical inversion label information on the basis of one type of job feature information that includes computer resources usage observed during program execution is described first, with reference to
When the job feature information depicted in
When forming the numerical inversion label information, the input data conversion unit 101 generates each label information name by, for example, prefixing the identifier of the generation source information. The input data conversion unit 101 may generate each label information name by any other method, as long as the generated name is uniquely identifiable.
The input data conversion unit 101 may set the job feature information whose value has been replaced, as the numerical inversion label information. The numerical inversion label information depicted in
Following this, the input data conversion unit 101 converts value v included in the job feature information into value v′ using function f (step S204). Function f used when the input data conversion unit 101 converts the value is, for example, set by the user, the system, or the like. Alternatively, the input data conversion unit 101 may hold function f beforehand.
The input data conversion unit 101 uses any mathematical function for function f. Function f used for the conversion into the value depicted in
The input data conversion unit 101 then sets the label information of the numerical inversion label information corresponding to value v, to converted value v′ (step S205). The input data conversion unit 101 performs the value conversion and the converted value setting for the number of conversion target values included in the job feature information (step S206).
For example, in the case of the job feature information of the task number “1” depicted in
As a result of the input data conversion unit 101 performing the value conversion and the converted value setting for the number of conversion target values included in the job feature information (the determination condition in step S206 is met), the numerical inversion label information corresponding to the input job feature information is generated. The input data conversion unit 101 repeatedly performs the process of steps S202 to S207 for the number of input pieces of job feature information.
After generating the numerical inversion label information for the number of input pieces of job feature information (the determination condition in step S207 is met), the input data conversion unit 101 ends the generation process.
The input data conversion unit 101 outputs the generated numerical inversion label information to the computer resources usage estimation model generation unit 102 including the machine learning algorithm and the like. The computer resources usage estimation model generation unit 102 generates an estimation model for computing a memory usage estimate, using the received numerical inversion label information.
The operation of the estimate reverse conversion unit 104 in this exemplary embodiment reversely converting the output value of the estimation algorithm into estimated computer resources usage is described next, with reference to
As depicted in
As depicted in
The following describes the operation of the estimate reverse conversion unit 104 generating the estimated memory usage information depicted in
The estimate reverse conversion unit 104 feeds output value p′ included in the numerical inversion label information output from the estimation model, to inverse function f −1 of function f used in the conversion target value conversion process in step S204 in
The estimate reverse conversion unit 104 repeatedly performs the process of step S211 for the number of input pieces of numerical inversion label information. After generating estimated memory usage information for the number of input pieces of numerical inversion label information, the estimate reverse conversion unit 104 ends the process.
Thus, the computer resources usage estimation device 100 in this exemplary embodiment can convert the character string included in the numerical inversion label information output from the estimation model, into a computer resources usage estimate which is numerical information. By using the converted estimate, the distributed processing system can process the task faster or more efficiently. The use of the estimate increases the possibility that the amount of computer resources assigned to the process can be made to minimum required quantity.
For example, suppose the user sets to use 2 GB memory for all processes in the distributed processing system. With this setting, a computer with 4 GB memory can execute two processes in parallel. In the case where the memory used for a process is 1 GB, however, the setting means that 2 GB memory is unnecessarily assigned to the computer.
If it is possible to estimate that the memory required for the process is 1 GB, the user can perform setting so that the distributed processing system assigns four processes all at once to a computer with 4 GB memory. By executing the four processes in parallel, the distributed processing system can process the job at double speed, as compared with the aforementioned setting. Moreover, the unnecessarily assignment of 2 GB memory is avoided, which contributes to higher computer resources use efficiency than the aforementioned setting.
The following describes the effect of the information obtained as a result of the conversion as depicted in
The numerical inversion label information depicted in
Accordingly, in the case of using the numerical inversion label information depicted in
For example, the naive Bayes algorithm handles discrete values as an estimation target. When handling numerical information which is a continuous quantity as an estimation target, the naive Bayes algorithm interprets all values as discontinuous discrete values.
The operation of interpretation as discontinuous discrete values is not an operation that is supposed to be performed by the naive Bayes algorithm. In the case of interpreting the information as discontinuous discrete values, the naive Bayes algorithm performs overfitting or the like in the estimation process. Overfitting or the like degrades the accuracy of the estimate of the amount of computer resources by the naive Bayes algorithm.
The numerical inversion label information output from the input data conversion unit 101 in this exemplary embodiment includes the numerical value converted from a continuous quantity to a discrete quantity by function f, as the label information. In the case where the numerical inversion label information including the label information is an estimation target, the computer resources usage estimation unit 103 can use an algorithm, such as the naive Bayes algorithm, that can only handle discrete values as estimates. The possibility that the computer resources usage estimation unit 103 can accurately estimate the amount of computer resources required for task processing using the naive Bayes algorithm is thus increased.
By adjusting function f, the computer resources usage estimation device 100 can obtain an estimate of appropriate resolution. For example, the computer resources usage estimation device 100 can estimate a large estimate without being affected by a slight change, by using a logarithmic function as function f. This increases the possibility that the amount of computer resources is estimated to an appropriate degree in conformity with the status of the distributed processing system.
The following describes an overview of the present invention.
With such a structure, the output device can provide information in a format suitable for a model that estimates the amount of computer resources required for task processing in a distributed processing system.
The estimation model application information may include word-containing information having binary information that indicates whether or not a character string indicated by the character string information included in the job feature information includes a prescribed word.
With such a structure, the output device can provide information indicating whether or not a job name or a class name includes a prescribed word.
The estimation model application information may include numerical inversion label information having, as string label information, a value derived by converting, by a prescribed function, the numeric value indicated by the numerical information included in the job feature information.
With such a structure, the output device can provide information including string label information that can be easily handled by the estimation model.
The output device 10 may include a form conversion unit (for example, the estimate reverse conversion unit 104) for outputting the estimation model application information output from the estimation model, in a same format as the job feature information corresponding to the estimation model application information.
With such a structure, the output device can provide information of computer resources usage in a format desired by the user.
The output device 10 may include a computer resources estimation unit (for example, the computer resources usage estimation unit 103) for estimating the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, by feeding the estimation model application information output from the output unit 11 on the basis of the job feature information into the estimation model.
With such a structure, the output device can estimate computer resources usage on the basis of estimation model application information.
The output device 10 may include a computer resources estimation model generation unit (for example, the computer resources usage estimation model generation unit 102) for generating the estimation model for estimating the amount of computer resources required for processing the task included in the job corresponding to the job feature information in the distributed processing system, using the estimation model application information output from the output unit 11 on the basis of the job feature information.
With such a structure, the output device can generate a computer resources usage estimation model on the basis of estimation model application information.
With such a structure, the data structure can provide information in a format suitable for a model that estimates the amount of computer resources required for task processing in a distributed processing system.
The estimation model application information may include word-containing information having binary information that indicates whether or not a character string indicated by the character string information included in the job feature information includes a prescribed word.
With such a structure, the data structure can provide information indicating whether or not a job name or a class name include a prescribed word.
The estimation model application information may include numerical inversion label information having, as string label information, a value derived by converting, by a prescribed function, the numeric value indicated by the numerical information included in the job feature information.
With such a structure, the data structure can provide information including string label information that can be easily handled by the estimation model.
Although the present invention has been described with reference to the above exemplary embodiments and examples, the present invention is not limited to the above exemplary embodiments and examples. Various changes understandable by those skilled in the art can be made to the structures and details of the present invention within the scope of the present invention.
This application claims priority based on Japanese Patent Application No. 2015-010492 filed on Jan. 22, 2015, the disclosure of which is incorporated herein in its entirety.
10 output device
11 output unit
100 computer resources usage estimation device
101 input data conversion unit
102 computer resources usage estimation model generation unit
103 computer resources usage estimation unit
104 estimate reverse conversion unit
Number | Date | Country | Kind |
---|---|---|---|
2015-010492 | Jan 2015 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2015/006361 | 12/21/2015 | WO | 00 |