This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2018-081907, filed on Apr. 20, 2018, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a computer-readable recording medium, a machine learning method, and a machine learning apparatus.
To avoid situations where workers take an administrative leave of absence (have a medical treatment), a prediction is made for workers who may be in poor mental health a number of months later on the basis of worker attendance record data so that appropriate steps can be taken (e.g., offer counseling) at early stages. According to a commonly-used method, dedicated staff visually goes through data to look for workers corresponding to a certain working state having any of characteristic patterns that are characteristic to the occurrence of poor mental health, such as frequent business trips, long overtime hours, sudden consecutive absences, absences without notice, and a combination of any of these. It is difficult to clearly define these characteristic patterns, partly because different dedicated staff members use different criteria. In recent years, an endeavor has been made to mechanically make a prediction for the assessments to be made by the dedicated staff, by learning the characteristic patterns characterizing poor mental health, through a machine learning process that uses a decision tree, a random forest, a Support Vector Machine (SVM), or the like.
Patent Literature 1: Japanese Laid-open Patent Publication No. 2016-151979
According to an aspect of an embodiment, a non-transitory computer-readable recording medium stores therein a machine learning program that causes a computer to execute a process. The process includes receiving time-series data including a plurality of items and including a plurality of records corresponding to a calendar; generating tensor data, based on the time-series data, including a tensor which is set calendar information and each of the plurality of items as mutually-different dimensions; and with respect to a learning model that performs a tensor decomposition on input tensor data and that inputs a result of the tensor decomposition to a neural network, performing a deep learning process on the neural network and learning a method of the tensor decomposition by using the tensor data as the input tensor data.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
However, in commonly-used machine learning processes, when data is input to a machine learning model, the data is converted into a form compliant with the input format of the machine learning model. Accordingly, relationships that have not been recognized at the time of the data conversion are lost during the data conversion. As a result, the learning process is not properly performed.
More specifically, in commonly-used machine learning processes, feature vectors are created from worker attendance record data. However, a learning process and a predicting process are unfortunately performed while missing characteristics of a calendar which the worker attendance record data has, such as attributes and relevance of elements of the worker attendance record data. Next, problems of a commonly-used machine learning process will be explained, with reference to
As illustrated in
As explained above, the data format of the training data is simply vector information and does not have attribute information of the elements of the vectors. It is therefore not possible to distinguish which values correspond to attendance/absence information and which values correspond to business trip information. For this reason, as illustrated in
For example, as illustrated in
Preferred embodiments will be explained with reference to accompanying drawings. The present disclosure is not limited to the exemplary embodiments. Further, it is possible to combine any of the embodiments together as appropriate, as long as no conflict occurs.
More specifically, the learning apparatus 100 generates the learning model by using a deep tensor that implements a Deep Learning (DL) process on data having a graph structure while using, as supervised data, worker attendance record data of one or more workers who were in poor health and took an administrative leave of absence (labeled as “a medical treatment: Yes”) and worker attendance record data of one or more workers who are in normal health and has not taken an administrative leave of absence (labeled as “a medical treatment: No”). After that, by using the learning model to which results of the learning process are applied, estimation of accurate events (labels) is realized with data having a new graph structure.
For example, the learning apparatus 100 receives worker attendance record data including a plurality of items and including a plurality of records corresponding to a calendar. From the worker attendance record input data, the learning apparatus 100 generates tensor data by creating a tensor while using calendar information and each of the plurality of items as mutually-different dimensions. Further, with respect to a learning model that performs a tensor decomposition by using the tensor data as an input and that further inputs a result of the classification process to a neural network, the learning apparatus 100 performs a deep learning process on the neural network and learns a method of the tensor decomposition. In this manner, the learning apparatus 100 generates the learning model that classifies data into categories of “a medical treatment: Yes” and “a medical treatment: No” from the tensor data of the worker attendance record data.
After that, the learning apparatus 100 generates tensor data by similarly creating a tensor from the worker attendance record data of a worker subject to the assessment and inputs the generated tensor data to the learning model resulting from the learning process. Further, the learning apparatus 100 outputs an output value indicating a prediction result as to whether the worker will be classified as “a medical treatment: Yes” or “a medical treatment: No”.
Next, the deep tensor will be explained. The deep tensor is a deep learning scheme that uses a tensor (graph information) as an input. The deep tensor is designed to learn a neural network and to automatically extract a partial graph structure that will contribute to the assessment. The extracting process is realized by learning the neural network and learning parameters of the tensor decomposition performed on input tensor data.
Next, the graph structure will be explained with reference to
The process of extracting the partial graph structure described above is realized with a mathematical calculation called a tensor decomposition. The tensor decomposition is a calculation to approximate an n-th order tensor that has been input by using a product of tensors of n-th or lower order. For example, the n-th order tensor that has been input is approximated by using a product of one n-th order tensor (called a core tensor) and tensors of which the quantity is equal to n and which are of lower order (which are usually second-order tensors (i.e., matrices) when n>2 is true). This classification process is not in one-to-one correspondence, and it is possible to arrange the core tensor to include an arbitrary partial graph structure of the graph structure expressed by the input data.
Next, a learning process using a deep tensor will be explained.
In this situation, the learning apparatus 100 performs a learning process on a prediction model, by using an extended error propagation method obtained by extending an error backpropagation method. In other words, the learning apparatus 100 corrects various types of parameters of the NN so as to minimize the classification error by propagating the classification error to lower layers, with respect to an input layer, an intermediate layer, and an output layer of the NN. Further, the learning apparatus 100 propagates the classification error up to the target core tensor and further corrects the target core tensor so as to be approximated to a partial structure of the graph that will contribute to the predicting process, i.e., either a characteristic pattern indicating characteristics of workers who took an administrative leave of absence or a characteristic pattern indicating characteristics of workers who are in normal health. With this arrangement, in the optimized target core tensor, a partial pattern that will contribute to the predicting process has been extracted.
During the predicting process, it is possible to obtain a prediction result, by converting the input tensor into the core tensor (the partial pattern of the input tensor) by performing a tensor decomposition and further inputting the core tensor to the neural network. During the tensor decomposition, the core tensor is converted so as to be similar to the target core tensor. In other words, the core tensor having the partial pattern that will contribute to the predicting process is extracted.
A Functional Configuration
The communicating unit 101 is a processing unit that controls communication with other apparatuses and may be, for example, a communication interface. For instance, the communicating unit 101 receives an instruction to start a process, the worker attendance record data, and the like, from a terminal device of an administrator. Further, the communicating unit 101 outputs a result of a learning process, a result of a predicting process performed after the learning process, and the like, to the terminal device of the administrator.
The storage unit 102 is an example of a storage device storing therein computer programs (hereinafter “programs”) and data and may be configured by using a memory or a hard disk, for example. The storage unit 102 stores therein a worker attendance record data database (DB) 103, a tensor DB 104, a learned result DB 105, and a prediction target DB 106.
The worker attendance record data DB 103 is a database storing therein the worker attendance record data that is related to attendance of workers and the like and has been input thereto by a user or the like. The worker attendance record data DB 103 is an example of the time-series data. The worker attendance record data stored in the present example includes a plurality of items and includes a plurality of records corresponding to a calendar. Further, the worker attendance record data is obtained by expressing worker attendance records used in corporations in the form of data and may be obtained from any of various types of publicly-known worker attendance management systems or the like.
For the item “attendance/absence”, for example, values indicating “attended”, “a medical treatment (an administrative leave of absence)”, “a saved-up vacation”, and “a paid vacation”, and the like may be set. For the item “a business trip: Yes/No”, values indicating whether or not the worker had a business trip may be set, so as to store therein one of the values corresponding to either “a business trip: Yes” or “a business trip: No”. In this situation, these values may be distinguished from one another by using numerical values. For example, it is possible to indicate the distinction as follows: “attended=0”; “a medical treatment=1” “a saved-up vacation=2”; and “a paid vacation=3”. Further, as for the units of the records in the worker attendance record data corresponding to the calendar, the records do not necessarily have to be in units of days and may be in units of weeks or months. Further, to accommodate situations where workers are allowed to take vacations in units of hours, it is also acceptable to set a value “an hourly vacation =4”. Further, the items serve as an example of the attributes.
The worker attendance record data stored in the present example is learning data, and a supervised label is appended thereto.
The tensor DB 104 is a database storing therein the tensors (the tensor data) generated from the worker attendance record data of the workers. The tensor DB 104 stores therein training data in which the tensors and the labels are kept in correspondence with each other. For example, as sets each made up of “tensor data and a label”, the tensor DB 104 stores therein “tensor data 1, the label ‘a medical treatment: Yes’”, “tensor data 2, the label ‘a medical treatment: No’”, and so on.
The items of the records in the learning data and the settings of the labels for the tensor data described above are merely examples. Besides the values and the labels indicating “a medical treatment: Yes” and “a medical treatment: No”, it is possible to use any of various values and labels capable of distinguishing whether workers in poor health are present or not, such as “a worker in poor health” and “a worker in normal health”; or “an administrative leave of absence: Yes” and “an administrative leave of absence: No”; or the like.
The learned result DB 105 is a database storing therein results of the learning process. For example, the learned result DB 105 stores therein assessment results (classification results) from the learning data obtained by the controlling unit 110, as well as the various types of parameters of the NN and the various types of parameters of the deep tensor that were learned through the machine learning process and the deep learning process.
The prediction target DB 106 is a database storing therein the worker attendance record data on which a prediction is to be made as to whether an administrative leave of absence will occur or not, by using the prediction model that was learned. For example, the prediction target DB 106 stores therein the worker attendance record data on which the prediction is to be made, and/or the tensor data generated from the worker attendance record data on which the prediction is to be made.
The controlling unit 110 is a processing unit that controls processes performed in the entirety of the learning apparatus 100 and may be, for example, configured by using a processor or the like. The controlling unit 110 includes a tensor generating unit 111, a learning unit 112, and a predicting unit 113. The tensor generating unit 111, the learning unit 112, and the predicting unit 113 are examples of one or more electronic circuits included in the processor or the like; or examples of processes executed by the processor or the like.
The tensor generating unit 111 is a processing unit that generates the tensor data obtained by creating tensors from the pieces of worker attendance record data.
More specifically, the tensor generating unit 111 generates the tensors from the worker attendance record data, by using such items that are expected to characterize a tendency of taking an administrative leave of absence (e.g., frequent business trips, long overtime hours, sudden consecutive absences, absences without notice, and a combination of any of these), as the mutually-different dimensions. For example, the tensor generating unit 111 generates a four-dimensional fourth-order tensor by using four elements representing the months, the dates, “attendance/absence”, and “a business trip: Yes/No”. When data from four months is used, the number of elements for the months is “4”. The number of elements for the dates is “31”, because the maximum value for the days in a month is 31. The number of elements for “attendance/absence” is “3” because possible options for attendance and absences are “attended”, “a vacation”, and “a non-business day”. The number of elements for “a business trip: Yes/No” is “2”, because the options are “a business trip: Yes” and “a business trip: No”. Accordingly, the tensor generated from the worker attendance record data is a “4×31×3×2” tensor. The value of each of the elements that correspond to “attendance/absence” and “a business trip: Yes/No” for each of the months and the dates in the worker attendance record data is “1”, whereas the value of each of the elements that do not correspond thereto is “0”. In this situation, it is possible to arbitrarily select the items used as the dimensions of the tensor. It is also possible to determine the items from past examples or the like.
In the present embodiment, the tensors explained above are simplified and expressed as illustrated in
The learning unit 112 is a processing unit that performs the learning process on the learning model by using the deep tensor scheme, while using the pieces of tensor data and the labels generated from the worker attendance record data as an input. More specifically, similarly to the method explained with reference to
The predicting unit 113 is a processing unit that predicts a label of each of the pieces of data subject to the assessment, by using the results of the learning process. More specifically, the predicting unit 113 reads the various types of parameters from the learned result DB 105 and constructs a deep tensor including the neural network or the like in which the various types of parameters are set. Further, the predicting unit 113 reads a piece of worker attendance record data on which a prediction is to be made from the prediction target DB 106, creates a tensor from the read data, and inputs the created tensor into the deep tensor. Subsequently, the predicting unit 113 outputs a result of the predicting process indicating “a medical treatment: Yes” or “a medical treatment: No”. Further, the predicting unit 113 displays the result of the predicting process on a display device or transmits the result of the predicting process to the administrator terminal device.
A Flow in the Process
Next, a flow in the learning process will be explained.
Subsequently, when the worker corresponding to the piece of worker attendance record data is a worker who took an administrative leave of absence (step S104: Yes), the tensor generating unit 111 appends the label “a medical treatment: Yes” (step S105). On the contrary, when the worker corresponding to the piece of worker attendance record data is a worker who did not take an administrative leave of absence (step S104: No), the tensor generating unit 111 appends the label “a medical treatment: No” (step S106).
After that, when the process of creating tensors from the pieces of worker attendance record data has not been finished, and there is at least one piece of worker attendance record data that has not been processed (step S107: No), the processes at step S102 and thereafter are repeatedly performed. On the contrary, when the process of creating tensors from the pieces of worker attendance record data has been finished (step S107: Yes), the learning unit 112 performs the learning process by using the pieces of tensor data resulting from the tensor creating process (step S108).
Advantageous Effects
As explained above, the learning apparatus 100 is able to perform the machine learning process that takes into consideration the relationships among the plurality of items included in the data subject to the learning process. For example, the learning apparatus 100 is capable of performing the learning process and the predicting process without losing the characteristics of the calendar. It is therefore possible to improve the level of precision of the predicting process.
Accordingly, even when a partial pattern that will affect the predicting process and is extracted as indicated with (a) in
The exemplary embodiments of the present disclosure have thus been explained. However, it is possible to carry out the present disclosure in various different modes other than those in the embodiments described above.
The Learning Process
The learning process described above may be performed as many times as arbitrarily determined. For example, it is acceptable to perform the learning process by using all the pieces of training data. Alternatively, it is also acceptable to perform the learning process only a predetermined number of times. Further, as for the method for calculating the classification error, it is acceptable to use a publicly-known calculation method such as a least squares method. Alternatively, it is also acceptable to use a commonly-used calculation method applied to neural networks. Further, an example of the learning model corresponds to learning a weight or the like of the neural network by inputting the tensors to the neural network, so as to be able to classify events (e.g., “a medical treatment: Yes” and “a medical treatment: No”) by using the learning data.
Further, in the above explanation, the worker attendance record data for the sixth months is used as an example of the data used for the predicting process. However, possible embodiments are not limited to this example. It is acceptable to arbitrarily change the time period to four months or the like. Further, the example is explained in which, with respect to the worker attendance record data for the six months, the label is appended depending on whether or not the worker took an administrative leave of absence within the following three months. However, possible embodiments are not limited to this example. It is acceptable to arbitrarily change the time period to within the following two months, or the like. Further, the tensor data does not necessarily have to be four-dimensional. It is possible to generate tensor data of less than four-dimensional or five or more dimensional. Further, besides the worker attendance record data, it is possible to use data in any format as long as the data is attendance/absence data exhibiting the status of workers arriving at work, leaving work, taking vacations, and the like. Further, it is also possible to use pieces of data starting at mutually-different times such as a piece of worker attendance record data from January to June and another piece of worker attendance record data from February to July.
The Neural Network
In the present embodiment, it is possible to use any of various types of neural networks such as a Recurrent Neural Network (RNN) or a Convolutional Neural Network (CNN). Further, as for the method used in the learning process, it is possible to use any of various publicly-known methods, besides the error backpropagation method. Incidentally, neural networks have a multi-layer structure including, for example, an input layer, an intermediate layer (a hidden layer), and an output layer, while each of the layers has a structure in which a plurality of nodes are connected by edges. Each of the layers has a function called an “activation function”, while each of the edges has a “weight”. The value of each of the nodes is calculated from the values of nodes in the preceding layer, the values of the weights (weight coefficients) of the connecting edges, and the activation function of the layer. As for the calculation method, it is possible to use any of various publicly-known methods.
Further, a learning process in a neural network is a process of correcting parameters (i.e., weights and biases) so that an output layer has a correct value. According to the error backpropagation method, a loss function indicating how much the value of the output layer is different from that in the correct state (a desirable state) is defined with respect to a neural network, so as to update the weights and biases in such a manner that the loss function is minimized, while using a steepest descent method or the like.
A System
Unless noted otherwise, it is acceptable to arbitrarily modify any of the processing procedures, the controlling procedures, specific names, and various information including various types of data and parameters that are presented in the above text and the drawings. Further, the specific examples, the distributions, and the numerical values explained in the embodiments are merely examples and may arbitrarily be modified.
The constituent elements of the apparatuses and the devices illustrated in the drawings are based on functional concepts. Thus, there is no need to physically configure the constituent elements as indicated in the drawings. In other words, the specific modes of distribution and integration of the apparatuses and the devices are not limited to those illustrated in the drawings. It is acceptable to functionally or physically distribute or integrate all or a part of the apparatuses and the devices in any arbitrary units, depending on various loads and the status of use. Further, all or an arbitrary part of the processing functions performed by the apparatuses and the devices may be realized by a CPU and a program analyzed and executed by the CPU or may be realized as hardware using wired logic.
Hardware
The communication device 100a is realized by using a network interface card or the like and communicates with another server. The HDD 100b stores therein a program and databases that bring the functions illustrated in
The processor 100d brings into operation the processes that implement the functions explained with reference to
In this manner, the learning apparatus 100 operates as an information processing apparatus that implements the learning method by reading and executing the program. Further, the learning apparatus 100 is also capable of realizing the same functions as those described in the above embodiments, by reading the program from a recording medium while using a medium reading device and executing the read program. In this situation, the program referred to in the present alternative embodiment does not necessarily have to be executed by the learning apparatus 100. For example, the present disclosure is similarly applicable to situations where the program is executed by another computer or a server or where the program is executed by collaboration of one or more computers and/or one or more servers.
It is possible to distribute the program via a network such as the Internet. Further, the program may be recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a Compact Disk Read-Only Memory (CD-ROM), a Magneto-Optical (MO) disk, a Digital Versatile Disk (DVD), or the like so as to be executed as being read from the recording medium by a computer.
According to one aspect of the embodiments, it is possible to implement the machine learning process while taking into consideration the relationships among the plurality of attributes included in the data subject to the learning process.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2018-081907 | Apr 2018 | JP | national |