This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2018-081898, filed on Apr. 20, 2018, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein is related to a machine learning method, a machine learning device, and a computer-readable recording medium.
Predicting mentally unwell conditions of employees in a few months later based on their attendance records data and taking some actions such as counseling in earlier stages to present them from taking a suspension of work (sick leave) has been performed. Generally, dedicated staff members perform a visual check to find an employee who falls on work conditions with feature patterns such as frequent business trips, long overtime, repeated sudden absences, absence without notice, and a combination of these patterns. It is difficult to clearly define these feature patterns, because such dedicated staff members may individually have their own standards. In recent years, machine learning using a decision tree, random forest, SVM (Support Vector Machine), or the like has been performed to learn feature patterns specific to mentally unwell conditions and to automatically provide a prediction, which has been decided by the dedicated staff members. Examples of related art are described in Japanese Laid-open Patent Publication No. 2007-156721 and Japanese Laid-open Patent Publication No. 2006-163521.
Machine learning, however, requests at least a certain number of pieces of learning data. Unwell persons account for about 2 to 3% of an organization, and thus it is difficult to collect a satisfactory number of pieces of data for learning. Accordingly, it is difficult to increase the accuracy of learning
For general machine learning, inputting a feature vector with a fixed length is a prerequisite. One simple vector representation method of attendance record is to arrange daily attendance status in chronological order. Learning data is generated vectorizing daily statuses in attendance record data in the order of arrows in the attendance record data.
In this manner, a data format of learning data simply provides vector information, but does not provide attribute information on each element of the vector. Accordingly, it is not possible to distinguish which value represents attendance/absence information or which value represents business trip information. Generating a plurality of pieces of learning data from an attendance record data does not always increase the number of feature patterns of an unwell person because the relations among the attributes are unclear. By contrast, a plurality of feature patterns are possibly provided to an unwell person. This causes overfitting and degrades the accuracy of learning.
According to an aspect of an embodiment, a non-transitory computer-readable recording medium stores therein a machine learning program that causes a computer to execute a process including: generating pieces of learning data based on time series data including a plurality of items and including a plurality of records corresponding to a calendar, each of the pieces of learning data being learning data of a certain period, the certain period being composed of a plurality of unit periods, start times of the certain period of each of the pieces of learning data being different from each other for the unit period, in which each of the pieces of the learning data and a label corresponding to the start time are paired; generating, based on the generated learning data, tensor data in which a tensor is created with calendar information and the plurality of items having different dimensions; and performing deep learning of a neural network and learning of a method of tensor decomposition with respect to a learning model in which the tensor data is subjected to the tensor decomposition as input tensor data to be inputted to the neural network.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Preferred embodiments will be explained with reference to accompanying drawings. This embodiment is, however, not intended to limit the scope of the present invention in any way. Moreover, it is possible to combine the embodiments one another as appropriate within a scope without inconsistency.
To be specific, the learning device 100 generates a learning model using Deep Tensor (registered trademark) that performs deep learning (DL) on graph-structured data with attendance record data (label: with sick leave) of an unwell person who took sick leave (suspension of work) and attendance record data (label: without sick leave) of a person who has not taken sick leave (suspension of work) as supervised data. Then, the learning device 100 uses the learning model to which learning findings are applied to implement inference for an accurate event (label) with respect to a new graph-structured data.
For example, the learning device 100 generates a plurality of pieces of learning data (supervised data) from time series data having a plurality of items and having a plurality of records corresponding to a calendar, each piece of learning data being a certain period of data, the certain period being composed of a plurality of unit periods, start times of the certain period of data being different from each other for the unit period, in which each piece of the certain period of data and a label corresponding to the start time thereof are paired. Then, the learning device 100 generates, from the generated learning data, tensor data in which a tensor is created with calendar information and the items having different dimensions. After that, the learning device 100 performs deep learning of a neural network and learning of a method of tensor decomposition with respect to a learning model in which the tensor data is subjected to the tensor decomposition as input tensor data, so as to be input to the neural network. In this manner, the learning device 100 generates a plurality of pieces of learning data from the attendance record data and generates a learning model to provide classification into “take sick leave” and “not take sick leave” based on the tensor data of each piece of learning data.
After that, the learning device 100 generates tensor data by similarly creating a tensor from the attendance record data of an employee who is to be determined and inputs the tensor data to the learned learning model. The learning device 100 outputs an output value representing prediction results whether the target employee “takes sick leave” or “not take sick leave.”
The following explains Deep Tensor. Deep Tensor is a deep learning with a tensor (graphic information) as input. Deep Tensor automatically extracts a partial graph structure that contributes to a determination, together with learning a neural network. This extraction processing is provided by learning parameters for tensor decomposition of input tensor data together with learning a neural network.
Next, the following explains a graph structure with reference to
Such processing to extract a partial graph structure is implemented by a mathematical operation referred to as tensor decomposition. Tensor decomposition is an operation to approximate the input n-th order tensor with a product with the n-th or lower order tensor. For example, the input n-th order tensor is approximated with a product of the one n-th order tensor (referred to as a core tensor), and n tensors the order of which is lower than n-th (when n>2, normally the second order tensor, or matrix, is used). This decomposition is not unique, but any desired partial graph structure in the graph structure represented by the input data is able to be included in the core tensor.
The following explains learning of Deep Tensor.
The learning device 100 executes learning of a prediction model using the expanded error propagation method obtained by expanding the error back-propagation method. That is, the learning device 100 corrects various kinds of parameters of NN so that the classification error becomes smaller as the classification error is propagated toward lower layers through an input layer, an intermediate layer, and an output layer included in the NN. Furthermore, the learning device 100 causes the classification error to be propagated to the target core tensor to correct the target core tensor so as to be closer to a partial graph structure that contributes to prediction, that is, a feature pattern representing a feature that a person who takes suspension of work or a feature pattern representing a feature that a person who has not taken sick leave has. This correction allows an optimized target core tensor to have a partial pattern that contributes to the prediction extracted thereto.
When a prediction is performed, an input tensor is converted to a core tensor (a partial pattern of the input tensor) by tensor decomposition and the core tensor is input to a neural network, and thereby prediction results are obtained. The tensor decomposition allows the core tensor to be converted so as to analogize to the target core tensor. That is, a core tensor having a partial pattern that contributes to prediction is extracted.
The communicating unit 101 is a processing unit that controls communication with other devices, which provides a communication interface, for example. For example, the communicating unit 101 receives a process start instruction, attendance record data, and the like, from the terminal of an administrator. The communication unit 101 outputs learning results, prediction results of prediction target data, and the like, to the administrator terminal.
The storage unit 102 exemplifies a storage device that stores therein a computer program and data, such as a memory and a hard disk. This storage unit 102 stores therein an attendance record data DB 103, a learning data DB 104, a tensor DB 105, a learning result DB 106, and a prediction target DB 107.
The attendance record data DB 103 is a database that stores therein attendance record data concerning the attendance of an employee or the like input by a user or the like, and exemplifies time series data. The attendance record data stored here is composed of a plurality of items and contains a plurality of records corresponding to the calendar. Furthermore, the attendance record data is made by data organization based on attendance records used in respective companies, and is able to be obtained from various kinds of well-known attendance management systems or the like.
It is noted that to the classification item of attendance/absence, a value corresponding to any one of items such as coming to the office, sick leave (suspension of work), accumulated holiday, and paid holiday is set. For the item of with/without business trip, a value of with/without a business trip is set, and the value corresponding to with or without taking a business trip are stored therein. It is noted that the above-described values are able to be distinguished with numbers or the like. For example, it is possible to distinguish the values in such a manner as attendance=0, sick leave=1, accumulated holiday=2, and paid holiday=3, for example. It is noted that a record unit corresponding to the calendar of the attendance data record may be not only a daily unit but also a weekly or a monthly unit. Moreover, in accordance with a case where it is possible to take leave in an hourly unit, a value of hourly leave=4 may be set.
A learning data DB 104 is a database that stores therein learning data from which tensor is created. Specifically, the learning data DB 104 stores therein a plurality of pieces of learning data in which a certain period of data having a different start time in the attendance record data and a label corresponding to the start time are paired. For example, the learning data DB 104 stores therein “learning data a, label (without sick leave),” “learning data b, label (with sick leave),” and the like, as “data, label.” It is noted that the learning data will be explained later.
The tensor DB 105 is a database that stores therein a tensor (tensor data) generated from each piece of the learning data. This tensor DB 105 stores therein training data in which tensors and labels are associated with each other. For example, the tensor DB 105 stores therein “tensor data a, label (with sick leave),” “tensor data b, label (without sick leave),” and the like, as “tensor data, label.” It is noted that the label that the tensor DB 105 stores therein is a label that is associated with the learning data from which the tensor is generated.
It is noted that settings of record items and tensor data labels in the above-described learning data are merely examples, and not limited to values and labels such as “with sick leave” and “without sick leave.” It is also possible to use various types of values and labels such as “a person who takes suspension of work” and “a person who has not taken suspension of work,” and “with a suspension of work” and “without a suspension of work,” which are able to distinguish the existence of an unwell person.
The learning result DB 106 is a database that stores therein a learning result. For example, the learning result DB 106 stores a determination result (classification result) of learning data by the control unit 110, various parameters of NN and various parameters of Deep Tensor learned through machine learning or deep learning, and the like.
The prediction target DB 107 is a database that stores therein attendance record data of a target for which the existence of sick leave (suspension of work) is predicted using a learned prediction model. For example, the prediction target DB 107 stores therein attendance record data of a prediction target, tensor data generated from the attendance record data of the prediction target, and the like.
The control unit 110 is a processing unit that manages the whole processing of the learning device 100, and is a processor, for example. This control unit 110 includes a learning data generator 111, a tensor generator 112, a learning unit 113, and a prediction unit 114. It is noted that the learning data generator 111, the tensor generator 112, the learning unit 113, and the prediction unit 114 exemplify a process that is executed by an electronic circuit included in the processor or the like, or the processor or the like.
The learning data generator 111 is a processing unit that generates a plurality of pieces of learning data from pieces of attendance record data stored in the attendance record data DB 103, each piece of learning data being a certain period of data, the certain period being composed of a plurality of unit periods, start times of the certain period of data being different from each other for the unit period, in which each piece of the certain period of data and a label corresponding to the start time thereof are paired. Specifically, the learning data generator 111 samples data for a specified period from the attendance record data of one person, with overlapping allowed. For example, the learning data generator 111 extracts, from each piece of attendance record data, a plurality of pieces of data having different beginnings of periods (start times), and sets a label “with sick leave” when sick leave period (suspension of work period) exists within three months after the end time for each piece of data or a label “without sick leave” when sick leave period (suspension of work period) does not exist within three months after the end period therefor.
Subsequently, the learning data generator 111 extracts data 1b for six months from May to October by shifting the start time by 30 days (one month) from April. Then, the learning data generator 111 determines the label as “with sick leave” because “sick leave” occurred in January out of November, December, and January, which are within three months from October. As a result, the learning data generator 111 stores “data 1b, label (with sick leave)” in the learning data DB 104.
Next, the learning data generator 111 extracts data 1c for six months from June to November by shifting the start time by 30 days (one month) from May. Then, the learning data generator 111 determines the label as “with sick leave” because “sick leave” occurred in January out of December, January, and February, which are within three months from November. As a result, the learning data generator 111 stores “data 1c, label (with sick leave)” in the learning data DB 104.
Finally, the learning data generator 111 extracts data 1d for six months from July to December by shifting the start time by 30 days (one month) from June. Then, the learning data generator 111 determines the label as “(with sick leave)” because “sickness” occurred in January and March out of January, February, and March, which are within three months from December. As a result, the learning data generator 111 stores “data 1d, label (with sick leave)” in the learning data DB 104.
In this manner, the learning data generator 111 is capable of generating maximum four samples of learning data from a one-year attendance record for one person. It is noted that the learning data generator 111 is capable of generating maximum 12 pieces of learning data from attendance record data of one person when sampling is performed with attendance record data for six months as one sample by shifting the start time by ten days.
The tensor generator 112 is a processing unit that generates tensor data in which a tensor is created from each piece of learning data. The tensor generator 112 creates a tensor with calendar information, and each of the items of “month, date, attendance/absence, with/without business trip, attendance time, and leave time” included in each piece of attendance record data as a dimension. The tensor generator 112 stores the created tensor (tensor data) associated with a label that is attached by the learning data generator 111 to the learning data from which the tensor is created, in the tensor DB 105. Learning is executed by Deep Tensor with the generated tensor data as input. It is noted that Deep Tensor, during learning, extracts a target core tensor that identifies a partial pattern of learning data having an influence on prediction, and executes the prediction based on the extracted target core tensor when the prediction is performed.
Specifically, the tensor generator 112 generates a tensor from learning data with items that are assumed to characterize tendency of taking sick leave, such as frequent business trips, long overtime, repeated sudden absences, absence without notice, frequent holiday works, and a combination of any of these items, as dimensions. For example, the tensor generator 112 generates a fourth order tensor of four dimensions using four elements of month, date, attendance/absence, and with/without business trip. When four months of data are used, an element count for month is “4,” an element count of date is “31” based on the fact that the maximum number of days of a month is 31, an element count of attendance/absence is “3” based on the fact that types of attendance/absence are coming to the office, leave, and holiday, an element count of with/without business trip is “2” based on the fact that business trip is done or not done. Thus, a tensor generated from learning data is a tensor of “4×31×3×2” and a value of an element corresponding to an item among attendance/absence and with/without business trips in the months and dates in the learning data is 1, and a value of an element not corresponding to any of those items is 0. Any desired item is selectable as a dimension for a tensor or is determinable based on the past event.
In the first embodiment, the above-described tensor is simplified to be described as in
The learning unit 113 is a processing unit that performs deep learning of the neural network and learning of a method of tensor decomposition with respect to a learning model in which the tensor data is subjected to tensor decomposition as input tensor data, so as to be input to the neural network (NN). That is, the learning unit 113 executes learning of the learning model by Deep Tensor with the tensor data generated from each piece of the learning data and the label as input.
Specifically, similarly to the method explained in
The prediction unit 114 is a processing unit that predicts, using the learning results, a label of data that is to be determined. Specifically, the prediction unit 114 reads out the various kinds of parameters from the learning result DB 106, and builds Deep Tensor including the neural network in which the various kinds of parameters are set, and the like. Then, the prediction unit 114 reads out the attendance record data of a prediction target from the prediction target DB 107 to create a tensor therefrom, and inputs the created tensor to Deep Tensor. After that, the prediction unit 114 outputs a prediction result indicating with or without sick leave. The prediction unit 114, then displays the prediction result on a display or transmits the prediction result to an administrator terminal. It is noted that the attendance record data of a prediction target may be input as it is or may be input with the data divided every six months.
Processing Flow
The following explains a flow of the learning process.
Then, when data containing “sick leave” within three months is sampled (S104: Yes), the learning data generator 111 attaches a label of “with sick leave” to the sampled data (S105). By contrast, when data not containing “sick leave” within three months is sampled (S104: No), the learning data generator 111 attaches a label of “without sick leave” to the sampled data (S106).
When continuing sampling (S107: Yes), the learning data generator 111 samples data corresponding to the next start time (S108), and executes S104 and the subsequent steps. By contrast, when closing the sampling (S107: No), the learning data generator 111 determines whether there is any unprocessed attendance record data (S109).
When there is unprocessed attendance record data (S109: Yes), the learning data generator 111 repeats S102 and the subsequent steps to the following attendance record data. By contrast, when there is not any unprocessed attendance record data (S109: No), the tensor generator 112 executes creating a tensor from a piece of the learning data stored in the learning data DB 104 to create tensors (S110), and the learning unit 113 executes, using the tensors and labels stored in the tensor DB 105, learning process (S111).
For an organization of 1,000 persons containing approximately 30 unwell persons, the number of samples for the unwell persons is no more than 30 when one sample is taken from attendance record data for each person of the organization. However, as described above, the learning device 100 according to the first embodiment is able to generate maximum 120 samples of learning data of the unwell persons by shifting the start time by 30 days. Furthermore, the learning device 100 is able to generate maximum 360 samples of learning data for the unwell persons by shifting the start time by ten days.
Thus, the learning device 100 is able to secure a sufficient number of samples for learning, thereby executing learning by Deep Tensor and improving the accuracy of learning. Moreover, in a case where an unwell condition prediction model is newly built, as opposed to using the learned model, for such a reason that a different item is processed in the attendance record data, even a small organization is able to build an unwell condition prediction model by applying the method according to the first embodiment.
The following explains an example of increasing learning data in number by applying the method according to the first embodiment to general machine learning.
In the general machine learning, elements the feature vectors of which are in the same position are learned to have the same attribute (
By contrast, the following explains an example using the method according to the first embodiment under the same condition.
The learning device 100 according to the first embodiment thus generates a plurality of pieces of learning data by using Deep Tensor (core tensor) changing a range from which the original data is taken. As a result, it is possible to collect the number of pieces of data needed for learning, thereby improving the accuracy of learning.
The following explains simulation results of Deep Tensor and the general machine learning.
For each of Deep Tensor (A), Deep Tensor (B), decision tree (A), and decision tree (B), comparison is made to accuracy (accuracy rate), precision (relative factor), recall (recall factor), and F-measure, each serving as an index of the accuracy of learning. Deep Tensor (A) provides results of executing learning by Deep Tensor using 290 samples without increasing the number of samples. Deep Tensor (B) provides results of executing learning by Deep Tensor with the number of samples in the method of the first embodiment increased to 1,010. Decision tree (A) provides results of executing learning by decision tree using 290 samples without increasing samples. Decision tree (B) provides results of executing learning by decision tree with the number of samples in the method of the first embodiment increased to 1,010.
In the results of
Although the embodiments of the present invention have been explained, the present invention may be implemented in various kinds of different aspects in addition to the above-described embodiments.
The above-described learning process may be executed for any desired number of times. For example, the learning process may be executed using all pieces of training data, or may be executed for a certain number of times. Furthermore, as a method for calculating a classification error, a known calculation method such as the least square method may be employed, or a general calculation method used in NN may be employed. It is noted that learning weight or the like of a neural network by inputting tensor data to the neural network so as to be able to classify an event (for example, with sick leave and without sick leave), using learning data, corresponds to an example of a learning model.
While the explanation is made with attendance record data for six months as example data used for prediction, it is not limited thereto, but may be optionally changed to attendance record data for four months or the like. Moreover, while the explanation is made to the example in which a label is attached to attendance record data for six months depending on whether sickness leave (suspension of work) is taken within three months after the end time thereof, it is not limited thereto, but may be optionally changed to within two months or the like. The order of tensor data is not limited to fourth order, and tensor data below the fourth order may be generated, or tensor data of a fifth order or more may be generated.
Not only attendance record data but also any other format of data may be used as far as it provides conditions of employees or the like, such as coming to the office, leaving the office, and taking leave. In addition, the start time may be set at any desired point of attendance data, without being limited to the top of the attendance data.
In the second embodiment, various kinds of neural networks such as RNN (Recurrent Neural Networks) and CNN (Convolutional Neural Network) may be used. For a method of learning, various kinds of known methods may be employed in addition to the error back-propagation method. A neural network has a multistage configuration including an input layer, an intermediate layer (hidden layer), and an output layer, for example, the layers each having a structure in which a plurality of nodes are tied with edges. Each of the layers has a function called “activation function,” each edge having “weight,” the value of each node being calculated based on the value of a node in the previous layer, the value of weight of a connection edge (weighting factor), and the activation function that the layer has. For a calculation method, various kinds of known methods are able to be employed.
Learning in a neural network refers to correcting parameters, that is, weight and bias so that the output layer has a correct value. In the error back-propagation method, “loss function” is determined that indicates how far the value of the output layer is away from a proper condition (desired condition) with respect to the neural network, and the weight and the bias are updated so that the loss function can be minimized using the steepest descent method and the like.
Process procedures, control procedures, specific names, and information including various kinds of data and parameters represented in the above description and drawings may be optionally changed unless otherwise specified. The specific example, distribution, numeric values explained in the embodiments are merely examples, and may be optionally changed.
The components of the devices in the drawings have conceptual features, and do not necessarily have physical configurations as illustrated in the drawings. That is, specific forms of the distribution and integration of each device are not limited to those in the drawings. In other words, all or part of the devices may be functionally or physically distributed or integrated in any desired unit according to various kinds of loads and operating conditions. Moreover, all or any desired part of the processing functions of the devices are implemented by CPU and a computer program analyzed or executed by the CPU, or may be implemented as hardware with wired logic.
The communication device 100a is a network interface card or the like, which communicates with other servers. The HDD 100b stores therein a computer program and a database that operate the functions illustrated in
The processor 100d reads out from the HDD 100b or the like to develop in the memory 100c a computer program that executes the same processing as that executed by the processing units illustrated in
In this manner, the learning device 100 operates as an information processing device that executes a learning method by reading out to execute the computer program. Moreover, the learning device 100 is capable of implementing the same functions as those described in the above-described embodiments by allowing a media reader to read out the computer program from a recording medium to execute the read computer program. It is noted that the computer program referred to in other embodiments than the above-described embodiments is not limited to being executed by the learning device 100. For example, it is possible to apply the present invention similarly when another computer or server executes the computer program or these computer and server execute the computer program in cooperation with one another.
This computer program is distributable via a network such as the Internet. Furthermore, this computer program may be stored in a computer-readable recording medium such as a hard disc, a flexible disc (FD), a compact disc read-only memory (CD-ROM), a magneto-optical disk (MO), or a digital versatile disc (DVD), and may be executed by being read out from the recording medium.
According to one embodiment, it is possible to improve the accuracy of learning.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventors to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention.
Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2018-081898 | Apr 2018 | JP | national |