The present invention relates to a technique for predicting a personal operation using machine learning.
In many situations, individuals may perform specific operations on multiple objects (candidates) based on their individual (or personal) evaluation, such as subjective evaluation or evaluation based on experiences. One example is a normal social life situation in which a delicious looking dish is chosen from a restaurant menu, photographs of multiple candidate persons are ranked in the order of attractiveness, and appealing items for coordination are chosen from a group of products displayed at a store.
Research has been underway to predict, using machine learning, such personal evaluation dependent on human subjectivity and experiences. Non-Patent Literature 1 below describes a method for evaluating a degree by which food in a photograph looks delicious. Non-Patent Literature 2 below describes a method for evaluating aesthetics of an image.
Known studies use user-defined subjective indices such as the degree by which an item looks delicious or the degree by which an item appears beautiful to quantify personal evaluation and express such evaluation using numbers. For machine learning, true values (true data) for such subjective indices are to be preset for images prepared as training data.
However, people have different tastes and feelings. Defining a true value for each subjective index has been difficult. Evaluation based on personal subjectivity and experiences or a personal sense of value is difficult to express using numbers. Thus, appropriate subjective indices may be difficult to define. Known approaches have this limitation.
In response to the above issue, one or more aspects of present invention are directed to a technique for predicting a personal operation on multiple objects using machine learning. One aspect of the present invention is directed to a technique for learning, using machine learning, a personal operation dependent on human subjectivity or experiences without presetting definitions and true values for subjective indices.
An operation prediction apparatus according to an aspect of the present disclosure includes a dataset obtainer that obtains a dataset including a plurality of objects, an attribute information obtainer that obtains attribute information about a user, and a predictor that predicts an operation to be performed on the dataset by the user based on the dataset and the attribute information using a model trained through machine learning. The model includes a first module that calculates, for each of the plurality of objects in the dataset, an index value corresponding to a combination of the object and the attribute information using a neural network, and a second module that calculates a prediction result of the operation to be performed by the user by performing a predetermined process on a plurality of index values obtained from the first module and corresponding to the respective plurality of objects. The model is trained through machine learning on training data including a sample dataset including a plurality of samples, true data being a result of an operation performed on the sample dataset by an operator, and attribute information about the operator.
The operation prediction apparatus with this structure can predict, in response to a dataset and attribute information, an operation likely to be performed on the dataset by a person corresponding to the attribute information. The operation prediction apparatus may predict operations based on personal evaluation, such as selecting, sorting, and grouping of objects based on hobbies, preferences, or experiences.
For machine learning of such personal operations, known techniques are to preset definitions and true values for subjective indices. In contrast, the operation prediction apparatus with the above structure uses an index value corresponding to the subjective index as an internal parameter of a model and eliminates explicit presetting of definitions or true values for the index value. In other words, during learning, the result of an operation performed by an operator (tester) on a sample dataset (e.g., result of selection, sorting, or grouping) is simply input as true data. This allows automatic generation of the index value correlated with the operation. This allows, through machine learning, prediction of a subjective operation performed by a human without presetting the definitions and true values for the subjective indices.
When a user operation such as selecting, sorting, or grouping objects is performed with the second module, these operations can be simulated using a predetermined process usually as a non-differentiable process. For the second module including a non-differentiable process, during model training, backpropagation cannot be used to propagate an error in the prediction result, which is an output of the second module, to the error in the index value, which is an input into the second module (also the output of the first module). In this case, for example, a dedicated model for estimating the error in the second module may be used to convert the error in the prediction result to the error in the index value.
In response to the above constraints during model training, the predetermined process may be approximated by a process using a differentiable function or by a combination of processes using differentiable functions. Configuring the internal calculations of the second module with a differentiable process allows the error in the prediction result to be propagated back through the second module to the first module. This allows the entire model (more specifically, the first module and the second module together) to be trained through machine learning by backpropagation and allows the model to learn easily.
The neural network may output a probability distribution of the index value in response to an input of at least one object and the attribute information. The first module may sample an index value from the probability distribution output from the neural network and output the sampled index value to the second module. The validity of the prediction result is expected to be improved by such a probability model.
The first module may use a differentiable function to sample the index value from the probability distribution. This allows the first module to be trained through machine learning by backpropagation.
The first module and the neural network may have various structures. For example, the neural network may receive an input of a value obtained with one of the first module or the second module recursively. The first module may output, in response to an input of the plurality of objects, the index value with a condition in which the plurality of objects appear simultaneously or consecutively. At least one object of the plurality of objects may include a plurality of items of information, or the attribute information may include a plurality of items of information. In this case, the first module may include a plurality of submodules corresponding to the respective plurality of items of information, and output, for the at least one object, a plurality of index values obtained with the plurality of submodules.
The predetermined process may include at least one selected from the group consisting of at least one of four arithmetic operations performed on the plurality of index values, sorting of the plurality of objects based on the plurality of index values or values computed from the plurality of index values, a threshold process performed on each of the plurality of index values or a value computed from the index value, selection of at least one object from the plurality of objects based on the plurality of index values or the values computed from the plurality of index values, and grouping of the plurality of objects based on the plurality of index values or the values computed from the plurality of index values.
A model training method according to an aspect of the present disclosure is a method for training a model through machine learning. The model is usable in an operation prediction apparatus. The method includes obtaining a sample dataset including a plurality of samples, obtaining true data being a result of an operation performed on the plurality of samples by an operator, obtaining attribute information about the operator, and training the model through machine learning on the sample dataset including the plurality of samples, the true data, and the attribute information about the operator. In the above method, the training the model through machine learning may include training the first module and the second module by backpropagation based on an error between the true data and an output of the model in response to an input of the plurality of samples and the attribute information about the operator.
An operation prediction method according to an aspect of the present disclosure includes obtaining a dataset including a plurality of objects, obtaining attribute information about a user, and predicting an operation to be performed on the dataset by the user based on the dataset and the attribute information using a model trained through machine learning. The model includes a first module that calculates, for each of the plurality of objects in the dataset, an index value corresponding to a combination of the object and the attribute information using a neural network, and a second module that calculates a prediction result of the operation to be performed by the user by performing a predetermined operation on a plurality of index values obtained from the first module and corresponding to the respective plurality of objects. The model is trained through machine learning on training data including a sample dataset including a plurality of samples, true data being a result of an operation performed on the sample dataset by an operator, and attribute information about the operator.
One or more aspects of the present invention may be directed to an operation prediction apparatus including at least a part of the above elements or structures, or to a system that performs, for example, operations, evaluations, selections of actions, control, simulations, suggestions, recommendations, and searches using the prediction result from the apparatus. One or more aspects of the present invention may also be directed to a model training method and a model training apparatus for a model used in the operation prediction apparatus. One or more aspects of the present invention may also be directed to an operation prediction method or a control method for the operation prediction apparatus including at least a part of the above processes, or to a method for operations, evaluations, selections of actions, control, simulations, suggestions, recommendations, and searches, using the prediction result obtained with the method. One or more aspects of the present invention may also be directed to a program for causing a processor to perform the steps included in the method, or a recording medium recording the program. The above elements and processes may be combined with one another in any manner to form one or more aspects of the present invention.
The technique according to the above aspects of the present invention generates a machine learning model that has learned personal operations dependent on human subjectivity and experiences without presetting definitions and true values for subjective indices, and predicts the personal operations on multiple objects using the model.
An operation prediction apparatus, a model training method, and an operation prediction method according to one embodiment of the present invention will now be described with reference to the drawings.
<Prediction Model.
A model for predicting personal operations dependent on human subjectivity and experiences (hereafter, a prediction model) will now be described with reference to
In response to an input of a dataset x={x1, . . . , xN} of multiple objects and user attribute information a, a prediction model 1 outputs operations likely to be performed by the user on the dataset x as a prediction result y={y1, . . . , yN}.
The objects xi (i=1, . . . , N) may be data of any type and designed as appropriate for an application using the prediction model 1. For example, image data (including moving images), text data, and voice data may be used as objects, or a combination of multiple items of data may be used as a single object. Examples of combinations of multiple items of data include a combination of an image and the description of a subject contained in the image (e.g., a cooking recipe and the description of a sightseeing spot) and a combination of an image and geographic information about a location contained in the image (e.g., positional information such as latitude and longitude, the nearest station or airport, and time taken from the nearest station or airport). A combination of items of data may also be a combination of data pieces of the same type, such as a combination of multiple images. The number of objects N may be set as appropriate.
The attribute information a is information for stratifying users. The stratification may be performed differently depending on the application using the prediction model 1. The attribute information may thus be any information. The attribute information may include, for example, age, an age group, sex, blood type, occupation, income, assets, height, weight, health conditions, past illnesses, the place of birth, the place of residence, nationality, family structure, hobbies, and preferences. For an application for outputting personalized results, information for identifying an individual user (the name, personal identification or ID, and social security and tax number) may be used as the attribute information. A single item of information or a combination of multiple items of information (e.g., age and sex) may be input as the attribute information a.
The prediction result y represents the result of an operation performed by the user on the dataset x. Examples of the operation include selection of k objects (1≤k<N), sorting (ranking) and grouping objects, and combinations of these operations. The specific task to be achieved by the operation may be designed as appropriate for an application using the prediction model 1.
The operation performed on the dataset x herein refers to an operation performed on some or all of the objects xi in the dataset x under the condition reflecting all objects xi, . . . , xN in the dataset x. More specifically, the prediction model 1 does not output an individual prediction result for a single input object but outputs an overall prediction result for a group of N input objects.
As shown schematically in
<First Module>
The neural network used in the first module M1 can have any structure and be designed as appropriate for the application using the prediction model 1. For example, a convolutional neural network or an improved version of such a neural network may be used. A neural network to receive a recurrent input of values obtained in either the first module M1 or the second module M2, such as a recursive neural network, may also be used.
In the example of
The structure of the first module M1 is not limited to the structures shown in
v
i=μ+σiv*
v*˜N(0,1)
This function is designed to output a mean value plus the probabilistic noise σiv* as the index value vi.
This structure for sampling using the differentiable function allows an error in the index value vi to be propagated back through the sampling section 41 to the neural network 40. This allows the first module M1 to be trained through machine learning by backpropagation.
<Second Module>
The second module M2 performs a predetermined process on the set of index values V={v1, . . . , vN} and outputs the prediction result y={y1, . . . , yN}. In particular, the second module M2 does not perform the process on a single element (a single index value vi), but on a set of multiple elements (the set V of index values). This allows the second module M2 to perform operations (e.g., sorting, selection, and grouping) reflecting the relationship with other elements. The predetermined process can be designed as appropriate for an application using the prediction model 1. For example, the predetermined process may include a combination of one or more of the following processes 1) to 5).
In the processes, the value wi is obtained with the four arithmetic operations performed on the index value vi.
When, for example, the user operation to be predicted by the prediction model 1 is object selection, the process of the second module M2 can be a combination of the processes 1) and 4). When the user operation to be predicted is sorting (ranking) of all objects, the process of the second module M2 can be a combination of processes 1) and 2). When the user operation to be predicted is sorting (ranking) of some objects, the process of the second module M2 can be a combination of processes 1), 2), and 4).
The prediction result y has predicted values y1, . . . , yN as its elements in response to inputs of the respective objects x1, . . . , xN. A predicted value yi can be of any type, such as a binary value (0 or 1), a continuous value, (a parameter representing) a probability distribution, or a vector. The type of the predicted value yi and the meanings (definitions) of the value can be designed as appropriate for the application using the prediction model 1.
When, for example, the user operation to be predicted is object selection, the predicted value yi may be represented by the binary value, with the definition of 1 representing being selected and 0 representing being unselected. In this case, the prediction result y is output as a binary vector such as y={0, 0, 1, 0, 1} (this example represents the prediction result showing that the two objects or the third and fifth objects are selected out of five objects). When the user operation to be predicted is sorting (ranking) objects, a sorting matrix may be output as the prediction result y. The sorting matrix is an N×N matrix (N is the number of objects xi), and the element with the value 1 in the N-dimensional vector yi in the i-th row represents the order (rank) of the objects xi. For example, the sorting matrix y shown below is an example matrix that defines the order (rank) of the five objects x1 to x5 as 4th, 2nd, 5th, 3rd, and 1st. The objects sorted in accordance with this matrix y are in the order of x5, x2, x4, x1, and x3.
When the user operation to be predicted is grouping objects, information identifying the cluster to which the object xi belongs may be output as a predicted value yi. For example, a K-dimensional one-hot vector may be output as the predicted value yi with the k-th element alone being 1 and the other elements being 0, where K is the number of clusters, and k is the cluster number to which the object xi belongs (k=1, . . . , K).
The predetermined process performed with the second module M2 may be approximated by a process using a differentiable function or by a combination of processes using differentiable functions. The internal calculations of the second module including a differentiable process allow an error in the prediction result y to be propagated back through the second module M2 to the first module M1.
The four arithmetic operations in the process 1) are clearly differentiable. For the sorting in the process 2), for example, the processes described in Non-Patent Literatures 3 and 4 may be used. For the threshold process in the process 3), for example, a sigmoid function or a hard-sigmoid function can be combined with a straight-through estimator (STE). For a forward pass, the sigmoid function or the hard-sigmoid function can be used to binarize the data. For a backward pass, a gradient can be calculated using prestored pre-binarized values. For the selection in the process (4), for example, the Gumbel-softmax (refer to Non-Patent Literature 5) or a softmax-function may be combined with the straight-through estimator.
<Apparatus Configuration>
The operation prediction apparatus 5 mainly includes a dataset obtainer 50, an attribute information obtainer 51, a predictor 52, an information output unit 53, a storage 54, and a training unit 55. The dataset obtainer 50 obtains the dataset x. The attribute information obtainer 51 obtains the user attribution information a. The predictor 52 predicts a user operation from the dataset x and the attribute information a using the prediction model 1. The information output unit 53 provides, to the user, various items of information, such as a prediction result and a processing result. The storage 54 is an internal storage for storing various items of data such as the dataset x, the attribute information a, the prediction result, and the processing result. The training unit 55 trains the prediction model 1 through machine learning. The operation prediction apparatus 5 is to include the training unit 55 to train (and to retrain) the prediction model 1. However, the operation prediction apparatus 5 using a trained model generated by another training device may not include the training unit 55.
The operation prediction apparatus 5 may be a general-purpose computer including, for example, a central processing unit (CPU) or a processor, a memory, a storage, a communication device, an input device, and a display device. In this case, the configuration shown in
<Model Training>
In step S100, the training unit 55 obtains training data. The training data may be obtained from the storage 54, which is an internal storage, or from an external storage. The training data includes a sample dataset xs={xs1, . . . , xsN} including N samples, true data yt={yt1, . . . , ytN} that results from an operation performed by an operator O on the sample dataset xs, and the attribute information aO of the operator O. To achieve sufficient prediction accuracy, the learning unit 55 may obtain a large amount of training data, including many sample variations and attribute information variations.
In step S101, the training unit 55 sets initial values for all parameters of the first module M1 (such as the weight of each layer of the neural network), as well as parameter values used for machine learning, such as the learning rate.
In step S102, the training unit 55 inputs the sample dataset xs and the attribute information aO included in the training data into the first module M1.
In step S103, a forward pass operation is performed. More specifically, the first module M1 calculates the index value vsi from each sample xsi and the attribute information aO, and inputs a set of index values vs={vs1, . . . , vsN} into the second module M2. The second module M2 then performs a predetermined process on the set Vs of index values and outputs a prediction result ys={ys1, . . . , ysN}.
In step S104, the training unit 55 calculates an error between the prediction result ys and the true data yt.
In step S105, the training unit 55 performs a backward pass operation by backpropagation to update the parameters of the first module M1.
In step S106, the training unit 55 performs end determination and ends the training process when a predetermined end condition is satisfied.
The trained prediction model 1 obtained through the above process is stored into the predictor 52 and used for a prediction process described below.
<Prediction Process>
In step S200, the dataset obtainer 50 obtains a dataset x={x1, . . . , xN} including N objects. The dataset x may be obtained from the storage 54, which is an internal storage, or from an external storage.
In step S201, the attribute information obtainer 51 obtains the user attribute information a. For example, the attribute information obtainer 51 may display an attribute information input screen as a graphical user interface (GUI) and request the user to input or select the attribute information. The attribute information obtainer 51 may also estimate the user attribute information by analyzing the dataset x, or obtain the user attribute information from another applications or network service (e.g., obtain personal information through ID integration with a social networking service, or SNS, application).
In step S202, the predictor 52 inputs the dataset x and the attribute information a into the first module M1 in the prediction model 1.
In step S203, a forward pass operation is performed. More specifically, the first module M1 calculates the index value vi from each sample xi and the attribute information a, and inputs a set of index values V={v1, . . . , vN} into the second module M2. The second module M2 then performs a predetermined process on the set V of index values and calculates the prediction result y={y1, . . . , yN}.
In step S204, the information output unit 53 outputs information indicating the prediction result y. Any method for outputting the prediction result y may be used. For example, a list of selected images may appear on the screen, an album may be generated using the selected images, or recommendations may be provided to the user.
In the embodiment described below, the prediction model 1 is used for a continuous knapsack problem. In the present embodiment, an application is to select a sightseeing spot that can be visited within a predetermined time period from multiple sightseeing spots.
Inputs are a set of images of sightseeing spots x={x1 . . . , xN}, the time taken at each sightseeing spot c={c1, . . . , cN}, and an age group a as tourist attribute information. A total visit time Ctotal is preset as a constraint condition. An output is an indicator yiϵ{0, 1} indicating whether the i-th object xi has been selected.
Step 1: The first module M1 calculates the index values v1, . . . , vN from the images x1, . . . , xN and the age group a.
Step 2: The second module M2 calculates wi=log(vi)−log(ci).
Step 3: The second module M2 sorts the N objects x1, . . . , xN in descending order based on w1, . . . , wN, where p(i) is an original index of the i-th object xi after sorting.
Step 4: The second module M2 calculates a total travel time Ci taken to travel to the i-th sightseeing spot with the following formula.
Step 5: The second module M2 calculates sigmoid(Ci−Ctotal), sets yp(i)=1 for p(i) with the calculated value of 0.5 or greater, and sets the other y elements to 0.
The above processes allow extracting popular sightseeing spots that can be visited within the predetermined time Ctotal for people in the age group a. The prediction result can be used in, for example, applications for arranging and recommending sightseeing courses in accordance with the age of each traveler.
In the first embodiment, the prediction model 1 is used for the knapsack problem in which the cost ci of each object xi is preset. In the second embodiment, the cost ci is an unknown parameter and estimated internally by the first module M1.
The prediction model 1 may be, for example, used in the applications described below.
1. Imitation of Expert Recommendation
From a large number of food images, an expert (e.g., registered dietitian) creates a menu for a week based on the concept such as the Western style or the Japanese style. A prediction model that has learned such creation of a menu provides the menu appearing to be selected by the expert for a new dataset of food images.
2. Recommendation for User
For example, menu images for various users may be used as a sample dataset, and the attributes of each user (e.g., favoring Japanese food) may be estimated from the menu trends and histories. The prediction model generated using the training data can generate a menu preferred by a person with those attributes (e.g., a person who likes Japanese food) in response to an input of a new dataset of the food images and attributes.
3. Automatic Summary Generation
The prediction model 1 described in the first embodiment can be used for automatic generation of content summaries. An example situation of the prediction model 1 includes generating a 60-second promotional video from input video content. The original video content is first split into short clips ranging from a few seconds to a dozen seconds. The video can be split using known methods, such as, by detecting scene transitions or by splitting based on metadata embedded in the video. Each clip is then input into the prediction model 1 as the object xi and the playback time of each clip as a cost ci. An optimal combination of objects xi that maximizes the value vi can be calculated under the constraint of the total cost within 60 seconds. At this time, for example, age groups and interest categories, such as twenties and fashion, may be input as attribute information to create promotional videos that are highly effective in appealing to people in a specific age group and interest categories. In addition to the video contents, the prediction model 1 can be used to generate summaries of any digital content, such as text documents and recorded data.
4. Narrowing Inspection Areas of Device
Example use of the prediction model 1 described in the second embodiment to assist in device maintenance will now be described. For example, a target device may have more than 100 inspection items, and the time taken to inspect all of them thoroughly may be more than one hour. When the device malfunctions during operation and the causes are to be identified within ten minutes, completing more than 100 inspection items in time is impossible. The prediction model 1 may thus be used to narrow the inspection items to the areas to be inspected that are likely to be the causes of the malfunction and can be inspected within the time limit. For example, as shown in
5. Product Inspection
Example use of the prediction model 1 described in the second embodiment to assist in sampling inspection of products will now be described. When, for example, inspecting all products on a production line is difficult, sampling inspection is performed on a predetermined number of products. At this time, the inspection details and the inspection time vary depending on the conditions of the products. The target objects to be sampled and the time taken for the inspection also vary depending on the skills of each inspector. The prediction model 1 may thus be used to assist in the selection of products to be inspected, thus allowing detecting as many defects as possible within a time limit (e.g., working hours per day).
<Appendix>
Number | Date | Country | Kind |
---|---|---|---|
2020-115035 | Jul 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/022970 | 6/17/2021 | WO |