The present invention relates to model learning and label estimation.
In tests for examining conversation skills by evaluating impression such as the likability of telephone voices (NPL 1) or the level/fluency of foreign language pronunciation (NPL 2), voices are evaluated with quantitative impression values (such as, for example, five-stage evaluation from “good” to “bad”, five-stage evaluation from “high” to “low” in terms of likability, or five-stage evaluation from “high” to “low” in terms of spontaneity).
Currently, experts in various skills evaluate the impression of a voice and give impression values, and thereby a judgement of passing or failing is made. However, if the impression of a voice can be automatically estimated and an impression value can be obtained, the value can be used as the pass mark of the test or the like, or as a reference value for experts who are inexperienced in evaluation (for example, persons who have just started working as evaluators).
In order to realize automatic estimation of a label (e.g., an impression value) for data (e.g., voice data) using machine learning, it is sufficient to perform learning processing using a pair of data and label given to the data as learning data, and generate a model for estimating a label for input data.
However, there are individual differences among evaluators, and there may be cases where an evaluator who is inexperienced in giving a label gives a label to data. Accordingly, different evaluators may give different labels to the same data.
In order to learn a model for estimating a label as obtained by averaging label values given by a plurality of evaluators, it is sufficient that a plurality of evaluators give labels to the same data, and a pair of a label obtained by averaging the values of the labels and the data is used as learning data. To enable stable estimation of an average label, it is preferable that evaluators as many as possible give labels to the same data. For example, in NPL 3, ten evaluators respectively give labels to the same data.
Evaluators include one having a high evaluation ability and one having a low evaluation ability. In a case where there are many evaluators per piece of data, even if some of the evaluators have a low evaluation ability, the label of learning data is corrected to a label with some degree of correctness by labels given by evaluators having a high evaluation ability. However, if the number of evaluators per piece of data is small, label errors of learning data may increase due to the lack of evaluation ability of the evaluators, and a model for estimating an accurate label cannot be learned.
The present invention was made in view of the aforementioned problem, and an object thereof is to provide a technique that enables learning of a model capable of performing accurate label estimation, even if learning data is used for which the number of evaluators per piece of data is small.
Learning data is received that includes learning feature data and label data indicating a label given to the learning feature data by an evaluator, and based on estimation label probability values obtained by applying a label estimation model, which estimates a probability distribution of labels given to feature data, to the learning feature data serving as the feature data, and ability data, which indicates a probability that an evaluator gives a correct label to the feature data and a probability that the evaluator gives a wrong label to the feature data, an estimation observation label probability value is obtained that is a weighted sum of the estimation label probability values with the ability data, and updated ability data and an updated label estimation model are respectively obtained by updating the ability data and updating the label estimation model, the updated ability data and the updated label estimation model being updated so that an error value is reduced, the error value indicating an error of the estimation observation label probability value with respect to the label indicated by the label data.
According to the present invention, a weighted sum of estimation label probability values with ability data that indicates abilities of evaluators in probability is evaluated, and the ability data and a label estimation model are updated, thus making it possible to learn a model that is capable of performing accurate label estimation even if learning data is used for which the number of evaluators per piece of data is small.
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
A first embodiment of the present invention is first described.
<Configuration>
As exemplified in
<Preprocessing>
As preprocessing of model learning processing performed by the model learning device 11, the following three processes are performed. As the first process, learning label data is stored in the learning label data storage unit 111. As the second process, learning feature data is stored in the learning feature data storage unit 112. As the third process, ability data is stored in the ability data storage unit 113. The learning label data includes label data that indicates the values of labels respectively given by a plurality of evaluators to each of a plurality of pieces of learning feature data (label data indicating labels respectively given by the evaluators to the learning feature data). “Label” refers to a correct label that is given to learning feature data by an evaluator who has perceived “information perceptible by human (such as, for example, voices, music, texts, images, and videos)” that corresponds to the learning feature data at the discretion of the evaluator. The value of a label may be a numerical value or a symbol such as an alphabet character. For example, a label is a numerical value indicating an evaluation result given by an evaluator who has perceived “information perceptible by human” that corresponds to learning feature data evaluating the information (for example, a numerical value indicating an impression). Learning feature data refers to feature data for use in learning. Feature data may be data that indicates information perceptible by human (such as, for example, voice data, music data, text data, image data, and video data). Feature data may also be data that indicates features of such information perceptible by human (for example, data regarding the feature amount). Ability data refers to data that indicates the probability that each of a plurality of evaluators gives a correct label to feature data, and the probability that each of the plurality of evaluators gives a wrong label thereto. For example, ability data may be a set of numerical values or symbols such as alphabet characters, or may be a function such as a probability density function.
Learning label data exemplified in
The pieces of learning feature data x(i) corresponding to the label data number i∈{1, . . . , I} that are exemplified in
Ability data a(k, c, c′) exemplified in
The default values of the ability data a(k, c, c′) may be set randomly, or a test is conducted as to whether or not each evaluator can give a correct label to feature data, and the default values may be set based on the result thereof. For example, it is assumed that, in the test, a plurality of evaluators evaluate the same feature data and respectively give labels to the feature data. At this time, the label given by another evaluator who has evaluated the same feature data is regarded as a correct label, and thereby the default values of the ability data a (k, c, c′) may be set. For example, out of pieces of feature data to which the label corresponding to the label data c, the set of data of label data numbers i to which labels are given by the evaluators corresponding to the evaluator numbers k(i)≠k′, which are other than the evaluator number k′∈{1, . . . , K}, is expressed by
A
c.\k′
={i|y(i)=cΛk(i)≠k′}
Also, out of pieces of the same feature data as
A
c.\k′
={i|y(i)=cΛk(i)≠k′},
the set of data of label data numbers i to which the labels corresponding to the label data c′ are given by the evaluator corresponding to the evaluator number k(i)=k′ who has evaluated the feature data is expressed by
B
c′.\k′
={i|x(i)∈XA
At this time, the default values of the ability data a(k, c, c′) may be set as follows:
where |⋅| denotes the number of elements of the set “⋅”, and
\k′denotes a symbol other than k′.
<Model Learning Processing>
The following will describe model learning processing according to the present embodiment.
In the model learning processing of the present embodiment, the updating unit receives an input of learning data that contains: the learning feature data x(i); and the label data y(i) that indicates a label given to the learning feature data by an evaluator. Then, in the model learning processing, updated ability data and updated label estimation model λ are obtained in accordance with the guideline described below. The updating unit first evaluates an error value L(i) that indicates an error of an estimation observation label probability value y{circumflex over ( )}(i, c′), which is a weighted sum of estimation label probability values h(i, c) with the ability data a (k, c, c′), with respect to the label indicated by the label data y(i). Here, the estimation label probability values h(i, c) are obtained by applying the label estimation model λ, which estimates the probability distribution of labels given to feature data, to the learning feature data x(i) serving as the feature data. The ability data a(k, c, c′) indicates the probability that an evaluator gives a correct label to the feature data and the probability that the evaluator gives a wrong label. Then, the ability data a(k, c, c′) and the label estimation model λ are updated so that the error value L(i) is reduced. Hereinafter, the model learning processing will be described in detail with reference to
<<Processing of Evaluation Label Estimation Unit 114 (Step S114)>
The evaluation label estimation unit 114 receives inputs of the label estimation model λ output from the estimation model learning unit 118, and the learning feature data x(i) extracted from the learning feature data storage unit 112. Note that examples of the label estimation model λ include a neural network, a hidden Markov model, and a support vector machine. Any default value of the label estimation model λ may be set. The evaluation label estimation unit 114 applies the label estimation model λ to the learning feature data x(i) to obtain and output estimation label probability values h(i, c) (where i∈{1, . . . , I} and c∈{1, . . . , C}). Here, the estimation label probability values h(i, c) indicate the probability that the label data of the correct label of the learning feature data x(i) corresponding to the label data number i is c. That is, the estimation label probability values h(i, c) exemplified in the present embodiment are the probability distribution p(c|x(i), λ) obtained by applying the label estimation model λ to the learning feature data x(i). However, the following expression
should be satisfied. p(c|x(i),λ) is the probability distribution in which label data of the correct label corresponding to the learning feature data x(i) is c∈{1, . . . , C} in the label estimation model λ.
<<Processing of Observation Label Estimation Unit 115 (Step S115)>>
The observation label estimation unit 115 receives inputs of the estimation label probability values h(i, c) obtained in step S114, the evaluator number k(i) extracted from the learning label data storage unit 111, and the ability data a(k, c, c′) extracted from the ability data storage unit 113. The observation label estimation unit 115 obtains an estimation observation label probability value y{circumflex over ( )}(i, c′) based on the input estimation label probability values h(i, c), the evaluator number k(i), and the ability data a(k, c, c′), and outputs the obtained estimation observation label probability value y{circumflex over ( )}(i, c′). As described above, the estimation observation label probability value y{circumflex over ( )}(i, c′) is a weighted sum of the estimation label probability values h(i, c) with the ability data a(k(i), c, c′). With this, a situation in which an evaluation value deflects from the true value depending on the evaluator's ability is reproduced. As described above, the ability data a(k(i), c, c′) indicates, when the evaluator corresponding to the evaluator number k(i) evaluates the feature data of the label indicated by the label data c, the probability that the label indicated by the label data c′∈{1, . . . , C} is given. The estimation observation label probability value y{circumflex over ( )}(i, c′) reproduces the probability that the label corresponding to the label data c′ is given to the learning feature data x(i), based on both the probability (probability of c=c′) that the evaluator corresponding to the evaluator number k(i) gives the correct label, and the probability (probability of c≠c′) that the evaluator gives a wrong label. For example, the observation label estimation unit 115 obtains the estimation observation label probability value y{circumflex over ( )}(i, c′) in the following manner, and outputs the obtained estimation observation label probability value y{circumflex over ( )}(i, c′).
Note that as indicated by the expression, the upper right index “{circumflex over ( )}” of “y{circumflex over ( )}(i, c′)” should essentially be added to the position immediately above “y”, but there may be cases where it is added to the position to the upper right of “y”, due to restricted description notation.
<<Processing of Error Evaluation Unit 116 (Step S116)>>
The error evaluation unit 116 receives inputs of the estimation observation label probability value y{circumflex over ( )}(i, c′) obtained by the observation label estimation unit 115, and the label data y(i) extracted from the learning label data storage unit 111. The error evaluation unit 116 obtains an error value L(i) that indicates an error of the estimation observation label probability value y{circumflex over ( )}(i, c′) with respect to the label indicated by the label data y(i), and outputs the obtained error value L(i). The error value L(i) indicates the deviation of the estimation observation label probability value y{circumflex over ( )}(i, c′) with respect to the label indicated by the label data y(i). For example, the error evaluation unit 116 evaluates an error between the label data y(i) and the estimation observation label probability value y{circumflex over ( )}(i, c′) based on the Categorical Cross-Entropy, which is an error value that is used frequently in class identification, so as to obtain and output the error value L(i). For example, the error evaluation unit 116 obtains the error value L(i) based on the following expression:
where the following expression is satisfied.
<<Processing of Ability Learning Unit 117 (Step S117)>>
The ability learning unit 117 receives inputs of the estimation label probability values h(i, c) obtained in step S114, the estimation observation label probability value y{circumflex over ( )}(i, c′) obtained in step S115, the error value L(i) obtained in step S116, the evaluator number k(i) extracted from the learning label data storage unit 111, and the ability data a (k, c, c′) extracted from the ability data storage unit 113. The ability learning unit 117 uses them to update the ability data a(k, c, c′), thereby obtaining the updated ability data a′(k, c, c′). For example, the ability learning unit 117 updates the ability data a(k, c, c′) so that the error value L(i) is reduced, and obtains the updated ability data a(k, c, c′). For example, the ability learning unit 117 first updates a(k, c, c′) with respect to all of c∈{1, . . . , C} as follows.
where the following expression is satisfied.
Also, η is a preset parameter of learning rate. r is a positive real number, and if this processing is performed by a neural network, η is 0.01 or smaller, for example. After a(k, c, c′) for all of c∈{1, . . . , C} have been updated in the above-described manner, the ability learning unit 117 performs normalization with respect to, for example, all of c, c″∈{1, . . . , C} in the following manner so that a(k, c, c″) is a probability value, and obtains the updated ability data a(k, c, c″).
The obtained updated ability data a(k, c, c″) is stored as new ability data a(k, c, c″) in the ability data storage unit 113.
<<Processing of Estimation Model Learning Unit 118 (Step S118a)>
The estimation model learning unit 118 receives inputs of the estimation observation label probability value y{circumflex over ( )}(i, c′) obtained in step S115, the error value L(i) obtained in step S116, the evaluator number k(i) extracted from the learning label data storage unit 111, and the ability data a(k, c, c′) updated in step S117 and extracted from the ability data storage unit 113. The estimation model learning unit 118 uses them to obtain an updated label estimation model λ obtained by updating the label estimation model λ, and outputs the obtained updated label estimation model λ. For example, the estimation model learning unit 118 updates the label estimation model λ so that the error value L(i) is reduced, and obtains the updated label estimation model λ. For example, the estimation model learning unit 118 updates the parameter of the updated label estimation model λ so that the error value L(i) is reduced, based on the following gradient.
If the label estimation model λ is a neural network, the estimation model learning unit 118 updates, based on the above-described gradient, the parameter of the label estimation model λ using a gradient descent method, for example. If the label estimation model λ is a neural network, the estimation model learning unit 118 may obtain a gradient for updating the parameter based on the above-described gradient, and may update the parameter. The updated label estimation model 2. obtained in the above-described manner is transmitted, as a new label estimation model n, to the evaluation label estimation unit 114.
<<Processing of Control Unit 119 (Step S119)>
The control unit 119 determines whether or not a termination condition is satisfied. The termination condition is not limited, but, for example, a case where the amount of change in the parameter of the label estimation model λ between before and after step S118a is a predetermined value or smaller (the parameter of the label estimation model λ has sufficiently converged), a case where update of the parameter of the label estimation model λ is executed a predetermined number of times, or the like can be used as the termination condition. If it is determined that the termination condition is not satisfied, the procedure moves back to step S114. That is, the processing from step S114 onwards is repeated again using the updated ability data updated in step S117 as new ability data a(k, c, c′), and the updated label estimation model updated in step S118a as a new label estimation model λ.
<<Processing of Estimation Model Learning Unit 118 (Step S118b)>>
On the other hand, if it is determined in step S119 that the termination condition is satisfied, the estimation model learning unit 118 outputs the parameter for specifying the label estimation model λ obtained ultimately in step S118a (information for specifying the updated label estimation model λ). Alternatively, the estimation model learning unit 118 may output the parameter for specifying the label estimation model λ before being updated ultimately in step S118a (information for specifying the label estimation model λ).
<Estimation Processing>
The following will describe estimation processing according to the present embodiment.
As described above, the parameter that specifies the label estimation model λ output from the model learning device 11 is stored in the model storage unit 121 of the label estimation device 12 (
Hereinafter, a second embodiment of the present invention will be described. In the second embodiment, the functions of the updating unit of the first embodiment, which includes the ability data storage unit 113, the evaluation label estimation unit 114, the observation label estimation unit 115, the error evaluation unit 116, the ability learning unit 117, the estimation model learning unit 118, and the control unit 119, are implemented by a single neural network. Hereinafter, differences from the first embodiment are mainly described, and the matters that have been described are given with the same reference numerals, and descriptions thereof are simplified.
<Configuration>
As exemplified in
<Preprocessing>
As preprocessing of model learning processing performed by the model learning device 21, learning label data is stored in the learning label data storage unit 111, and learning feature data is stored in the learning feature data storage unit 112. The difference from the first embodiment is that although ability data is stored in the ability data storage unit 113 in the preprocessing of the first embodiment, this process is omitted in the preprocessing of the present embodiment. Otherwise, this preprocessing is the same as the preprocessing of the first embodiment.
<Model Learning Processing>
The following will describe model learning processing of the present embodiment with reference to
In the model learning processing of the present embodiment, as will be described below, a label estimation model λ or an updated label estimation model λ that is obtained by performing learning processing using an error value as a loss function until a predetermined termination condition is satisfied is output to a neural network that includes a first node N(1) (one or more nodes), a second node N(2) (one or more nodes), and a third node N(3) (one or more nodes).
Here, the first node N(1) is a normal neural network that functions as the label estimation model λ, and obtains estimation label probability values h(i, c) upon input of learning feature data x(i)=(x(i, 1), . . . , x(i, n)). The second node N(2) performs, upon input of an evaluator number k(i), conversion using an embedding layer or the like, and outputs the obtained ability data a(k(i), c, c′). The third node N(3) performs, upon input of the estimation label probability values h(i, c) and the ability data a(k(i), c, c′), conversion based on probability calculation, and outputs the obtained estimation observation label probability value y{circumflex over ( )}(i, c′).
Where n is an integer of 1 or more, and k(i)∈{1, . . . , K}, i∈{1, . . . , I}, y(i)∈{1, . . . , C}, c∈{1, . . . , C}, and c′∈{1, . . . , C} are satisfied.
<<Processing of Loss Function Calculation Unit 211 (Step S211)>>
Using the estimation observation label probability value y{circumflex over ( )}(i, c′), which is obtained as a result of the learning feature data x(i) extracted from the learning feature data storage unit 112 being input to the first node N(1) and the evaluator number k(i) extracted from the learning label data storage unit 111 being input to the second node N(2) and is output from the third node N(3), and the label data y(i) extracted from the learning label data storage unit 111, the loss function calculation unit 211 obtains an error value L(i) in a manner as described with reference to step S116 of the first embodiment, and outputs the obtained error value L(i) as a loss function L(i).
<<Processing of Parameter Updating Unit 218 (Step S218a)>>
The parameter updating unit 218 receives an input of the loss function L(i) obtained in step S211 and performs learning processing using the loss function L(i), thereby updating parameters (for example, at least one of a weight and an activation function) of the first node N(1) and the second node N(2) of the above-described neural network. For example, the parameter updating unit 218 updates parameters of the first node N(1) and the second node N(2) so that the loss function L(i) is reduced. A back propagation method, a gradient descent method, or the like can be used for the parameter update.
<<Processing of Control Unit 219 (Step S219)>
The control unit 219 determines whether or not a termination condition is satisfied. The termination condition is not limited, but any of, for example, the following four cases falls under the termination condition. The first case is that the amount of change from the estimation observation label probability value y{circumflex over ( )}(i, c′) obtained in the step S211 in the previous procedure to the estimation observation label probability value y{circumflex over ( )}(i, c′) obtained in the step S211 in the current procedure is a predetermined value or less (a case where the estimation observation label probability value y{circumflex over ( )}(i, c′) has sufficiently converged). The second case is that the amount of change from the loss function L(i) obtained in the step S211 in the previous procedure to the loss function L(i) obtained in the step S211 in the current procedure is a predetermined value or less (a case where the loss function L(i) has sufficiently converged). The third case is that the amount of change from the parameter updated in the step S218a in the previous procedure to the parameter updated in the step S218a in the current procedure is a predetermined value or less (a case where the parameter of the label estimation model λ has sufficiently converged). The fourth case is that the parameter update in step S218a has been executed a predetermined number of times, and the like. Any of these cases can be defined as the termination condition. If it is determined that the termination condition is not satisfied, the procedure moves back to step S211, and the processing in steps S211, S218a, and S219 are executed again. On the other hand, if it is determined that the termination condition is satisfied, the parameter updating unit 218 outputs the parameter of the first node N(1) as the parameter of the label estimation model λ.
<<Processing of Parameter Updating Unit 218 (Step S218b)>>
On the other hand, if it is determined in step S219 that the termination condition is satisfied, the parameter updating unit 218 outputs the parameter of the first node N(1) ultimately updated in step S218a as the parameter for specifying the label estimation model λ (information for specifying the updated label estimation model λ). Alternatively, the parameter updating unit 218 may output the parameter of the first node N(1) before being ultimately updated in step S218a as the parameter for specifying the label estimation model λ (information for specifying the label estimation model λ).
<Estimation Processing>
The following will describe estimation processing of the present embodiment. In the first embodiment, the parameter for specifying the label estimation model λ output from the model learning device 11 is stored in the model storage unit 121 of the label estimation device 12 (
[Other Modifications and the Like]
Note that the present invention is not limited to the above-described embodiments. For example, the respective pieces of processing of the evaluation label estimation unit 114, the observation label estimation unit 115, the error evaluation unit 116, the ability learning unit 117, the estimation model learning unit 118, and the control unit 119 that have been described in the first embodiment may be executed by a single processing unit. Alternatively, the respective pieces of processing of a plurality of processing units included in the evaluation label estimation unit 114, the observation label estimation unit 115, error evaluation unit 116, the ability learning unit 117, the estimation model learning unit 118, and the control unit 119 may be executed by a single processing unit. In this case, the implementing method is not limited to a neural network. For example, in the second embodiment, the functions of the updating unit that includes the ability data storage unit 113, the evaluation label estimation unit 114, the observation label estimation unit 115, the error evaluation unit 116, the ability learning unit 117, the estimation model learning unit 118, and the control unit 119 are implemented by a single neural network, but may be implemented together by another method.
The above-described various types of processing may be not only executed in time-series manner in accordance with the description, but also executed in parallel or individually according to the throughput of a device that executes the processing or according to the need. Moreover, it is needless to say that changes may be suitably made without departing from the spirit of the present invention.
The above-described devices are configured by, for example, a general-purpose computer or a dedicated computer that includes a processor (hardware processor) such as a CPU (central processing unit) and a memory such as a RAM (random-access memory) or a ROM (read-only memory) executing a predetermined program. This computer may be provided with one processor and one memory, or may be provided with a plurality of processors and a plurality of memories. This program may be installed in the computer or may be recorded in advance in the ROM or the like. Also, some or all of the processing units may be configured using an electronic circuitry that realizes the processing functions without using the program, instead of an electronic circuitry such as a CPU that realizes the functional configuration as a result of the program being read. An electronic circuitry constituting one device may include a plurality of CPUs.
If the above-described configuration is realized by a computer, the processing content of the functions that the devices should have is described in a program. By executing this program by the computer, the processing functions are realized on the computer. The program in which the processing content is described can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a non-transitory recording medium. Examples of such a recording medium include a magnetic recording device, an optical disk, a magnetooptical medium, and a semiconductor memory.
This program is distributed by the sales, transfer, lending, or the like, of a portable recording medium such as a DVD and a CD-ROM in which the program is recorded. Furthermore, the program may also be distributed such that this program is stored in a storage device in a server computer and transferred from the server computer to another computer via a network.
First, the computer that executes such a program once stores the program recorded in a portable recording medium or transferred from the server computer for example, in its own storage device. During the execution of the processing, this computer reads the program stored in its own storage device and executes processing in accordance with the read program. As another aspect of execution of the program, the computer may directly read the program from the portable recording medium and may execute the processing in accordance with this program, and the computer may also execute, upon receiving programs transferred from the server computer, processing sequentially in accordance with the received programs. A configuration is also possible in which the above-described processing is executed not by transferring the programs from the server computer to this computer, but using a so-called ASP (Application Service Provider) service, which realizes processing functions only based on an execution instruction and acquisition of a result.
Instead of the processing functions of the present devices being realized by a predetermined program being executed on a computer, at least some of the processing functions may be realized by hardware.
Number | Date | Country | Kind |
---|---|---|---|
2019-040240 | Mar 2019 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/007287 | 2/25/2020 | WO | 00 |