The present invention relates to a learning data generation device, a learning data generation method, and a program, for generating learning data.
Machine learning techniques may be broadly classified as trained learning in which learning is performed whilst adding ground truth labels to learning data, untrained learning in which learning is performed without adding labels to learning data, and reinforcement learning in which a computer is induced to autonomously derive an optimal method by rewarding good results. For example, a support vector machine (SVM) that performs class classification is known as an example of trained learning (see, NPL 1).
By hierarchically combining multiple classifiers, it is also possible to perform a more advanced classification. However, in doing so, because it is necessary to build and/or update classifiers necessary for each individual classification, a problem arises in that that correction of classification results and generation of learning data are time and effort intensive.
An objective of the present invention, made in view of the above background, is to provide a learning data generation device, a learning data generation method, and a program, that can efficiently generate learning data necessary for the learning of models when performing classification entailing a hierarchical combination of classifiers.
In order to solve the abovementioned problem, a learning data generation device of the present invention is a learning data generation device for generating learning data in a system that performs classification of an input data group using a plurality of classifiers that are combined hierarchically, and comprises: a learning scope determination unit for determining input data to be a learning scope, on the basis of classification results from a multi-class classification of the input data group using the plurality of classifiers; and a training data generation unit for generating training data that is the input data determined to be the learning scope to which the classification results of the input data are appended as labels.
Further, in order to solve the abovementioned problem, a learning data generation method of the present invention is a learning data generation method for generating learning data in a system that performs classification of an input data group using a plurality of classifiers that are combined hierarchically, and comprises: determining input data to be a learning scope, on the basis of classification results from a multi-class classification of the input data group using the plurality of classifiers; and generating training data that is the input data determined to be the learning scope to which the classification results of the input data are appended as labels.
Further, in order to solve the abovementioned problems, a program pertaining to present invention causes a computer to function as the abovementioned learning data generation device.
According to the present invention, it is possible to efficiently generate learning data necessary for the learning of models when performing a classification that entails a hierarchical combination of classifiers.
In the accompanying drawings:
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
Firstly, a system for classifying input data groups using multiple classifiers that are hierarchically combined is explained.
The primary classifier uses a dialogue scene prediction model to predict the dialogue scene in a contact center, and in the example given in
Inquiry understanding is a scene in which the inquiry content of the customer is acquired, such as: “I'm enrolled in your auto insurance, and I have an inquiry regarding the auto insurance.”, “So, you have an inquiry regarding the auto insurance policy you are enrolled in?”, “Umm, the other day, my son got a driving license. I want to change my auto insurance policy so that my son's driving will be covered by the policy; can you do this?”, “So, you want to add your son who has newly obtained a driving license to your automobile insurance?”.
Contract confirmation is a scene in which contract confirmation is performed, such as: “I will check your enrollment status, please state the full name of the party to the contract.”, “The party to the contract is Ichiro Suzuki.”, “Ichiro Suzuki. For identity confirmation, please state the registered address and phone number.”, “The address is ______ in Tokyo, and the phone number is 090-1234-5678.”, “Thank you. Identity has been confirmed.”.
Response is a scene in which a response to an inquiry is performed, such as “Having checked this regard, your present policy does not cover family members under the age of 35.”, “What ought I do to add my son to the insurance?”, and “This can be modified on this phone call. The monthly insurance fee would increase by JPY 4,000, to a total of JPY 8,320; do you accept?”.
Closing is a scene in which dialogue termination confirmation is performed, such as “Thank you for calling us today.”.
The secondary classifier further predicts, with respect to the dialogue for which the dialogue scene was predicted by the primary classifier, the utterance type in an utterance-wise manner. The secondary classifier may use multiple models to predict multiple kinds of utterance types. In the present embodiment, with respect to a dialogue for which the dialogue scene is predicted to be inquiry understanding, a topic utterance prediction model is used to predict whether, in an utterance-wise manner, utterances are topic utterances; a regard utterance prediction model is used to predict whether, in an utterance-wise manner, utterances are regard utterances; and a regard confirmation utterance prediction model is used to predict whether, in an utterance-wise manner, utterances are regard confirmation utterances. Further, with respect to a dialogue for which the dialogue scene is predicted to be contract confirmation, a contract confirmation utterance prediction model is used to predict whether, in an utterance-wise manner, utterances are contract confirmation utterances; and a contract responsive utterance prediction model is used to predict whether, in an utterance-wise manner, utterances are contract responsive utterances.
A topic utterance is an utterance by the customer that is intended to convey the topic of the inquiry. A regard utterance is an utterance by the customer that is intended to convey the regard of the inquiry. A regard confirmation utterance is an utterance by the service person that is intended to confirm the inquiry regard (e.g. a readback of the inquiry regard). A contract confirmation utterance is an utterance of the service person that is intended to confirm the details of the contract. A contract responsive utterance is an utterance by the customer that is intended to, with respect to the contract content, provide a response to the service person.
The tertiary classifier predicts or extracts, on the basis of the classification results of the primary and secondary classifiers, utterance focus point information. Specifically, from utterances predicted by the secondary classifier to be topic utterances, the focus point of the topic utterances information is predicted using the topic prediction model. Further, from utterances predicted by the secondary classifier to be regard utterances, the entirety of the text is extracted as the focus point information of the regard utterances, and from utterances predicted by the secondary classifier to be regard confirmation utterances, the entirety of the text is extracted as the focus point information of the regard confirmation utterances. Further, from utterances predicted by the secondary classifier to be contract confirmation utterances and utterances predicted to be contract responsive utterances, the name of the party to the contract, the address of the party to the contract and the telephone number of the party to the contract are extracted. The extraction of the name of the party to the contract, the address of the party to the contract and the telephone number of the party to the contact may be performed using models and also may be performed in accordance with pre-stipulated rules.
The Learning data generation device 1 is a device that generates learning data for models in a system for classifying input data groups using multiple classifiers that are hierarchically combined.
The classification dependency relation store 11 stores, in relation to each classification, a classification dependency relation table that defines an order in which the classifiers are performed (classifier combinations). The classification dependency relation table defines the classifiers to be used at each level and their conditional values.
The multi-class classifier 12 reads out the classification dependency relation table from the classification dependency relation store 11 and, in accordance with the classification dependency relation table, performs a multi-class classification with respect to the input data group, and generates and saves a classification results table representative of the classification results. Here, any known method such as SVM, deep neural network (DNN) and the like may be applied as the classification method. With regards to DNN, models appropriate for dealing with time-series data such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) and the like may be utilized. Further, classification may be performed in accordance with pre-stipulated rules. The rules may include exact matching on the string or word; forward-matching; backward-matching; partial matching; and besides these, matching based on regex.
The learning form generation unit 13 generates a learning form having classification results based on the classification results table generated by the multi-class classifier 12 and a correction interface for rectifying said classification results, and causes the learning form to be displayed on the display 2. The correction interface is an object for rectifying the classification results and is associated with the classification level and the targeted point.
Specifically, the learning form generation unit 13 generates a learning form which shows, in a categorized manner for the respective classification results, the classification results from the first level (top level) classifier, and shows, within the region for displaying the classification results by the first level classifier, classification results by the classifiers of the respective lower levels.
Further, the learning form generation unit 13 generates a correction interface including buttons for adding classification results, buttons for deleting classification results, and regions for inputting corrected classification results. Moreover, in some embodiments modification may be possible by clicking the classification results display region, and in this case the classification results display region and the post-correction classification results input area become one and the same.
In
Specifically, the primary display region 21 displays only “opening” which is the classification result of the primary classifier, and the primary display region 25 displays only “closing” which is the classification result of the primary classifier.
The primary display region 22 displays “inquiry understanding” which is the classification result of the primary classifier. If the classification dependency relation table is followed, in a case in which the classification result of the primary classifier is “inquiry understanding”, the processing proceeds to the second level. Then, utterance type prediction is performed at the second level and, in a case in which the result of this is “true”, the processing proceeds to the third level. For this purpose, the primary display region 22 displays, in secondary display region 221, “topic”, “regard”, and “regard confirmation”, which indicate that the classification results at the secondary classifier is “true”. Further, the classification results relating to topic utterances and extraction results relating to utterance focus point information of regard utterances and regard confirmation utterances are displayed in the tertiary display region 222. Moreover, as extraction results relating to utterance focus point information of regard utterances and regard confirmation utterances are often similar, only one of them may be displayed.
Similarly, the primary display region 23 displays “contract confirmation” which is the classification result of the primary classifier, and “name”, “address”, and “contact details”, which indicate that the classification result at the secondary classifier is “true”. Further, with respect to “name”, “address”, and “contact details”, extraction results pertaining to utterance focus point information are displayed in the tertiary display region 232.
In the example shown in
Further, as part of the correction interface, in the primary display regions 21 to 25, “add focus point” buttons for adding utterance focus point information are displayed, and in the primary display regions 22 to 24, “X” buttons, shown by X symbols, for deleting utterance focus point information are displayed.
With respect to the third level topic prediction results shown in the tertiary display region 222, in a case in which the prediction is from multiple candidates, a user can select from a pulldown to perform a correction and save action. Further, with respect to the third level utterance focus point information extraction results shown at tertiary display regions 232, 242, the user can rectify and save the text. Unnecessary utterance focus point information can be deleted by depressing the “X” button.
The corrected point record unit 14 generates correction information that records the correction point and the corrected classification results in a case in which the learning form generated by the learning form generation unit 13 has been corrected by the user via the correction interface (i.e. in a case in which the classification results have been corrected). Moreover, the user can perform corrections on classification results in the midst of the multiple levels, via buttons associated with the classification levels. Correction includes modification, addition, and deletion. In a case in which the classification result of the top level classifier has been corrected, the corrected point record unit 14 changes said classification result (in the present embodiment, the dialogue scene corresponding to dialogue content) and generates correction information. The training data can entail only the correction information of the top level classifier, the classification results from initial servicing up to the correction information, or all classification results including correction information. For example, in the present embodiment, in a case in which the classification result of the dialogue scene prediction of the primary classifier was corrected from “inquiry understanding” to “response”, the classification result of the primary classifier is changed from “inquiry understanding” to “response”. The learning scope can be set to at least each utterance up to the utterance for which the classification result was corrected and time-series data of that classification result, and may be set to time series data of the classification results of all successive utterances including the utterance for which the classification result was corrected.
Further, in a case in which a classification result of a classifier of a particular level is corrected, the corrected point record unit 14 also corrects classification results of classifiers at levels higher than said particular level in conformance with the classification result correction. In a case in which there is no need to rectify the classification results of the top level classifier, it can be left at that. For example, in the present embodiment, even if the classification result of the topic utterance prediction by the secondary classifier was left at “true” and not subjected to correction, in a case in which the classification result of the topic prediction by the tertiary classifier was deleted, because it implies that the classification result of the secondary classifier was incorrect, the classification result of the secondary classifier is corrected from “true” to “false”. It suffices to go back to the binary classification at the second level, and it is not necessary to go back to the first level.
Further, corrected point record unit 14 may, in a case in which a classification result of a classifier of a particular level is corrected, also exclude, from the training data, classification results of classifiers of levels lower than said particular level in conformance with the classification result correction. For example, in the present embodiment, in a case in which the classification result of dialogue scene prediction by the primary classifier is corrected from “inquiry understanding” to “response” and in a case in which the classification result of the regard utterance prediction by the secondary classifier is predicted to be “true”, then “true” is excluded from the training data. Moreover, the corrected point record unit 14 checks for the existence of corrections from the higher levels and if there are no corrections, it then checks for existence of corrections at the lower levels. Thus, hypothetically, even if the user, after having corrected the topic prediction classification result of the tertiary classifier, went on to rectify the dialogue scene prediction classification result of the primary classifier, the topic prediction correction of the tertiary classifier would, in a case in which the dialogue scene prediction of the primary classifier is not “inquiry understanding”, be deleted from the training data because the corrected point record unit 14 checks from the corrections at the first level.
Moreover, in a case in which topic addition is concerned, the user can, by selecting via clicking and the like on separately displayed utterance data, establish an association with utterances corresponding to the topic. For example, in a case in which, in the interest of differentiation from other utterance data, a prescribed background color is to be applied to utterance data predicted, by the topic utterance prediction model, to be a topic utterance, a scenario in which the topic utterance prediction model prediction is erroneous may occur; this scenario causing non-application of the background color necessary for inducing the service person to recognize that the utterance data concerns a topic utterance. In this case, by clicking on the utterance data recognized as being a topic utterance, the prescribed background color will be applied. Further, if the prescribed background color has been applied on the utterance data on the basis of the operations of the service person, utterance types may be added in correspondence to the utterance data.
With respect to segment 4, in a case in which the user adds “topic”, as shown in
With respect to segment 5, in a case in which the user modifies the “topic”, as shown in
The learning scope determination unit 15 reflects the correction information generated by corrected point record unit 14 in the classification results table generated by the multi-class classifier 12. Then, the learning scope determination unit 15 determines the learning scope based of the classification results. The learning scope determination unit 15 may also include the first level for which the user has not performed correction within the learning scope. For example, by depressing a confirmation button provided in the learning form, even in a case in which there is no correction by the user, this may be included in the learning scope. The learning scope may be configured for each level by providing a confirmation button for the entirety of the dialogue, a confirmation button for each dialogue scene of the first level, or a confirmation button for confirming the subordinate levels, i.e. the second and third levels.
For example, the learning scope determination unit 15 determines the learning scope to be one or more consecutive input data including input data corresponding to the corrected classification results and having the same classification results of the first level (top level) classifier. That is, it is determined that the learning scope (the training data scope) is to be a consecutive range including corrected points and having the same classification results of the first level classifier, and within the learning scope, not only corrected information but also non-corrected information is set as a target of the training data. Even if there is a range in which the same classification results are consecutive in the first level, in a case in which points corrected by the user are not included and the abovementioned confirmation button is not provided, because it is not possible to determine whether the user has performed a confirmation with respect to the classification results of said range, they are not included in the learning scope. On the other hand, in a case in which the user has performed correction, because it can be considered that the user has performed confirmation for the range in which the same classification results of the first level classifier are consecutive, it is set as a target for the training data.
On the other hand, according to the example of
The training data generation unit 16, with respect to the learning scope determined by learning scope determination unit 15, generates training data from the respective classification items/segments/labels, by associating correction information with the multi-class classification results and updating. In a case in which the third level classification results are deleted, because the ground truth is unclear, the training data generation unit 16 excludes corresponding classification items from the training data.
Next, the learning data generation method pertaining to the learning data generation unit 1 will be explained.
The learning data generation unit 1, using the multi-class 12, classifies the input data groups (S101). Moreover, in relation to the abovementioned embodiment, though an explanation has been provided for a case having a hierarchy of three levels, cases involving more than three levels may also be conceived. That is, no limitation on the number of levels is set for the present invention. For example, in a case in which classification is performed at two levels, dialogue scene prediction would be performed at the first level, and the second level regard utterance prediction would only be performed in a case in which the dialogue scene prediction result is “inquiry understanding”. Further, in a case in which classification is performed at four levels, the result from the third level topic prediction would be subclassified at the fourth level. For example, in a case in which it is predicted that the topic is “auto insurance” at the third level, the fourth level would entail classification into any of “new contract”, “modification”, “cancellation”.
Next, the learning data generation device 1 generates, using the learning form generation unit 13, a learning form (S102), and causes the classification results to be displayed on the display 2 (S103).
When the classification results displayed on the display 2 are corrected by the user (S104—Yes), the learning data generation device 1 records the corrected point using the corrected point record unit 14 (S105). Then, the learning scope is determined using the learning scope determination unit 15 (S106), and training data is generated using the training data generation unit 16 (S107). In a case in which the classification results displayed on the display 2 are not corrected by the user (S104—No), step S105 is not performed and the processing of steps S106 and S107 are performed.
Moreover, a computer can be used to realize the functions of the abovementioned learning data generation device 1, and such a computer can be realized by causing a CPU of the computer to read out and execute a program, wherein the program describes procedures for realizing the respective functions of the learning data generation device 1 and is stored in a database of the computer.
Further, the program can be recorded on a computer readable medium. By using the computer readable medium, installation on a computer is possible. Here, the computer readable medium on which the program is recorded can be a non-transitory recording medium. Though the non-transitory recording medium is not particularly limited, it can be a recording medium such as a CD-ROM and/or a DVD-ROM, for example.
As explained above, according to the present invention, in a case in which a classification result from an nth-level is corrected, the correction can be automatically reflected in the classifier results of the classifiers in the levels above the Nth-level by following the dependencies of the classification/prediction of multiple levels. Thus, training data for all levels can be efficiently generated. Further, because it is possible to set not only points corrected by the user, but also points not corrected by the user, as training data that has been confirmed by the user, a large amount of training data can be prepared. Thus, it is possible to efficiently generate learning data for each of the classifiers.
Further, according to the present invention, by displaying a learning form having the classification results for multiple levels and a correction interface for rectifying the classification results the user can readily perform correction of the classification results, and operability can be improved.
Although the above embodiments have been described as typical examples, it will be evident to skilled person that many modifications and substitutions are possible within the spirit and scope of the present invention. Therefore, the present invention should not be construed as being limited by the above embodiments, and various changes and modifications and the like can be made without departing from the claims. For example, it is possible to combine a plurality of constituent blocks described in the configuration diagram of the embodiment into one, or to divide one constituent block.
Number | Date | Country | Kind |
---|---|---|---|
2018-152893 | Aug 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/031934 | 8/14/2019 | WO | 00 |