The present invention relates to a technique for evaluating dialogues in a group including two or more dialogue participants (referred to as group dialogues).
As a technique for evaluating group dialogues, there is a technique for estimating leadership and the degree of contribution from the number of utterances during a dialogue, the frequency of words included in spoken sentences, the number of times of nodding seen from camera images, and the like. There is also a technique for performing questionnaire evaluation such as how much contribution is directly made by the participant and whether the participant is satisfied, or a technique for making a third party evaluate the result of the dialogue. There is also a technique for evaluating system-generated sentences when evaluating dialogues by a dialogue system.
In the related art, when evaluating a dialogue between dialogue participants, a dialogue obtained by digitizing an actual dialogue state such as the number of utterances of the dialogue participants and features of words is often used. Also, it is possible to acquire or estimate the personality and the sense of values such as the leadership of the dialogue participant and use the same for the evaluation of the dialogue.
However, the important point of the evaluation is different between a case of dialogue participants having a dialogue for the first time and a case of a continuing dialogue over a long period of time.
For example, when only participants who meet for the first time are gathered and only one discussion takes place, it is conceivable that participants may avoid showing their personality or sense of values on the surface or may place importance on brightening the mood without trying to understand the personality or sense of values of the other parties. When there is a clear goal such as arriving at a correct answer through the dialogue, it is conceivable to evaluate the entire dialogue and one's own behavior by placing importance on whether or not the goal was reached.
On the other hand, when the same members continuously have discussions over a long period of time such as in meetings of a company, brightening the mood is not considered important, and the important point is sometimes whether more advanced opinions have been expressed or whether consensus has been reached. Also, even when there is a clear goal as in the above-described example, in some cases, it is sufficient to reach the conclusion after continuing the discussion, and in other cases, it is not necessary to arrive at the correct answer only in the course of the dialogue.
In this way, even if the dialogue is of the same task-oriented type, the important points of the dialogue evaluation greatly depend on the relationship between the participants. Therefore, by digitizing the relationship and using them for dialogue evaluation, it is possible to perform a more accurate evaluation. However, the related arts (for example, PTL 1, NPL 1, and NPL 2) do not take the relationships between dialogue participants into consideration.
The present invention has been made in view of the above points, and an object of the present invention is to provide a technique capable of evaluating a dialogue performed in a group while taking relationships between dialogue participants into consideration.
According to the disclosed technique, there is provided a dialogue evaluation device that evaluates a dialogue performed in a group of two or more participants, the dialogue evaluation device including:
According to the disclosed technique, it is possible to evaluate a dialogue performed in a group while taking relationships between dialogue participants into consideration.
An embodiment of the present invention (present embodiment) will be described below with reference to the drawings. The embodiment described below is merely an example, and embodiments to which the present invention is applied are not limited to the following embodiment.
First, the overview of the present embodiment will be described. In the present embodiment, a dialogue evaluation device 100, which will be described later, predicts a score obtained by digitizing an evaluation reflecting the degree of achievement, degree of satisfaction, degree of contribution, and the like of each dialogue among the dialogue participants in a group dialogue by two or more dialogue participants (which may be called members or participants).
The dialogue evaluation device 100 uses dialogue data in which a dialogue to be evaluated is recorded and dialogue experience data indicating past dialogue experiences between participants when performing evaluation. As the dialogue data, individuality data indicating the personalities and the sense of values of the participants constituting the dialogue group may be included.
The dialogue data is a record of dialogues in chronological order, and is, for example, voice data collected by a microphone, text data transcribed from each member's utterance, video data of the movement of each member, vital data such as a heartbeat of each member recorded using a device such as a smart watch, and the like. As described above, the dialogue data may include individuality data indicating the individuality of each participant participating in the dialogue, and may include, for example, the results of questionnaires on personality, data on individuality predicted from past data using existing technology, and data on attributes such as age, work history, and position.
The dialogue evaluation device 100 obtains the degree of activity of the dialogue from the dialogue data. The degree of activity is a value obtained by converting how lively the dialogue is, the uplifting feeling of the members, the individuality of the dialogue participants themselves, and the like into comparable numerical values. For example, as the degree of activity, in the case of voice data, the magnitude or change of voice of each participant can be used, in the case of text data, the number of utterances or the number of words uttered by each participant can be used, in the case of video data, the magnitude of gestures or nods can be used, and in the case of vital data, the speed or change of heartbeat can be used. Note that the degree of activity may be called a “feature value.”
The individuality data that can be included in the dialogue data is data expressed for each participant of the dialogue, and is data obtained by, for example, selecting an answer that suits oneself from options such as “I agree,” “I agree a little,” “I don't agree much,” or “I don't agree at all,” in response to the question “I like to talk about myself in front of others,” selecting words that match one's personality and sense of values from words such as “taciturn,” “talkative,” and “extrovert,” or evaluating each item on a nine-stage scale. Alternatively, data evaluated for each participant by others may be used as the individuality data, or a score of a result of predicting the individuality by using past dialogue data or the like may be used.
When the results of questionnaires related to personalities and attributes are used as the individuality data, the numerical value may be used as it is or the totaled value may be used as long as the answers to the questionnaires are made by stepwise numerical values. When the answer to the questionnaire is Yes/No, numerical values such as 1 for Yes and 0 for No is used. In the case of answering by a selection formula from items which are not continuous such as occupation and preference, it is possible to prepare feature values for the number of items, and convert them into vectors with 1 for the selected item and 0 for the others. In the case of free description, it is possible to use words included in the free description as feature values as they are, or classify the words into categories from the description content, prepare feature values for the number of categories, convert them into vectors with 1 for the selected item and 0 for the others, and use them as activation scores. Further, questions with different answer methods, such as questions with Yes/No answers, questions with answers on a seven-stage scale, and questions with free description may be mixed.
The dialogue experience data is obtained by digitizing the experience of the participants in the dialogue so far, and for example, the number of times participants had a dialogue with each other so far, the total time for which they have participated in dialogues with each other, and the like are used. Alternatively, a period in which the person is expected to have had a dialogue experience, such as a period in which they belonged to the same project at work or a period in which they were in the same class at school, may be substituted. One or a plurality of types of data used as the dialogue experience data may be used. For the dialogue experience data, when there are three or more participants, it does not matter how participants who have had the same dialogue experience are combined, such as using dialogues that all participants experienced together or combining participants to form pairs and using a dialogue experienced together by two people in each pair. For example, for all participants, dialogue data experienced together for all combinations of different numbers of participants may be extracted and used as dialogue experience data.
The dialogue evaluation device 100 calculates a dialogue evaluation score by combining the activity score obtained from the dialogue data and the dialogue experience score obtained from the dialogue experience data. A dialogue evaluation score is given to each participant here, and is a score that expresses the degree of achievement, degree of satisfaction, or degree of contribution of the participant in the dialogue. The dialogue evaluation score may be each score (vector) of the degree of achievement, degree of satisfaction, and degree of contribution, may be a score of any one of the degree of achievement, degree of satisfaction, and degree of contribution, or may be scores of any one or more of the degree of achievement, degree of satisfaction, and degree of contribution. By calculating statistics, such as the average of the scores of each participant, it is also possible to represent the scores of the entire group.
Hereinafter, the configuration and operation of the dialogue evaluation device 100 will be described in detail as an embodiment.
The dialogue log DB 108 may be provided outside the dialogue evaluation device 100 and connected to the dialogue evaluation device 100 via a network. Also, the feature value weight calculation unit 104 may include the dialogue experience data extraction unit 102 and the dialogue experience score calculation unit 103.
Next, an operation example of the dialogue evaluation device 100 will be described. In the operation example described below, the number of participants in the dialogue to be evaluated is k, and the k participants are defined as h1, . . . , hk. The dialogue evaluation scores of the participants are defined as s1, . . . , sk, and are calculated by the dialogue evaluation device 100.
Dialogue data and a participant list (h1, . . . , hk) are input from the input unit 106. In the present embodiment, as dialogue data used for input, transcripts of utterances made by each participant and personality traits of each participant acquired from a questionnaire survey are used. As described above, the dialogue data is not limited thereto, and data indicating the state of dialogue such as spoken voice, moving image, and biological sensor information may be used, or data indicating the attributes such as age, work history, and position and the individuality such as questionnaire results for asking the sense of values, experience, preference, etc. may be used. Also, the number of types of data is not limited.
The procedure of the flowchart of
In S101, the activity score calculation unit 101 receives dialogue data as an input, extracts a feature value from the dialogue data, and outputs the feature value. Here, the total utterance time (u1, . . . , uk) of each participant, the number of back channels (b1, . . . , bk) of each participant, the number of important word utterances (w1, . . . , wk) of each participant, and the personality trait score (p1, . . . , pk) are extracted as feature values.
Here, the back channel refers to a response or the like, and for example, it can be extracted by a definition such that utterances that do not include verbs, nouns, adjectives, numerals, etc. are used as back channels. The number of important word utterances refers to the number of utterances of a word included in a word list determined to be important in advance. As the personality trait score, the results of questionnaires are totalized and numerical values obtained by using a classification method such as a BigFive personality trait or an existing scale is used.
Each feature value of each participant may be either a scalar value or a vector value, and for example, as the number of important word utterances wi=(w′i1, . . . , w′ic), the number of appearances of each of c words may be used as a feature value, or as the personality trait score pi=(p′i1, . . . , p′id), d types of personality trait factors or attributes may be used as feature values. Further, the feature value is not necessarily required for each of all the participants, and for example, only the feature value of the person who is the object of the dialogue evaluation may be extracted and used, or a value obtained by totaling the average, maximum value, minimum value, and dispersion of all the participants may be used.
The feature value is not limited to the above-described one, and may be any numerical value that can be extracted and calculated from the dialogue data, such as the average volume and maximum volume of the microphone, the number of times of operations such as nodding, and the number of times of exceeding the maximum heart rate and the average heart rate. Further, attributes such as age and position may be classified from data indicating individuality, and numerals representing groups to which participants belong may be used as feature values. In the present embodiment, as shown below, the feature values a1, . . . , al obtained by connecting the feature values extracted as described above are used as activity scores.
In S102, the dialogue experience data extraction unit 102 extracts dialogue experience data from the dialogue log DB 108. The dialogue experience data refers to logs of past dialogues that the dialogue participants have participated so far.
The dialogue experience data extraction unit 102 extracts dialogue experience data in which the dialogue participant to be evaluated has participated from the dialogue log DB 108. At this time, only the data in which all the dialogue participants participate together may be extracted, or a plurality of combinations such as data in which some participants participate together may be extracted. In the present embodiment, a log of the dialogue in which all the dialogue participants participate together and a log of the dialogue in which the dialogue participants are paired together and the two-person pairs participate together are extracted as dialogue experience data.
In S103, the dialogue experience score calculation unit 103 receives the dialogue experience data extracted in S102 as an input, and calculates and outputs a dialogue experience score. Specifically, the dialogue experience score calculation unit 103 digitizes and extracts dialogue experiences performed in the past by participants participating in the dialogue to be evaluated from the dialogue experience data.
More specifically, in the present embodiment, the number of past dialogues and the value of the frequency of dialogues in the most recent month are used as the dialogue experience score. The dialogue experience score calculation unit 103 calculates the number of past dialogues (sall, s12, . . . , s(k−1)k) and the frequency of dialogues in the most recent month (fall, f12, . . . , f(k−1)k) from the dialogue logs of all the participants and the two-person pairs output by the dialogue experience data extraction unit 102. Note that sall is the number of dialogues for all participants, and s12 is the number of dialogues for the pair of participant 1 and participant 2. Also, fall is the frequency of dialogues in the most recent month for all participants, and f12 is the frequency of dialogues in the most recent month for the pair of participant 1 and participant 2. Any unit may be used for the frequency of dialogues in the most recent month, and for example, the number of dialogues in the most recent month may be used.
As described below, in the present embodiment, the dialogue experience score calculation unit 103 outputs the concatenation of the number of past dialogues and the frequency of dialogues in the most recent month as a dialogue experience score E.
In S104, the feature value weight calculation unit 104 receives the dialogue experience score as an input, and calculates and outputs the weight of each feature value in the activity score. The weight of the feature value may be a continuous value corresponding to the continuous value of the dialogue experience score, or the weight of the feature value may be determined for each class by performing classification according to the size and distribution of the dialogue experience score.
As a method of generating the weight, for example, there is a method of generating and using a model that outputs the weight of the feature value when the dialogue experience score E is input by using data annotated by participants with dialogue evaluation scores in past dialogue logs. Such a model may be, for example, a machine learning model such as a neural network.
For example, assuming that the above model is expressed by a function f and the weight of the feature value is w, the feature value weight calculation unit 104 can calculate the weight as F(E)=w. When generating the model, an activity score A may be calculated by using the annotated dialogue log, and the parameters of the model f may be adjusted so that “(f(E))A” becomes a dialogue evaluation score s of the correct answer.
By using the dialogue evaluation score s of the specific participant as the dialogue evaluation score s of the correct answer, a weight w capable of calculating the dialogue evaluation score s of the specific participant can be obtained. Further, by using the dialogue evaluation score s (for example, a vector having a degree of satisfaction and a degree of contribution as elements) of a specific factor as the dialogue evaluation score s of the correct answer, a weight w capable of calculating the dialogue evaluation score s of the specific type can be obtained.
It is assumed that the weight of the feature value includes a value that indicates which feature value should be considered more in the dialogue evaluation among the feature values included in the activity score, and for example, w=(w1, . . . , wl) can be expressed. Here, an arbitrary wi corresponds to ai included in the activity score A, and is a value that can determine how important each feature value ai is to be given by the weight wi. Although w is a vector in the above example, w may be a matrix.
As described above, since the dialogue experience score E is used when calculating the weight w, the weight w in which dialogue experience (relationship between dialogue participants) is taken into consideration can be obtained.
In S105, the dialogue evaluation calculation unit 105 calculates a dialogue evaluation score si of a certain dialogue participant hi by using the activity score A and the weight w of each feature value as follows. The calculated result is output from the output unit 107.
s
i
=wA
In the above example, since w and A are vectors, the dialogue evaluation score is a scalar value. The feature value in the activity score A in the above formula may be only the feature value related to the participant hi of the score calculation object, or may include a feature value other than the feature value related to the participant hi of the score calculation object.
The calculation of the dialogue evaluation score as the scalar value as described above is an example, and the dialogue evaluation score may be calculated as a vector value. For example, when the dialogue evaluation score is calculated as a vector value by using a matrix as w, the dialogue evaluation may be determined by the size of the vector, or elements of each vector may be factors of dialogue evaluation such as “degree of agreement,” “degree of activity” and “degree of satisfaction.”
By executing the calculation of the above formula si=wA for each participant, the dialogue evaluation score of each participant can be obtained. However, obtaining the dialogue evaluation score of each participant is an example.
The activity score related to the dialogue of all participants is included as the feature value, and a weight based on the dialogue experience of all participants is calculated in the feature value weight calculation unit 104, so that the calculation result of the wA can be used as the evaluation score of the entire dialogue regardless of the participants.
Further, after obtaining the dialogue evaluation score of each participant, the statistics (for example, average, total) may be used as the evaluation score of the entire dialogue in the group.
The dialogue evaluation device 100 can be implemented by, for example, causing a computer to execute a program. This computer may be a physical computer, or may be a virtual machine on a cloud.
In other words, the dialogue evaluation device 100 can be implemented by executing a program corresponding to the processing performed by the dialogue evaluation device 100 using hardware resources such as a CPU and a memory that are built into the computer. The program can be recorded in a computer-readable recording medium (such as a portable memory) to be saved or distributed. Furthermore, the program can also be provided through a network such as the Internet or an electronic mail.
The program for implementing the processing in the computer is provided by, for example, a recording medium 1001 such as a CD-ROM or a memory card. When the recording medium 1001 in which the program is stored is set in the drive device 1000, the program is installed from the recording medium 1001 to the auxiliary storage device 1002 through the drive device 1000. However, the program need not necessarily be installed from the recording medium 1001, and may be downloaded from another computer via a network. The auxiliary storage device 1002 stores the installed program and stores necessary files, data, and the like.
The memory device 1003 reads and stores the program from the auxiliary storage device 1002 when there is an instruction to start the program. The CPU 1004 implements functions related to the dialogue evaluation device 100 according to the program stored in the memory device 1003. The interface device 1005 is used as an interface for connection to a network or the like. The display device 1006 displays a graphical user interface (GUI) or the like according to a program. The input device 1007 includes a keyboard and mouse, buttons, a touch panel, or the like, and is used to input various operation instructions. The output device 1008 outputs a calculation result.
Evaluation of group dialogue is affected by the utterance and behavior of participants in actual dialogues and the relationships between participants. Since the relationship between participants changes, such as the more experience they have in dialogue together, the deeper the relationship becomes, in the present embodiment, it is possible to accurately determine the evaluation of the dialogue by expressing this as the dialogue experience score and combining the dialogue experience scores.
In other words, in the present embodiment, when evaluating the group dialogue, the dialogue evaluation score is obtained on the basis of the activity score calculated from dialogue data indicating the state of dialogue and the individuality of the participants and the dialogue experience score calculated from dialogue experience data indicating the past dialogue experience between the participants of the dialogue. Thus, the evaluation representing the degree of contribution, degree of satisfaction, degree of achievement, or the like in consideration of the relationship between the participants can be predicted.
Further, from the predicted evaluation, it is possible to examine the organization of the group in which the degree of satisfaction of the participant is higher, or to intervene in the way of proceeding the dialogue. For example, in the case of a participant who places importance on the number of utterances in a dialogue between participants who meet for the first time, there is a likelihood that a dialogue with a high degree of satisfaction can be achieved by adding an intervention that increases the number of utterances.
Further, when a person makes a dialogue with the other party having no dialogue experience so much, the dialogue evaluation of the other party is predicted later, and a transaction necessary for improving the dialogue evaluation of the participant can be intentionally taken in the next dialogue or reflected in materials.
Alternatively, by virtually simulating the dialogue and observing the change in the dialogue evaluation of the dialogue participant, it is also possible to formulate a strategy for what kind of dialogue should be performed and what kind of intervention should be made.
The present specification discloses at least a dialogue evaluation device, a dialogue evaluation method, and a program according to each of the following Items.
A dialogue evaluation device that evaluates a dialogue performed in a group of two or more participants, the dialogue evaluation device including:
The dialogue evaluation device according to Item 1, in which the dialogue data includes individuality data including personality traits of participants constituting the group, and the one or more feature values include a feature value related to an utterance and a feature value related to a personality trait.
The dialogue evaluation device according to Item 1 or 2, in which the feature value weight calculation unit receives the dialogue experience score as an input and calculates the weight using a model that outputs the weight.
The dialogue evaluation device according to any one of Items 1 to 3, in which the dialogue evaluation score calculation unit calculates a vector having a plurality of factors of dialogue evaluation as a plurality of elements as the dialogue evaluation score.
The dialogue evaluation device according to any one of Items 1 to 4, in which the dialogue evaluation score calculation unit calculates a dialogue evaluation score of each participant in the group, and calculates a statistic of dialogue evaluation scores of all participants as the dialogue evaluation score of the entire group.
A dialogue evaluation method executed by a dialogue evaluation device that evaluates a dialogue performed in a group of two or more participants, the dialogue evaluation method including:
A program for causing a computer to function as each unit in the dialogue evaluation device according to any one of Items 1 to 5.
Although the present embodiment has been described above, the present invention is not limited to such a specific embodiment, and various modifications and changes can be made within the scope of the gist of the present invention described in the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/028989 | 8/4/2021 | WO |