This disclosure relates to a voice evaluation system, a voice evaluation method, and a computer program that evaluate voice.
A known system of this type is a system that obtains uttered voice and estimates a speaker’s feeling. For example, Patent Literature 1 discloses a technique/technology of quantitatively analyzing a feeling of anger and a feeling of embarrassment from a customer’s voice who calls a call center. Patent Literature 2 discloses a technique/technology of classifying the feelings into “laugh,” “anger,” “sadness,” and the like, by using a parameter of a voice feature amount extracted from input voice data. Patent Literature 3 discloses a technique/technology of outputting a quantitative index obtained by converting the feelings such as joy, anger, satisfaction, stress, and reliability, into numerals by using interactive voice data as an input.
In each of the Patent Literatures described above, mainly one-to-one conversation is intended to be a target, and evaluation about voice uttered by a group is not considered.
It is an example object of this disclosure to provide a voice evaluation system, a voice evaluation method, and a computer program for solving the problems described above.
A voice evaluation system according to an example aspect of this disclosure includes: an acquisition unit that obtains voice uttered by a group of a plurality of persons; a detection unit that detects an element corresponding to a feeling from the obtained voice; and an evaluation unit that evaluates the obtained voice on the basis of the detected element.
A voice evaluation method according to an example aspect of this disclosure includes: obtaining voice uttered by a group of a plurality of persons; detecting an element corresponding to a feeling from the obtained voice; and evaluating the obtained voice on the basis of the detected element.
A computer program according to an example aspect of this disclosure operates a computer: to obtain voice uttered by a group of a plurality of persons; to detect an element corresponding to a feeling from the obtained voice; and to evaluate the obtained voice on the basis of the detected element.
Hereinafter, a voice evaluation system, a voice evaluation method, and a computer program according to example embodiments will be described with reference to the drawings.
A voice evaluation system according to a first example embodiment will be described with reference to
First, with reference to
In
The voice acquisition unit 110 is configured to obtain voice uttered by the group (hereinafter referred to as “collective voice” as appropriate). The voice acquisition unit 110 includes, for example, a microphone located where a group is formed. The voice acquisition unit 110 may be configured to perform various processes for the obtained voice (e.g., a noise cancellation process, a process of extracting a particular section, etc.). The collective voice obtained by the voice acquisition unit 110 is configured to be outputted to the feeling element detection unit 120.
The feeling element detection unit 120 is configured to detect a feeling element from the collective voice obtained by the voice acquisition unit 110. The “feeling element” herein is an element indicating a feeling of the group included in the voice, and an example of the feeling element includes, for example, an element corresponding to a feeling of “joy,” an element corresponding to a feeling of “anger,” and an element corresponding to a feeling of “sadness” or the like. The feeling element detection unit 120 is configured to detect at least one type of feeling element set in advance. The existing technology can be adopted for a method of detecting the feeling element from voice as appropriate. For example, it is possible to use a method that uses frequency analysis of the voice, a method that uses deep learning, or the like. Information about the feeling element detected by the feeling element detection unit 120 is configured to be outputted to the voice evaluation unit 130.
The voice evaluation unit 130 is configured to evaluate the collective voice on the basis of the feeling element detected by the feeling element detection unit 120. Specifically, the voice evaluation unit 130 is configured to evaluate a degree of the feeling of the group from the feeling element detected from the collective voice. The voice evaluation unit 130 evaluates the collective voice, for example, by converting the feeling element into numerals. For example, when the element corresponding to the feeling of “joy” is detected, the voice evaluation unit 130 calculates a score corresponding to the feeling of “joy” of the group and makes an evaluation. Specifically, when the collective voice mainly includes the element corresponding to the feeling of “joy”, the score corresponding to the feeling of “joy” may be calculated as a high value. On the other hand, when the collective voice does not mainly include the element corresponding to the feeling of “joy”, the score corresponding to the feeling of “joy” may be calculated as a low value.
Next, with reference to
As illustrated in
The processor 11 reads a computer program. For example, the processor 11 is configured to read a computer program stored in at least one of the RAM 12, the ROM 13 and the storage apparatus 14. Alternatively, the processor 11 may read a computer program stored by a computer readable recording medium by using a not-illustrated recording medium reading apparatus. The processor 11 may obtain (i.e., read) a computer program from a not-illustrated apparatus that is located outside the voice evaluation system 10 through a network interface. The processor 11 controls the RAM 12, the storage apparatus 14, the input apparatus 15, and the output apparatus 16 by executing the read computer program. Especially in the first example embodiment, when the computer program read by the processor 11 is executed, a functional block for evaluating the obtained voice is implemented in the processor 11 (see
The RAM 12 temporarily stores the computer program to be executed by the processor 11. The RAM 12 temporarily stores the data that is temporarily used by the processor 11 when the processor 11 executes the computer program. The RAM 12 may be, for example, a D-RAM (Dynamic RAM).
The ROM 13 stores the computer program to be executed by the processor 11. The ROM 13 may otherwise store fixed data. The ROM 13 may be, for example, a P-ROM (Programmable ROM).
The storage apparatus 14 stores the data that is stored for a long term by the voice evaluation system 10. The storage apparatus 14 may operate as a temporary storage apparatus of the processor 11. The storage apparatus 14 may include, for example, at least one of a hard disk apparatus, a magneto-optical disk apparatus, an SSD (Solid State Drive), and a disk array apparatus.
The input apparatus 15 is an apparatus that receives an input instruction from a user of the voice evaluation system 10. The input apparatus 15 may include, for example, at least one of a keyboard, a mouse, and a touch panel.
The output apparatus 16 is an apparatus that outputs information about the voice evaluation system 10 to the outside. For example, the output apparatus 16 may be a display apparatus (e.g., a display) that is configured to display the information about the voice evaluation system 10.
Next, with reference to
As illustrated in
Subsequently, the feeling element detection unit 120 detects the feeling element from the collective voice obtained by the voice acquisition unit 110 (step S12). Then, the voice evaluation unit 130 evaluates the collective voice on the basis of the feeling element detected by the feeling element detection unit 120 (step S13). A result of the evaluation by the voice evaluation unit 130 may be outputted, for example, to a not-illustrated display apparatus.
Next, an example of a technical effect obtained by the voice evaluation system 10 according to the first example embodiment will be described.
For example, in venues of various events such as the stage and sports watching, the voice uttered by the group (e.g., a cheer, a scream, etc.) varies depending on excitement. Therefore, if such voice can be properly evaluated, to what extent an event is accepted by visitors can be supposedly determined.
As described in
Since the voice evaluation system 10 according to the first example embodiment evaluates the collective voice uttered by the group, it is possible to properly evaluate the feeling as a whole group, for example, even in a situation where it is difficult to obtain the voice from each person. Moreover, since an evaluation can be made only by the voice without using a face image or the like, it is possible to properly evaluate the feeling of the group even in poor illumination.
A voice evaluation system according to a second example embodiment will be described with reference to
First, with reference to
As illustrated in
The utterance section recording unit 111 records the voice obtained in a section in which the group utters the voice. The voice recorded by the utterance section recording unit 111 is configured to be outputted to the feeling element detection unit 120. On the other hand, the silence section recording unit 112 records a section in which the group does not utter the voice (e.g., a section in which a volume is less than or equal to a predetermined threshold). The section recorded by the silence section recording unit 112 is not outputted to the feeling element detection unit 120, but is directly outputted to an evaluation data generation unit 140 (in other words, it is out of an evaluation target). In this way, it is possible to reduce a processing load of the system by limiting the section for voice evaluation.
The first element detection unit 121, the second element detection unit 122, the third element detection unit 123, and the fourth element detection unit 124 are configured to detect respective different feeling elements. For example, the first element detection unit 121 may detect the feeling element corresponding to the feeling of “joy”. The second element detection unit 122 may detect the feeling element corresponding to the feeling of “anger”. The third element detection unit 123 may detect the feeling element corresponding to the feeling of “sadness”. The fourth element detection unit 124 may detect a feeling element corresponding to a feeling of “pleasure”.
A hardware configuration of the voice evaluation system 10 according to the second example embodiment may be the same as the hardware configuration of the voice evaluation system 10 according to the first example embodiment (see
Next, with reference to
As illustrated in
Subsequently, the feeling element detection unit 120 detects the feeling elements from the collective voice obtained by the voice acquisition unit 110 (step S23). Specifically, the first element detection unit 121, the second element detection unit 122, the third element detection unit 123, and the fourth element detection unit 124 detect the respective feeling elements corresponding to different feelings.
The respective feeling elements detected by the first element detection unit 121, the second element detection unit 122, the third element detection unit 123, and the fourth element detection unit 124 are inputted to the voice evaluation unit 130. Then, the voice evaluation unit 130 evaluates the collective voice on the basis of the feeling elements detected by the feeling element detection unit 120 (step S24a).
Next, an example of a technical effect obtained by the voice evaluation system 10 according to the second example embodiment will be described.
As described in
A voice evaluation system according to a third example embodiment will be described with reference to
First, with reference to
As illustrated in
The first evaluation unit 131 is configured to evaluate the voice on the basis of the feeling element detected by the first element detection unit 121. The second evaluation unit 132 is configured to evaluate the voice on the basis of the feeling element detected by the second element detection unit 122. The third evaluation unit 133 is configured to evaluate the voice on the basis of the feeling element detected by the third element detection unit 123. The fourth evaluation unit 134 is configured to evaluate the voice on the basis of the feeling element detected by the fourth element detection unit 124.
A hardware configuration of the voice evaluation system 10 according to the third example embodiment may be the same as the hardware configuration of the voice evaluation system 10 according to the first example embodiment (see
Next, with reference to
As illustrated in
Subsequently, the feeling element detection unit 120 detects the feeling elements, from the collective voice obtained by the voice acquisition unit 110 (the step S23). Specifically, the first element detection unit 121, the second element detection unit 122, the third element detection unit 123, and the fourth element detection unit 124 detect the respective feeling elements corresponding to different feelings. The respective feeling elements detected by the first element detection unit 121, the second element detection unit 122, the third element detection unit 123, and the fourth element detection unit 124 are inputted to the voice evaluation unit 130.
Subsequently, the voice evaluation unit 130 evaluates the collective voice on the basis of the feeling elements detected by the feeling element detection unit 120 (step S24). Specifically, the first evaluation unit 131, the second evaluation unit 132, the third evaluation unit 133, and the fourth evaluation unit 134 separately make evaluations on the basis of the feeling elements detected by the first element detection unit 121, the second element detection unit 122, the third element detection unit 123, and the fourth element detection unit 124, respectively.
Next, an example of a technical effect obtained by the voice evaluation system 10 according to the third example embodiment will be described.
As described in
A voice evaluation system according to a fourth example embodiment will be described with reference to
First, with reference to
As illustrated in
The evaluation data generation unit 140 is configured to generate evaluation data by integrating evaluation results of the first evaluation unit 131, the second evaluation unit 132, the third evaluation unit 133, and the fourth evaluation unit 134 with information about the section stored in the silence section recording unit 112. The evaluation data are generated as data for the user of the voice evaluation system 10 to properly understand the evaluation results. A specific example of the evaluation data will be described in detail later in a fifth example embodiment.
A hardware configuration of the voice evaluation system 10 according to the fourth example embodiment may be the same as the hardware configuration of the voice evaluation system 10 according to the first example embodiment (see
Next, with reference to
As illustrated in
Subsequently, the feeling element detection unit 120 detects the feeling elements from the collective voice obtained by the voice acquisition unit 110 (the step S23). Specifically, the first element detection unit 121, the second element detection unit 122, the third element detection unit 123, and the fourth element detection unit 124 detect the respective feeling elements corresponding to different feelings. Then, the voice evaluation unit 130 evaluates the collective voice on the basis of the feeling elements detected by the feeling element detection unit 120 (the step S24). Specifically, the first evaluation unit 131, the second evaluation unit 132, the third evaluation unit 133, and the fourth evaluation unit 134 evaluate the collective voice by using the respective different feeling elements.
Subsequently, the evaluation data generation unit 140 generates the evaluation data from the evaluation result of the collective voice (step S25). The evaluation data generated by the evaluation data generation unit 140 may be outputted, for example, to a not-illustrated display apparatus or the like.
Next, an example of a technical effect obtained by the voice evaluation system 10 according to the fourth example embodiment will be described.
As described in
Next, the voice evaluation stem 10 according to a fifth example embodiment will be described with reference to
With reference to
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
It is also possible to combine and use the respective display examples described above as appropriate. Furthermore, the display examples of the evaluation data described above are merely examples, and the evaluation data may be displayed in another display aspect.
Next, an example of a technical effect obtained by the voice evaluation system 10 according to the fifth example embodiment will be described.
As described in
A voice evaluation system according to a sixth example embodiment will be described with reference to
First, with reference to
As illustrated in
The scream element detection unit 125 is configured to detect a feeling element corresponding to a scream (hereinafter referred to as a “scream element” as appropriate) from the voice obtained by the voice acquisition unit 110. Here, the “scream” is a scream uttered from the group in occurrence of abnormality in a surrounding environment of the group (e.g., in natural disasters such as earthquakes), and is clearly differentiated, for example, from a scream similar to a shout of joy or a cheer. The differentiation between the scream in occurrence of abnormality and another scream can be realized, for example, by machine learning that uses a neural network. Information about the scream element detected by the scream element detection unit 125 is configured to be outputted to the abnormality determination unit 135.
The abnormality determination unit 135 is configured to determine whether or not abnormality has occurred in the surrounding environment of the group, on the basis of the scream element detected by the scream element detection unit 125. The abnormality determination unit 135 determines whether or not abnormality has occurred on the basis of the extent of the feeling corresponding to the scream obtained as an evaluation result using the scream element. For example, the abnormality determination unit 135 calculates a score of the feeling corresponding to the scream from the scream element, and when the score exceeds a predetermined threshold, the abnormality determination unit 135 may determine that abnormality has occurred, and when the score does not exceed the predetermined threshold, the abnormality determination unit 135 may determine that abnormality has not occurred.
A hardware configuration of the voice evaluation system 10 according to the sixth example embodiment may be the same as the hardware configuration of the voice evaluation system 10 according to the first example embodiment (see
Next, with reference to
As illustrated in
Subsequently, the feeling element detection unit 120 detects the feeling elements from the collective voice obtained by the voice acquisition unit 110 (the step S23). Specifically, the first element detection unit 121, the second element detection unit 122, the third element detection unit 123, and the fourth element detection unit 124 detect the respective feeling elements corresponding to different feelings. In addition, especially in the sixth example embodiment, the scream element detection unit 125 detects the scream element (step S31).
Subsequently, the voice evaluation unit 130 evaluates the collective voice on the basis of the feeling elements detected by the feeling element detection unit 120 (the step S24). Specifically, the first evaluation unit 131, the second evaluation unit 132, the third evaluation unit 133, and the fourth evaluation unit 134 evaluate the collective voice by using the respective different feeling elements. Furthermore, especially in the sixth example embodiment, the abnormality determination unit 135 determines whether or not abnormality has occurred in the surrounding environment of the group on the basis of the scream element detected by the scream element detection unit 125 (step S32)
Subsequently, the evaluation data generation unit 140 generates the evaluation data from the evaluation result of the collective voice (the step S25). Here, in particular, when it is determined in the abnormality determination unit 135 that abnormality has occurred, the evaluation data generation unit 140 generates the evaluation data including information about the abnormality (e.g., abnormality occurrence timing, etc.). Alternatively, the evaluation data generation unit 140 may generate abnormal notification data for notifying the occurrence of abnormality, separately from the normal evaluation data. In this case, the abnormality notification data may include, for example, data for controlling an operation of an alarm of an event venue.
Next, an example of a technical effect obtained by the voice evaluation system 10 according to the sixth example embodiment will be described.
As described in
A voice evaluation system according to a seventh example embodiment will be described with reference to
An overall configuration of the voice evaluation system 10 according to the seventh example embodiment may be the same as the overall configurations of the voice evaluation system 10 according to the first to sixth example embodiments (see
A hardware configuration of the voice evaluation system 10 according to the seventh example embodiment may be the same as the hardware configuration of the voice evaluation system 10 according to the first example embodiment (see
Next, with reference to
As illustrated in
The voices uttered by respective groups in the area A, the area B, and the area C can be obtained as different voices. Specifically, the voice uttered by the group in the area A may be obtained by a microphone 200a. The voice uttered by the group in the area B may be obtained by a microphone 200b. The voice uttered by the group in the area C may be obtained by a microphone 200c. Each of the microphones 200a to 200c is configured as a part of the voice acquisition unit 110, and each voice in respective one of the areas A to C is obtained by the voice acquisition unit 110.
In operation of the voice evaluation system 10 according to the seventh example embodiment, the same steps as those in the voice evaluation system 10 according to the first to sixth example embodiments (see
Next, an example of a technical effect obtained by the voice evaluation system 10 according to the seventh example embodiment will be described.
As described in
The example embodiments described above may be further described as, but not limited to, the following Supplementary Notes.
A voice evaluation system described in Supplementary Note 1 is a voice evaluation system including: an acquisition unit that obtains voice uttered by a group of a plurality of persons; a detection unit that detects an element corresponding to a feeling from the obtained voice; and an evaluation unit that evaluates the obtained voice on the basis of the detected element.
A voice evaluation system described in Supplementary Note 2 is the voice evaluation system described in Supplementary Note 1, wherein the detection unit detects elements corresponding to a plurality of types of feelings from the obtained voice.
A voice evaluation system described in Supplementary Note 3 is the voice evaluation system described in Supplementary Note 2, wherein the evaluation unit evaluates the obtained voice for each feeling, on the basis of the elements corresponding to the plurality of types of feelings.
A voice evaluation system described in Supplementary Note 4 is the voice evaluation system described in any one of Supplementary Notes 1 to 3, wherein the evaluation unit generates evaluation data indicating an evaluation result of the obtained voice.
A voice evaluation system described in Supplementary Note 5 is the voice evaluation system described in Supplementary Note 4, wherein the evaluation unit generates the evaluation data as time series data.
A voice evaluation system described in Supplementary Note 6 is the voice evaluation system described in Supplementary Note 4 or 5, wherein the evaluation unit generates the evaluation data by graphically showing the evaluation result.
A voice evaluation system described in Supplementary Note 7 is the voice evaluation system described in any one of Supplementary Notes 1 to 6, wherein the evaluation unit detects occurrence of abnormality in a surrounding environment of the group, from the evaluation result of the obtained voice.
A voice evaluation system described in Supplementary Note 8 is the voice evaluation system described in any one of Supplementary Notes 1 to 7, wherein the acquisition unit obtains the voice uttered by the group by dividing the group into a plurality of area, and the evaluation unit evaluates the obtained voice in each of the areas.
A voice evaluation method described in Supplementary Note 9 is A voice evaluation method including: obtaining voice uttered by a group of a plurality of persons; detecting an element corresponding to a feeling from the obtained voice; and evaluating the obtained voice on the basis of the detected element.
A computer program described in Supplementary Note 10 is a computer program that operates a computer: to obtain voice uttered by a group of a plurality of persons; to detect an element corresponding to a feeling from the obtained voice; and to evaluate the obtained voice on the basis of the detected element.
This disclosure is not limited to the examples described above and is allowed to be changed, if desired, without departing from the essence or spirit of the invention which can be read from the claims and the entire specification. A voice evaluation system, a voice evaluation method, and a computer program with such modifications are also intended to be within the technical scope of this disclosure.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/012381 | 3/19/2020 | WO |