This application is based on and claims priority under 35 USC 119 from Japanese Patent Application No. 2012-011218 filed Jan. 23, 2012.
The present invention relates to a voice analyzer, a voice analysis system, and a non-transitory computer readable medium storing a program.
According to an aspect of the invention, there is provided a voice analyzer including: a plate-shaped body; a plurality of first voice acquisition units that are placed on both surfaces of the plate-shaped body and that acquire a voice of a speaker; a sound pressure comparison unit that compares sound pressure of a voice acquired by the first voice acquisition unit placed on one surface of the plate-shaped body with sound pressure of a voice acquired by the first voice acquisition unit placed on the other surface and determines a larger sound pressure; and a voice signal selection unit that selects information regarding a voice signal which is associated with the larger sound pressure and is determined by the sound pressure comparison unit.
Exemplary embodiments of the present invention will be described in detail based on the following figures, wherein:
Hereinafter, an exemplary embodiment of the invention will be described in detail with reference to the accompanying drawings.
Example of the System Configuration
As shown in
The terminal apparatus 10 includes plural microphones 11 (a first microphone 11a, a second microphone 11b, a third microphone 11c, and a fourth microphone 11d) and plural amplifiers 13 (a first amplifier 13a, a second amplifier 13b, a third amplifier 13c, and a fourth amplifier 13d) as examples of plural voice acquisition units that acquire the voice of a speaker (Hereinafter, described as microphones 11a, 11b, 11c, and 11d when the first microphone 11a, the second microphone 11b, the third microphone 11c, and the fourth microphone 11d are not distinguished). In addition, the terminal apparatus 10 includes a voice analysis unit 15 that analyzes the acquired voice and a data transmission unit 16 that transmits the analysis result to the host apparatus 20, and further includes a power supply unit 17.
In the present exemplary embodiment, the first and second microphones 11a and 11b are placed so as to be separated from each other by a distance set in advance in the horizontal direction. Here, the first and second microphones 11a and 11b are placed at the positions near the mouth of a wearer so as to be located side by side in the horizontal direction. The distance between the first and second microphones 11a and 11b is 10 cm to 20 cm, for example.
In addition, the third and fourth microphones 11c and 11d are placed on both sides of the plate-shaped body 30 which will be described later. In addition, the third and fourth microphones 11c and 11d are placed at the positions far from the mouth (speaking portion) of the wearer, compared with the first and second microphones 11a and 11b. Here, the third and fourth microphones 11c and 11d are placed about 35 cm below the first and second microphones 11a and 11b, for example. That is, in the present exemplary embodiment, two microphones whose distances from the mouth of the wearer are different and two microphones separated from each other in the horizontal direction may be selected as microphones placed in the terminal apparatus 10. Here, a pair of the first microphone 11a and the third microphone 11c (or the fourth microphone 11d) and a pair of the second microphone 11b and the third microphone 11c (or the fourth microphone 11d) may be selected as the former case. In addition, as the latter case, a pair of the first microphone 11a and the second microphone 11b may be selected.
Various types of known microphones, such as dynamic type microphones and capacitor type microphones, may be used as the microphones 11a, 11b, 11c, and 11d in the present exemplary embodiment. In particular, it is preferable to use a non-directional MEMS (Micro Electra Mechanical Systems) type microphone.
The first to fourth amplifiers 13a to 13d amplify electrical signals that the first to fourth microphones 11a to 11d output, according to the acquired voice. Known operational amplifiers or the like may be used as the first to fourth amplifiers 13a to 13d in the present exemplary embodiment.
The voice analysis unit 15 analyzes the electrical signals output from the first to fourth amplifiers 13a to 13d. In addition, the voice analysis unit 15 determines the front and back of the body 30 and also identifies whether the speaker is a wearer or others and outputs a face-to-face angle, which is an angle at which the wearer and the speaker face each other, when the speaker is identified to be others. This will be described in detail later.
The data transmission unit 16 transmits the acquired data including the analysis result of the voice analysis unit 15 and the ID of the terminal apparatus 10 to the host apparatus 20 through the radio communication line. As the information transmitted to the host apparatus 20, not only the analysis result, but also information regarding the voice acquisition time, sound pressure of the acquired voice, and the like of the microphones 11a to 11d may be included according to processing performed in the host apparatus 20. In addition, a data storage unit that stores the analysis result of the voice analysis unit 15 may be provided in the terminal apparatus 10, and data stored for a certain period of time may be collectively transmitted. In addition, transmission using cables is also possible. In the present exemplary embodiment, the data transmission unit 16 functions as a voice signal transmission unit that transmits information of a voice signal of the voice.
The power supply unit 17 supplies electric power to the microphones 11a to 11d, the first to fourth amplifiers 13a to 13d, the voice analysis unit 15, and the data transmission unit 16. As the power supply, it is possible to use known power supplies, such as a dry battery and a rechargeable battery, for example. In addition, the power supply unit 17 includes known circuits, such as a voltage conversion circuit and a charging control circuit, when necessary.
The host apparatus 20 includes a data receiving unit 21 that receives the data transmitted from the terminal apparatus 10, a data storage unit 22 that stores the received data, a data analysis unit 23 that analyzes the stored data, and an output unit 24 that outputs the analysis result. The host apparatus 20 is realized by an information processing apparatus, such as a personal computer, for example. Moreover, in the present exemplary embodiment, the plural terminal apparatuses 10 are used as described above, and the host apparatus 20 receives the data from each of the plural terminal apparatuses 10.
The data receiving unit 21 corresponds to the radio communication line described above, and receives the data from each terminal apparatus 10 and transmits it to the data storage unit 22. In the present exemplary embodiment, the data receiving unit 21 functions as a receiving unit that receives information regarding a voice signal transmitted from the data transmission unit 16. The data storage unit 22 stores the received data acquired from the data receiving unit 21 according to the speaker. Here, speaker identification is performed by checking a terminal ID transmitted from the terminal apparatus 10 and a speaker name and a terminal ID registered in the host apparatus 20 in advance. Alternatively, it is also possible to transmit a wearer state instead of the terminal ID from the terminal apparatus 10.
The data analysis unit 23 analyzes the data stored in the data storage unit 22. As the specific analysis content and analysis method, various kinds of content and methods may be adopted depending on the purpose or aspect of use of the system according to the present exemplary embodiment. For example, the frequency of conversation between wearers of the terminal apparatuses 10 or the tendencies of a conversation partner of each wearer are analyzed, or the relationship between speakers in a conversation is estimated from the information regarding the length or sound pressure of the voice in the conversation.
The output unit 24 outputs the analysis result of the data analysis unit 23 or performs output based on the analysis result. As a unit that outputs the analysis result or the like, various kinds of units including display of a display device, printout using a printer, and voice output may be adopted according to the purpose or aspect of use of the system, the content or format of an analysis result, and the like.
Example of the Configuration of a Terminal Apparatus
As described above, the terminal apparatus 10 is used in a state worn by each user. As shown in
The body 30 is configured such that at least circuits for realizing the first to fourth amplifiers 13a to 13d, the voice analysis unit 15, the data transmission unit 16, and the power supply unit 17 and a power supply (battery) of the power supply unit 17 are housed in a plate-shaped case 31, for example, a rectangular parallelepiped thin case 31 formed of metal, resin, or the like. In addition, in the present exemplary embodiment, the third and fourth microphones 11c and 11d are provided on both sides of the case 31. In addition, a pocket through which an ID card, on which ID information such as the name or team of the wearer is displayed, is inserted may be provided in the case 31. In addition, such ID information or the like may be written on the surface of the case 31 itself.
In addition, the body 30 does not need to be a rigid body or does not need to be a rectangle. Accordingly, the body 30 may be formed by materials, such as cloth, which is neither rigid nor rectangular. For example, the body 30 may be a cloth bib or an apron to which required members (microphones 11a, 11b, 11c, and 11d and the like) are attached.
The first and second microphones 11a and 11b are provided in the strap 40. As materials of the strap 40, it is possible to use known various materials, such as leather, synthetic leather, cotton, other natural fibers, synthetic fiber using resin, and metal. In addition, coating processing using silicon resin, fluorine resin, or the like may be performed.
The strap 40 has a cylindrical structure, and the microphones 11a and 11b are housed inside the strap 40. By providing the microphones 11a and 11b inside the strap 40, it is possible to prevent damage or contamination of the microphones 11a and 11b, and it is suppressed that a speaker in a conversation is aware of the presence of the microphones 11a and 11b.
Explanation Regarding a Method of Identifying Whether a Speaker is a Wearer or Others
A method of identifying whether the speaker is a wearer or others who are persons other than the wearer (speaker identification) in the above configuration will be described.
The system according to the present exemplary embodiment identifies a voice of the wearer of the terminal apparatus 10 or voices of others using the voice information acquired, for example, by the first and third microphones 11a and 11c among the microphones provided in the terminal apparatus 10. In other words, in the present exemplary embodiment, it is identified whether the speaker of the acquired voice is a wearer or others. In addition, in the present exemplary embodiment, speaker identification is performed on the basis of non-linguistic information, such as sound pressure (volume input to the first and third microphones 11a and 11c), instead of linguistic information acquired using morphological analysis or dictionary information, among information items of the acquired voice. That is, a speaker of the voice is identified from the speaking situation specified by the non-linguistic information instead of the content of speaking specified by the linguistic information.
As described with reference to
On the other hand, assuming that the mouth (speaking portion) of a person other than the wearer (another person) is a sound source, the distance between the first microphone 11a and the sound source and the distance between the third microphone 11c and the sound source do not change greatly since another person is separated from the wearer. Although there may be a difference between both the distances depending on the position of another person with respect to the wearer, the distance between the first microphone 11a and the sound source is not several times the distance between the third microphone 11c and the sound source unlike the case when the mouth (speaking portion) of the wearer is a sound source. Therefore, for the voice of another person, the sound pressure of the acquired voice in the first microphone 11a is not largely different from the sound pressure of the acquired voice in the third microphone 11c unlike the case of the voice of the wearer.
In the present exemplary embodiment, therefore, the sound pressure ratio, which is a ratio between the sound pressure of the acquired voice in the first microphone 11a and the sound pressure of the acquired voice in the third microphone 11c, is calculated. In the present exemplary embodiment, therefore, a voice of the wearer and voices of others in the acquired voices are identified using the difference of the sound pressure ratio. More specifically, in the present exemplary embodiment, a threshold value of the ratio between the sound pressure of the first microphone 11a and the sound pressure of the third microphone 11c is set. In addition, a voice with a larger sound pressure ratio than the threshold value is determined to be the voice of the wearer, and a voice with a smaller sound pressure ratio than the threshold value is determined to be the voice of another person.
In addition, although the speaker identification is performed using the first and third microphones 11a and 11c in the example described above, the invention is not limited to this, and the same is true even if the second and third microphones 11b and 11c are used.
In addition, although the speaker identification is performed on the basis of the sound pressure of the voices acquired by the first and third microphones 11a and 11c in the example described above, adding information regarding the phase difference between the acquired voices for the identification may also be considered. That is, assuming that the mouth (speaking portion) of the wearer is a sound source, the distance between the first microphone 11a and the sound source is greatly different from the distance between the third microphone 11c and the sound source as described above. For this reason, the phase difference between the voice acquired by the first microphone 11a and the voice acquired by the third microphone 11c is increased. On the other hand, assuming that the mouth (speaking portion) of a person other than the wearer (another person) is a sound source, the distance between the first microphone 11a and the sound source and the distance between the third microphone 11c and the sound source do not change greatly since another person is separated from the wearer, as described above. For this reason, the phase difference between the voice acquired by the first microphone 11a and the voice acquired by the third microphone 11c is reduced. Therefore, the accuracy of the speaker identification is improved by taking into consideration the phase difference between the voices acquired by the first and third microphones 11a and 11c.
Explanation Regarding the Face-to-Face Angle
In the present exemplary embodiment, the face-to-face angle is an angle between a wearer of the terminal apparatus 10 and a speaker facing the wearer. In addition,
Explanation Regarding a Method of Calculating the Face-to-Face Angle
Here, it is assumed that a point S is the position of the speaker, more precisely, the position of a speaking point which is a sound source of the voice of the speaker. In addition, the voice emitted from the speaking point spreads concentrically from the point S. However, since the voice spreads at the speed of sound which is a limited speed, time taken until the voice reaches the first microphone 11a is different from time taken until the voice reaches the second microphone 11b. As a result, a time difference Δt corresponding to the voice path difference δ occurs. In addition, assuming that the distance between the first and second microphones 11a and 11b is D and the distance between the midpoint C and the point S is L, the following Expression (1) is satisfied.
δ=(L2+LD cos α+D2/4)0.5−(L2−LD cos α+D2/4)0.5 (1)
If L>D, the influence of L is small. Accordingly, Expression (1) may be approximated to the following Expression (2).
δ≅D cos α (2)
In addition, if the sound speed c and the time difference Δt are used, the following Expression (3) is satisfied.
δ=cΔt (3)
That is, the face-to-face angle α may be calculated using Expressions (2) and (3). That is, the face-to-face angle α which is an angle between the wearer and the speaker facing each other may be calculated on the basis of the time difference Δt when the voice of the speaker reaches the first and second microphones 11a and 11b, which are two voice acquisition units, and the distance D between the first and second microphones 11a and 11b.
In addition, the time difference Δt when the voice of the speaker reaches the first and second microphones 11a and 11b may be calculated as follows.
Here, the horizontal axis indicates a data number given to the data of 5000 points, and the vertical axis indicates the amplitude of the voice of a speaker. In addition, the solid line indicates a waveform signal of the voice of the speaker reaching the first microphone 11a, and the dotted line indicates a waveform signal of the voice of the speaker reaching the second microphone 11b.
In the present exemplary embodiment, a cross-correlation function of these two waveform signals is calculated. That is, one waveform signal is fixed and the other waveform signal is shifted for calculation which takes the sum of products.
As shown in
Moreover, in the present exemplary embodiment, the amplitude is divided into predetermined frequency bands and large weighting is given to the frequency band with the largest amplitude, thereby calculating the cross-correlation function. The time difference Δt calculated in this manner is more accurate. In addition, in order to calculate the time difference Δt more accurately, it is preferable that the distance between the first and second microphones 11a and 11b fall within the range of 1 cm to 100 cm. If the distance between the first and second microphones 11a and 11b is less than 1 cm, the time difference Δt becomes too small. As a result, the error of the face-to-face angle derived thereafter tends to be large. In addition, if the distance between the first and second microphones 11a and 11b is larger than 100 cm, the influence of the reflected voice when deriving the time difference Δt is increased. In addition, when calculating the cross-correlation function, it is necessary to perform calculation for the longer time width. As a result, the load required for the calculation is increased.
Explanation Regarding the Wearing State of a Terminal Apparatus
Meanwhile, when the wearer wears the terminal apparatus 10, the front and back may be reversed unlike the case shown in
Here,
On the other hand,
In addition, when the terminal apparatus 10 is worn in the state shown in
Moreover, in the present exemplary embodiment, as described above, the strap 40 is connected to the body 30 in a state where both the ends of the strap 40 are separated from each other by a predetermined distance in the horizontal direction of the body 30. Therefore, the positional relationship in the horizontal direction between the microphones 11a and 11b provided in the strap 40 and the positional relationship between the front and back of the microphones 11c and 11d placed on both surfaces of the body 30 are associated with each other. That is, it is hard for only the body 30 to rotate. Accordingly, when the third microphone 11c faces the outside of the wearer, the first microphone 11a is located on the left side and the second microphone 11b is located on the right side when viewed from the wearer (the case of
In the present exemplary embodiment, therefore, the positional relationship of the microphones 11a 11b, 11c, and 11d is checked to determine to which of the two wearing states the current state corresponds.
Explanation Regarding a Voice Analysis Unit
The voice analysis unit shown in
In addition,
First, the microphones 11a, 11b, 11c, and 11d acquire the voice of the speaker (step 101). Then, the first to fourth amplifiers 13a to 13d amplify voice signals from the microphones 11a to 11d, respectively (step 102).
Then, the amplified voice signals are transmitted to the voice analysis unit 15, and the sound pressure comparison section 151 compares the sound pressure of the voice acquired by the microphone 11c with the sound pressure of the voice acquired by the microphone 11d (step 103). Then, from the comparison result of the sound pressure of the voice acquired by the third microphone 11c and the sound pressure of the voice acquired by the fourth microphone 11d among the comparison results of the sound pressure comparison section 151, the voice signal selection section 152 determines that one of the third and fourth microphones 11c and 11d with larger sound pressure faces the outside of the wearer. In addition, the voice signal selection section 152 selects the information regarding a voice signal of the voice acquired by the microphone determined to have a larger sound pressure (step 104).
That is, in the positional relationship between the third and fourth microphones 11c and 11d, voice acquisition is performed more satisfactorily when the microphone faces the outside of the wearer than when the microphone faces the inside of the wearer. For this reason, when the microphone faces the outside of the wearer, the sound pressure is more increased. Therefore, when the sound pressure of the voice acquired by the third microphone 11c is larger than the sound pressure of the voice acquired by the fourth microphone 11d, it may be determined that the third microphone 11c faces the outside. In addition, it may be determined that the wearing state of the terminal apparatus 10 in this case is the same as in the case shown in
In contrast, when the sound pressure of the voice acquired by the fourth microphone 11d is larger than the sound pressure of the voice acquired by the third microphone 11c, it may be determined that the fourth microphone 11d faces the outside. In addition, it may be determined that the wearing state of the terminal apparatus 10 in this case is the same as in the case shown in
Then, on the basis of the determination result of the sound pressure comparison section 151, the positional relationship determination section 153 determines the positional relationship between the microphones 11a and 11b (step 105). That is, the positional relationship between the microphones 11a and 11b is either the case described in
Then, the speaker identification section 154 identifies whether the speaker is the wearer or others, who are persons other than the wearer (speaker identification) (step 106). In this case, between the microphones 11c and 11d placed in the body 30, the microphone which faces the outside of the wearer and is selected by the voice signal selection section 152 is used. If the microphone facing the inside of the wearer is used, the voice acquisition state is deteriorated. In this case, a correct speaker identification result may not be obtained. Therefore, the sound pressure of the voice acquired by the third microphone 11c is used when the wearing state of the terminal apparatus 10 is as shown in
In addition, when the speaker identification section 154 identifies that the speaker is the wearer (that is, when the speaker identification section 154 determines that the speaker is not others) (No in step 107), the process returns to step 101. On the other hand, when the speaker identification section 154 identifies that the speaker is others (Yes in step 107), the face-to-face angle output section 155 calculates first the time difference Δt when the voice of the speaker reaches the first and second microphones 11a and 11b using the method described in
Then, information regarding a voice signal of the voice including the information of the face-to-face angle α or the speaker identification result is output to the host apparatus 20 through the data transmission unit 16 (step 110). In this case, the data transmission unit 16 selects and transmits the information regarding a voice signal of the voice acquired by the microphone placed on the surface that faces the outside of the wearer. In addition, for the microphones 11a and 11b, information regarding the voice signal of the voice is transmitted corresponding to the positional relationship in the horizontal direction determined by the positional relationship determination section 153.
By the voice analysis system 1 described in detail up to now, the speaker identification result and the face-to-face angle information may be used as information that determines the relationship between the wearer and the speaker.
Here, as the relationship between the wearer and the speaker, for example, communication relationship between the wearer and the speaker is determined. For example, if the wearer and the speaker are located close to each other and it may be seen from the face-to-face angle information that the wearer and the speaker face each other, a possibility that the wearer and the speaker will be in conversation is high. In addition, if the wearer and the speaker face away from each other, a possibility that the wearer and the speaker will not be in conversation is high. In practice, other information such as timing or the interval of acquisition of the voice of the speaker or the wearer is also used for the above determination. In addition, as the relationship between the wearer and the speaker, for example, relationship in which one of them looks down on the other one from above may also be determined using the face-to-face angle in the vertical direction. In addition, the positional relationship or the like between plural persons in conversation may also be determined on the basis of the information from the plural terminal apparatuses 10.
In addition, although the speaker identification or the output of the face-to-face angle is performed by the terminal apparatus 10 in the example described above, the speaker identification or the output of the face-to-face angle may also be performed by the host apparatus 20 without being limited to this. In the voice analysis system 1 in this form, the functions of the speaker identification section 154 and the face-to-face angle output section 155 performed in the voice analysis unit 15 are performed in the data analysis unit 23 of the host apparatus 20, for example, unlike the voice analysis system 1 shown in
Explanation Regarding a Program
The processing performed by the terminal apparatus 10 in the present exemplary embodiment, which has been described in
Therefore, the processing performed by the terminal apparatus 10 which has been described in
The foregoing description of the exemplary embodiments of the present invention has been provided for the purpose of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention is defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2012-011218 | Jan 2012 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5778082 | Chu et al. | Jul 1998 | A |
5793875 | Lehr et al. | Aug 1998 | A |
7526094 | Hickling | Apr 2009 | B2 |
8155328 | Kotegawa et al. | Apr 2012 | B2 |
8525654 | Yoshizawa et al. | Sep 2013 | B2 |
8553903 | Greywall | Oct 2013 | B2 |
20090290257 | Kimura et al. | Nov 2009 | A1 |
20100211387 | Chen | Aug 2010 | A1 |
20100214086 | Yoshizawa et al. | Aug 2010 | A1 |
20110091056 | Nishizaki et al. | Apr 2011 | A1 |
20110103617 | Shin | May 2011 | A1 |
20120062729 | Hart et al. | Mar 2012 | A1 |
20120070009 | Ishii et al. | Mar 2012 | A1 |
20120114155 | Nishizaki et al. | May 2012 | A1 |
20130297319 | Kim | Nov 2013 | A1 |
Number | Date | Country |
---|---|---|
A-2002-165292 | Jun 2002 | JP |
Number | Date | Country | |
---|---|---|---|
20130191127 A1 | Jul 2013 | US |