The present invention relates to a self-speech detection device, a voice input device, and a hearing aid having a function of distinguishing a voice uttered by a wearer from a voice coming from the outside world.
Conventionally, a device using a bone conduction microphone has been put into practical use as a device for separating and identifying self-utterance, which is a voice uttered by a wearer, from an outside voice.
The bone conduction microphone detects the vibration of the skull caused by the wearer's voice through the attachment part that is in close contact with the wearer's ear canal. There is no sensitivity to external noise, which is air vibration, and the wearer's self-speech Only the sound can be picked up.
However, the bone conduction microphone is required to have close contact between the mounting portion and the ear canal in order to pick up the self-speech sound with high sensitivity. For this reason, it is necessary to match the shape of the wearing part with the shape of the ear canal of each wearer, and in the case of a cell type body in which the main body is integrated with the wearing part, it is necessary to replace the whole cell, so each wearer There was a problem that the fitting of was complicated.
An object of the present invention is to provide a self-speech detection device, a voice input device, and a hearing aid that can detect self-speech with a simpler configuration.
According to the invention of claim 1 of the present application, a mounting portion that is inserted into the ear canal of a wearer to insulate between the outside world and the ear canal, and a first microphone provided toward the outside world. A second microphone provided toward the ear canal; a delay unit that delays the signal of the first microphone for a predetermined time; a receiver that outputs the signal delayed by the delay unit into the ear canal; and a first microphone. And a signal processing unit that detects the self-utterance of the wearer based on the correlation between the signal waveform of (1) and the signal waveform of the second microphone.
In the invention of claim 2 of this application, in the invention of claim 1, the signal processing unit calculates a difference between the signal waveform of the first microphone and the signal waveform of the second microphone, and It is a processing unit that detects the wearer's self-utterance based on the correlation with the signal waveform of the microphone 1.
According to a third aspect of the present application, in the first aspect of the invention, the signal processing unit propagates the signal waveform of the first microphone delayed by the delay means including the ear canal from the receiver to the second microphone. The convolution calculation is performed by the transfer function of the road, the difference between the convolved signal waveform and the signal waveform of the second microphone is calculated, and the wearer's self is calculated based on the correlation between the difference signal waveform and the signal waveform of the first microphone. It is characterized in that it is a processing unit for detecting an utterance.
According to the invention of claim 4 of this application, in the invention of claim 3, a sound source for generating a test sound signal and inputting it to the delay means, a direct waveform of the test sound signal and a second microphone output from the receiver are used. It is characterized in that it further comprises means for calculating a transfer function of the propagation path based on the propagation waveform of the received test sound signal.
According to the invention of claim 5 of this application, a mounting portion that is inserted into the ear canal of the wearer to insulate between the outside world and the ear canal, the first microphone provided toward the outside world, and the inside of the ear canal are provided. A second microphone, a delay unit that delays the signal received by the first microphone for a predetermined time, a receiver that outputs the signal delayed by the delay unit into the ear canal, and a first microphone that is delayed by the delay unit. The signal waveform is convolved with the transfer function of the propagation path including the ear canal from the receiver to the second microphone, the difference signal waveform between this convolutional signal waveform and the signal waveform of the second microphone is extracted, and this difference signal waveform is And a signal processing unit for external output.
According to the invention of claim 6 of this application, in the invention of claim 5, the sound source for generating a test sound signal and inputting it to the delay means, the direct waveform of the test sound signal and the second microphone output from the receiver are used. It is characterized in that it further comprises means for calculating a transfer function of the propagation path based on the propagation waveform of the received test sound signal.
According to the invention of claim 7 of this application, a mounting portion that is inserted into the ear canal of the wearer to insulate between the outside world and the ear canal, a first microphone provided toward the outside world, and the inside of the ear canal are provided. The second microphone, the voice speed conversion means for extending the signal received by the first microphone on the time axis, the receiver for outputting the signal delayed by the delay means into the ear canal, and the voice speed conversion means. The signal waveform of the first microphone is convolved with the transfer function of the propagation path including the ear canal from the receiver to the second microphone, and the difference signal waveform between this convolutional signal waveform and the signal waveform of the second microphone and the first And a signal processing unit that detects the self-utterance of the wearer based on the correlation with the signal waveform of the microphone and prohibits the operation of the speech speed conversion means when the self-utterance is detected.
According to the invention of claim 8 of this application, in the invention of claim 7, a sound source for generating a test sound signal and inputting it to the delay means, a direct waveform of the test sound signal and a second microphone output from the receiver are used. It is characterized in that it further comprises means for calculating a transfer function of the propagation path based on the propagation waveform of the received test sound signal.
In the present invention, the second microphone receives not only the self-speech sound but also the external voice output from the receiver. On the other hand, the first microphone receives the foreign voice. By canceling the component of the external voice by correlating the signal waveform of the second microphone and the first signal waveform, only the self-speech sound component can be separated and extracted. The foreign voice received by the first microphone includes the self-speech sound uttered by the mouth, but the same processing may be performed.
Since the signal processing unit can be configured by a DSP or the like, the structure can be simplified even when performing complicated signal processing.
Therefore, according to the present invention, it is possible to separate and extract only the self-speech sound with a configuration of the mounting portion that is much simpler than that of the bone conduction microphone that only shields the outside world from the ear canal.
Embodiments of the present invention will be described with reference to the drawings.
In the cell 1, a microphone 5 for detecting a sound coming from the outside world, a receiver 6 for outputting a sound to the ear canal 4, a microphone 7 for detecting a sound inside the ear canal 4, and a receiver 6, A pipe hole 8 that spatially connects with the microphone 7 is provided. The microphone 5, the receiver 6 and the microphone 7 are connected to the signal processing unit 9. The signal processing unit 9 is composed of an electronic circuit such as a DSP, analyzes and processes the audio signal input from the microphone 5 and the microphone 7, and outputs the processed audio signal from the receiver 6. Hereinafter, examples of various configurations of the signal processing unit 9 and their operations will be described.
With reference to
8B and 8C are diagrams in which the level of the input waveform is modeled, but actually the waveform as shown in
On the other hand, with reference to
9D and 9E are diagrams in which the level of the input waveform is modeled, the waveforms shown in
Based on the difference in the correlation values described above, when the peak of the correlation value is output at 0 sec, the speech determination device 12 determines that the correlated waveform is due to self-speech, and the peak of the correlation value is output after tsec. Then, it is determined that the waveform is due to the foreign voice.
With reference to
On the other hand, a signal waveform and a correlation when a self-speech is performed will be described with reference to FIGS. When the wearer utters, the voice signal is emitted from the mouth to the outside world and also propagates into the ear canal 4 through the body propagation path. The sound emitted to the outside world is received by the microphone 5, and the sound transmitted to the external auditory meatus 4 is received by the microphone 7. Since the propagation distances of both are extremely short, they are received almost at the same time (0 sec). Therefore, the waveform of the point A appears at the microphone 5 at the time 0 sec, and the waveform similar to the point A also appears at the microphone 7 at the time 0 sec. Further, the waveform of the point A is delayed by the delay device 10 for tsec and then emitted from the receiver 6 to the ear canal 4, so that the microphone 7 also receives this voice. That is, the microphone 7 receives both the voice of the wearer himself/herself generated in the ear canal and the delayed voice emitted from the receiver 6, and the synthesized waveform thereof is as shown in the third row of FIG. It becomes the waveform of pointC. The difference processing device 13 subtracts the pointC waveform which is the voice signal waveform input from the microphone 7 and the pointB waveform which is the voice signal waveform input from the delay device 10. Since the pointC waveform is a composite signal of the internally transmitted waveform and the voice waveform output from the receiver 6, when the pointB waveform is subtracted from this, the waveform component of the external voice output from the receiver 6 is canceled, and) As shown in the fourth row, the waveform becomes similar to pointA. The correlation calculation device 11 is a device that in time series examines how similar the point A waveform is to the reference waveform, using the point A waveform as the reference waveform, and outputs a correlation value proportional to the degree of the correlation. In this example, since the waveform of pointD almost coincides with the waveform of pointA, a large correlation value is output at the timing of 0 sec. The utterance determination device 12 determines that there is a self-utterance started at the timing of 0 sec due to the input of the large correlation value. As described above, in the example of
The examples of
In the example of
In
On the other hand, the microphone 7 receives the sound generated in the external auditory meatus 4 and inputs it to the difference processing device 13. The microphone 7 receives both the voice output by the receiver 6 and the voice of the wearer transmitted from the ear canal 4. The difference processing device 13 subtracts the audio signal waveform input from the transfer function correction device 14 from the audio signal waveform input from the microphone 7 to cancel this component. The audio signal waveform from which the output of the receiver 6 has been canceled is input to the correlation calculation device 11. The correlation calculation device 11 calculates a correlation value between the voice signal input from the microphone 5 and the voice signal input from the difference processing device 13.
With reference to
On the other hand, with reference to
The correlation calculation device 11 is a device that in time series examines how similar the point A waveform is to the reference waveform, using the point A waveform as the reference waveform, and outputs a correlation value proportional to the degree of the correlation. In this example, since the waveform of pointD almost coincides with the waveform of pointA, a large correlation value is output at the timing of 0 sec. The utterance determination device 12 determines that there is a self utterance started at the timing of 0 sec due to the input of the large correlation value.
In
Although the transfer function correction device 14 is inserted between the delay device 10 and the difference processing device 13 in the device of
Note that, in the examples of
6, as a system of the microphone 5-delay device 10-receiver 6 of
When the instruction device 33 gives an instruction to turn on, the control unit 30 first sets the transfer function mode, the control device 30 turns on the sound source 34, and instructs the arithmetic processing device 35 to perform a transfer function arithmetic operation. The test sound signal formed by the sound source 34 may be white noise or pink noise, but may be another signal, for example, a waveform of an impulse or a low-frequency signal having a fixed frequency. The test sound signal output from the sound source 34 is emitted into the external auditory meatus 4 via the A/D converter 37, the delay device 38, the D/A converter 39, the amplifier 40, and the receiver 6. The test sound emitted into the ear canal 4 is detected by the microphone 7 provided toward the ear canal 4. The output of the microphone 7 is input to the arithmetic processing unit 35 via the amplifier 41 and the A/D converter 42. The arithmetic processing unit 35 may be configured by a DSP or the like. On the other hand, the arithmetic processing unit 35 is also supplied with a signal obtained by converting the test sound signal formed by the sound source 34 into a digital signal from the A/D converter 37. The arithmetic processing unit 35 compares the pointA signal (the signal input from the A/D converter 37) and the pointB signal (the signal input from the A/D converter 42) as the transfer function operation and compares these signals. A correction coefficient is calculated according to the difference between There are various methods for obtaining the correction coefficient. Then, a function G(k) such that
G(k)·A(k)−B(k)=E(k) Once obtained, the function G(k) is in a relatively stable state unless the device is removed from the ear canal. You just need to calculate. When the calculation is completed, the processing unit 35 goes into a standby state. When the control device 30 detects that the arithmetic processing device 35 is in the standby state, the control device 30 stops the sound source 34 and instructs the arithmetic processing device 35 to perform a signal processing operation to switch to the normal operation mode.
In the normal operation mode, when an external sound is input from the microphone 5, it passes through the amplifier 36, the A/D converter 37, the delay device 38, the D/A converter 39, and the amplifier 40, and after tsec from the receiver 6 to the ear canal 4. Is output. The external sound output to the external auditory meatus 4 is received by the microphone 7 provided toward the external auditory meatus 4, and is input to the arithmetic processing unit 35 via the amplifier 41 and the A/D converter 42. The arithmetic processing unit 35 in which the signal processing operation is set, after convolving the transfer function G(k) obtained above into the pointA signal input from the A/D converter 37, that is, the direct foreign voice signal, The difference calculation processing is performed with the pointB signal, that is, the signal including the external audio signal transmitted through the external auditory meatus 4. By this processing, the foreign voice component is deleted from the pointB signal and only the self-speech voice is extracted. If the extracted self-speech voice is connected to the voice input terminal of the communication device, only a voice signal with little noise can be picked up and sent to the other party even in a place where environmental noise is large.
In the above example, the output of the A/D converter 37 (pointA signal) is input to the arithmetic processing device 35 as the reference signal, but the output of the delay device 38 may be input.
In
When the instruction device 53 gives an instruction to turn on, the control unit 50 first sets the transfer function mode, the control device 50 turns on the sound source 54, and instructs the arithmetic processing device 55 to perform a transfer function arithmetic operation. The test sound signal formed by the sound source 54 may be white noise or pink noise, but may be formed of another signal, for example, an impulse or a waveform of a low frequency signal having a fixed frequency. The test sound signal output from the sound source 54 is output from the receiver 9 and received by the microphone 5 and converted into an electric signal, and then the A/D converter 57, the speech speed conversion/gain control device 58, and the D/A conversion. It is emitted into the ear canal 4 through the container 59, the amplifier 60, and the receiver 6. The test sound emitted into the ear canal 4 is detected by the microphone 7 provided toward the ear canal 4. The output of the microphone 7 is input to the arithmetic processing unit 55 via the amplifier 61 and the A/D converter 62. The arithmetic processing unit 55 may be composed of a DSP or the like. On the other hand, the arithmetic processing unit 55 is also supplied with the voice source 54 formed by the voice speed conversion/gain control unit 58 and the voice speed converted from the test sound signal output from the receiver 9. The arithmetic processing unit 35 compares the point C signal (the signal input from the speech speed conversion/gain control unit 58) and the point B signal (the signal input from the A/D converter 62) as the transfer function arithmetic operation, and The correction coefficient is calculated according to the difference in the signal of. There are various methods for obtaining the correction coefficient, but for example, the point A signal is A(k), the point B signal is B(k), and the error E(k) between A(k) and B(k) is zero as shown in the following equation. Then, a function G(k) such that
G(k)·A(k)−B(k)=E(k) Once obtained, the function G(k) is in a relatively stable state unless the device is removed from the ear canal. You just need to calculate. When the calculation is completed, the calculation processing device 55 enters the standby state. When the control device 50 detects that the arithmetic processing device 55 is in the standby state, the control device 50 stops the sound source 54 and instructs the arithmetic processing device 55 to perform a signal processing operation to switch to the normal operation mode.
In the normal operation mode, when an external sound is input from the microphone 5, it passes through the amplifier 56, the A/D converter 57, the speech speed conversion/gain control device 58, the D/A converter 59, the amplifier 60, and the receiver after tsec. 6 is output to the ear canal 4. The external sound output to the external auditory meatus 4 is received by the microphone 7 provided toward the external auditory meatus 4, and is input to the arithmetic processing unit 55 via the amplifier 61 and the A/D converter 62. The arithmetic processing unit 55 to which the signal processing operation is set convolves the transfer function G(k) obtained above with the pointC signal input from the speech speed conversion/gain control unit 58, that is, the direct foreign voice signal. After that, a difference calculation process is performed with the pointB signal, that is, the signal including the external audio signal transmitted through the external auditory meatus 4. By this processing, the foreign voice component is deleted from the pointB signal and only the self-speech voice component is extracted. When the microphone 5 receives only the foreign voice, the self-speech voice component is 0, but when self-speaking, the component of that level is extracted. When the arithmetic processing device 55 extracts the self-speech voice component, the control device 50 prohibits the voice speed conversion/gain control device 58 from performing the voice speed conversion and controls the gain to be small. As a result, the speech speed conversion for the self-speech is prohibited, and the self-speech is smoothly performed.
In the above example, the output of the A/D converter 57 (pointA signal) may be input to the arithmetic processing device 55 as a reference signal.
As a result, the speech speed conversion is desired to be applied to the foreign voice, but not to the vocalized voice. When a person speaks, since the person controls the utterance while listening to his/her own voice, there arises a problem that the person cannot speak well if the uttered speech is delayed. Therefore, it is necessary to detect the utterance section and not apply the speech speed conversion process to the self-utterance. In addition, since the self-speech voice sounds louder than the foreign voice, it is necessary to control the gain so as to have an appropriate volume.
In this way, since it is possible to separate the self-speech and the external sound with a simple configuration, when this is used in a speech speed conversion hearing aid, it is used in the same cell regardless of the shape of the ear canal of the wearer. be able to.
According to the present invention, since the external voice signal is canceled from the input signal of the second microphone provided toward the ear canal to extract only the self-speech sound, the structure is complicated. It is possible to extract only the self-speech voice from which the external voice such as environmental noise is removed without using a bone conduction microphone whose setting is delicate for each wearer.
Also, if this is used as a hearing aid, the speech speed conversion can be prohibited only in the section of the self-speech voice, and a hearing aid that can easily speak can be configured.
Number | Date | Country | |
---|---|---|---|
61096128 | Sep 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12555570 | Sep 2009 | US |
Child | 13917079 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17244202 | Apr 2021 | US |
Child | 18425025 | US | |
Parent | 16571973 | Sep 2019 | US |
Child | 17244202 | US | |
Parent | 13917079 | Jun 2013 | US |
Child | 16571973 | US |