This application claims priority from European Patent Application No. 19166291.5 filed on Mar. 29, 2019. The content of this application is incorporated herein by reference in its entirety.
This disclosure relates to a hearing device comprising a vibration sensor configured to detect a vibration at least partially caused by an own voice activity of a user, according to the preamble of claim 1. The disclosure further relates to a method of operating a hearing device comprising such a vibration sensor, according to the preamble of claim 15.
Hearing devices may be used to improve the hearing capability or communication capability of a user, for instance by compensating a hearing loss of a hearing-impaired user, in which case the hearing device is commonly referred to as a hearing instrument such as a hearing aid, or hearing prosthesis. A hearing device may also be used to produce a sound in a user's ear canal. Sound may be communicated by a wire or wirelessly to a hearing device, which may reproduce the sound in the user's ear canal. For example, earpieces such as earbuds, earphones or the like may be used to generate sound in a person's ear canal. Furthermore, hearing devices may be employed as hearing protection devices that suppress or at least substantially attenuate loud sounds and noises that could harm or even damage the user's sense of hearing. Hearing devices are often employed in conjunction with communication devices, such as smartphones, for instance when listening to sound data processed by the communication device and/or during a phone conversation operated by the communication device. More recently, communication devices have been integrated with hearing devices such that the hearing devices at least partially comprise the functionality of those communication devices.
Identifying an own voice activity of a user of the hearing device can be desirable for a number of reasons. For instance, an occlusion of an ear of the user can provoke an unnatural perception of a sound associated with the own voice activity. Occlusion occurs when an inner region of an ear canal is at least partially sealed from an ambient environment outside the ear canal. For instance, an otoplastic or other hearing device component inserted into the ear canal can provoke such a sealing. As a consequence of the sealing, an acoustic connection between the inner region of the ear canal and the ambient environment outside the ear canal can be strongly reduced or cut off such that little or no pressure equalisation in between the isolated regions can take place. The occlusion effect can then be caused by bone-conducted vibrations reverberating in the sealed inner region of the ear canal, so that speaking, chewing, body movement, heart beat or the like may create echoes or reverberations in the inner region. Those reverberations can add to an airborne sound produced by the own voice activity and even dominate the sound perception of the user. The user then may perceive “hollow” or “booming” echo-like sounds during the own-voice activity and/or the user may perceive his own voice as too loud. After identifying the own voice activity, the occlusion effect can be at least partially mitigated, for instance by an appropriate processing of an audio signal for reproducing a sound of the user's own voice and/or by activating a venting of the ear canal to the ambient environment. Own voice detection can also be desirable to recognize a situation in which the user is involved in a conversation or intends to communicate. Identifying such a hearing situation can be useful to adjust the audio processing or other hearing device parameters accordingly, for instance to provide a certain directionality of a beamformer or an ambient sound level particularly suitable for the user's communication, as compared to other hearing situations such as streaming a television program or listening to music. The own voice activity can include any deliberately caused vibration of the user's vocal chords, for instance a speech or coughing by the user.
Own voice detection can be applied in a hearing device to identify an own voice activity of the user. Various solutions for own voice detection have been proposed. Some solutions rely on a signal analysis of a sound signal detected by a microphone outside the ear canal. U.S. Pat. No. 8,477,973 B2 discloses two microphones arranged at different locations of the ear and an adaptive filter to process a difference signal of the signals obtained by the two microphones, wherein the presence of the wearer's own voice is determined by a comparison of the difference signal with the signal obtained by one of the microphones. U.S. Pat. No. 9,584,932 B2 proposes computing a difference between an audio signal picked up by an ear canal microphone and a filtered audio signal obtained from a signal processing unit after recording by an ambient microphone in order to identify the presence of an own voice sound. U.S. Pat. No. 9,271,091 B2 discloses a method of own voice shaping by estimating an ambient sound portion and an own voice sound portion from audio signals recorded by an outer microphone and an ear canal microphone and adding the sound portions after a separate signal processing. Other solutions are based on picking up the user's voice transmitted from the user's vocal chords to the ear canal wall via bone conduction through the user's head. For this purpose, a bone conductive microphone or a pressure sensor, as disclosed in European Patent No. EP 2 699 021 B1, may be employed to probe the bone conducted signal. U.S. Pat. No. 9,313,572 B2 discloses a voice activity detector comprising microphones and an inertial sensor for detecting a voiced speech of the user by computing a coincidence of the speech included in the audio signal detected by the microphone and of the vibration of the user's vocal chords detected by the inertial sensor.
Those prior art solutions can require a rather processing intensive analysis of a detection signal recorded by the microphones or bone vibration detectors which impedes a desirable quick identification of the own voice activity. Moreover, the reliability of own voice detection can be compromised by the detector position, for instance when a good contact of the bone vibration probe to the irregular shape of an ear canal wall is required.
It is an object of the present disclosure to avoid at least one of the above mentioned disadvantages and to provide a hearing device and/or a method of operating the hearing device allowing a detection of the user's own voice activity in a rather uncomplicated manner, in particular such that a signal processing required for the own voice detection can be kept less intensive. It is another object to enable the own voice detection in a rather time efficient manner, in particular such that an occurrence of an own voice activity can be determined with a minimum delay. It is a further object to increase the reliability of own voice detection, in particular such that the likelihood of false detections and/or missing detections of the own voice activity can be reduced. It is another object to provide enhancement of additional own voice detection devices and/or methods, in particular such that the additional own voice detection can be equipped with increased reliability. It is yet another object to enable the own voice detection in a manner allowing recognition of a content of a speech of the user, in particular to increase the reliability of current speech recognition techniques.
At least one of these objects can be achieved by a hearing device comprising the features of patent claim 1 and/or in a method of operating a hearing device comprising the features of patent claim 15. Advantageous embodiments of the invention are defined by the dependent claims.
The present disclosure proposes a hearing device configured to be worn at least partially at a head of a user. The hearing device comprises a vibration sensor configured to detect a vibration conducted through the user's head to the hearing device, wherein at least a part of the vibration is caused by an own voice activity of the user. The vibration sensor is configured to output a vibration signal comprising information about said vibration. The hearing device further comprises a processor communicatively coupled to the vibration sensor. The processor is configured to determine a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic. The own voice characteristic is indicative of said part of the vibration caused by the own voice activity. The processor is further configured to identify the own voice activity based on an identification criterion comprising said presence of the own voice characteristic in the vibration signal at the associated vibration frequency.
Own voice detection based on such an identification criterion of the own voice activity can be implemented in a rather processing efficient manner such that the own voice activity may be identified with small delay. Beyond that, the detection reliability can be enhanced, in particular when the identification criterion incorporates further conditions, as further detailed below, and/or when such an own voice identification is complemented by another own voice detection technique. The identification criterion can also by employed for a speech recognition, as further described below.
Independently, the present disclosure proposes a binaural hearing system comprising said hearing device as a first hearing device, and further comprising a second hearing device.
Independently, the present disclosure proposes a method of operating a hearing device configured to be worn at least partially at a head of a user. The method comprises detecting a vibration conducted through the user's head to the hearing device, wherein at least a part of the vibration is caused by an own voice activity of the user. The method further comprises providing a vibration signal comprising information about said vibration. The method further comprises determining a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with the own voice characteristic, wherein the own voice characteristic is indicative of said part of the vibration caused by the own voice activity. The method further comprises identifying the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the vibration signal at the associated vibration frequency.
Independently, the present disclosure also proposes a non-transitory computer-readable medium storing instructions that, when executed by a processor, cause a hearing device to perform operations of this method.
In some implementations, the hearing device can comprise at least one of the subsequently described features. Each of those features can be provided solely or in combination with at least another feature. Each of those features can be correspondingly provided in some implementations of the binaural hearing system and/or of the method of operating the hearing device and/or of the computer-readable medium.
The processor can be configured to determine a signal feature of the vibration signal. The signal feature can comprise at least one frequency dependent characteristic of the vibration signal. In some implementations, the determining the presence of the own voice characteristic at the associated vibration frequency comprises simultaneous determining of a signal feature in the vibration signal and determining of a presence of the signal feature at the vibration frequency associated with the own voice characteristic. This can contribute to a processing efficient and fast detection of the own voice activity. In some implementations, the signal feature can comprise a peak in the vibration signal. The peak can be determined by a peak detection. In some implementations, the signal feature can comprise a signal level of the vibration signal.
In some implementations, the processor is configured to associate said own voice characteristic with a vibration frequency selected from a frequency range detectable by the vibration sensor. The processor can be configured to select the associated vibration frequency from the frequency range detectable by the vibration sensor. The processor can be configured to distinguish between a plurality of own voice characteristics each associated with a respective vibration frequency. The processor can be configured to determine a presence of a signal feature determined in the vibration signal at the vibration frequency associated with the own voice characteristic. In some implementations, the processor is configured to select the associated vibration frequency from a set of associated vibration frequencies. For instance, such a set of associated vibration frequencies may comprise frequencies separated from one another by a frequency difference.
The vibration signal can be time dependent. In particular, the vibration signal can be provided in a time domain representing a progressing time during which the vibration signal is detected. In some implementations, the vibration signal is recorded, in particular sampled, at successive points in time during a recording time, in particular sampling time. A recording rate, in particular sampling rate, can be defined as the number of recorded values of the vibration signal per time. In some implementations, the vibration signal is provided as an analog signal.
In some implementations, the processor is configured to evaluate the vibration signal in a frequency domain comprising a spectrum of vibration frequencies. Based on this evaluation, said presence of the own voice characteristic in the vibration signal at the associated vibration frequency may be determined. For instance, the vibration signal can be evaluated in the frequency domain by determining the power spectral density (P SD) of the vibration signal. In some implementations, the processor is configured to determine said presence of the own voice characteristic at the associated vibration frequency in the time dependent vibration signal. For instance, the presence of the own voice characteristic at the associated vibration frequency may be determined by evaluating the vibration signal in the time domain with respect to zero crossings, in particular in a zero crossing analysis, and/or in a time series analysis, in particular by a dynamic time warping.
In some implementations, the own voice characteristic is a first own voice characteristic in the vibration signal associated with a first vibration frequency and the processor is configured to determine a presence of a second own voice characteristic in the vibration signal at an associated second vibration frequency. The identification criterion can further comprise said presence of the second own voice characteristic in the vibration signal at the associated second vibration frequency. In this way, a reliability of the own voice detection may be enhanced. In some implementations, the first vibration frequency and the second vibration frequency can be selected to be different. The processor can be configured to determine a frequency distance of said different frequencies. The identification criterion can further comprise the frequency distance corresponding to a predetermined distance value.
In some implementations, the processor is configured to determine a temporal sequence of the presence of the first own voice characteristic and the second own voice characteristic in the vibration signal. The identification criterion can further comprise said presence of the first own voice characteristic temporally preceding said presence of the second own voice characteristic in the vibration signal. In particular, the processor can be configured to determine a first time of said presence of the first own voice characteristic in the vibration signal at the associated first vibration frequency, and to determine a second time of said presence of the second own voice characteristic in the vibration signal at the associated second vibration frequency. The identification criterion can further comprise said first time temporally preceding said second time. The first vibration frequency and the second vibration frequency can be selected to be different or equal. In some implementations, the processor is configured to evaluate a modulation of the vibration signal, for instance in a modulation analysis, to determine said temporal sequence of the own voice characteristics. In particular, the modulation of an amplitude of the vibration signal can be evaluated. In some implementations, the processor is configured to determine a time period of said temporal sequence. The identification criterion can further comprise the time period corresponding to a predetermined time interval.
In some implementations, the hearing device comprises a microphone configured to detect a sound conducted through an ambient environment of the user and to output an audio signal comprising information about said sound. The processor can be communicatively coupled to the microphone. In some implementations, the processor is configured to determine a presence of an own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic. The own voice characteristic in the audio signal can be indicative of at least a part of said sound which is caused by the own voice activity. The identification criterion can further comprise said presence of the own voice characteristic in the audio signal at the associated audio frequency. Thus, the own voice detection reliability may be improved. In some implementations, the processor is configured to distinguish between a plurality of vibration frequencies each associated with a respective own voice characteristic in the vibration signal and between a plurality of audio frequencies each associated with a respective own voice characteristic in the audio signal. The processor can be configured to relate each audio frequency of said plurality of audio frequencies with a respective vibration frequency of said plurality of vibration frequencies. Said identification criterion can further comprise said associated vibration frequency related to said associated audio frequency.
In some implementations, the own voice characteristic is a first own voice characteristic in the audio signal associated with a first audio frequency and the processor is configured to determine a presence of a second own voice characteristic in the audio signal at an associated second audio frequency. The identification criterion can further comprise said presence of the second own voice characteristic in the audio signal at the associated second audio frequency. In some implementations, the first audio frequency and the second audio frequency can be selected to be different. The processor can be configured to determine a frequency distance of said different frequencies. The identification criterion can further comprise the frequency distance corresponding to a predetermined distance value.
In some implementations, the processor is configured to determine a temporal sequence of the presence of the first own voice characteristic and the second own voice characteristic in the audio signal. The identification criterion can further comprise said presence of the first own voice characteristic temporally preceding said presence of the second own voice characteristic in the audio signal. In particular, the processor can be configured to determine a first time of said presence of the first own voice characteristic in the audio signal at the associated first audio frequency, and to determine a second time of said presence of the second own voice characteristic in the audio signal at the associated second audio frequency. The identification criterion can further comprise said first time temporally preceding said second time. The first audio frequency and the second vibration frequency can be selected to be different or equal. In some implementations, the processor is configured to evaluate a modulation of the audio signal, for instance in a modulation analysis, to determine said temporal sequence of the own voice characteristics in the audio signal. In some implementations, the processor is configured to determine a time period of said temporal sequence of own voice characteristics in the audio signal. The identification criterion can further comprise the time period corresponding to a predetermined time interval.
In some implementations, the hearing device comprises a database. The processor can be configured to retrieve the frequency associated with the own voice characteristic from the database, in particular at least one of the associated vibration frequency and the associated audio frequency. The processor can be configured to store the frequency associated with the own voice characteristic in the database, in particular at least one of the associated vibration frequency and the associated audio frequency.
In some implementations, the hearing device is configured to operate in a first mode of operation and in a second mode of operation. In the first mode of operation, the own voice activity of the user can be detected by identifying the own voice activity based on the identification criterion. In the second mode of operation, the hearing device can be prepared for the detection of the own voice activity of the user by providing a frequency associated with the own voice characteristic. Providing the frequency associated with the own voice characteristic can comprise deriving the frequency associated with the own voice characteristic by the hearing device. The derived frequency can thus be identified, in particular learned, by the hearing device such that it can then be used for determining said presence of the own voice characteristic at the derived frequency. The derived frequency can be the vibration frequency associated with the own voice characteristic in the vibration signal and/or the audio frequency associated with the own voice characteristic in the audio signal. In some implementations, the derived frequency can be stored in a database, in particular in the second mode of operation of the hearing device, such that it can be retrieved before said determining of said presence of the own voice characteristic at the associated frequency, in particular in the first mode of operation of the hearing device.
In some implementations, the processor is configured to determine a signal feature of the vibration signal and to determine a signal feature of the audio signal. The processor can be configured to determine a similarity measure between the signal feature of the vibration signal and the signal feature of the audio signal. At least one of said determining of the presence of the own voice characteristic in the vibration signal at the associated vibration frequency and said presence of the own voice characteristic in the audio signal at the associated audio frequency can be based on the similarity measure. In some implementations, a threshold value for the similarity measure is provided. In some implementations, after determining the similarity measure larger than the threshold value, at least one of the signal feature of the vibration signal can be identified as the own voice characteristic in the vibration signal and the signal feature of the audio signal can be identified as the own voice characteristic in the audio signal. Thus, at least one of the vibration frequency associated with the own voice characteristic can be identified as the frequency of the signal feature of the vibration signal and the audio frequency associated with the own voice characteristic can be identified as the frequency of the signal feature of the audio signal. In some implementations, the identification criterion further comprises the similarity measure determined larger than the threshold value. In some implementations, the similarity measure can comprise a correlation between the signal feature of the vibration signal and the signal feature of the audio signal, for instance a cross-correlation. In some implementations, the similarity measure can comprise a correlation between the vibration signal and the audio signal. The correlation and/or comparison can be carried out with respect to the frequency associated with the own voice characteristic.
In some implementations, in particular in a first mode of operation of the hearing device in which the own voice activity of the user is detected, employing such a similarity measure can be exploited for an enhanced reliability during detection of the own voice activity. For instance, in a situation in which at least one of the vibration signal and the audio signal includes a rather large signal to noise ratio (SNR), the similarity measure can contribute to an improved identification criterion for the own voice activity which may compensate the poorer quality of at least one of the signals. Thus, a required signal threshold for said determining of the own voice characteristic at the associated frequency may be lowered by employing the similarity measure.
In some implementations, in particular in a second mode of operation of the hearing device in which the hearing device is prepared for the detection of the own voice activity of the user, employing such a similarity measure can be exploited for deriving at least one of said vibration frequency associated with the own voice characteristic in the vibration signal and said audio frequency associated with the own voice characteristic in the audio signal. For instance, when both the associated vibration frequency and the associated audio frequency are unknown, both frequencies may be derived by the similarity measure larger than the threshold value. When one of the associated vibration frequency and the associated audio frequency is unknown, the other frequency may be derived by the similarity measure larger than the threshold value.
In some implementations, the processor is configured to determine the SNR from the audio signal. The processor can be configured to derive at least one of the associated vibration frequency and the associated vibration frequency when the SNR is determined to be smaller than a threshold value. For instance, in a situation in which at least one of the vibration signal and the audio signal includes a rather low SNR, the good quality of the respective signal may be exploited to derive the respective frequency associated with the own voice characteristic based on the similarity measure in the above described way. The derived frequency can thus be used for the own voice detection at a later time by determining said presence of the own voice characteristic at the derived frequency, for instance at a larger SNR of the vibration signal and/or the audio signal. The derived frequency can be stored in a database such that it can be retrieved at the later time for the determining of said presence of the own voice characteristic at the derived frequency associated with the own voice characteristic.
In some implementations, the microphone is communicatively coupled to a beamformer. The processor can be configured to steer a directionality of the beamformer toward a mouth of the user mouth during said detection of the sound. In this way, said part of said sound caused by the own voice activity can be detected in an improved manner. Thus, the audio signal can be provided in a suitable way for an improved reliability of the own voice detection. The microphone can be included in a microphone array communicatively coupled to the beamformer. In some implementations of the binaural hearing system, the second hearing device also comprises a microphone communicatively coupled to the beamformer. By the binaural beamforming, the signal quality of the audio signal can be further improved regarding the own voice detection.
In some implementations, the processor is configured to determine an audio signal characteristic from the audio signal. The audio signal characteristic can comprise a SNR of the audio signal. The identification criterion can further comprise the SNR smaller than a threshold value of the SNR. The audio signal characteristic can comprise an intensity of the audio signal. The identification criterion can further comprise the intensity larger than a threshold value of the intensity.
In some implementations, the processor is configured to determine an intensity of the audio signal and to select at least one of said associated vibration frequency and said associated audio frequency depending on said audio signal intensity. The intensity can be indicative of a volume of said sound detected by the microphone, in particular a volume level. In this way, a frequency shift of the frequency associated with the own voice characteristic at different speech volumes of the user can be accounted for, which can be caused by the “Lombard effect”. In some implementations, the processor is configured to determine an intensity of the vibration signal and to select at least one of said associated vibration frequency and said associated audio frequency depending on said audio signal intensity, in order to account for the Lombard effect. In some implementations, in particular in said second mode of operation of the hearing device, the processor is configured to determine an intensity of the audio signal and/or the vibration signal and to derive at least one of said associated vibration frequency and said associated audio frequency. The derived vibration frequency and/or audio frequency can then be employed during own voice detection at varying speech volumes of the user, in particular in said first mode of operation of the hearing device. The processor can be configured to store said determined intensity and said derived vibration frequency and/or audio frequency in a database, in particular in said second mode of operation of the hearing device, and to retrieve the data, in particular in said first mode of operation of the hearing device.
In some implementations, the processor is configured to determine a signal feature of at least one of the vibration signal and the audio signal, and to classify, based on a pattern of own voice characteristics, the signal feature as the own voice characteristic. The processor can be configured to derive at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal. In this way, the processor can be configured to learn the associated vibration frequency from the vibration signal and/or the associated audio frequency from the audio signal. In some implementations, in particular in said second mode of operation of the hearing device, the hearing device can thus be prepared for the detection of the own voice activity of the user by providing the derived frequency as the frequency associated with the own voice characteristic. For instance, the derived frequency can be stored in a database in the second mode of operation such that it can be retrieved from the database by the processor in the first mode of operation of the hearing device. In some implementations, in particular in said first mode of operation of the hearing device, the presence of the own voice characteristic can thus be determined at the derived frequency associated with the own voice characteristic and the own voice activity of the user can be detected by identifying the own voice activity based on the identification criterion.
In some implementations, the processor is configured to determine a similarity measure between the signal feature and the pattern of own voice characteristics. The signal feature can be classified as the own voice characteristic if the similarity measure is determined to be larger than a threshold value of the similarity measure. The classification can be provided by a classification algorithm, in particular a classifier, executed by the processor. The classification algorithm can comprise, for instance, a linear classifier such as a Bayes classifier.
The pattern of own voice characteristics can be determined, in particular learned, from a set of own voice characteristics that have been previously determined at an associated frequency, in particular a vibration frequency and/or audio frequency. The set of own voice characteristics can comprise, for instance, own voice characteristics determined from different users, in particular to determine the pattern of own voice characteristics common to the users. The set of own voice characteristics can comprise own voice characteristics determined at various times, in particular to determine the pattern of own voice characteristics over time. The set of own voice characteristics can comprise own voice characteristics determined at different SNR values of the signal, in particular to determine the pattern of own voice characteristics at different SNR values and/or common to different SNR values. The set of own voice characteristics can comprise own voice characteristics determined at various speech volumes of the user, in particular to determine the pattern of own voice characteristics at different speech volumes and/or common to different speech volumes. The latter pattern may be employed, for instance, when determining a presence of an own voice characteristic at the associated frequency at different speech volumes of the user causing a frequency shift by the Lombard effect, as described above. In some implementations, the pattern of own voice characteristics is provided to the processor. In particular, the pattern of own voice characteristics can be stored in a database such that it can be retrieved by the processor from the database. In some implementations, the pattern of own voice characteristics is determined and/or customized by the processor. For instance, the processor can be configured to determine and/or customize the pattern based on classifying the signal characteristic in the above described way. The processor can also be configured to collect the set of own voice characteristics over time and to determine and/or customize the pattern from the set. The determined and/or customized pattern of own voice characteristics can be stored in a database such that it can be retrieved by the processor from the database.
In some implementations, the vibration signal comprises first directional data indicative of a first direction of said part of the vibration caused by the own voice activity, and second directional data indicative of a second direction of said part of the vibration caused by the own voice activity. The first direction can be different from the second direction, for instance perpendicular to the first direction. The processor can be configured to determine said presence of the own voice characteristic in the first directional data and in the second directional data. The identification criterion can further comprise a coincidence, in particular a correlation, of said presence of the own voice characteristic in the first directional data and in the second directional data. This can also contribute to a better own voice detection reliability. The vibration signal can also comprise third directional data indicative of a third direction of said part of the vibration caused by the own voice activity and the processor can be configured to determine said presence of the own voice characteristic in the third directional data. The identification criterion can further comprise a coincidence, in particular a correlation, of said presence of the own voice characteristic in the first, second, and third directional data.
In some implementations, the vibration sensor comprises an accelerometer. Said vibration can be detectable by the accelerometer as an acceleration measurable by the accelerometer. The accelerometer can be configured to detect said vibration in a first spatial direction and a second spatial direction. The accelerometer can thus be configured to provide the vibration signal with said first directional data and said second directional data. The accelerometer can be configured to detect said vibration in a third spatial direction and to provide the vibration signal with said third directional data. For instance, the spatial directions can correspond to the directions of a cartesian coordinate system.
In some implementations, the own voice characteristic comprises a peak of the vibration signal at the associated vibration frequency. The own voice characteristic can be detected at the associated vibration frequency by a peak detection in the vibration signal. In some implementations, the own voice characteristic comprises a minimum signal level of the vibration signal at the associated vibration frequency. The own voice characteristic can be detected by determining the signal level larger than the minimum signal level in the vibration signal at the associated vibration frequency. In some implementations, the own voice characteristic comprises a peak of the audio signal at the associated vibration frequency. The own voice characteristic can be detected at the associated audio frequency by a peak detection in the audio signal. In some implementations, the own voice characteristic comprises a minimum signal level of the audio signal at the associated audio frequency. The own voice characteristic can be detected by determining the signal level larger than the minimum signal level in the audio signal at the associated audio frequency.
In some implementations, the associated vibration frequency comprises a harmonic frequency of said part of the vibration caused by the own voice activity. In particular, the harmonic frequency can be defined as a frequency of a harmonic content of the vibration signal produced by said part of the vibration caused by the own voice activity. For instance, the harmonic frequency can be a frequency of a harmonic content of the vibration signal provided by the accelerometer. The harmonic frequency can comprise the fundamental frequency of said part of the vibration. The harmonic frequency may also comprise a higher harmonic frequency of said part of the vibration, in particular a frequency corresponding to the fundamental frequency multiplied by a positive integer. In some implementations, the processor is configured to select the associated vibration frequency such that it comprises the harmonic frequency, in particular the fundamental frequency, of said part of the vibration. Thus, the own voice detection reliability may be further improved in a manner requiring rather small signal processing. Selecting the fundamental frequency, as compared to a higher harmonic frequency, may provide the least error-prone determination of the own voice characteristic. In some implementations, the associated audio frequency comprises a harmonic frequency of said part of the sound caused by the own voice activity, in particular the fundamental frequency of said part of the sound. In some implementations, the processor is configured to select the associated audio frequency such that it comprises the harmonic frequency of said part of the sound. The harmonic frequency, in particular fundamental frequency, associated with the own voice characteristic may be stored in a database such that it can be retrieved by the processor from the database for determining said presence of the own voice characteristic at the associated frequency. In particular, multiple fundamental frequencies associated with multiple own voice characteristics can be stored in the database.
In some implementations, the associated vibration frequency comprises an alias frequency of a frequency of said part of the vibration caused by the own voice activity. In some implementations, the processor is configured to select the associated vibration frequency such that it comprises said alias frequency. In some implementations, the processor can be configured to process the vibration signal at a sampling rate producing an aliasing of the vibration signal at the alias frequency. In some implementations, the vibration detector can be configured to provide the vibration signal at a sampling rate producing an aliasing of the vibration signal at the alias frequency, in particular when processed by the processor. For instance, the processing rate and/or recording rate can be less than double of a rate corresponding to a vibration frequency which can be mirrored in the vibration signal at the alias frequency. In some implementations, the signal content at the alias frequency is provided by providing the vibration signal to the processor unfiltered at the mirrored frequency. For instance, an anti-aliasing filter, in particular low pass filter, provided at an input of the processor may be configured in such a way and/or may be omitted. In this way, said presence of the own voice characteristic can be determined at higher frequencies associated with the own voice characteristic at the associated alias frequency of the higher frequencies mirrored a lower frequency range. Thus, the recording and/or processing of the vibration signal can be less power intensive and more processing efficient contributing to a reduced complexity of the own voice detection.
In same implementations, a frequency range of a female voice, in particular a frequency range between 150 Hz and 250 Hz, and/or a frequency range of a child's voice, in particular a frequency range between 250 Hz and 650 Hz, is reproduced at a range of alias frequencies in the vibration signal comprising vibration frequencies in a frequency range of a male voice, in particular a frequency range below 150 Hz. In some implementations, the frequency reproduced at the alias frequency in the vibration signal corresponds to a harmonic frequency of said part of the vibration caused by the own voice activity, in particular the fundamental frequency. In some implementations, the associated audio frequency comprises an alias frequency of a frequency of said part of the sound caused by the own voice activity. In some implementations, the processor is configured to select the associated audio frequency such that it comprises said alias frequency. In some implementations, the frequency reproduced at the alias frequency in the audio signal corresponds to a harmonic frequency of said part of the sound caused by the own voice activity, in particular the fundamental frequency. The alias frequency associated with the own voice characteristic may be stored in a database such that it can be retrieved by the processor from the database for determining said presence of the own voice characteristic at the associated alias frequency. In particular, multiple alias frequencies associated with multiple own voice characteristics can be stored in the database.
In some implementations, the processor is configured to evaluate the vibration signal at a sampling rate of at most 1 kHz. In some implementations, the vibration detector is configured to provide the vibration signal at a sampling rate of at most 1 kHz. Such a sampling rate can allow an efficient signal processing such that the own voice detection can be provided in a time efficient manner. In some implementations, such a sampling rate may exploit the aliasing of the associated vibration frequency as described above, in particular for determining a frequency range of a child's voice, to ensure a good reliability of the own voice detection. In some implementations, the processor is configured to evaluate the vibration signal at a sampling rate of at most 500 Hz. In some implementations, the vibration detector is configured to provide the vibration signal at a sampling rate of at most 500 Hz. Thus, the efficiency of the signal processing can be further improved. In some implementations, such a sampling rate may also exploit the aliasing of the associated vibration frequency, in particular for determining a frequency range of a child's voice and/or female voice. In some implementations, the processor is configured to evaluate the audio signal at a sampling rate of at most 1 kHz, in particular at most 500 Hz.
The vibration frequency associated with the own voice characteristic in the vibration signal and/or the audio frequency associated with the own voice characteristic in the audio signal can comprise a frequency bandwidth. In some implementations, the determining the presence of the own voice characteristic comprises simultaneous determining a signal feature and determining of a presence of the signal feature at the frequency bandwidth associated with the own voice characteristic. For instance, the processor can be configured to select the frequency bandwidth and determine the presence of the signal feature at the frequency bandwidth. In some implementations, the determining the presence of the own voice characteristic comprises determining a signal feature and subsequent determining of a presence of the signal feature at said frequency bandwidth. For instance, the processor can be configured to determine the presence of the signal feature and subsequently select the frequency bandwidth and determine the signal feature present at the frequency bandwidth. In some implementations, the frequency bandwidth corresponds to a width of at most 50 Hz, in particular at most 20 Hz. This can improve the reliability of the own voice detection, in particular in conjunction with a speech recognition of the user's voice.
In some implementations, the hearing device further comprises a high pass filter configured to provide the vibration signal with vibration frequencies above a cut-off frequency of at most 100 Hz, in particular at most 80 Hz. In this way, the vibration signal can be provided to the processor with a signal content in which signal artefacts, in particular artefacts of vibrations caused by a body movement of the user, are removed. Thus, the own voice detection reliability can be enhanced. In some implementations, the high pass filter is configured to provide the vibration signal to the processor with vibration frequencies above a cut-off frequency of at most 50 Hz, in particular at most 30 Hz. Such a range of vibration frequencies provided by the high pass filter can have the additional advantage to exploit the above described aliasing effect during said determining of the own voice characteristic by still allowing to remove artefacts caused by a body movement of the user. In some implementations, the cut-off frequency is at least 1 Hz.
In some implementations, the hearing device further comprises a low pass filter configured to provide said audio signal with audio frequencies below a cut-off frequency of at most 8 kHz, in particular at most 4 kHz. In this way, the audio signal can be provided to the processor with a signal content adjusted to the own voice detection, in particular to improve the own voice detection reliability and/or a speech recognition, wherein a variety of own voice characteristics related to vocals, consonants, keywords etc. may be distinguishable. In some implementations, the low pass filter is configured to provide the vibration signal to the processor with vibration frequencies below a cut-off frequency of at most 1 kHz, in particular at most 500 Hz. Such a configuration can be particularly advantageous for own voice detection in noisy environments.
In some implementations, the processor is configured to provide a speech recognition of the user during the own voice activity. The speech recognition can be based on said identification criterion including said determined presence of said own voice characteristic at the associated frequency. The determined own voice characteristic may thus be allocated to a speech component of the user, and recognized based on the allocation. The determined own voice characteristic may also be allocated to a speech component of the user which has been recognized by another speech recognition method, and confirmed or not confirmed based on this allocation, to increase the reliability of the speech recognition. In some implementations, in particular for said speech recognition, the processor is configured to recognize, based on the identification criterion, at least one of a word and a phrase spoken by the user during said own voice activity.
In some implementations, the associated vibration frequency is selected such that the own voice characteristic is representative for a speech component characteristic of said own voice activity. In some implementations, the speech component comprises at least one of a vowel, a consonant, a voiced phoneme, an unvoiced phoneme, a syllable, and a word rate spoken by the user during the own voice activity. In some implementations, the processor is configured to select the associated vibration frequency such that it corresponds to a selected speech component, for instance at least one vowel spoken by the user. The associated vibration frequency corresponding to the selected speech component can be user specific. Such an associated vibration frequency may be derived from the processor, in particular learned by the processor, for instance in any of the above described ways. In some implementations, the processor is configured to select the associated vibration frequency from a set of associated vibration frequencies each corresponding to a different speech component, for instance a different vowel spoken by the user.
In some implementations of the binaural hearing system, the second hearing device comprises an additional vibration sensor configured to detect said vibration and to output an additional vibration signal comprising information about said vibration. Said identification criterion can further comprise a coincidence, in particular a correlation, of said presence of the own voice characteristic in the vibration signal of the first hearing device and in the additional vibration signal of the second hearing device at the associated vibration frequency. In some implementations of the binaural hearing system, the second hearing device comprises an additional microphone configured to detect said sound, and to output an additional audio signal comprising information about said sound. Said identification criterion can further comprise a coincidence, in particular a correlation, of said presence of the own voice characteristic in the audio signal of the first hearing device and in the additional audio signal of the second hearing device at the associated audio frequency.
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. The drawings illustrate various embodiments and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the disclosure. Throughout the drawings, identical or similar reference numbers designate identical or similar elements. In the drawings:
Referring to
Hearing device 100 may be implemented by any type of hearing device configured to enable or enhance hearing of a user wearing hearing device 100. For example, hearing device 100 may be implemented by a hearing aid configured to provide an amplified version of audio content to a user, an earphone, a cochlear implant system configured to provide electrical stimulation representative of audio content to a user, a sound processor included in a bimodal hearing system configured to provide both amplification and electrical stimulation representative of audio content to a user, or any other suitable hearing prosthesis. Different types of hearing devices can also be distinguished by the position at which a housing accommodating output transducer 100 is intended to be worn at a head of a user relative to an ear canal of the user. Hearing devices which are configured such that the housing enclosing the transducer can be worn at a wearing position outside the ear canal, in particular behind an ear of the user, can include, for instance, behind-the-ear (BTE) hearing aids. Hearing devices which are configured such that the housing enclosing the transducer can be at least partially inserted into the ear canal can include, for instance, earbuds, earphones, and hearing instruments such as receiver-in-the-canal (RIC) hearing aids, in-the-ear (ITE) hearing aids, invisible-in-the-canal (IIC) hearing aids, and completely-in-the-canal (CIC) hearing aids. The housing can be an earpiece adapted for an insertion and/or a partial insertion into the ear canal. Some hearing devices comprise a housing having a standardized shape intended to fit into a variety of ear canals of different users. Other hearing devices comprise a housing having a customized shape adapted to an ear canal of an individual user. The customized housing can be, for instance, a shell formed from an ear mould or an earpiece that is customizable in-situ by the user.
Microphone 106 may be implemented by any suitable audio detection device and is configured to detect a sound presented to a user of hearing device 100. The sound can comprise audio content (e.g., music, speech, noise, etc.) generated by one or more audio sources included in an environment of the user. The sound can also include audio content generated by a voice of the user during an own voice activity, such as a speech by the user. In particular, a vibration of the user's vocal chords during the own voice activity may produce airborne sound in the environment of the user, which is detectable as the audio signal by microphone 106. Microphone 106 is configured to output an audio signal comprising information about the sound detected from the environment of the user. Microphone 106 may be included in or communicatively coupled to hearing device 100 in any suitable manner. Output transducer 110 may be implemented by any suitable audio output device, for instance a loudspeaker of a hearing device or an output electrode of a cochlear implant system.
Vibration sensor 108 may be implemented by any suitable sensor configured to detect a vibration conducted during an own voice activity through the user's head. In particular, the vibrations can be conducted from the user's vocal chords through the bones and tissue of the head. In some implementations, sensor 108 may also be referred to as a bone vibration sensor. Vibration sensor 108 is configured to output a vibration signal comprising information about the detected vibrations. Vibration sensor 108 may be positioned at any position at the user's head allowing the detection of the vibrations conducted through the head. In some implementations, vibration sensor 108 can be positioned behind an ear of the user. For instance, vibration sensor 108 can be included in a part of a BTE or RIC hearing aid intended to be worn behind the user's ear. In some implementations, vibration sensor 108 can be positioned inside an ear canal of the user. For instance, vibration sensor 108 can be included in a part of an earbud or of a MC or ITE or IIC or CIC hearing aid intended to be worn inside the ear canal.
In some implementations, vibration sensor 108 can be included inside a housing of the hearing device. The vibrations can be transmitted from the user's head through the housing to vibration sensor 108. In some implementations, vibration sensor 108 can be provided externally from a housing of the hearing device. In particular, vibration sensor 108 can be provided at a head surface, for instance behind the ear or inside the ear canal, to directly pick up the vibrations from the users head. Thus, while hearing device 100 is being worn by a user, the detected vibrations are representative of the own voice activity. In some implementations, vibration sensor 108 comprises an inertial sensor, in particular an accelerometer and/or a gyroscope. The inertial sensor can be positioned inside the ear canal or at a different position at the user's head. In some implementations, vibration sensor 108 comprises a bone conductive microphone and/or a pressure sensor and/or a strain gauge to be positioned inside an ear canal as disclosed in European patent application No. EP 18195686.3, which is herewith included by reference. In some implementations, vibration sensor 108 comprises an optical sensor employing a light emitter, such as a laser diode or a LED, and a photodetector to detect the vibrations, as disclosed in U.S. patent application publication Nos. US 2018/0011006 A1 and US 2018/0011006 A1, which are herewith included by reference.
In some implementations, vibration sensor 108 is configured to output the vibration signal while microphone 106 outputs the audio signal. Both, the vibration signal and the audio signal can be representative of the own voice activity. For example, the audio signal may represent audio content generated, on the one hand, by one or more audio sources included in an environment and, on the other hand, by the own voice activity, while the vibration signal may represent vibrations mostly generated by the own voice activity. As another example, the vibration signal may contain additional artefacts caused, for instance, by a movement of the user and/or impacts from the environment.
Memory 104 may be implemented by any suitable type of storage medium and may be configured to maintain (e.g., store) data generated, accessed, or otherwise used by processor 102. For example, memory 104 may maintain data representative of an own voice processing program that specifies how processor 102 processes the vibration signal and/or the audio signal. Memory 104 may also be used to maintain a database including data representative of parameters that are employed for the own voice detection. To illustrate, memory 104 may maintain data associated with own voice characteristics that can be representative for an own voice activity in the vibration signal provided by vibration sensor 108 and/or in the audio signal provided by microphone 106. The data may include values of a vibration frequency of the vibration signal and/or values of an audio frequency of the audio signal which are associated with a respective own voice characteristic in the vibration signal and/or audio signal.
Processor 102 may be configured to access the vibration signal generated by vibration sensor 108 and/or the audio signal generated by microphone 106. Processor 102 may use the vibration signal and/or the audio signal to identify an own voice activity of the user. For example, processor 102 may be configured to determine a presence of an own voice characteristic in the vibration signal at a vibration frequency associated with an own voice characteristic, and to identify the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the vibration signal at the associated vibration frequency. As another example, processor 102 may determine a presence of an own voice characteristic in the audio signal at an audio frequency associated with the own voice characteristic, and to identify the own voice activity based on an identification criterion comprising said determined presence of the own voice characteristic in the audio signal at the associated audio frequency. These and other operations that may be performed by processor 102 are described in more detail in the description that follows. References to operations performed by hearing device 100 may be understood to be performed by processor 102 of hearing device 100.
Signal features produced in vibration signals 132-134 by the own voice activity can be visualized in frequency spectra 142-144. In the example, such a signal feature of vibration signals 132-134 produced by the pronunciation of the first vowel can be seen as a peak 147, 148, 149 visible in frequency spectra 142-144 at an associated vibration frequency of approximately 78 Hz. Signal features 147-149 each are indicative of the vibration caused by the own voice activity and thus correspond to an own voice characteristic. Own voice characteristic 147-149 is produced in each vibration signal 132-134 for the different spatial directions 122-124. Determining a presence of the own voice characteristic in vibration signals 132-134 at the associated vibration frequency can thus be exploited to provide an identification criterion for the own voice activity. On the one hand, such an identification criterion can facilitate the own voice detection, in particular to allow a faster detection. On the other hand, such an identification criterion can increase the reliability of the own voice detection, in some implementations also in conjunction with additional requisites satisfying the identification criterion.
Signal features produced in vibration signals 132-134 by the pronunciation of the second vowel can be seen in frequency spectra 152-154 as a spectral peak 157, 158, 159. Signal features 157-159 each are indicative of the vibration caused by the own voice activity and thus each correspond to an own voice characteristic. The vibration frequency associated with own voice characteristics 157-159 is approximately 92 Hz in each vibration signal 132-134 for the different spatial directions 122-124. The vibration frequency associated with own voice characteristics 147-149 produced in vibration signals 132-134 by the pronunciation of the first vowel thus differs from the vibration frequency associated with own voice characteristics 157-159 produced in vibration signals 132-134 by the pronunciation of the second vowel. This shows that the vibration frequency associated with the own voice characteristics produced in vibration signals 132-134 can depend on the content of the own voice activity, in particular the content of the user's speech. Moreover, the vibration frequency associated with the own voice characteristics generally can also depend on properties of the user. For instance, different voices of different users generally may produce an own voice characteristics associated with a different vibration frequency in the vibration signal, in particular for an own voice activity including the same content. Moreover, different speech volumes of the own voice activity, for instance when the user speaks louder due to noise occurring in the environment, can lead to a frequency shift of the vibration frequency associated with the own voice characteristic. The later phenomenom is also known as the “Lombard effect”. An own voice detection relying on an identification criterion comprising a presence of the own voice characteristic in vibration signals 132-134 may thus account for the occurring variations of the vibration frequency associated with the own voice characteristic in order to increase the detection reliability. Some embodiments of hearing device 100 and methods of its operation, which allow to employ such an identification criterion for own voice detection at varying vibration frequencies associated with the own voice characteristic, are addressed in the subsequent description.
In the method illustrated in
The determining the signal feature can comprise a peak detection in the vibration signal. In some implementations, the vibration signal can be evaluated in a frequency domain comprising a spectrum of vibration frequencies in order to determine the signal feature. This may imply converting a time dependent vibration signal from a time domain into the frequency domain. In some implementations, the signal feature can be determined directly from a time dependent vibration signal. To illustrate, at least one of peaks 147-149 and/or at least one of peaks 157-159 produced in vibration signals 132-134 at an associated vibration frequency may be extracted after converting at least a temporal section of vibration signals 132-134 from the time domain into the frequency domain, as illustrated in
In operation 607, a decision is performed depending on an identification criterion. The identification criterion can be based on whether the signal feature is determined to be present in the vibration signal at a vibration frequency associated with an own voice characteristic. The signal feature can thus be identified as the own voice characteristic which is determined to be present at the associated vibration frequency. In some implementations, determining the presence of the own voice characteristic at the associated vibration frequency comprises simultaneous determining the signal feature in the vibration signal and determining the presence of the signal feature at the vibration frequency associated with the own voice characteristic in operation 603. In particular, the vibration signal can be evaluated at the associated vibration frequency with respect to the presence of the signal feature which is thus identified as the own voice characteristic. In some implementations, the presence of the own voice characteristic at the associated vibration frequency comprises the operations of determining the signal feature in operation 603, and subsequently determining the presence of the signal feature at the vibration frequency associated with the own voice characteristic. For instance, the vibration signal can be evaluated for any vibration frequency or a plurality of vibration frequencies with respect to the presence of the signal feature and then it can be determined if a vibration frequency at which the signal feature is present corresponds to the vibration frequency associated with the own voice characteristic. To illustrate, vibration signals 132-134 may be evaluated at the vibration frequency associated with at least one of peaks 147-149 and/or at least one of peaks 157-159 in order to determine the presence of the respective peak at the associated vibration frequency, and/or vibration signals 132-134 may be first evaluated with respect to the presence of at least one of peaks 147-149 and/or at least one of peaks 157-159 and then it may be determined if the respective peak is present at the associated vibration frequency.
The vibration frequency associated with the own voice characteristic can comprise a frequency bandwidth. The frequency bandwidth can be selected such that it accounts for inaccuracies and/or variances of a value of the vibration frequency occurring during the detection of the vibration. In some implementations, the frequency bandwidth can be selected such that it is associated with a plurality of own voice characteristics. To illustrate, the vibration frequency can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 147-149 and the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134. The own voice activity may thus be identified depending on at least one of the own voice characteristics determined to be present at the associated vibration frequency. In some implementations, the frequency bandwidth can be selected such that it is associated with a single own voice characteristic. To illustrate, the vibration frequency associated with one own voice characteristic can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 147-149 produced in vibration signals 132-134 and not comprising the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134. The vibration frequency associated with another own voice characteristic can be a frequency bandwidth comprising the vibration frequency associated with at least one of peaks 157-159 produced in vibration signals 132-134 and not comprising the vibration frequency associated with at least one of peaks 147-149 produced in vibration signals 132-134. The own voice activity may thus be identified depending on the respective own voice characteristic determined to be present at the associated vibration frequency.
Depending on the outcome of the decision performed in operation 607, a non-occurring own voice activity of the user is identified in operation 608, if the own voice characteristic has not been determined to be present in the vibration signal at the vibration frequency associated with the own voice characteristic. Conversely, an occurrence of an own voice activity of the user is identified in operation 609, if the own voice characteristic has been determined to be present in the vibration signal at the associated vibration frequency.
To illustrate, the own voice characteristic can be produced in the vibration signal at an alias frequency of the fundamental frequency by employing a sampling rate causing an aliasing effect. Vibration sensor 108 can be configured to record the vibrations caused by the own voice activity at this sampling rate and/or to provide the vibration signal at this sampling rate. To this end, vibration sensor 108 may be configured to sample the vibrations from an analog input without applying an anti-aliasing filter (e.g. low pass filter) in between. Vibration sensor 108 can thus be configured to produce the own voice characteristic in the vibration signal at the fundamental vibration frequency and/or at the alias vibration frequency, in particular such that anti-aliasing components can be produced in the vibration signal. Determining the presence of the own voice characteristic at the alias vibration frequency can have the advantage to allow vibration sensor 108 to operate at a lower sampling rate than the Nyquist rate. This can allow determining the presence of an own voice characteristic in the vibration signal exhibiting a fundamental frequency beyond the Nyquist frequency. For instance, at least one of peaks 147-149 and/or at least one of peaks 157-159 produced in vibration signals 132-134 may be produced by a pronunciation of the vowel at a fundamental frequency corresponding to the associated vibration frequency, or they can be produced by a pronunciation of the vowel at a fundamental frequency larger than the associated vibration frequency, wherein an alias frequency of the fundamental frequency corresponds to the associated vibration frequency. For example, an own voice activity of a female voice characterized by higher vibration frequencies may thus be determined by a presence of the own voice characteristic at the alias vibration frequency of the fundamental frequency, whereas an own voice activity of a male voice characterized by lower vibration frequencies may be determined by a presence of the own voice characteristic at the fundamental frequency.
Depending on the outcome of the decision performed in operation 907, an occurrence of an own voice activity of the user is identified in operation 609 if the own voice characteristic in the vibration signal has been determined to be present at the fundamental vibration frequency associated with the own voice characteristic. Depending on the outcome of the decision performed in operation 908, an occurrence of an own voice activity of the user is identified in operation 609, if the own voice characteristic in the vibration signal has been determined to be present at the alias frequency of the fundamental vibration frequency. Conversely, a non-occurring own voice activity of the user is identified in operation 608 if the own voice characteristic in the vibration signal neither has been determined to be present at the fundamental vibration frequency after the decision in operation 907, nor at the alias vibration frequency after the decision in operation 908. The decisions according to operations 907, 908 may be performed simultaneously or in any order.
In some implementations, the decision performed in operation 907 can be omitted. Those implementations may correspond to some embodiments of the method illustrated in
In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in
The decision in operation 1007 depending on an identification criterion whether the own voice characteristic is determined to be present at the associated vibration frequency substantially corresponds to operation 607 described above, wherein the identification criterion further depends on whether the own voice characteristic is determined to be present at the first time in the vibration signal. The decision in operation 1008 depending on an identification criterion whether the own voice characteristic is determined to be present at the associated vibration frequency substantially corresponds to operation 607 described above, wherein the identification criterion further depends on whether the own voice characteristic is determined to be present at the second time in the vibration signal. Operations 1007, 1008 can be performed in any order or they can be performed simultaneously. In particular, operation 607 in the method illustrated in
In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in
In operation 1107, a decision is performed depending on an identification criterion. The identification criterion can be based on at least one of whether the own voice characteristic is determined to be present in the vibration signal at a vibration frequency associated with the own voice characteristic, and whether the own voice characteristic is determined to be present in the audio signal at an audio frequency associated with the own voice characteristic. In some implementations, determining the presence of the own voice characteristic at the associated frequency can comprise determining the signal feature in the vibration signal and/or audio signal and simultaneously determining a presence of the signal feature at the frequency associated with the own voice characteristic in at least one of operations 603, 1103. In some implementations, determining the presence of the own voice characteristic at the associated frequency can also comprise subsequent determining of a signal feature in the vibration signal and/or audio signal in at least one of operations 603, 1103 and then determining the presence of the signal feature at the frequency associated with the own voice characteristic.
In some implementations, the identification criterion can be based on a similarity measure between the signal feature determined in the vibration signal in operation 603 and the signal feature determined in the audio signal in operation 1103. Determining the similarity measure can comprise determining a comparison and/or a correlation, for instance a cross-correlation, of the vibration signal and the audio signal with respect to the frequency at which the signal feature determined in operations 603, 1103 has been determined to be present. Thus, the vibration frequency and the audio frequency at which the signal feature has been determined to be present in operations 603, 1103 can be evaluated with respect to the comparison and/or correlation. The decision in operation 1107 can be performed depending on whether the similarity measure has been determined to be large enough. In particular, the identification criterion may be provided such that the vibration frequency at which the signal feature has been determined to be present in operation 603 and the audio frequency at which the signal feature has been determined to be present in operation 1103 must be similar to a specified degree, for instance such that they are shifted by a certain frequency difference or by at most a maximum value of a frequency difference or such that they are substantially equal. When the similarity measure has been determined to be large enough, at least one of the signal feature determined in operation 603 can be identified as the own voice characteristic determined to be present in the vibration signal at the associated vibration frequency and the signal feature determined in operation 1103 can be identified as the own voice characteristic determined to be present in the audio signal at the associated audio frequency.
In some implementations, at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal is set to a predetermined frequency. For instance, at least one of the associated vibration frequency and the associated audio frequency can be retrieved from a database by applying an operation corresponding to operation 703 illustrated in
In some implementations, the maintaining of data relative to the own voice characteristic in operation 702, and/or the retrieving of the associated vibration frequency from the data in operation 703, as illustrated in
In some implementations, an audio signal characteristic is determined from the audio signal in operation 1113. Determining the audio signal characteristic can comprise estimating a signal to noise ratio (SNR) of the audio signal. Determining the audio signal characteristic can comprise estimating a volume level of the audio signal, in particular a volume level of the own voice activity and/or a volume level of other sound in the environment. The determined audio signal characteristic can be employed during the decision performed in operation 1107. For instance, a significance of the signal feature determined to be present in the audio signal can depend on an estimated SNR of the audio signal. For instance, the identification criterion applied in the decision in operation 1107 may predominantly depend on whether the signal feature is determined to be present in the vibration signal at the vibration frequency associated with the own voice characteristic when the SNR is estimated to be rather high in the audio signal. In some implementations, at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal is set depending on the audio signal characteristic. In particular, the audio signal characteristic can comprise an estimated volume level of the audio signal and at least one of the vibration frequency associated with the own voice characteristic in the vibration signal and the audio frequency associated with the own voice characteristic in the audio signal can be set depending on the estimated volume level, in order to account for the “Lombard effect” causing a frequency shift of the detected own voice activity at different speech volumes of the user.
In some implementations, a speech recognition is performed in operation 1109. The speech recognition can be used to identify a content of a speech of the user during the own voice activity, for instance keywords spoken by the user. The speech recognition can employ the own voice characteristic determined in the vibration signal at the associated vibration signal and/or the own voice characteristic determined in the audio signal at the associated audio signal. To illustrate, peaks 147-149 and/or peaks 157-159 produced in vibration signals 132-134 may be identified as the respective vowels spoken by the user. In order to identify a plurality of vowels, consonants, words, phonemes, speech pauses, etc. successively spoken by the user, the own voice characteristic can be determined in the vibration signal and/or in the audio signal at different times, in particular by correspondingly applying operations 1003, 1004 illustrated in
In some implementations, operation 1203 of deriving the own voice characteristic can comprise determining a signal feature in operation 603 and classifying the signal feature as the own voice characteristic. In particular, classifying operation 804 based on a pattern of own voice characteristics provided in operation 805, as illustrated in
In some implementations, operation 1203 of deriving the own voice characteristic can comprise initiating a training operation for an individual user. During the training operation, the user can be instructed to perform a predetermined own voice activity. The own voice characteristic in the vibrations signal that can be attributed to the own voice activity can thus be identified during operation 1203. The associated vibration frequency can thus be identified during operation 1204, in particular as the vibration frequency at which the own voice characteristic has been determined to be present in operation 1203. Initiating the training operation can comprise, for instance, instructing the user to pronounce a certain number of vowels, consonants, phonemes, words, etc. The user may also be instructed to perform the own voice activity at different volume levels.
A decision in operation 1305 can then be performed depending on the determined similarity measure. In a situation in which a determined similarity has been determined to be larger than a similarity threshold, for instance a correlation has been determined to be large enough, at least one of a vibration frequency associated with the own voice characteristic in the vibration signal and an audio frequency associated with the own voice characteristic in the audio signal can be identified based on the similarity measure in operation 1204. For instance, the associated vibration frequency and/or the associated audio frequency may then be selected to correspond to the vibration frequency and/or audio frequency at which the at least one of the signal features has been determined in operations 603, 1103. The associated vibration frequency and/or the associated audio frequency can then be stored in the data base for own voice characteristics in operation 1209. In a contrary situation, in which the similarity has not been determined to be larger than the similarity threshold, the associated vibration frequency and/or the associated audio frequency cannot be identified and the data base for own voice characteristics is maintained in its present state in operation 702.
In some implementations, operation 1113 of determining an audio signal characteristic, as described above in conjunction with the method illustrated
In some implementations, the hearing device is configured to operate in a first mode of operation in which an own voice activity of the user is detected and in a second mode of operation in which the hearing device can be prepared for the detection of the own voice activity. The first mode of operation may be implemented by at least one of the methods illustrated in
In some implementations, peak detector 1403 is configured for peak detection at an harmonic frequency, for instance the fundamental frequency, of the vibration detected by vibration sensor 108, as illustrated by component 104 constituting a harmonic frequency peak detector. A determination, if the detected peak is present at the harmonic frequency, can be carried out simultaneously during peak detection, for instance by harmonic frequency peak detector 1404, or after peak detection, for instance by own voice identifier 1407. In some implementations, peak detector 1403 is configured for peak detection at an alias frequency, of the vibration detected by vibration sensor 108, as illustrated by component 105 constituting an alias frequency peak detector. A determination, if the detected peak is present at the alias frequency, can be carried out simultaneously during peak detection, for instance by alias frequency peak detector 1405, or after peak detection, for instance by own voice identifier 1407.
In some implementations, signal processing configuration 1601 further comprises a speech recognizer 1609. Speech recognizer 1609 is configured to identify a content of a speech of the user identified as an own voice activity by own voice identifier 1407. The speech recognition can be based on spectral information comprising the frequencies associated with the previously detected peaks by peak detectors 1403, 1503 and/or temporal information comprising the time interval between the detected peaks provided by modulation analyzers 1605, 1606. For instance, keywords and/or commands and/or sentences spoken by the user may be identified in such a configuration.
In some implementations, an audio signal comprising information about the multiple audio signals provided by microphones 106, 1706 is provided by beamformer 1702 to audio signal peak detector 1503 and to audio signal modulation analyzer 1606. In some implementations, the audio signal provided by microphone 106 and the audio signal provided by microphone 1706 are provided separately to audio signal peak detector 1503 and to audio signal modulation analyzer 1606. Correlator and/or comparator 1506 can be configured to correlate and/or compare the peaks detected by vibration signal peak detector 1403 in the vibration signal and the peaks detected by audio signal peak detector 1503 in the respective audio signal of both microphones 106, 1706.
While the principles of the disclosure have been described above in connection with specific devices and methods, it is to be clearly understood that this description is made only by way of example and not as limitation on the scope of the invention. The above described preferred embodiments are intended to illustrate the principles of the invention, but not to limit the scope of the invention. Various other embodiments and modifications to those preferred embodiments may be made by those skilled in the art without departing from the scope of the present invention that is solely defined by the claims.
Number | Date | Country | Kind |
---|---|---|---|
19166291.5 | Mar 2019 | EP | regional |