The present disclosure relates to audio signal processing and relates more specifically to a method and computing system for correcting a spectral shape of a voice signal measured by an audio sensor located inside an ear canal of a user of the audio system.
The present disclosure finds an advantageous application, although in no way limiting, in wearable devices such as earbuds or earphones or smart glasses used to pick-up voice for a voice call established using any voice communication system.
To improve picking up a user's voice signal in noisy environments, wearable devices like earbuds or earphones are typically equipped with different types of audio sensors such as microphones and/or accelerometers.
These audio sensors are usually positioned such that at least one audio sensor, referred to as external sensor, picks up mainly air-conducted voice and such that at least another audio sensor, referred to as internal sensor, picks up mainly bone-conducted voice. Compared to an external sensor, an internal sensor picks up the user's voice with less ambient noise but with a limited spectral bandwidth (mainly low frequencies), such that the bone-conducted voice provided by the internal sensor can be used to enhance the air-conducted voice provided by the external sensor, and vice versa.
External sensors are usually air conduction sensors (e.g. microphones), while internal sensors can be either air conduction sensors or bone conduction sensors (e.g. accelerometers).
Voice signals measured by a bone conduction sensor are usually unaffected by the fit of an earbud, wherein a tight fit corresponds to substantially no gap between the earbud and the user's ear while a loose fit corresponds to the presence of a gap between the earbud and the user's ear. As long as the earbud is in contact with the skin inside the ear canal, a consistent voice signal capture is obtained with minimal ambient noise leakage.
On the other hand, voice signals captured by an internal air conduction sensor are affected by the fit of the earbud. In particular, a loose fit will usually result in a reduction in the low frequency (below ˜600 Hertz) components due to less occlusion effect. A loose fit may also result in a boost in the mid frequency (in the range of around 600 Hertz to 1500 Hertz) components due to more resonance in the ear canal and due to increased ambient noise leakage.
The use of an active Noise Cancellation (ANC) unit may also affect voice signals captured by an internal air conduction sensor, especially in the case of a feedback ANC unit. More specifically, the use of an ANC unit causes a reduction in the low frequency components of voice signals captured by an internal air conduction sensor, thereby reducing the occlusion effect.
In some existing solutions, audio signals from an internal sensor and an external sensor are mixed together for mitigating noise, by using the audio signal provided by the internal sensor mainly for low frequencies while using the audio signal provided by the external sensor for higher frequencies. However, in the case of loose fitting of the earbud or with an active ANC unit, the reduction of the low frequency components and/or the boost of the mid frequency components of the audio signal provided by the internal sensor eventually results in an inconsistent sounding voice in the output signal.
Audio signals from internal sensors may also be used for purposes other than mixing with audio signals from e.g. external sensors. For instance, audio signals from internal sensors may be used for voice activity detection (VAD), speech level estimation, speech recognition, etc., which are also affected by loose fitting of the earbud and/or by an active ANC unit.
The present disclosure aims at improving the situation. In particular, the present disclosure aims at overcoming at least some of the limitations of the prior art discussed above, by proposing a solution enabling to mitigate the effects on the audio signals provided by internal sensors of loose fitting of an earbud (or earphone) and/or of an active ANC unit.
For this purpose, and according to a first aspect, the present disclosure relates to an audio signal processing method implemented by an audio system which comprises at least an internal sensor, wherein the internal sensor is an air conduction sensor located in an ear canal of a user of the audio system and arranged to measure acoustic signals which propagate internally to a head of the user, wherein the audio signal processing method comprises:
Hence, the present disclosure proposes to perform a spectral analysis of the internal audio signal produced by the internal sensor, and more specifically to compute a spectral center of an audio spectrum of the internal audio signal. Indeed, as discussed above, the presence of a loose fit of an earbud and/or of an active ANC unit results in a reduction in low frequency components due to a reduction of the occlusion effect (and possibly also in a boost of mid frequency components). Accordingly, the presence of a reduction of the occlusion effect will result in a greater value for the spectral center compared to an expected value of the spectral center with a tight fit of the earbud and an inactive ANC unit (or no ANC unit at all). The spectral center of the audio spectrum of the internal audio signal may therefore be used to evaluate a level of the occlusion effect, since the higher the spectral center the lower the occlusion effect. Since the global effects of loose fitting and/or of an active ANC unit are known (reduction of low frequency components and possibly boost of mid frequency components), the spectral center can be used to determine a spectrum shape correction filter aiming at correcting these global effects. If the spectral center corresponds substantially to the expected value (for the case with a tight fit and an inactive ANC unit), then the spectrum shape correction filter may be e.g. an identity filter (i.e. which does not modify the shape of the audio spectrum of the internal audio signal). If the spectral center is significantly greater than said expected value, then the spectrum shape correction filter may be configured to e.g. boost the low frequency components and possibly to reduce the middle/high frequency components of the internal audio signal.
In specific embodiments, the audio signal processing method may further comprise one or more of the following optional features, considered either alone or in any technically possible combination.
In specific embodiments, the spectral center is a spectral centroid or a spectral median of the audio spectrum.
In specific embodiments, determining the spectrum shape correction filter comprises comparing the spectral center with one or more predetermined thresholds.
In specific embodiments, responsive to the spectral center being greater than at least one predetermined threshold, determining the spectrum shape correction filter comprises configuring said spectrum shape correction filter to modify the audio spectrum of the internal audio signal to reduce the spectral center of said audio spectrum.
In specific embodiments, one of the one or more predetermined thresholds is between 200 Hertz and 800 Hertz, or between 300 Hertz and 600 Hertz.
In specific embodiments, the audio signal processing method further comprises:
In specific embodiments, determining the spectrum shape correction filter comprises selecting, based on the spectral center, a spectrum shape correction filter among a plurality of predetermined different spectrum shape correction filters.
In specific embodiments:
In specific embodiments, filtering the internal audio signal is performed by applying the spectrum shape correction in time domain or in frequency domain.
In specific embodiments, the audio system further comprises an external sensor arranged to measure acoustic signals which propagate externally to the user's head, and said audio signal processing method further comprises:
According to a second aspect, the present disclosure relates to an audio system comprising at least an internal sensor, wherein the internal sensor corresponds to an air conduction sensor to be located in an ear canal of a user of the audio system and arranged to measure acoustic signals which propagate internally to a head of the user, wherein the internal sensor is configured to produce an internal audio signal, wherein said audio system further comprises a processing circuit configured to:
According to a third aspect, the present disclosure relates to a non-transitory computer readable medium comprising computer readable code to be executed by an audio system comprising at least an internal sensor, wherein the internal sensor corresponds to an air conduction sensor to be located in an ear canal of a user of the audio system and arranged to measure acoustic signals which propagate internally to a head of the user, wherein said audio system further comprises a processing circuit, wherein said computer readable code causes said audio system to:
The invention will be better understood upon reading the following description, given as an example that is in no way limiting, and made in reference to the figures which show:
In these figures, references identical from one figure to another designate identical or analogous elements. For reasons of clarity, the elements shown are not to scale, unless explicitly stated otherwise.
Also, the order of steps represented in these figures is provided only for illustration purposes and is not meant to limit the present disclosure which may be applied with the same steps executed in a different order.
As indicated above, the present disclosure relates inter alia to an audio signal processing method 20 for mitigating the effects of loose fitting of an earbud (or earphone) and/or of an active ANC unit.
As illustrated by
The present disclosure finds an advantageous application, although non-limitative, to the case where the internal sensor 11 is an air conduction sensor. In the sequel, we assume in a non-limitative manner that the internal sensor 11 is an air conduction sensor, e.g. a microphone, to be located in an ear canal of a user and arranged towards the interior of the user's head.
In the non-limitative example illustrated by
For instance, if the audio system 10 is included in a pair of earbuds (one earbud for each ear of the user), then the internal sensor 11 is for instance arranged in a portion of one of the earbuds that is to be inserted in the user's ear, while the external sensor 12 is for instance arranged in a portion of one of the earbuds that remains outside the user's ears. It should be noted that, in some cases, the audio system 10 may comprise two or more internal sensors 11 (for instance one or two for each earbud) and/or two or more external sensors 12 (for instance one for each earbud).
As illustrated by
In some embodiments, the processing circuit 13 comprises one or more processors and one or more memories. The one or more processors may include for instance a central processing unit (CPU), a graphical processing unit (GPU), a digital signal processor (DSP), a field-programmable gate array (FPGA), an application specific integrated circuit (ASIC), etc. The one or more memories may include any type of computer readable volatile and non-volatile memories (magnetic hard disk, solid-state disk, optical disk, electronic memory, etc.). The one or more memories may store a computer program product (software), in the form of a set of program-code instructions to be executed by the one or more processors in order to implement all or part of the steps of an audio signal processing method 20.
As illustrated by
As illustrated by
For instance, the internal audio signal may be sampled at e.g. 16 kilohertz (kHz) and buffered into time-domain frames of e.g. 4 milliseconds (ms). For instance, it is possible to apply on these frames a 128-point DCT or FFT to produce the audio spectrum up to the Nyquist frequency fNyquist, i.e. half the sampling rate (i.e. 8 kHz if the sampling rate is 16 kHz).
In the sequel, we assume in a non-limitative manner that the frequency band on which is determined the audio spectrum of the internal audio signal is composed of N discrete frequency values fn with 1≤n≤N, wherein fmin=f1 corresponds to the minimum frequency and fmax=fN corresponds to the maximum frequency, and fn-1<fn for any 2≤n≤N. For instance, fmin=0 and fmax=fNyquist, but the spectral analysis of the internal audio signal may also be carried out on a frequency sub-band in [0, fNyquist]. For instance, fmin=0 and fmax is lower than or equal to 4000 Hz, or 3000 Hz, or 2000 Hz (for instance fmax=1500 Hz). It should be noted that the determination of the audio spectrum may be performed with any suitable spectral resolution. Also, the frequencies fn may be regularly spaced or irregularly spaced.
The audio spectrum SI of the internal audio signal sI corresponds to a set of values {SI(fn), 1≤n≤N}. The audio spectrum SI is a magnitude spectrum such that SI(fn) is representative of the power of the internal audio signal sI at the frequency fn. For instance, if the audio spectrum is computed by an FFT, then SI(fn) can correspond to |FFT[sI](fn)| (i.e. modulus or absolute level of FFT[sI](fn)), or to |FFT [sI](fn)|2 (i.e. power of FFT[sI](fn)), etc.
It should be noted that, in some embodiments, the audio spectrum can optionally be smoothed over time, for instance by using exponential averaging with a configurable time constant.
As illustrated by
Basically, the spectral center is a scalar value (a frequency value) representative of how the magnitude is distributed in the audio spectrum.
In preferred embodiments, the spectral center corresponds to a spectral centroid of the audio spectrum. Basically, the spectral centroid corresponds to a center of mass of the audio spectrum and may be calculated as a weighted sum of the frequencies present in the audio spectrum, weighted by their respective associated magnitudes given by the audio spectrum. With the above notations, the spectral centroid fcentroid may be computed as:
According to another example, the spectral center may be a spectral median of the audio spectrum. The spectral median corresponds to a frequency for which the sum of the magnitudes for frequencies below the spectral median is substantially equal to the sum of the magnitudes for frequencies above the spectral median. With discrete frequencies, the spectral median fmedian may be determined by finding the index k such that
and the spectral median fmedian may for instance be set to fk or fk+1.
Other examples are possible for the spectral center, as long as it is representative of how the magnitude is distributed in the audio spectrum. In the sequel, we assume in a non-limitative manner that the spectral center of the audio spectrum corresponds to the spectral centroid fcentroid.
It should be noted that, in some embodiments, the spectral centroid can optionally be smoothed over time, for instance by using exponential averaging with a configurable time constant.
As illustrated by
Indeed, as discussed above, the presence of a loosely fit earbud and/or of an active ANC unit results in a reduction in low frequency components due to a reduction of the occlusion effect (and possibly also in a boost of mid frequency components). Accordingly, the presence of a reduction of the occlusion effect will result in a greater value for the spectral centroid fcentroid compared to an expected value of the spectral centroid fcentroid with a tight fit of the earbud and an inactive ANC unit (or no ANC unit at all). The spectral centroid fcentroid of the audio spectrum of the internal audio signal may therefore be used to evaluate a level of the occlusion effect in the internal audio signal compared to acoustic signals which propagate externally to the head of the user of the audio system 10, since the higher the spectral centroid fcentroid the lower the occlusion effect.
Since the global effects of loose fitting and/or of an active ANC unit are known (reduction of low frequency components and possibly boost of mid frequency components), the spectral centroid fcentroid can be used to determine a spectrum shape correction filter aiming at correcting these global effects. If the spectral centroid fcentroid corresponds to an expected value (for the case with a tight fit and an inactive ANC unit), then the spectrum shape correction filter may be e.g. an identity filter (i.e. which does not modify the shape of the audio spectrum of the internal audio signal, which is identical to not applying the spectrum shape modification filter). If the spectral centroid fcentroid corresponds to an unexpected value, then the spectrum shape correction filter may be configured to e.g. boost the low frequency components and possibly to reduce the middle/high frequency components of the internal audio signal.
For instance, the spectral centroid fcentroid may be compared to one or more predetermined thresholds to evaluate the level of occlusion effect in the internal audio signal (which is representative of a fit quality level of the earbud). For instance, it is possible to consider a threshold fTH1 between 200 Hertz (Hz) and 800 Hz, or between 300 Hz and 600 Hz, for instance equal to 400 Hz.
Hence, if the spectral centroid fcentroid is lower than fTH1, then the earbud may be considered to be tightly fit (and the ANC unit to be inactive). In that case, the spectrum shape correction filter may be an identity filter.
In turn, if the spectral centroid fcentroid is greater than fTH1, then the earbud may be considered to be loosely fit (or the ANC unit to be active). In that case, the spectrum shape correction filter may be configured to modify the audio spectrum of the internal audio signal to produce a modified audio spectrum having a modified spectral centroid f′centroid which is lower than the original spectral centroid fcentroid. Typically, the spectrum shape correction filter, in that case, applies greater gains for low frequency components than for middle/high frequency components of the audio spectrum.
It is also possible to use more than one threshold, to define different possible ranges for the spectral centroid fcentroid, related to different levels of occlusion effect (e.g. representative of different fit quality levels). For instance, it is possible to consider another threshold fTH2>fTH1 between 800 Hertz (Hz) and 1400 Hz, or between 900 Hz and 1200 Hz, for instance equal to 1000 Hz.
Hence, if the spectral centroid fcentroid is lower than fTH1, then the spectrum shape correction filter may be an identity filter, as discussed above.
If the spectral centroid fcentroid is greater than fTH1 and lower than fTH2, then the earbud may be considered to be loosely fit (or the ANC unit to be active). In that case, the spectrum shape correction filter may be configured to modify the audio spectrum of the internal audio signal to reduce the spectral centroid. If the spectral centroid fcentroid is greater than fTH2, then the earbud may be considered to be extremely loosely fit. In that case, the spectrum shape correction filter is also configured to modify the audio spectrum of the internal audio signal to reduce the spectral centroid, but the expected shift of the spectral centroid needs to be greater than for the spectrum shape correction filter used when fTH1<fcentroid<fTH2. For instance, each spectrum shape correction filter which is not the identity filter should be configured to produce a modified audio spectrum having a modified spectral centroid f′centroid which is likely to be lower than the threshold fTH1.
For instance, it is possible to define beforehand a plurality of different spectrum shape correction filters, associated respectively to different possible ranges of the spectral centroid. For instance, a first spectrum shape correction filter may be used when fcentroid<fTH1 (identity filter), a second spectrum shape correction filter may be used when fTH1<fcentroid<fTH2, a third spectrum shape correction filter may be used when fcentroid>fTH2, etc.
According to other examples, the spectrum shape correction filter may be adjusted dynamically to the audio spectrum to ensure that the modified spectral centroid f′centroid is lower than the threshold fTH1. For instance, if fcentroid<fTH1, the spectrum shape correction filter may be the identity filter. If fcentroid>fTH1, then the spectrum shape correction filter may be adjusted dynamically to the audio spectrum to obtain a modified spectral centroid f′centroid that is lower than the threshold fTH1. For instance, a plurality of candidate spectrum shape correction filters may be evaluated until a candidate spectrum shape correction filter, or a combination of cascaded candidate spectrum shape correction filters, such that f′centroid<fTH1 is found.
As illustrated by
In a conventional manner, the internal audio signal may be filtered by the spectrum shape correction in time domain, by using a time-domain spectrum shape correction filter applied directly on the time-domain internal audio signal, or in frequency domain, by using a frequency domain spectrum shape correction filter applied to a frequency-domain internal audio signal.
Hence, the spectrum shape correction filter to be applied for fit compensation can be designed in multiple ways, using time-domain infinite impulse response, IIR, and finite impulse response, FIR, filters, frequency-domain weights, or a combination of both techniques. For instance, a blend of flat gain, low-pass, high-pass, band-pass, peaking, low-shelf and high-shelf filters can be used depending on how the audio spectrum is affected by the earbud fit and/or by the active ANC unit and the correction needed.
In
As illustrated by
In preferred embodiments, and as illustrated by
Hence, the proposed audio signal processing method 20 enhances the internal audio signal in the presence of a loosely fit earbud and/or an active ANC unit, by filtering the internal audio signal by a spectrum shape correction filter. Hence, as such, the filtered internal audio signal may be used to improve the performance of different applications, including the applications which may use only the internal audio signal from the internal sensor 11 (e.g. speech recognition, VAD, speech level estimation, etc.).
In some embodiments, it is also possible to combine the filtered internal audio signal with an external audio signal produced by the external sensor 12. In such a case, and as illustrated by
It should be noted that the combining of the external audio signal with the filtered internal audio signal may be performed in time domain or in frequency domain. In the examples illustrated by
For instance, the cutoff frequency may be a static frequency, which is preferably selected beforehand in the frequency band in which the audio spectrum of the internal audio signal is computed.
According to another example, the cutoff frequency may be dynamically adapted to the actual noise conditions. For instance, the setting of the cutoff frequency may use the method described in U.S. patent application Ser. No. 17/667,041, filed on Feb. 8, 2022, the contents of which are hereby incorporated by reference in its entirety.
It is emphasized that the present disclosure is not limited to the above exemplary embodiments. Variants of the above exemplary embodiments are also within the scope of the present invention.
The above description clearly illustrates that by its various features and their respective advantages, the present disclosure reaches the goals set for it.
Indeed, by computing a spectral center of the audio spectrum of the internal audio signal, it is possible to detect a loosely fit earbud and/or an active ANC unit, and to configure a spectrum shape correction filter accordingly. While the present disclosure is particularly advantageous for compensating for loosely fit earbuds, it is also advantageous for compensating for active ANC units. Indeed, it might not be possible to obtain the information on whether the ANC unit is active or inactive from said ANC unit, and the spectral center can also be used to detect that the ANC unit is likely to be active, even if the spectral center alone does not enable to differentiate the effects of a loosely fit earbud from the effects of an active ANC unit.