This application is based on and claims priority under 35 U.S.C. § 119 to French Patent Application No. FR2211921, filed on Nov. 16, 2022, in the French Intellectual Property Office, the disclosure of which is incorporated by reference herein in its entirety.
The field of the disclosure is that of processing audio frequency signals.
The disclosure relates more specifically to a method for equalizing such a signal in a potentially noisy environment, in particular when the noise in question is likely to vary over time.
The disclosure has many applications, in particular, but not exclusively, for broadcasting an audio frequency signal in any type of broadcasting environment, e.g. a sports stadium, a theatre, the interior of a car or equivalent, etc.
In the remainder of this document, we will focus in particular on describing a problem in the field of broadcasting an audio frequency signal in a car's interior, which the inventors of the present patent application have faced. Of course, the disclosure is not limited to this particular field of application, but rather is of interest for broadcasting audio frequency signals in any type of broadcasting environment (e.g. a sports stadium, a theatre, etc.), in particular when the environment is noisy and the noise in question is likely to vary over time.
The effect of masking a first audio frequency signal by a second audio frequency signal is the process by which the hearing threshold for the first signal is raised by the presence of the second signal. In other words, spectrum masking occurs in a given frequency band when the presence of the second signal prevents detection of the first, lower amplitude signal in the same frequency band.
In a car, this effect is generally produced by the aerodynamic noise associated with the car's movement, as well as by the sound of the engine. If there is noise, the perception of the spectral balance of the music played in the car's interior can then be altered as certain frequencies will be masked.
The perceived tonal balance depends on the difference between the broadcast sound level and the masking threshold. As musical signals have a given dynamic range (difference between the highest and lowest amplitudes), for a given average level value in dB SPL (Sound Pressure Level) close to the threshold, certain components of the signal will be perceived and others will be masked.
To avoid the masking effect and preserve the perceived tonal balance, it is necessary to increase certain frequencies of the broadcast audio frequency signal above the masking threshold. In the prior art, two types of techniques are conventionally used to deal with this issue of masking:
However, the background noise in a car has various sources, including by way of example:
Background noise can generally be described as broadband noise with a decay of 6 dB per octave in the high frequencies. However, depending on the sources of noise listed above, this definition may not be sufficient to describe the masking effects encountered in practice. For example, the high frequencies may also be masked when it rains. Similarly, depending on the type of vehicle and its speed, the frequency bands actually masked may change over time.
Faced with such variable masking effects, it can be noted that:
A technique is therefore needed for equalizing an audio frequency signal broadcast in an environment having a background noise, the characteristics of which (in intensity and/or spectral shape) vary over time, as may be the case, for example, in a car.
In one aspect of the disclosure, a method is proposed for equalizing an audio frequency signal broadcast in a broadcasting environment by a broadcasting system comprising at least one loudspeaker. Such a method comprises:
The disclosure thus proposes a novel and innovative solution for equalizing an audio frequency signal broadcast in a broadcasting environment.
More specifically, the fact that the actual noise present in the broadcasting environment (e.g. a vehicle, a sports stadium, a room in a building, a theatre, etc.) is taken into account via the microphone(s) enables the equalization to be adapted to all types of noise that may be present in such a broadcasting environment (e.g. for a vehicle: aerodynamic driving noise, engine noise, tyre contact noise on the road in the case of a car, etc.) as well as their evolution over time.
Furthermore, equalization by weighting the spectrum of the audio frequency signal provides more precise equalization than using a conventional shelf-type filter.
In some aspects, the acoustic frequency mask represents, for each frequency component, said difference when said difference is greater than a predetermined threshold.
In other words, the audio frequency signal, for a given frequency component, is considered to be masked if the energy of the background noise exceeds the target value for the audio frequency signal by an amount at least equal to the predetermined threshold. The threshold can therefore be seen as an offset applied to the acoustic mask. Such a threshold allows the dynamics of the audio frequency signal to be taken into account and preserved.
In some aspects, the frequency weighting mask is obtained by weighting different frequency components of the acoustic frequency mask by applying predetermined weighting values.
In this way, high-frequency harshness or sibilance can be controlled. This weighting control also allows the lack of precision in noise extraction to be taken into account perceptually by adjusting it by ear in operational conditions for a given type of broadcasting environment.
In some aspects, the values of the frequency weighting mask are limited to a maximum value and a minimum value.
In this way, the maximum value defines a maximum weighting of the spectrum of the audio frequency signal, avoiding any discrepancy in determining the correction and limiting the overall gain. Excessive gain could overly modify the target audio perception (via the “loudness” effect) of the audio frequency signal.
Similarly, the minimum value, e.g. corresponding to a weighting of 0 dB, allows the dynamic range of the audio frequency signal not to be reduced (or to be reduced only to a limited extent).
In some aspects, the determination of a desired frequency profile involves calculating a desired frequency division of an energy of the audio frequency signal as a function of at least one parameter belonging to the group comprising:
In this way, the desired frequency profile for the audio frequency signal in the broadcasting environment is obtained, for example at a given listening point.
In some aspects, said estimation of a frequency profile of the noise signal involves correcting a transfer function of said at least one microphone.
In this way, the noise signal capture errors caused by the microphone(s) are compensated for.
In some aspects, the method comprises:
The steps of estimating, determining and equalizing are carried out periodically for various samples of the captured signal and the audio frequency signal. The frequency equalization implements, for a given implementation:
In this way, the correction parameters are frozen when voice signals not initially present in the audio frequency signal are detected in the signal captured by the microphone(s) (e.g. for a vehicle: the voice of the passengers in the vehicle). This avoids discrepancies or artefacts in the equalization.
In some aspects, the method comprises:
The steps of estimating, determining and equalizing are carried out periodically for various samples of the captured signal and the audio frequency signal. Detection of at least one voice signal involves estimating a likelihood of the presence of at least one voice signal in the noise signal. The frequency equalization implements, for a given implementation, the frequency weighting mask corresponding to a weighted linear combination of, on the one hand, the acoustic frequency mask determined during a previous implementation of said steps and, on the other hand, the acoustic frequency mask determined during the given implementation of said steps. The weighting is a function of the likelihood of presence such that the linear combination is reduced to:
In some aspects, the weighted linear combination is expressed as Pvp(f)=P0(f)+α(p)·(Pm(f)−P0(f)), where:
In some aspects, the frequency equalization implements temporal smoothing of the frequency weighing mask according to the law Pvp_m(n,f)=P(n)·(Pvp(f)−Pvp_m(n−1,f)), where:
In some aspects, the estimation of the noise signal involves a method of spectral estimation of background noise, based on, on the one hand, the captured signal and, on the other hand, the audio frequency signal. The estimation of the frequency profile of the noise signal comprises:
In some aspects, the estimation of the frequency profile of the noise signal comprises:
The estimation of the noise signal involves a summation of each of the filtered noise signals.
For example, the method of spectral estimation of background noise in question is a method of spectral estimation of background noise as implemented in methods for reducing noise by echo cancellation, known as ECNR, such as encountered, for example, in the mobile phone sector.
In some aspects, the method comprises an averaging of a plurality of signals each captured by a different microphone implemented in the broadcasting environment. The averaging provides the captured signal.
The disclosure also relates to a computer program comprising program code instructions for implementing a method as described above, according to one of its various aspects, when it is run on a computer.
The disclosure also relates to a device for equalizing an audio frequency signal broadcast in a broadcasting environment by a broadcasting system comprising at least one loudspeaker. Such an equalization device comprises a reprogrammable computing machine or a dedicated computing machine configured to carry out the steps of the equalization method according to the disclosure (according to one of the various aforementioned aspects). The features and advantages of this device are thus the same as those of the corresponding steps of the equalization method described above. As such, they are not described in more detail.
Other aims, features and advantages of the disclosure will become more apparent upon reading the following description, provided simply by way of non-limiting example, with reference to the figures, wherein:
The general principle of the disclosure is based on estimating a frequency profile of a signal representing a background noise present in a broadcasting environment based on, on the one hand, a signal captured by one (or more) microphone(s) implemented in the broadcasting environment and, on the other hand, an audio frequency signal broadcast in the broadcasting environment in question. An acoustic frequency mask representing, for each frequency component, a difference between the frequency profile of the noise signal and the desired frequency profile for the broadcast audio frequency signal (e.g. when the frequency profiles in question are expressed in logarithmic units) is determined. The audio frequency signal is equalized via a weighting of its spectrum by applying a frequency weighting mask that is a function of the frequency acoustic mask.
Thus, the fact that the actual noise present in the broadcasting environment is taken into account via the microphone(s) enables the equalization to be adapted to all types of noise that may be present in such a broadcasting environment (e.g. for a vehicle: aerodynamic driving noise, engine noise, tyre contact noise on the road in the case of a car, etc.) as well as their evolution over time.
Furthermore, equalization by weighting the spectrum of the audio frequency signal provides more precise equalization than using a conventional shelf-type filter.
With reference to [
The vehicle is shown here in the form of a car, but the method according to the disclosure applies likewise to all types of vehicles.
Returning to [
In certain aspects, the equalization device 110eq is not part of the broadcasting system 110, but rather is connected to the broadcasting system 110 via a wire connection (e.g. USB connection or equivalent) or radio connection (e.g. Bluetooth, Wi-Fi or equivalent) in order to exchange data, e.g. the broadcast audio frequency signal and the equalized audio frequency signal.
In certain aspects, the broadcasting system 110 comprises a single loudspeaker 110hp.
Returning to [
In certain aspects, a single microphone 120 is used to capture the signal in the vehicle 100.
With reference to [
Returning to [
For example, the captured signal corresponds to an averaging of signals each captured by one of the microphones 120. In this way, the background noise present throughout the vehicle is estimated more accurately. In the aforementioned aspects in which a single microphone is used to capture the signal in the vehicle 100, the captured signal corresponds to the signal captured by the microphone in question.
Returning to [
More specifically, during a step E210tf, a spectrogram of the captured signal is estimated, e.g. based on Fourier transforms of the captured signal. Such a spectrogram is, for example, estimated periodically. For example, an updated spectrogram is delivered with each new available sample of the captured signal.
Thus, for each of the frequency components of the spectrograms of the captured signal:
In this way, during a step E210spb, a spectrogram of the noise signal representing the background noise present in the vehicle 100 is estimated by concatenating the values of the frequency components retained during steps E210fb3b or E210fb3a.
During a step E210tfi, an inverse Fourier transform is applied to the spectrogram of the noise signal, producing the estimated noise signal.
In other aspects, other methods for estimating the noise signal are implemented. The noise signal is, for example, estimated by implementing a method of spectral estimation of background noise as implemented in methods for reducing noise by echo cancellation, known as ECNR (Echo Cancellation Noise Reduction). Such ECNR methods are, for example, conventionally used in the mobile phone sector.
Returning to [
In certain aspects, the estimation of the frequency profile of the noise signal involves estimating a spectrogram of the noise signal, for example based on Fourier transforms.
In certain aspects, the estimation of the frequency profile of the noise signal involves correcting the transfer function(s) of the microphone(s) 120. In this way, the noise signal capture errors caused by the microphones are compensated for.
The bars 300br shown in [
Returning to [
The diagram 300tg shown in [
Returning to [
In certain aspects, the acoustic frequency mask represents, for each frequency component, said difference when the difference in question is greater than a predetermined threshold. In other words, the audio frequency signal, for a given frequency component, is considered to be masked if the energy of the background noise exceeds the target value for the audio frequency signal by an amount at least equal to the predetermined threshold. The threshold can therefore be seen as an offset applied to the acoustic mask. Such a threshold allows the dynamics of the audio frequency signal to be taken into account and preserved. According to the implementations, such a threshold can have a default value and/or also be adapted over time as a function of e.g. a user setting in the vehicle, the strength of the noise signal, etc.
The bars 310a and 310b shown in [
Returning to [
For example, a filter bank is applied to the audio frequency signal providing a corresponding plurality of filtered audio frequency signals. Each filtered audio frequency signal is weighted by a component of the frequency weighting mask corresponding to the frequency band of the filtered audio frequency signal in question.
Thus, the fact that the actual noise present in the vehicle is taken into account via the microphone(s) enables the equalization to be adapted to all types of noise that may be present in such a vehicle (e.g. aerodynamic noise, engine noise, tyre contact noise on the road in the case of a travelling vehicle, etc.) as well as their evolution over time.
Furthermore, equalization by weighting the spectrum of the audio frequency signal provides more precise equalization than using a conventional shelf-type filter.
In certain aspects, the frequency weighting mask is obtained by weighting different frequency components of the acoustic frequency mask by applying predetermined weighting values.
In this way, high-frequency harshness or sibilance can be controlled. This weighting control also allows the lack of precision in noise extraction to be taken into account perceptually by adjusting it by ear in operational conditions for a given type of vehicle.
According to the implementations, such weighting values can have a default value and/or also be adapted over time as a function of e.g. a user setting in the vehicle, the strength of the noise signal, etc.
In certain aspects, the values of the frequency weighting mask are limited to a maximum value and a minimum value.
For example, the maximum value defines a maximum weighting of the spectrum of the audio frequency signal, avoiding any discrepancy in determining the correction and limiting the overall gain. Excessive gain could overly modify the target audio perception (via the “loudness” effect) of the audio frequency signal.
Similarly, the minimum value, e.g. corresponding to a weighting of 0 dB, allows the dynamic range of the audio frequency signal not to be reduced (or to be reduced only to a limited extent).
Returning to [
Indeed, in certain aspects, the aforementioned steps of estimating (E210, E220), determining (E230, E240) and equalizing (E250) are carried out periodically for various samples of the captured signal and the audio frequency signal. In this way, the frequency equalization implements, for a given implementation:
In this way, the correction parameters are frozen when voice signals not initially present in the audio frequency signal are detected in the signal captured by the microphone(s) (e.g. the voice of the occupant(s) of the vehicle). This avoids discrepancies or artefacts in the equalization.
In other aspects, the detection information provided during step E260 represents a likelihood rate of the presence of voice signals in the noise signal. In this case, the step E260 comprises, for example, estimating the likelihood of the presence of one (or more) voice signal(s) in the noise signal (e.g. the voice of one (or more) occupant(s) of the vehicle). To do this, a voice detection method, for example the G.729 VAD (Voice Activity Detection) technique, combined with the comparison between, on the one hand, the audio frequency signal sent to the loudspeakers 110hp and, on the other hand, the signal captured by the microphone(s) 120 is implemented, for example. Indeed, the VAD can detect the presence of one (or more) voice signal(s) in the noise signal. The comparison between the audio frequency signal and the captured signal is used to check whether the voice signal(s) that may have been detected by the VAD represent(s) the voice of one (or more) occupant(s) present in the audio frequency signal. ECNR techniques can be used for such a comparison, for example:
A likelihood p of the presence of one (or more) voice signal(s) in the noise signal is thus calculated as a function of the aforementioned correlation (e.g. P=f1(correlation)), or the aforementioned energy ratio (e.g. P=f2(signal energy ratio)). An example of the f1 function is: f1(x)=x. An example of the f2 function is: f2(x)=x if x<1 and f2(x)=1 if x>1.
In such aspects of the step E260, the frequency weighting mask is weighted as a function of the likelihood of the presence of one (or more) voice signal(s) in the noise signal. For example, the weighting α takes the form: α(p)=1-p.
In such aspects, the frequency weighting mask Pvp(f) is expressed, for example, as Pvp(f)=P0(f)+α(p)·(Pm(f)−P0(f)), where:
In this way, the frequency weighting mask Pvp(f) is reduced to the acoustic frequency mask P0(f) determined during a previous implementation of the aforementioned steps when α(p)=0, i.e. when the likelihood p of the presence of a voice signal in the noise signal is equal to 1. Similarly, the frequency weighting mask Pvp(f) is reduced to the acoustic frequency mask Pm(f) determined during the given implementation of the aforementioned steps when α(p)=1, i.e. when the likelihood p of the presence of a voice signal in the noise signal is zero.
In certain aspects, other expressions are used for the weighting α(p) and for the frequency weighting mask Pvp(f). However, in such aspects, the frequency weighting mask Pvp(f) is reduced to the acoustic frequency mask P0(f) determined during a previous implementation of the aforementioned steps when the likelihood p of the presence of a voice signal in the noise signal is equal to 1. Similarly, the frequency weighting mask Pvp(f) is reduced to the acoustic frequency mask Pm(f) determined during the given implementation of the aforementioned steps when the likelihood p of the presence of a voice signal in the noise signal is zero.
In certain aspects, temporal smoothing (or time averaging) is applied to the acoustic frequency mask. Temporal smoothing follows the following law: Pvp_m(n,f)=P(n)·(Pvp(f)−Pvp_m(n−1,f)) where:
In certain aspects, the detection of one (or more) voice signal(s) in the captured signal is carried out in a narrow band of the captured signal to reduce the calculations required for this detection. Sub-sampling is carried out to adapt said signal to the narrow band. Such a narrow band is limited, for example, to 0 . . . 4 kHz, which contains the most significant part of the voice's energy.
However, in certain aspects, the step E260 is not carried out and the correction parameters used for the equalization during step E250 are not frozen, but rather updated each time the steps of the method are carried out again.
With reference to [
The aspect shown in [
More specifically, during step E220′, the frequency profile of the noise signal is estimated by carrying out the following steps:
For example, the method of spectral estimation of background noise in question is a method of spectral estimation of background noise as implemented in the aforementioned ECNR methods. Alternatively, the steps described above with reference to step E210 can be carried out instead of a method of spectral estimation of background noise as implemented in the ECNR methods in order to estimate each filtered noise signal.
Returning to [
Furthermore, according to the aspect shown in [
In this way, by carrying out step E260 as described above with reference to [
However, in certain aspects, the steps E210′ and E260 are not carried out and the correction parameters used for the equalization during step E250 are not frozen, but rather updated each time the steps of the method are carried out again.
With reference to [
The device 110eq comprises a random access memory 403 (for example a RAM memory), a processing unit 402 equipped for example with one (or more) processor(s), and controlled by a computer program stored in a read-only memory 401 (for example a ROM memory or a hard disk). During initialization, the code instructions of the computer program are, for example, loaded into the random access memory 403 before being executed by the processor of the processing unit 402.
This
If the device 110eq is designed at least partly with a reprogrammable computing machine, the corresponding program (i.e. the sequence of instructions) may or may not be stored in a removable storage medium (such as a CD-ROM, a DVD-ROM, a USB stick), this storage medium being partly or totally readable by a computer or processor.
In certain aspects, the broadcasting system 110 comprises the device 110eq.
In certain aspects, the device 110eq is connected to the broadcasting system 110.
Number | Date | Country | Kind |
---|---|---|---|
2211921 | Nov 2022 | FR | national |