The present application claims priority from Danish patent application PA202270098 dated Mar. 11, 2022, the disclosure of which is incorporated herein by reference in its entirety.
The present invention concerns a computer-implemented method for processing an audio signal, a computer system for performing the method and a non-transitory computer-readable storage medium.
The generation of high-quality audio is key for conveying a story, both for pure audio content as well as video content to a broader audience, most important for today's content providers. However, conventional audio recording methods often face a drawback due to the mediocre or bad audio quality. The reduction of quality in the recorded sound can have many causes, one of them being artificial sounds at certain audible frequencies. Those noises are generated for examples from ground loop noise (the typical 50 Hz or 60 Hz humming), light hums as well as room modes.
While such artificial noise can significantly hamper the listener's experience, another drawback lies in the fact that post processing of audio requires substantial efforts. Particularly, improving audio quality often requires several iterations, in which the content provider listens to the processed audio signal until the desired result is achieved.
The above-mentioned issues are addressed by a denoising process, which aims among other aspects to specifically suppress the frequencies at which the above artificial noise occurs. This process can be part of or also added to already existing denoising processes, for which the noise of the recorded audio signal is identified and subsequently removed. This process often requires manual work and the setting of several parameters until the desired quality and best eligibility is obtained.
Some solutions proposed pre-defined profiles, to which the noisy recorded sound signal is matched against. While such approach may result in an improved sound, it is rather inflexible as it requires many pre-specified profiles to be able to process sound signals recorded under different environments. Generally, this approach also changes the characteristics of the recorded sound signal; a feat that is usually undesirable. Another solution lies in iterative approaches or the use of deep learning networks to build a model able to suppress the artificial noise. However, such approaches are computationally quite expensive.
Consequently, there is a desire to simplify or at least reduce the complexity of the denoising process and provide a flexible denoising process that requires less computational efforts.
This and other objects are addressed by the subject matter of the independent claims. Features and further aspects of the proposed principles are outlined in the dependent claims.
The inventor proposes the use of a simpler geometrically aided model to obtain a plurality of filter elements that can be used to suppress previously identified artificial noise peaks. The proposal is based on the findings that an equalizing peaking filter cascade, referred to as peaking EQ with a negative gain can be used to surgically suppress frequencies that are annoying to the listeners. The cascade is built in such way, that the output of one filter element acts as an input for a subsequent element, such that the overall transfer function of the cascade is the product of the transfer functions of each individual filter element.
In contrast to conventional models, which require typical profiles to identify the frequencies of interest, the proposed method is also adaptive, meaning that the center frequencies for the respective filter elements are adjusted to the actual sound signal and therefore highly flexible. In other words, actual noise peaks are identified within the recorded sound signal without the need for or the existence of a pre-recorded profile.
In an aspect, a computer implemented method for processing an audio signal is proposed, in which an audio signal, in particular containing speech is obtained. The audio signal is pre-processed to generate a smoothed difference signal therefrom as well as a smoothed spectrum thereof. In accordance with the proposed principle, at least one peak, particularly originating from a noise source is detected in the generated smoothed difference signal or the smoothed spectrum thereof.
This aspect refers to the adaptive portion of the proposed process, as the detection and identification of the peak originating from a noise source is adjustable and not based on a pre-specified profile. In some instances, one may ignore certain detected peaks below a first frequency threshold or above a certain second frequency threshold. While the first frequency threshold is useful to avoid attenuating portions of speech (as speech fundamentals are often below 1 kHz with harmonics usually much above 1 kHz), the second frequency threshold can be used to reduce the computational efforts. The latter may often lie above 15 kHz and in frequency ranges barely audible and not affecting the listener's experience. Typical values for a first frequency threshold value are 150 Hz or 200 Hz and generally below 300 Hz. On the other hand peaks at frequencies above 20 kHz can also often be ignored, particularly if the magnitude of such peaks is not very high.
In a further step, a peaking filter is generated based on the at least one detected peak after detecting the at least one peak. The filter is subsequently used to surgically suppress the peak in the obtained sound signal. For this purpose, one needs to define the bandwidth of the respective filter as well as its negative gain. The bandwidth is also expressed as a Q factor. For the purpose of the present application Q factor is used, although the proposed method can be implemented by setting a certain bandwidth as well. To this regard, Q factor and bandwidth shall be understood as similar and included in the sense of the claims until expressed differently.
It has been found that the Q factor necessary to provide a good suppression is not a constant value for the various possible scenarios, but actually varies depending on the characteristics of each peak. Consequently, a suitable Q factor for peak suppression has to be determined for each of the detected peaks. Hence, the Q factor is geometrically obtained based on determined cut-off frequencies around the at least one detected peak.
However, adjacent negative peaking filters may affect each other in case of suppressing more than one peak. In order to reduce such undesired interaction, the gain for the respective filter elements is derived from a gain interaction matrix with elements based on magnitude responses at cut-off frequencies around the at least one detected peak. In other words, the gain interaction matrix takes the position, Q factor or bandwidth of the respective filter elements into account to achieve a compromise between suppressing the peak and affecting the adjacent filters.
In some instances, the bandwidth of the respective filter elements in a series of filter elements of the peaking filter is adjusted after the Q factor for the respective elements is determined. This ensures that the filter elements do not overlap and the interaction between them is kept at minimum. This limitation may start for filter elements suppressing peaks at low frequency and then continue to higher frequencies. In some instances, the overall bandwidth of filter elements in a peaking filter with several elements may not exceed ⅓ octave.
With the proposed method, it is possible to detect noise peaks and individually suppress them in a flexible and efficient manner. The detection of such peaks allows a flexible adjustment and offer the application of this method for sound and speech recorded at different environments. The present method is largely independent of other sound processing tasks, but can easily implemented in existing workflows. It requires not much computational effort and can therefore be used on recording devices with low computational capabilities for pre-processing.
As stated before, the proposed method can be applied to a single peak, but also to a plurality of peaks. In the latter instance, the plurality of peaks is detected, and the peaking filter is generated based on the detected peaks of the plurality of peaks individually. The peaking filter comprises a cascade of filter elements and each filter element is associated with one of the plurality of the detected peaks. Such approach is suitable, as the gain and Q factor of a respective filter elements affect adjacent filter elements, which will be the case, if the peaks are close together. Alternatively, an individual peak can be detected and a peaking filter with a single element applied to the detected peak. then a second peak is detected, and a new peaking filter applied to the second peak. This process is repeated through all peaks and is particularly useful if the various peaks are spaced apart and do not affect each other due to its low computational effort.
The peaking filter may comprise a plurality of filter elements, each filter element defined by a Q factor and a gain parameter. The gain parameter of each filter element is derived from a gain interaction matrix based on magnitude responses at cut-off frequencies at each of the plurality of detected peaks. The Q factor is based on the geometrically obtained cut-off frequencies around the associated one of the plurality of detected peaks. In other words, the Q factor and the gain parameter define the respective filter element on the peaking filter.
Some aspects concern the step of obtaining smoothed difference signal. For instance, the audio signal is processed to generate a smoothed spectrum of the obtained audio signal. Then, a difference between the obtained audio signal and the smoothed spectrum is determined and calculated to obtain the smoothed difference signal. The smoothing factor of the smoothed spectrum may be adjustable in some instances. In addition and optionally, the difference signal is further smoothed as well with a smoothing factor different from a smoothing factor used for providing a smoothed spectrum. The latter optional step is not necessary, but it will further soften the spectral differences and subsequently simplify the calculation of the Q factor.
One aspect concerns the generation of the smoothed spectrum. It has been generally observed that a “good spectrum” is a smooth spectrum. That is that if any portion “stands out” in the frequency spectrum, like a sharp peak or a hill, then it may very likely be a spurious signal a noise or some other undesired component. Depending on the characteristics of such “mistakes” in the smoothed spectrum, the peaking filter can be applied. Consequently, one may first windowing the obtained audio signal having a certain window length. The window length is in the range between 5 s and 30 s and in particular between 10 s and 25 s and in particular between 15 s and 20 s and in particular shorter than 22 s.
Subsequently, a Short Time Fourier transform of the windowed signal over time is computed and averaged. Alternatively, a periodogram of the windowed signal can be computed, using for example Welch's method, and taking the square root therefrom. The so obtained computed spectrum can be optionally converted to dB scale and also further smoothed.
In some instances, the at least one peak may be detected by applying a threshold to the obtained difference signal. The peak or peaks are identified as they exceed the threshold. In another approach, the step of detecting at least one peak comprises applying a threshold to the obtained difference signal and identifying a largest first peak exceeding the threshold. The position of this peak may be stored for later processing. Then, a largest second peak towards lower frequencies is identified, wherein said peak is at least a minimum distance in frequency from the largest first peak, said largest first peak forming a center.
The latter step above can be repeated with the largest second peak forming a new center until all peaks above the threshold towards the lower frequencies have been identified. Likewise, a largest third peak towards higher frequencies is identified, wherein the third peak is at least a minimum distance in frequency from the largest first peak forming a center. As for the lower frequencies, the step can be repeated with the largest third peak forming the center. All of the identified peaks may be stored in memory. The peaking filter is then applied to the respective sound signal, whereas each filter element with its gain parameter and Q factor calculated is associated with one of the peaks.
In yet another aspect, the step of applying a peaking filter on the at least one detected peak comprises after detecting the at least one peak the calculation of the gain parameter for the filter element associated with the at least one peak. For this purpose, one may use the gain interaction matrix based on magnitude responses at cut-off frequencies around the at least one peak. The gain interaction matrix takes possible interaction of the gains of different adjacent filter elements into account. Afterwards, the Q factor may be calculated based on the determined cut-off frequencies around the at least one detected peak and the gain thereof.
Some aspects concern the calculation of the Q factor. For example, the Q factor may be based on a bandwidth given by the logarithmic value of a difference by the respective cut-off frequencies around the at least one detected peak. In some instances, the Q factor can be derived by identifying the two inflection points of the smoothed difference signal around the at least one detected peak (on the left and right side). Then, a virtual tangent through the inflection point corresponding to the steepest slope is computed. The expression “computing a virtual tangent” does include approximating a function that closely resembles the function of the tangent. The crossing point between the virtual tangent and an average gain value is subsequently determined. The average gain value in such case is derived from gain values of the target peak and considered to be half of said value. Based on the determined crossing point the bandwidth can be computed (i.e. by mirroring the crossing point on an axis through the peak on the frequency axis. Based on the bandwidth, the Q factor is determined, which also depends on the crossing point.
In some other instances, the Q factor is determined by calculating a bandwidth given by the identified crossing point and a second crossing point having the same frequency distance from the at least one detected peak as the identified crossing point. In yet another instance, the Q factor is derived by computing the second derivative of the smoothed difference signal. The frequency coordinates or the respective pair (i.e. frequency and gain value) are identified in the smoothed difference signal around the at least one peak from this derivative. The frequency coordinates correspond to the positions, in which the second derivative changes its sign. The gain values of the two coordinates are determined from the smoothed difference signal and the Q factor is obtained in response to the two gain values.
In yet another aspect, the bandwidth is determined based on an average of the determined gained values and a second derivative function through one of the identified two points, said one of the identified two points having a local extreme in the first derivative. Alternatively, the bandwidth is determined based on half of the center gain, also referred to as gain midpoint and a second derivative function through one of the identified two points, said one of the identified two points having a local maximum in the first derivative.
Another aspect concerns a computer system having one or more processors and a memory coupled to the one or more processors. The memory comprises instructions, which when executed by the one or more processors cause the one or more processors to perform the method according to any of the preceding claims. A non-transitory computer-readable storage medium may also comprise computer-executable instructions for performing the method according to any of the preceding claims.
Further aspects and embodiments in accordance with the proposed principle will become apparent in relation to the various embodiments and examples described in detail in connection with the accompanying drawings in which
The following embodiments and examples disclose various aspects and their combinations according to the proposed principle. The embodiments and examples are not always to scale. Likewise, different elements can be displayed enlarged or reduced in size to emphasize individual aspects. It goes without saying that the individual aspects of the embodiments and examples shown in the figures can be combined with each other without further ado, without this contradicting the principle according to the invention. Some aspects show a regular structure or form. It should be noted that in practice slight differences and deviations from the ideal form may occur without, however, contradicting the inventive idea.
Recorded sound signals may contain continuous or periodic noises coming from various but artificial origin. Those origins include for example electric humming from circuits, open or closed electric loops and the like. Usually, such noises are on a single frequency or rather narrow band within the audible range and not considered broad or wide band. During processing of the recorded signals, those noise peaks should be suppressed to improve the listener's experience.
For the purpose of denoising, one may apply a negative peaking EQ filter onto the signal, wherein the negative peaking EQ filter contains a narrow, but highly negative gain, and therefore selectively attenuates the selected narrow region in the frequency spectrum. Several filter types can be used for this purpose. For the present examples, a second order IIR filter is utilized; however, it is to be understood that various filter types and even combinations thereof can be used to achieve the desired result, that is to suppress the spurious noise without affecting the use signal too much. The transfer function of a negative peaking EQ filter for suppressing a single peak at the center frequency fc with a sampling rate of fs and a given Q factor as well as a gain gab is given by
The recorded sound signal may comprise more than a single noise peak.
Filter element P1 comprises a center frequency at appr. 12 kHz, while filter elements P2 and P3 have their center frequencies at 15 kHz and 20 kHz, respectively. Each filter is characterized by a Q factor and a gain value, the latter being −6 dB, −4 dB and −3 dB, respectively. For the proposed negative peaking EQ filter and the method in this application, the Q factors, which are based on the bandwidth of the filter elements are adjustable to count for different possible bandwidth of the noise peaks. This is visible in the bandwidth of the negative peak in
It has also been found that the Q factor as well as the gain parameters of adjacent filters will affect each other, if the peaks to be suppressed are closely spaced apart. This behavior is visible between peaks P1 and P2, respectively. The overall gain for the negative peaking filter will therefore be based on a gain interaction matrix that takes the various peaks and its bandwidth into account. The gain for each filter will be calculated based on those parameters.
After recording, the sound file is processed using processes 3 or 4. As illustrated, the sound file can be processed in parallel using processes 3 and 4, or sequentially, such that the output of process 3 is applied as input to process 4.
The audio signal is obtained in step S1 and stored as a digital signal in the memory. A smoothed difference signal is then obtained from the stored audio signal and a smoothed spectrum thereof in step S2. Various embodiments are possible for such process and will be explained in greater detail with regards to
The smoothed difference signal is represented as a frequency-gain spectrum. In step S3, at least one peak is detected within the smoothed difference signal. A portion of the smoothed difference signal will be identified as peak, if it exceeds a certain threshold. The threshold is adjustable and may be set in dependence on the frequency at which the peaks occur. Such approach can ensure that noise peaks in frequency bands of interest, that is the frequency range for which the ear is most sensible, are more suppressed than those in other peaks. For example, since this method is employed to correct speech, peaks found in frequencies below 300 Hz are substantially ignored. This way the risk of accidentally suppressing the fundamental frequency of the speaker is reduced. The position and “strength” or volume of the detected peaks are stored in memory.
In a subsequent step S4, a negative peaking filter response is determined based on the above detected peaks. More particularly, the negative peaking filter response to be calculated is defined as one or more filter element, each filter element centered around one or more of the detected peaks. Each filter element is given by a Q factor and a gain parameter, which in turned are determined from the respective detected peaks of the smoothed difference signal. Due to the above-mentioned possible interaction between adjacent filter elements, one may calculate the gain parameter and Q factor based on this behavior.
The gain parameter as calculated in step S5 is derived from a gain interaction matrix with elements based on magnitude responses at cut-off frequencies around the at least one detected peak. The gain interaction matrix is used to adjust for the interactions between each filter element in the negative peaking EQ filter. If there is only a single filter element, that is the target spectrum and the cascade consisted of a single negative peaking EQ filter, the desired gain would simply be the value of the magnitude spectrum at its peak. The same approach can be applied if the filter elements are spaced far apart from each other, without interaction between the elements.
However, this is rarely the case. Usually, one needs to estimate a series of filter elements, closely spaced, where each filter element of the negative peaking EQ filter interacts with the others in a non-linear fashion. An adjustment for the interactions between each filter element in the cascade is required to calculate the gains for each of those filter elements within the negative peaking EQ filter. For this reason, an interaction matrix MI is constructed that reads:
A matrix element Hn (ejam) corresponds to the magnitude response of the n-th filter element at the m-th point, were it is to be placed alone in the cascade. The above-mentioned m points are associated with the cut-off frequencies of each of the N filter element, as well as their geometric means
The filter elements of the negative peaking EQ filter used to compute the interaction matrix are characterised by the Q factor, whose calculation is explained with respect to
A prototype gain of gdb=17 db is used as an initial starting point for each filter element. It is found that this is a good starting point for the calculation. Then, the individual gains for the filter elements are calculated and stacked in a vector using the following iterative equation:
Element t of equation (4) corresponds to the values of the target spectrum at the m points, MI(i),t is the Moore-Penrose pseudoinverse of M at the i-th step of the iteration, defined as:
and MI is the interaction matrix computed from equation (3) but calculated with filter gains at the i-th step: gdb(i). Equation (4) converges in very few iterations (1 or 2 are usually sufficient).
Apart from the gain factor, the Q factor is also calculated for each filter element using an approach explained in
If there is only a single peak detected, the process is relatively simple; however with more than a single peak, the following steps are repeated until filter parameters Q and gain are determined for all identified peaks. For this process, the step S36, namely identifying the bandwidth for each filter element is branched into steps S360 to S367. Those steps are repeated.
In step S360, the largest negative peak is selected as the first peak for which the filter parameters are to be determined. The expression “minimum gain” corresponds thereby to the “most negative” peak in the target response. Once the initial peak is selected, the process continues by identifying the inflection point on the side with the steepest curve. The inflection point x is thereby defined as f″ (x)=0 where f″ (x) is the second derivative of the target response. Likewise, the steepest ascent corresponds to the extreme (and more precisely), the minimum of the first derivative of the target response function. In other words, one starts at the peak and identifies the position in frequency and gain, for which the second derivative becomes zero, while the first derivative has its extreme value (basically, there are two points for which the second derivative becomes zero, namely one on the left side and one on the right side, and the one with a lower value for the first derivative is chosen.
In step S362, a virtual line is extended from the identified point of step S361 in a linear fashion. Particularly, the tangent on the identified point is derived. The tangent will virtually intersect with a constant corresponding to gab/2 at an intersection point ip, wherein the gab is the target gain value. The determined intersection point ip corresponds to half of the bandwidth for the filter element centered around the peak as pointed in S364. The bandwidth is expressed in octaves and stored in a memory in step S265.
The above steps S361 to S365 are then repeated by first considering the next peak towards lower frequencies of the target response spectrum that are distanced by a specified distance from the current peak. This newly identified peak is set as new current peak and the steps repeated. If all peaks towards lower frequencies have been processed, the procedure is again repeated in step S367, this time with all identified peaks towards higher frequencies. Of course, the direction (first towards lower frequencies, then towards higher frequencies can be changed).
When the final bandwidth has been calculated, the process continues by transforming the obtained bandwidth for the respective filter element into corresponding Q factors, using for instance the above-mentioned equations in step S40. The gain interaction matrix is constructed in step S41 from the respective cut-off frequencies as in the previous example. Finally, the respective gains for each of the individual filter elements are calculated based on the gain interaction matrix and the type of filter in steps S42. The Q factor and the gain parameter for each filter element is combined to generate a full peaking filter for the audio signal. The generated filter is then applied to the sound signal to suppress the noise peaks.
where Δf is the bandwidth of a single spectrum bin, and the two smoothing functions starting from equation (6) are applied one after the other.
In accordance with the proposed method, the target response is now to be matched by a negative peaking EQ filter having the same or almost the same response with a negative peak at 6.5 kHz. The proposed approach disclosed herein generates such response, with the result being presented in
The estimation or determination of the Q factor for each filter element within the negative peaking EQ filter is an important aspect to obtain a filter that surgically suppresses the noise peaks, but leaves other portions of the signal unharmed. The Q factor should not be too large to avoid affecting adjacent filters or suppress useful portions of the signal but also not too low to ensure sufficient suppression. It has been found that the Q factor comprises values that can vary a lot (Q values in the range from 10 to 100 have been found useful). The flexible and adjustable determination of the Q factor is based on a geometric approach.
The term B is the bandwidth given in octaves and can be expressed by
wherein fi and fr are the midpoints at the left and right of the peak frequency respectively. This means that if we are able to specify the frequencies that the midpoints of the filter element of the peaking EQ filter should lie (and therefore its bandwidth B) the Q factor can be determined with the above equation. However as stated previously adjacent peaking filters (or other filters) would affect the magnitude response. Still, the following behaviour can be observed:
If there is another filter element of the peaking EQ filter close by, the additional filter would “pull” the magnitude response of the first filter element down with it, making it less “steep”. This behaviour can for example be observed in
The proposed principle now “guesses” where the two midpoints should be and therefore define its bandwidth by first identifying the inflection point on the steepest slope. For this purpose, one should calculate the first derivative of the target response curve as well as the second derivative therefrom. These two curves are depicted in
As one can see, the first derivative F1 has its extreme value on the left side of center frequency point P1, whereas the inflection point P2 is given by the zero crossing P2-0 on said steep side. When an inflection point can be identified, it means that one of two possible outcomes:
Both cases (a) and (b) can be addressed in a similar fashion depicted in
The midpoint is the point P3 where this line intersects the constant line at gab/2 as shown in
The two midpoints P3 and P3′ are part of the filter element response curve, that can be calculated by using the bandwidth and determining the Q factor therefrom.
Another aspect to be addressed is the dynamic behaviour of the detected peaks over time. Peaks may only occur periodically or also change its frequency slightly. For this reason, the filter elements of the negative peaking EQ are negatively extended to adapt to changes across time. The configuration of the filter is therefore changed as well over time. For a biquad 2nd order IIR filter used in the examples presented herein, its time independent configuration can be written as a differential equation:
The parameters ak and bk are the same as the ones in equation (1) above. With the above-mentioned dynamic behaviour of the detected peaks over time, the parameters ak and bk are becoming time-dependent themselves and the overall configuration is also time dependent, resulting in:
Since the parameters ak and bk are functions of Q and gab, it is possible to parametrize those over time. In addition, constraint on the Q factor and the gain parameter, for example such that Q cannot change more than 10% its original value and gab has always to be between 0 and −30. The constraints ensure that Q and gab do not diverge over time but stay within well-defined boundaries. To reduce the likelihood of sudden jumps of peaks in the Q factor or gain over time, one may further smooth the Q factor and the gain, resulting in:
The parameters 11 and 12 comprise small values, close to 0.
The present application provides a simple but efficient way to suppress noise peaks in a recorded sound signal, particularly in a sound signal containing speech. The proposed method utilizes a geometric approach for estimating the Q factor of a filter element in a peaking filter or a peaking filter cascade, respectively. The method is flexible due to its usage of a smoothed spectrum and the determination of a target response instead of pre-recorded speech target “profiles” where they match against. This enables to process all kinds of different sound signals with varying noise peaks and levels. While the present exemplary embodiment of the proposed principle herein concerns a negative peaking EQ filter; that is a filter that is used for noise suppression, one can utilize this method also to enhance certain portions of the frequency spectrum in the same way. To this extent, the present application proposes a method for processing sound signals, in which a peaking EQ filter is determined based on a target response determined by the difference between a smoothed spectrum and the original sound signal. The gain itself can be negative for suppression the center frequencies around the filter elements of the peaking EQ filter, positive for enhancement, but also a combination thereof.
| Number | Date | Country | Kind |
|---|---|---|---|
| PA202270098 | Mar 2022 | DK | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/EP2023/056200 | 3/10/2023 | WO |