The present invention relates to the field of acoustic feedback suppression of closed-loop systems. The closed-loop systems mentioned in the present invention are a category of systems whose system inputs are influenced by system outputs, including, for example, hearing aid systems and public address systems. The present invention specifically relates to a deep learning-based method for acoustic feedback suppression in a closed-loop system.
Sound reinforcement systems are widely used in multimedia electric classrooms, local conference systems and hearing aids as well as artificial cochlea, etc. Such an electro-acoustic system at least includes one microphone, one amplifier and one sound generating unit such as speaker, etc. Acoustic feedback means that when microphone and the speaker are in the same acoustic environment, there exists acoustic coupling due to a small distance therebetween. That is, the microphone picks up an external audio signal, and the audio signal passes through the amplifier and then is played back by the speaker, subsequently passes through a feedback path, is collected by the microphone and amplified by the amplifier again, and then is played back by the speaker again, thereby forming a positive feedback in a continuously cyclic manner. When a frequency meets Nyquist instability conditions, the signal magnitude increases continuously and howling occurs. Too large a signal magnitude can even cause a serious damage to audio equipment. Therefore, suppression of acoustic feedback can not only improve the sound reinforcement performance of the system, but also can ensure the stability and safety of the sound reinforcement system.
An object of the present invention is to overcome the problem in the prior art that the signal magnitude is too large and can cause a serious damage to audio equipment.
To achieve the above object, the present invention is implemented by the following technical solution.
The present invention proposes a deep learning-based method for acoustic feedback suppression in a closed-loop system, the method including:
As one of improvements of the above technical solution, the model is trained in an offline training mode in the method, including the following steps in the training:
As one of improvements of the above technical solution, the closed-loop system includes a forward path amplification module and a delay module; and the modeling of the closed-loop system of acoustic feedback is expressed as:
y(t)=v(t)+u(t)*f(t)
As one of the improvements of the above technical solution, generating a unit impulse response of an acoustic feedback path by simulation comprises:
Y(ω)=V(ω)+U(ω)F(ω)
U(ω)=Y(ω)G(ω)
As one of the improvements of the above technical solution, G(ω) is set to a constant G, an if G is related to the angular frequency, a transfer function in the closed-loop system is:
As one of the improvements of the above technical solution, if the closed-loop system further comprises an adaptive feedback cancellation module and a post-processing module, the transfer function in the closed-loop system is expressed as:
As one of the improvements of the above technical solution, generating a unit impulse response of an acoustic feedback path by simulation includes:
g(t)=Gδ(t−τfs)
As one of improvements of the above technical solution, mapping target of the deep learning neural network includes:
z(t)=v(t)+αn(t)
s(t)=Gv(t−τfs)
S(k,l)=Sr(k,l)+iSi(k,l)
U(k,l)=Ur(k,l)+iUi(k,l)
{{tilde over (S)}rc,{tilde over (S)}ic}=G(Urc,Uic;Φ)
S
c
=|S|
β
exp(j∠(S))
As one of improvements of the above technical solution, a mean squared error between an estimated result and a training target is directly used as the loss function, and the complex spectra and magnitude spectra are limited on the loss function; and
L
Mag+RI
=λL
RI+(1−λ)LMag,
L
Mag
=∥|{tilde over (S)}
c
|−|S
c|∥F2,
L
RI
=∥{tilde over (S)}
r
c
−S
r
c∥F2+∥{tilde over (S)}rc−Sic∥F2
As one of improvements of the above technical solution, when the trained model is applied to the closed-loop system, the model outputs a compressed complex spectrum {tilde over (S)}c(k,l) of an estimated signal, and {tilde over (S)}c(k,l) is decompressed to recover a complex spectrum {tilde over (S)}(k,l) which is expressed as:
{tilde over (S)}(k,l)=|{tilde over (S)}c(k,l)|1/β
Compared with the prior art, the present invention has the following advantages:
In the method, first, closed-loop signals are obtained by simulating a feedback path, and a training dataset is built by using parallel training data for deep learning formed by the closed-loop signals and open-loop signals together; then, a deep learning model is trained by using the built training dataset in an offline training mode; and after the trained model is applied to a closed-loop system, it can effectively suppress feedback signals, improve the quality and intelligibility of speech, and significantly improve the gain of the sound reinforcement system.
Advantage 1: Generating a unit impulse response of an acoustic feedback path by simulation does not need to measure a large number of acoustic feedback paths; this is important in hearing aid applications because measuring a large number of acoustic feedback path unit impulse responses is very difficult and needs a large amount of work, and it is difficult to traverse a variety of complex situation.
Advantage 2: It achieves the first deep learning-based suppression system of a marginally stable system, and can simultaneously solve the marginal howling problem, the comb filtering effect problem and the coloration effect problem, caused by feedback, thereby achieving high-quality audio signal output.
Advantage 3: It achieves denoising and feedback removal simultaneously; by using a closed-loop data generation method and adopting offline model training, it achieves the objective of denoising and feedback removal of an online closed-loop system; compared with a deep denoising method which can only suppress noise but cannot suppress feedback components of an audio signal of a closed-loop system, the present method has obvious advantages.
In view of the acoustical feedback phenomenon in a sound reinforcement system, a deep learning-based method for acoustic feedback suppression is proposed in the present invention. First, a training set is constructed in such a manner that a large number of unit impulse responses of an acoustic feedback path are generated first by simulation; using speech and audio signals as an external audio input, a target audio signal is generated under an open-loop condition, an audio signal with feedback is generated under a closed-loop marginally stable condition, and noise is superimposed therewith to generate a noisy audio signal with feedback; next, framing and feature extraction are performed on the noisy audio signal with feedback, learning targets are extracted frame by frame and point by point according to the target audio signal and the noisy audio signal with feedback, a deep neural network model is established, and the network is trained in an offline mode until an error converges to a certain range to complete the model training; finally, in an actual system testing and application stage, framing and feature extraction is performed on the noisy audio signal with feedback in the closed-loop system, and the trained deep neural network model is used to process the same to obtain a time-frequency complex spectrum of the target audio signal, and a time-domain target audio signal is reconstructed finally.
The present invention provides a deep learning-based method for acoustic feedback suppression. Directed to the problem of howling that may be present in hearing aids or live sound reinforcement and other acoustic feedback systems, the method is used to train a deep neural network model in an offline training mode, and the model is then placed in a closed-loop actual system to perform feedback suppression on signals, including specific steps as follows:
The technical solution provided in the present invention is further described below in conjunction with embodiments.
A flow diagram of a deep learning-based method for acoustic feedback suppression in a closed-loop system in Embodiment 1 of the present invention is shown in
y(t)=v(t)+u(t)*f(t) (1)
where t is sampling time, and * is convolution operation. By performing Fourier transforms on the time-domain signals, we obtain:
Y(ω)=V(ω)+U(ω)F(ω)
U(ω)=Y(ω)G(ω) (2)
where ω is an angular frequency. Without loss of generality, we assume that a forward path gain is a fullband gain, i.e. G(ω)=G is a constant; and if G(ω)=G is frequency-related, a frequency-related part can be incorporated into a frequency response of the acoustic feedback path. Thus, we can obtain a speaker to microphone closed-loop transfer function:
According to the Nyquist instability criterion, if a loop gain meets the following conditions:
where ∠● represents taking a phase, and |⋅| represents taking a modulus; that is, at a position of the angular frequency ω, if the modulus of the loop gain function is greater than or equal to 1, and a phase angle of the loop gain function is integer times of 2π, a sound reinforcement system oscillates at the frequency, resulting in howling. Thus, a marginally stable gain of the closed-loop system is obtained:
where τ is delays of all signal processing systems in the sound reinforcement system in second (s), fs is a sampling frequency in Hertz (Hz).
where fenv is a modulation frequency, φenv is a random phase, r(t) is a zero-mean Gaussian process, σ≥0 is a decay function, and tf is introduced, which represents the time when exponential decay of the transfer function starts.
In the closed-loop system, a forward path amplification module is expressed as:
g(t)=Gδ(t−τfs) (10)
where δ represents a Dirac function. In this method, to ensure that the constructed data is not infinite and not all audio signals without feedback, the value of G is in the range of G∈[0.5Gmax,0.999Gmax].
A signal u(t) not subject to feedback suppression processing fed to the speaker and a microphone pickup signal y(t) can be obtained from the equations (9) and (10) and a target signal source v(t).
z(t)=v(t)+αn(t) (11)
where a is the amount of injected noise calculated according to the signal-to-noise ratio.
The signal z(t) obtained is used as an input to the closed-loop system to obtain a noisy signal u(t) with feedback; and this signal is used as an input signal to the neural network, and a target signal is mapped, which is expressed as:
s(t)=Gv(t−τfs) (12)
K-point short-time Fourier transforms are performed on u(t) and s(t), respectively, to obtain complex spectrum expressions thereof at a time frame l and a frequency band k:
where w(t) is a window function, and R is a frame shift. Equation (13) is expressed as the form of a real part and an imaginary part:
S(k,l)=Sr(k,l)+iSi(k,l)
U(k,l)=Ur(k,l)+iUi(k,l) (14)
where Sr(k,l) and are the real part and the imaginary part of S(k,l), respectively, and Ur(k,l) and Ui(k,l) are the real part and the imaginary part of U(k,l), respectively.
In this method, we use a complex spectral mapping learning method, i.e., training the neural network to learn mapping from {Ur(k,l),Ui(k,l)} to {Sr(k,l),Si(k,l)}. This process can be expressed as:
{{tilde over (S)}rc,{tilde over (S)}ic}=G(Urc,Uic;Φ)
S
c
=|S|
β
exp(j∠(S)) (15)
where G(●,●;Φ) is a mapping function of the neural network, with Φ being a network parameter; (●)c represents compression operation on a spectrum, and βc∈[0,1] is a compression coefficient; {tilde over (S)}rc and {tilde over (S)}ic are real and imaginary parts of a compressed complex spectrum of an estimated signal, respectively, and Urc and Uic are real and imaginary parts of a compressed complex spectrum of an input feature signal, respectively.
Parameters of the convolutional layers are expressed in the form of convolutional kernel, number of channels, and skip value, and their input and output dimensions are expressed in the form of number of channels, time dimension, and feature dimension. In addition, the training batch of the network is set to 16, the number of iterations is set to 30, network training is optimized by using an Adam optimizer with a learning rate of 1.0×10−3 and a decay rate of s1.0×10−7, and then training is started. The deep neural network here can be in the form of other networks, such as a deep neural network based on magnitude mapping, a deep neural network based on real or complex mask mapping; the goal can also be achieved by using a shallow neural network, which is still a simple extension of the present invention. Using the parallel data construction method proposed in the present invention, as well as the offline training and online application mode, even with a time-domain deep neural network model, is also a simple extension of the present invention. This implementation uses a network model established based on deep learning. In fact, it is also possible to be based on other machine learning methods, and using other machine learning methods is also protected by the present invention.
L
Mag+RI
=λL
RI+(1−λ)LMag,
L
Mag
=∥|{tilde over (S)}
c
|−|S
c|∥F2,
L
RI
=∥{tilde over (S)}
r
c
−S
r
c∥F2+∥{tilde over (S)}rc−Sic∥F2 (16)
where λ is a weight coefficient with a value between 0 and 1, generally of 0.5. λ should approach 0 in low signal-to-noise ratio scenarios, and λ should approach 1.0 in high signal-to-noise ratio scenarios. If the magnitude spectra are used for mapping in the network, the value of λ is 0. Using other loss functions, such as SI-SDR, is still a simple extension of the present invention.
{tilde over (S)}(k,l)=|{tilde over (S)}c(k,l)|1/β
Inverse Fourier transform is performed on the complex spectrum and an overlap-add method is applied thereto to obtain a time-domain form {tilde over (s)}(t) of the estimated signal.
The present invention relates to acoustic feedback suppression in a closed-loop system such as a hearing aid system and a live sound reinforcement system, generates dedicated training datasets and designs a deep neural network architecture to achieve acoustic feedback suppression in a marginally stable state of the closed-loop system. In this method, a large number of unit impulse responses of an acoustic feedback path are generated by simulation to generate closed-loop feedback signals, and a large number of training datasets are generated by using the signals in combination with noise data, and model training is performed therewith. Applying the model to the closed-loop system can effectively suppress the feedback signals, improve the quality and intelligibility of speech, and significantly improve the gain of the sound reinforcement system. The innovation of the method is that first, closed-loop signals are obtained by simulating a feedback path, and a training dataset is built by using parallel training data for deep learning formed by the closed-loop signals and open-loop signals together; then, a deep learning model is trained by using the built training dataset in an offline training mode; and finally, the model is applied to a closed-loop system to achieve acoustic feedback suppression.
From the above specific description of the present invention, it is apparent that after applying the trained closed-loop system acoustic feedback suppression model to the closed-loop system, the present invention can effectively suppress the feedback signals, improve the quality and intelligibility of speech, and significantly improve the gain of the sound reinforcement system.
Finally, it should be noted that the above embodiments are only used for describing instead of limiting the technical solutions of the present invention. Although the present invention is described in detail with reference to the embodiments, persons of ordinary skill in the art should understand that modifications or equivalent substitutions of the technical solutions of the present invention should be encompassed within the scope of the claims of the present invention so long as they do not depart from the spirit and scope of the technical solutions of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2022108251680 | Jul 2022 | CN | national |