This application claims the priority benefit of Taiwan application serial no. 107105469, filed on Feb. 14, 2018. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The invention relates to a source separation method and a source separation device, and particularly relates to a source separation method and a source separation device capable of training a phase of a time-frequency signal.
Deep learning is a commonly used algorithm in signal source separation. The deep learning is adapted to convert a mixed signal from a time domain to a frequency domain through a Short-Time Fourier Transform (STFT), and obtain a magnitude of an absolute value thereof to serve as an input value of a deep neural network. Then, the deep learning obtains time-frequency data of the signal to be separated through the trained deep neural network, and then transforms the signal back to the time domain through an inverse STFT (iSTFT). However, to only use a magnitude of a spectrum of the mixed signal to serve as network training data but ignore the phase information, which is an important information implied in the STFT coefficient, may cause a poor hearing quality of the separated signal. Therefore, how to add the phase information to the deep neural network for training is a target to be achieved by related technicians of the art.
The invention is directed to a source separation method and a source separation device, in which phase information is added to a deep neural network for training, so as to improve hearing quality of a separated signal.
The invention provides a source separation method including: obtaining at least two source time-frequency signals and a mixed time-frequency signal corresponding to the at least two source time-frequency signals; disposing the mixed time-frequency signal at an input layer of a complex-valued deep neural network, and taking the at least two source time-frequency signals as a target of the complex-valued deep neural network; calculating a cost function of the complex-valued deep neural network; and performing partial differential to a real part and an imaginary part of a network parameter of the complex-valued deep neural network respectively to minimize the cost function.
In an embodiment of the invention, the source separation method further includes: performing partial differential to the real part of the network parameter to train a magnitude of the mixed time-frequency signal.
In an embodiment of the invention, the source separation method further includes: performing partial differential to the imaginary part of the network parameter to train a phase of the mixed time-frequency signal.
In an embodiment of the invention, the source separation method further includes: taking a quadratic error as the cost function of the complex-valued deep neural network.
In an embodiment of the invention, the source separation method further includes: performing partial differential to the real part and the imaginary part of the network parameter of the complex-valued deep neural network respectively by a gradient descent method.
In an embodiment of the invention, the network parameter includes a weight value and a deviation value.
The invention provides a source separation device including a processor and a memory coupled to the processor. The processor obtains at least two source time-frequency signals and a mixed time-frequency signal corresponding to the at least two source time-frequency signals. The processor disposes the mixed time-frequency signal at an input layer of a complex-valued deep neural network, and takes the at least two source time-frequency signals as a target of the complex-valued deep neural network. The processor calculates a cost function of the complex-valued deep neural network. The processor performs partial differential to a real part and an imaginary part of a network parameter of the complex-valued deep neural network respectively to minimize the cost function.
In an embodiment of the invention, the processor performs partial differential to the real part of the network parameter to train a magnitude of the mixed time-frequency signal.
In an embodiment of the invention, the processor performs partial differential to the imaginary part of the network parameter to train a phase of the mixed time-frequency signal.
In an embodiment of the invention, the processor takes a quadratic error as the cost function of the complex-valued deep neural network.
In an embodiment of the invention, the processor performs partial differential to the real part and the imaginary part of the network parameter of the complex-valued deep neural network respectively by a gradient descent method.
In an embodiment of the invention, the network parameter includes a weight value and a deviation value.
According to the above description, the source separation method and the source separation device of the invention calculate the cost function of the complex-valued deep neural network, and perform partial differential to the real part and the imaginary part of the network parameter of the complex-valued deep neural network respectively to minimize the cost function. During the process of performing partial differential to the imaginary part of the network parameter, the phase of the mixed time-frequency signal is trained, so that the complex-valued deep neural network may acquire better quality of a separated signal.
In order to make the aforementioned and other features and advantages of the invention comprehensible, several exemplary embodiments accompanied with figures are described in detail below.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
Referring to
In a forward transfer derivation of the complex-valued deep neural network, a hidden layer excitation value is x(1)∈N
net(1)=W(1)x+b(1) (1)
x
(1)=ƒ(net(1)) (2)
In the above equations, x∈N
In the forward transfer derivation of the complex-valued deep neural network, an output layer excitation value is x(2)∈N
net(2)=W(2)x(1)+b(2) (3)
x
(2)=ƒ(net(2)) (4)
In the aforementioned equations, W(2)∈N
Referring to
Referring to
In step S303, the mixed time-frequency signal is disposed at an input layer of a complex-valued deep neural network, and the at least two source time-frequency signals are taken as a target of the complex-valued deep neural network.
In step S305, a cost function of the complex-valued deep neural network is calculated. To be specific, the source separation method of the invention first adopts a quadratic error as a cost function of the complex-valued deep neural network, where an error value output by the complex-valued deep neural network is shown as a following equation (5):
εj=dj−yj,
j=1,2, . . . N(2) (5)
Where, dj is an expected output value of the complex-valued deep neural network, and yj is predicted value, i.e. a correct result.
Therefore, a cost value (E) is calculated according to a following equation (6):
E=Σ
j=1
N
|εj|2=Σj=1N
In step S307, partial differential is performed to a real part and an imaginary part of a network parameter of the complex-valued deep neural network respectively to minimize the cost function.
To be specific, in the invention, a gradient descent method is adopted to respectively perform partial differential to the network parameters (i.e. the weight value W, the deviation value b) of the complex-valued deep neural network to iterate an optimal solution. Since E is a real function (a product of εj and conjugate complex numbers thereof), which is non-analytic on a complex plane, it is unable to perform a differential operation. Therefore, it is required to perform partial differential and update to the real part Re(Wjk(2)) and the imaginary part Im(Wjk(2)) of the network parameter, and deriving results thereof are as follows:)
W
jk
(2)[n+1]=Wjk(2)[n]+μ((dj−yj)ƒ′(netj*(2))xk*(1)) (7)
b
j
(2)[n+1]=bj(2)[n]+μ((dj−yj)ƒ′(netj*(2))) (8)
Then, following equations (9), (10) are derivation between the input layer (a subscript l) and the hidden layer (a subscript k):
w
kl
(1)[n+1]=wkl(1)[n]+μ(Σj=1N(2)((dj−yj)ƒ′(netj*(2))wjk*(2)))ƒ′(netk*(1))xl* (9)
b
k
(1)[n+1]=bk(1)[n]+μ(Σj=1N
By performing partial differential to the real part of network parameter, a magnitude of the mixed time-frequency signal is trained, and by performing partial differential to the imaginary part of network parameter, a phase of the mixed time-frequency signal is trained. In the invention, since the complex-valued deep neural network is directly used to learn a Short-Time Fourier Transform (STFT) characteristic, the learned characteristic may keep an original structure of the magnitude and the phase of the signal. Moreover, in the invention, since parameters of each layer of the network are adjusted through an inverse transfer result in an iteration manner, learning of the complex-valued deep neural network is completed after iteration to convergence.
In summary, the source separation method and the source separation device of the invention calculate the cost function of the complex-valued deep neural network, and perform partial differential to the real part and the imaginary part of the network parameter of the complex-valued deep neural network respectively to minimize the cost function. During the process of performing partial differential to the imaginary part of the network parameter, the phase of the mixed time-frequency signal is trained, so that the complex-valued deep neural network may acquire better quality of a separated signal.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
107105469 | Feb 2018 | TW | national |