Conventional independent component analysis has had a problem that performance deteriorates when the number of sound sources exceeds the number of microphones. Conventional l1 norm minimization method assumes that noises other than sound sources do not exist, and is problematic in that performance deteriorates in environments in which noises other than voices such as echoes and reverberations exist. The present invention considers the power of a noise component as a cost function in addition to an l1 norm used as a cost function when the l1 norm minimization method separates sounds. In the l1 norm minimization method, a cost function is defined on the assumption that voice has no relation to a time direction. However, in the present invention, a cost function is defined on the assumption that voice has a relation to a time direction, and because of its construction, a solution having a relation to a time direction is easily selected.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a drawing showing a hardware configuration of the present invention;
FIG. 2 is a block diagram of software of the present invention; and
FIG. 3 is a processing flowchart of the present invention.
Claims
1. A sound source separating device, comprising:
an A/D converting unit that converts an analog signal, from a microphone array having number M microphones, wherein M includes at least two microphones, into a digital signal;a band splitting unit that band-splits the digital signal for conversion to a frequency domain input;an error minimum solution calculating unit that, for each of the bands, has vectors for sound sources exceeding the number M, and has vectors for sound sources that are from 1 to equal to the number M, and that outputs a solution set having minimized error between an estimated signal calculated from the vectors for sound sources 1 to M, a predetermined steering vector, and the frequency domain input;an optimum model calculation part that, for each of the bands in the error minimized solution set, selects a frequency domain solution having a weighted sum of an lp norm value and the error that is minimized; anda signal synthesizing unit that converts the selected frequency domain solution into time domain.
2. The sound source separating device according to claim 1,
wherein the steering vector is obtained by performing source location.
3. The sound source separating device according to claim 1,
wherein the error minimum solution calculating unit calculates a solution with a minimum error for each of the vectors that are equal in number of sound sources to the value zero and number of elements to the value zero, andwherein the optimum model calculation part, from among the outputted error minimum solution set, selects a solution having a weighted sum of a moving average value of the error and the moving average value of lp norm.
4. The sound source separating device according to claim 3,
wherein the error minimum solution calculating unit calculates a solution with a minimum error for each of the vectors that are equal in the number of sound sources to the value zero and the number of elements to the value zero, andwherein the optimum model calculation part, from among the outputted error minimum solution set, selects a solution having a weighted sum of the moving average value of the error and the moving average value of lp norm at a minimum.
5. A sound source separating program, comprising the steps of:
converting an analog signal from a microphone array including M microphones, wherein M is greater than or equal to 2, into a digital signal;band-splitting the digital signal into frequency domain;for each of the bands split, and from among vectors in which sound sources exceeding the number of microphone elements have value zero, and for each vector having sound sources of a number of elements between 1 and M, outputting a solution set having a minimum error between an estimated signal calculated from the vector, a steering vector, and the frequency domain signal;for each of the bands split, and from among error minimum solution set, selecting a solution for which a weighted sum of an lp norm value and the error is minimum; andconverting the selected solution into time domain.
6. A method for sound source separation, comprising:
receiving, at M microphones, an analog sound input;converting the analog sound input from at least two sound sources to a digital sound input;converting the digital sound input from a time domain to a frequency domain;generating a first solution set minimizing errors in an estimation of sound from active ones of the sound sources of number 1 to M;estimating a number of sound sources active to generate an optimal separated solution set that most closely approximates each sound source of the received analog sound input in accordance with the first solution set; andconverting the optimal separated solution set to the time domain.