The present application claims priority from Japanese Application Nos. 2004-045237 filed Feb. 20, 2004 and 2004-045238 filed Feb. 20, 2004, the disclosures of which are hereby incorporated by reference herein.
The present invention relates to a method and an apparatus for separating a sound-source signal and a method and a device for detecting the pitch of the sound-source signal. More particularly, the present invention relates to a method and an apparatus for separating one audio signal from among audio signals from a plurality of sound sources with stereomicrophones, and a method and a device for detecting the pitch of the audio signal.
Techniques for separating a target sound-source signal from an audio signal that is a mixture of a plurality of sound-source signals are known. For example, as shown in
For example, Japanese Unexamined Patent Application Publication No. 2001-222289 discloses one of the known sound-source signal separating techniques which utilizes an audio signal separating circuit and a microphone employing the audio signal separating circuit. In the disclosed technique, a plurality of mixed signals, each mixed signal containing the linear sum of a plurality of mutually independent linear sound-source signals, are frame divided, and the inverses of mixed matrices that minimize correlation of a plurality of signals separated by the separating circuit in connection with zero lag time are multiplied by each other on a per frame basis. An original voice signal is thus separated from the mixed signal.
Japanese Unexamined Patent Application Publication No. 7-28492 discloses a sound-source signal estimating device for estimating a target sound source. The sound-source signal estimating device is intended for use in extracting a target audio signal under a noisy environment.
The pitch of a target sound is determined to separate a sound-source signal. As a technique to detect pitch, Japanese Unexamined Patent Application Publication No. 2000-181499 discloses an audio signal analysis method, an audio signal analysis device, an audio signal processing method and an audio signal processing apparatus. According to the disclosure, an input signal having a predetermined duration of time is sliced every frame, a frequency analysis is performed for each frame, and a harmonic component assessment is performed based on the result of the frequency analysis for each frame. A harmonic component assessment is performed on the inter-frame difference in the amplitudes in the results of frequency analysis for each frame. The pitch of the input signal is thus detected using the result of the harmonic component assessment.
Microphones more in number than the sound sources are required to separate a plurality of sound-source signals. The use of a plurality of microphones is actually being studied. For example, Japanese Unexamined Patent Application Publication No. 2001-222289 discloses that separating a sound-source signal from three or more sound-sources using two microphones is difficult. Japanese Unexamined Patent Application Publication No. 7-28492 discloses a technique to extract an audio signal from a target sound source using a plurality of microphones (a microphone array). According to these disclosed techniques, more microphones than the number of sound sources are required to separate a target sound-source signal from a mixed signal of a plurality of sound-source signals.
In accordance with known techniques, stereomicrophones used in a mobile audio-visual (AV) device, such as a video camera, have difficulty in separating three or more sound-source signals.
When the pitch of a target sound is determined prior to the separation of the sound-source signals, the pitch detection is preferably appropriate for the separation of the sound-source signals.
Accordingly, it is an object of the present invention to provide a sound-source signal separating apparatus, a sound-source signal separating method, a pitch detecting device, and a pitch detecting method for picking up audio signals (typically acoustic signals) from a plurality of sound sources using a small number of sound pickup devices, such as stereomicrophones, and separating an audio signal of a target sound source.
According to a first aspect of the present invention, a sound-source signal separating apparatus includes a sound-source signal enhancing unit operable to enhance a target sound-source signal in an input audio signal to produce an enhanced sound-source signal, the input audio signal including a mixture of acoustic signals from a plurality of sound sources picked up by a plurality of sound pickup devices; a pitch detector operable to detect a pitch of the target sound-source signal in the input audio signal; and a sound-source signal separating unit operable to separate the target sound-source signal from the input audio signal based on the detected pitch and the enhanced sound-source signal.
The sound-source signal separating unit preferably includes a filter for separating the target sound-source signal from a signal output from the sound-source signal enhancing unit; and a filter coefficient output unit operable to output a filter coefficient of the filter based on information detected by the pitch detector.
The filter coefficient preferably features a frequency characteristic of the filter which causes a frequency component to pass through the filter, the frequency component having a frequency which is an integer multiple of the pitch frequency of the target sound-source signal.
The filter coefficient output unit preferably includes a memory storing filter coefficients corresponding to a plurality of pitches, the filter coefficient output unit reading and outputting a filter coefficient from the memory corresponding to the pitch of the target sound-source signal.
The sound-source signal separating apparatus further includes a high-frequency region processing unit operable to process a portion of the output signal in a consonant band; and a filter bank operable to extract the portion of the output signal in the consonant band from the sound-source signal enhancing unit and to transfer the portion of the output signal in the consonant band to the high-frequency region processing unit, to extract a portion of the output signal in a band other than the consonant band from the sound-source signal enhancing unit and to transfer the portion of the output signal in the band other than the consonant band to the filter, and to extract a portion of the output signal in a vowel band from the sound-source signal enhancing unit and to transfer the portion of the output signal in the vowel band to the pitch detector.
The plurality of sound pickup devices preferably include a left stereomicrophone and a right stereomicrophone.
The sound-source signal enhancing unit preferably corrects the audio signals from the plurality of sound pickup devices with a time difference between sound propagation delays, each sound propagation delay being measured from a target sound source to each of the plurality of sound pickup devices, and adds the corrected acoustic signals from the plurality of sound pickup devices in order to enhance the acoustic signal from only the target sound source. The pitch detector preferably detects the pitch of the target sound-source signal using two wavelengths of the pitch of the target sound-source signal as a unit of detection.
The sound-source signal separating unit preferably includes a fundamental waveform producing unit operable to produce a fundamental waveform based on information detected by the pitch detector using a steady portion of a signal output from the sound-source signal enhancing unit, the steady portion having the same or about the same pitch consecutively repeated throughout; and a fundamental waveform substituting unit operable to substitute a repetition of the fundamental waveform produced by the fundamental waveform producing unit for at least a portion of a signal based on the input audio signal.
Preferably, the pitch detector detects the pitch of the target sound-source signal using two wavelengths of the pitch of the target sound-source signal as a unit of detection. The plurality of sound pickup devices preferably includes a left stereomicrophone and a right stereomicrophone.
Preferably, the sound-source signal enhancing unit corrects the acoustic signals from the plurality of sound pickup devices with a time difference between sound propagation delays, each sound propagation delay being measured from a target sound source to each of the plurality of sound pickup devices, and adds the corrected acoustic signals from the plurality of sound pickup devices in order to enhance the acoustic signal from only the target sound source.
The fundamental waveform producing unit preferably averages the target sound-source signal in a steady portion of the target sound-source signal having the same or about the same pitch consecutively repeated throughout using two wavelengths of the pitch as a unit of detection.
According to a second aspect of the present invention, a sound-source signal separating method includes enhancing a target sound-source signal in an input audio signal to produce an enhanced sound-source signal, the input audio signal including a mixture of acoustic signals from a plurality of sound sources picked up by a plurality of sound pickup devices; detecting a pitch of the target sound-source signal in the input audio signal; and separating the target sound-signal from the input audio signal based on the detected pitch and the enhanced sound-source signal.
According to a third aspect, a pitch detector includes a sound-source signal enhancing unit operable to enhance a target sound-source signal in an input audio signal to produce an enhanced sound-source signal, the input audio signal including a mixture of acoustic signals from a plurality of sound sources picked up by a plurality of sound pickup devices; a period detector operable to detect a two-wavelength period of a signal output from the sound-source signal enhancing unit using two wavelengths of a pitch of the output signal as a unit of detection; and a continuity determining unit operable to determine, in response to a change in the two-wavelength period detected by the period detector, whether the same or about the same pitch has been consecutively repeated, and to output pitch information as the result of the determination.
The plurality of sound pickup devices preferably include a left stereomicrophone and a right stereomicrophone. The sound-source signal enhancing unit preferably corrects the acoustic signals from the plurality of sound pickup devices with a time difference between sound propagation delays, each sound propagation delay being measured from a target sound source to each of the plurality of sound pickup devices, and adds the corrected acoustic signals from the plurality of sound pickup devices in order to enhance the acoustic signal from only the target sound source.
According to a fourth aspect of the present invention, a pitch detecting method includes enhancing a target sound-source signal in an input audio signal to produce an enhanced sound-source signal, the input audio signal including a mixture of acoustic signals from a plurality of sound sources picked up by a plurality of sound pickup devices; detecting a two-wavelength period of a signal output from the sound-source signal enhancing step using two wavelengths of a pitch of the output signal as a unit of detection; and determining, in response to a change in the two-wavelength period detected in the period detecting step, whether about the same pitch has been consecutively repeated, and outputting pitch information as the result of the determination.
According to a fifth aspect of the present invention, a sound-source signal separating apparatus includes a pitch detector operable to detect a pitch of a target sound-source signal of an input audio signal using a wavelength twice the pitch of the target sound-source signal as a unit of detection, the input audio signal including a mixture of acoustic signals from a plurality of sound sources; and a sound-source signal separating unit operable to separate the target sound-source signal based on the detected pitch.
According to a sixth aspect of the present invention, a sound-source signal separating method includes detecting a pitch of a target sound-source signal of an input audio signal using a wavelength twice the pitch of the target sound-source signal as a unit of detection, the input audio signal including a mixture of acoustic signals from a plurality of sound sources; and separating the target sound-source signal based on the detected pitch.
The embodiments of the present invention are described below with reference to the drawings.
As shown in
In such a sound-source signal separating apparatus, the pitch detector 12 detects the pitch (the degree of highness) of a steady portion of the audio sound where the same or about the same pitch, such as a vowel, continues. The pitch detector 12 outputs the detected pitch and also information indicating the steady portion (for example, coordinate information along a time axis representing a continuous duration of the steady portion) as necessary. The delay correction adder 13 serves as sound-source signal enhancing means for enhancing a target sound-source signal. The delay correction adder 13 adds a time delay to the signal from each of the microphones in accordance with the difference in a propagation delay time from each of the sound sources to each of a plurality of microphones (two microphones in the case of a stereophonic system) and sums the delay corrected signals. The signal from a target sound source is thus strengthened and the signal from the other sound source is attenuated. This process will be discussed in more detail later. The separation coefficient generator 14 generates the filter coefficient to separate the signal from the target sound source in accordance with the pitch detected by the pitch detector 12. The separation coefficient generator 14 also will be discussed in more detail later. The filter calculating circuit 15 performs a filter process on a signal output from the delay correction adder 13 (via the filter 20A as necessary) using the filter coefficient from the separation coefficient generator 14 to separate the sound-source signal from the target sound source. The high-frequency region processor 17 performs a predetermined process on the output, such as a non-steady waveform including a consonant, from the delay correction adder 13 (via the high-pass filter 20B as necessary). The output of the high-frequency region processor 17 is supplied to the adder 16. The adder 16 adds the output from the filter calculating circuit 15 to the output from the high-frequency region processor 17, thereby outputting a separated output signal of the target sound to an output terminal 18.
The basic structure of the delay correction adder 13 of
The delay correction adder having the structure of
The adder 34 sums the delay output signals from the delay circuit 32L and the delay circuit 32R, thereby enhancing only the audio signal having a higher correlation factor. In the vowel portion having a repeated waveform, phase aligned segments are summed for enhancement while phase non-aligned segments are attenuated. The signal with only the target sound intensified or enhanced is thus output from the output terminal 35. When the subtracter 36 performs a subtraction operation to the delayed output signals from the delay circuits 32L and 32R, the phase aligned segments are subtracted one from another, and only the sound from the target sound source is attenuated. A signal with only the target sound attenuated is thus output from the output terminal 37.
The correlation factor is now described. The delay corrected waveform as described above offers a higher degree of waveform match while the other waveform with the phase thereof out of alignment offers a low degree of waveform match. The correlation factor “cor” representing the degree of waveform match is determined using equation (1):
where m1 and m2 are time samples of the microphones MCL and MCR, and S1 and S2 are standard deviations. Equation (1) determines a correlation factor cor of n pairs of samples (m11, m21), (m12, m22), . . . , (m1n, m2n).
A pitch detection operation of the pitch detector 12 is described below.
If the signal waveform of
The actual signal waveform contains a wave having a wavelength longer than the pitch period Tx (pitch wavelength λx) corresponding to the duration between the adjacent peak intervals. In particular, a component having a pitch period Ty (=2Tx) twice the pitch period Tx, namely, a component of a frequency Fy (=Fx/2) half the pitch frequency Fx is relatively strong as shown in the spectral diagram of
In accordance with one embodiment of the present invention, a period Ty twice the period Tx between peaks (pitch wavelength λx) is used as a unit in the pitch detection. If the peak is detected every two wavelengths, the pitch detection is performed at each peak having a similar shape, and the error tends to become smaller. Even if the timing of the start of the pitch detection is shifted by one wavelength, the results are statistically the same. Other integer multiples of wavelengths, such as four wavelengths, six wavelengths, eight wavelengths, . . . , can be used as the peak detection interval. However, if the peak is detected every four wavelengths, for example, the error level is lowered. A disadvantage with the four wavelengths is the increased number of samples.
The pitch detection operation is described below with reference to
In step S44, the peak value detector 24 detects a maximal peak value. In this step, local peak values represented by the letter X in the waveform diagram of
d(n)−d(n−1)>th and d(n+1)−d(n)<−th (2)
where the point “n” is a maximal peak point and the sample value at the point “n” is the maximal peak value.
In step S45, the maximum value detector 25 of
In step S46, the maximum-to-maximum value pitch detector 26 detects an interval between a first maximum value and a second maximum value of the maximal peak values, detected in step S45, namely, a pitch of every two maximum values (equal to two wavelengths). In other words, the pitch detection is performed every two wavelengths. Pitch detection means detection of the period Ty (=2Tx). The detected period Ty (or the frequency Fy=1/Ty) is used instead of the original pitch period Tx (or the original pitch frequency Fx). When the coordinate of the sample point of the signal waveform is expressed by the sample number, the period Ty determined in the pitch detection is expressed by the number of samples (the difference between the sample numbers). Let max 1 represent the coordinate (sample number) of the first maximum value and max 3 represent the coordinate of the third maximum value, and the following equation (3) holds:
Ty=max 3−max 1 (3)
Step S47 and the subsequent steps correspond to the process performed by the continuity determiner 27. In step S47, the pitches prior to and subsequent to the pitch detection interval unit are compared to each other. In this case, the pitch period Tx can be determined from Ty/2. Alternatively, the period Ty detected in the pitch detection process can be used as is. The ratio “r” of the pitch (or the period Ty) of one pitch detection unit to that of a next pitch detection unit is determined. For example, the period Ty of the two wavelengths is used, and let Ty(n) represent the two wavelength period of the current pitch detection unit “n”, and the pitch ratio r (here the ratio of the period Ty) is expressed by the following equation (4):
r(n)=Ty(n)/Ty(n−1) (4)
In step S48, a steady portion having stable pitch ratios “r” (the ratio of the period Ty), from among those determined in step S47, is determined. It is determined in step S48 whether the absolute value |Δr| (=|1−r|) of a rate of change of the ratio “r” is smaller than a predetermined threshold th_r. If it is determined that the absolute value |Δr| is smaller than the threshold th_r (i.e., yes), processing proceeds to step S49. The continuity determination flag is set (to 1), or a counter for counting the steady portions having stable pitches is counted up. If it is determined in step S48 that the absolute value |Δr| of the rate of change of the ratio “r” is larger than or equal to the threshold th_r (i.e., no), processing proceeds to step S50. The continuity determination flag is reset (to 0). The predetermined threshold th_r is 0.05, for example. As shown in
In step S51, it is determined whether the detected pitches (or the detected periods Ty) exhibit continuity. If the continuity determination flag, set in step S49, is consecutively counted by five times or more, it is determined that there is continuity. The detected pitch (or the period Ty) is thus determined as being effective. For example, as shown in
If it is determined in step S51 that there is continuity (i.e., yes), processing proceeds to step S52. The coordinates (time) of the steady portion throughout which the same or about the same pitch is repeated on the time axis is output. In step S53, the representative pitch (the mean value of the period Ty within the steady duration) is output, and processing thus ends. If it is determined in step S51 that no continuity is observed (i.e., no), processing ends. By repeating the process shown in
In summary, at least two sound sources are handled with respect to the stereomicrophones. To separate the sound emitted from a target person, the pitch of a steady portion of the mixed signal waveform, such as a vowel, is detected. In this case, the highness of the sound, and the sex of the person are not important. If the waveform is not a mixture, the variation in the level direction thereof is retained, and the period of the waveform changes with autocorrelation. In the case a mixed signal, the variation in the level direction is not retained. However, the pitch on the time axis is retained. In accordance with the embodiment of the present invention, the pitch is detected according to a two-wavelength period rather than by detecting the peak-to-peak period. In this way, the pitch detection is performed reliably and accurately. A sound separation process is easily performed later.
The operation of the sound-source signal separating apparatus of
The pitch detector 12 of
The pitch detector 12 determines the pitch according to the pitch detection unit, and determines the coordinate (sample number) in each continuity duration or steady portion throughout which the same or about the same pitch is repeated. The sound signal separator using the stereomicrophones of
The pitch detected by the pitch detector 12 is sent to the separation coefficient generator 14. The separation coefficient generator 14 generates a filter coefficient (separation coefficient) for the filter calculating circuit 15 that separates a target sound. The separation coefficient generator 14 generates the filter coefficient in accordance with a band-pass filter coefficient producing equation (5) with the representative pitch obtained by the pitch detector 12 as a fundamental frequency:
where h[i] represents a filter coefficient of a tap position “i”, FIRLEN is the number of filter taps, HLFLEN is (FIRLEN−1)/2, Pi represents a circular constant π, m represents the number of harmonics, and FS represents a sampling frequency. The sampling frequency FS is 4800 for 48 kHz. Furthermore, Lo[n] and Hi[n] represent bandwidths in frequencies of harmonics, where Lo[n] is for a higher frequency, and Hi[n] is for a lower frequency. Any bandwidth is acceptable, but is typically determined taking into account separation performance. The integer number of harmonics “m” can be max_freq/f[1] if the maximum frequency is max_freq and the fundamental frequency is f[1]. If m=0, f[0]=f[1]/2 applies. The fundamental frequency can be f[0].
The filter calculating circuit 15 handles a middle frequency region and lower frequency regions. Using the filter coefficient generated by the separation coefficient generator 14, the filter calculating circuit 15, like a FIR filter having a multiplication and summing function, separates the target sound containing the detected pitch and the lower frequency component thereof.
A non-steady waveform, such as a consonant, is input to the high-frequency region processor 17. The audio signal is divided into a high-frequency region and medium and low frequency regions because the vowel and the consonant have different vocalization mechanisms. The steadiness is easier to determine if the vowel distributed in the medium and low frequency regions and the consonant distributed in the high-frequency region are processed in different bands. The vowel, generated by periodically vibrating the vocal chords, becomes a steady signal. The consonant is a fricative sound or a plosive sound with the vocal chords not vibrated. The waveform of the consonant tends to become random. If a random waveform is contained in the vowel portion, the random component is noise, thereby adversely affecting the pitch detection. Given the same number of samplings, a higher frequency signal is subject to waveform destruction because the repeatability thereof is poorer than that of a low frequency signal. The pitch detection becomes erratic. For this reason, the audio signal is divided into the high-frequency region and the medium to low frequency regions in the determination of the steadiness to enhance determination precision.
The high-frequency region processor 17 removes a random portion at a high frequency due to a consonant, such as a fricative sound or a plosive sound, normally not occurring in the steady portion of the target sound, namely, the vowel portion.
In voices, high-level consonants are rarely present in the vowel portion. Even if a target sound is separated from a vowel portion of the sound from a plurality of sound sources, the separated sound sounds different from the original target sound when a random high-frequency wave is contained in the vowel portion. The high-frequency region processor 17 lowers the gain for the high-frequency wave in the steady vowel portion so that the high-frequency wave may not be applied to the adder 16. The resulting output thus becomes close to the original target sound.
The output from the filter calculating circuit 15 and the output from the high-frequency region processor 17 are summed by the adder 16. The separated waveform output signal of the target sound is output from the output terminal 18.
The relationship between the stereomicrophones and the sound source (humans) is described below. Although the spacing between the stereomicrophones is not particularly specified, it typically falls within a range from several centimeters to several tens of centimeters if the system is portable. For example, the stereomicrophones mounted on a mobile apparatus, such as a camera integrated VCR (so-called video camera), are used to pick up sounds. Persons, as sound sources, are positioned at three sectors (center, left, and right), each covering several tens of degrees. In this arrangement, the target sound separation is possible regardless of what sector each person is positioned in. The wider the spacing between the stereomicrophones, the more sectors the area is segmented into, taking into consideration the propagation of sounds to the stereomicrophones. More sectors means difficulty in carrying the apparatus. Conversely, the narrower the stereomicrophone spacing, the smaller the number of sectors, (for example three sectors), but the apparatus is easy to carry.
The LPF 22 of
As shown in
The pitch detector 12 discussed with reference to
A separation coefficient generator 76 in a sound-source signal separator 191 generates a filter coefficient (separation coefficient) of a filter calculating circuit 77 in accordance with equation (5). The separation coefficient generator 76 is substantially identical to the separation coefficient generator 14 of
In this embodiment, the pitch is detected in the steady portion. The voice of a speaking single person typically expands beyond the steadiness determination portion of the mixed waveform on the time axis. The separation filter coefficient is generated each time the pitch is detected. Applying the filter to the steadiness determination area only is not considered to be an efficient process. Using the filter coefficient in the vicinity of the steadiness determination area is preferred to enhance separation performance in the time direction.
If all harmonic components of the pitch frequency are subjected to the filter to improve separation performance in the separation of the target sound, sounds other than the target sound cannot be attenuated. Using statistical data, some harmonic bands can be excluded from the summing operation.
Another embodiment of the present invention is described below with reference to
The coefficient memory and coefficient selection unit 86 of
In a speaker determination, the voice of a target person is identified from among a plurality of persons (sound sources). The speaker determiner 82 uses a signal waveform obtained through the LPF 81. The low-frequency signal obtained via the LPF 81 is a signal falling within the same low band provided by the filter bank 73 in the pitch detection. In the speaker determination, a correlation is determined based on the output from the delay correction adder 13 of FIGS. 1 and 3 and a correlation factor cor discussed with reference to equation (1) to determine whether the target person is speaking. More specifically, as shown in
An output from the speaker determiner 82 is transferred to the steadiness determiner 74 and the area designator 83. Upon determining a steady area, the steadiness determiner 74 results in time axis coordinates, and sends the coordinate data to the area designator 83. Upon determining the speaker, the area designator 83 performs a process to expand the steadiness determination area by a certain duration of time, and notifies buffers 84 and 85 of the timing of the expanded steadiness determination area for area adjustment. The buffer 84 is interposed between the filter bank 73 and the filter calculating circuit 77 in the sound-source signal separator 192, and the buffer 85 is interposed between the filter bank 73 and the high-frequency region processor 79. For a duration of time (area) that is determined as being outside the steadiness determination area by the area designator 83, gain is simply lowered. To adjust gain, the same taps as those of the filter calculating circuit 77 are prepared, and the taps other than the center one are set to be zero, and the center tap is set to be a coefficient other than one. To set 1/10, only the center tap is set to be a coefficient of 0.1.
The rest of the sound-source signal separating apparatus of
In summary, at least two sound sources are handled with respect to the stereomicrophones. To separate the sound emitted from a target person, the pitch of the steady duration of the mixed signal waveform, such as a vowel, is detected. In this case, the highness of the sound and the sex of the person are not important. The band-pass coefficient (separation filter coefficient) is determined to obtain transfer characteristics of the target sound with respect to the pitch. The sounds in the band other than a peak along the frequency axis relating to the target sound are thus attenuated. The use of the coefficient memory eliminates the need for calculation of the coefficients.
As shown in
In the sound-source signal separating apparatus, the pitch detector 12 and the delay correction adder 13 remain unchanged from the respective counterparts of
The pitch detector 12 of
The fundamental waveform generator 140 generates a fundamental waveform based on the pitch of the steady portion detected by the pitch detector 12. A waveform having a wavelength equal to an integer multiple of the pitch wavelength is used as a fundamental wave. In this embodiment, a wavelength twice the pitch wavelength is used. The fundamental waveform substituting unit 150 substitutes a repeating waveform of the fundamental waveform generated by the fundamental waveform generator 140 for the steady portion of the audio signal from the delay correction adder 13 (or from the stereophonic audio input 11). The fundamental waveform substituting unit 150 thus outputs, to an output terminal 160, a separated waveform output signal with only the audio signal from the target sound source enhanced.
The operation of the sound-source signal separating apparatus of
The pitch detector 12 detects a pitch on a per pitch detection unit basis, and determines a continuous duration throughout which the same or about the same pitch is repeated, or coordinates (sample numbers) of the steady portion of the audio signal. The sound-source signal separating apparatus of
As previously discussed, phase matching is performed by performing the delay correction process on the target sound on each microphone, and the phase corrected signals are summed to enhance the target sound. The remaining sounds are attenuated. The signal waveforms in the steady portions are summed with the period equal to the pitch detection unit. The fundamental waveform of the steady portion is thus generated.
As previously discussed with reference to
The fundamental waveform substituting unit 150 substitutes the repetition of the fundamental waveform generated by the fundamental waveform generator 140 for the pitch duration or the steady portion within the output signal waveform from the delay correction adder 13. A waveform “a” represented by the solid line in
The output waveform signal from the fundamental waveform substituting unit 150 with the pitch duration or the steady portion substituted for by the fundamental waveform is output from the output terminal 160 as a separated output waveform signal of the target sound.
The relationship between the stereomicrophone and the sound source (person) remains unchanged from the preceding embodiment, and the discussion thereof is omitted herein.
In summary, at least two sound sources are handled with respect to the stereomicrophones. To separate the sound emitted from a target person, the pitch of the steady duration of the mixed signal waveform, such as a vowel, is detected. In this case, the highness of the sound and the sex of the person are not important. Continuity is determined to be present if the error between a prior pitch and a subsequent pitch is small. The steady portions are summed and averaged. The resulting waveform is regarded as the fundamental waveform. The fundamental waveform is substituted for the original waveform. As the substituted waveform is summed more, the mixed waveform is attenuated. Only the target sound is enhanced and then separated.
The present invention is not limited to the above-referenced embodiments. The pitch detection may be performed not only with a period of two wavelengths, but with a period of four wavelengths. However, if the pitch detection period is set to be four wavelengths or more, the number of samples to be processed increases. The pitch detection period is thus appropriately set in view of these factors. The arrangement of the pitch detector is applicable to not only the above-referenced sound-source signal separating apparatus but also a variety of sound-source signal separating apparatuses for separating the sound-source signal by detecting the pitch.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
P2004-045237 | Feb 2004 | JP | national |
P2004-045238 | Feb 2004 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
3644674 | Mitchell et al. | Feb 1972 | A |
4044204 | Wolnowsky et al. | Aug 1977 | A |
5694474 | Ngo et al. | Dec 1997 | A |
6885986 | Gigi | Apr 2005 | B1 |
20040170293 | Watson et al. | Sep 2004 | A1 |
Number | Date | Country |
---|---|---|
07-028492 | Jan 1995 | JP |
10191290 | Jul 1998 | JP |
11508105 | Jul 1999 | JP |
2000-181499 | Jun 2000 | JP |
2001-222289 | Aug 2001 | JP |
2002-515609 | May 2002 | JP |
2003-515281 | Apr 2003 | JP |
2003108200 | Apr 2003 | JP |
2003280696 | Oct 2003 | JP |
WO-0113360 | Feb 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20050195990 A1 | Sep 2005 | US |