The present invention relates to a sound source separation system, a sound source separation method and an acoustic signal acquisition device which separate a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, and is available for a case where a desired speech is acquired through a portable device like a cellular phone, and an in-vehicle device like a car navigation system.
In normal voice recognition, a speech uttered from a mouth is recorded through a close-talking type microphone, and is subjected to a recognition process. On the other hand, there are lots of applications, such as interaction with a robot, operation of an in-vehicle device like a car navigation system through a speech, and creation of conference minutes, where enforcing a user to use a close-talking type microphone is unnatural. In such applications, it is desirable that a speech should be recorded through a microphone provided at a system side and should be subjected to a recognition process. In a case where speech recording and voice recognition are performed through a microphone provided away from an utterer, however, an S/N ratio is deteriorated, it is difficult to hear, and the accuracy of voice recognition is extremely reduced.
In response to such problems, there is an attempt that a desired speech is selectively recorded by controlling the directivity using a microphone array. As such devices which control the directivity using a few microphones, there are an ultra directional microphone using two single-directional microphone units (see, patent literature 1) and a recording device for multi-channel stereo using four non-directional microphones (see, patent literature 2). Further, there is a microphone device having three pairs of microphones disposed around a base microphone (see, patent literature 3).
Moreover, there is proposed a scheme called SAFIA which separates a sound by utilizing a difference between sound pressures, reaching individual microphones and caused due to differences in positional relationships between the individual microphones and a sound source (see, patent literature 4). The scheme called SAFIA is a sound separation technique which causes output signals of a plurality of fixed microphones to undergo narrow-band spectrum analysis, and for a microphone that gives the largest power for each frequency band, performs band selection of assigning a sound of that frequency band to that microphone (see
It is, however, difficult to sufficiently separate a desired speech from background noises by merely controlling the directivity through a microphone array, and to miniaturize the device. According to the ultra directional microphone disclosed in patent literature 1 and the recording device for multi-channel stereo disclosed in patent literature 2, controlling of the directivity is realized by a few microphones, miniaturization of the device may be possible, but a performance of separating a desired sound is not good enough. Further, the microphone device disclosed in patent literature 3 uses a total of seven microphones, so that it has the same problems as those of the microphone array.
According to the foregoing SAFIA disclosed in patent literature 4, band selection is performed by utilizing a difference between sound pressure levels of signals between microphones originating from positional relationships of a plurality of fixed microphones, but in performing band selection, unlike the present invention to be discussed later, directivity control appropriate for separation of a desired speech and noises is not performed, so that the separation performance thereof is not good enough. Note that only a separation process (see FIG. 8 to be discussed later) through band selection not including a generation process of a spectrum of a target subject to a separation process through band selection in the scheme called SAFIA will be hereinafter described as maximum level band selection (BS-MAX). According to the maximum level band selection (BS-MAX) performed in the SAFIA, powers of the same frequency band are compared for each frequency band between spectra subject to comparison, and band selection of assigning the largest power at individual frequency bands to a spectrum obtained by separation is performed, but according to the invention, in addition to performing such a maximum level band selection (BS-MAX), powers at the same frequency band are compared for each frequency band between spectra subject to comparison, and band selection of assigning the smallest powers at individual frequency bands to a spectrum obtained by separation is also performed, and this will be described as minimum level band selection (BS-MIN). Further, according to the present invention, not only it is determined whether or not one condition such as selecting the maximum or the minimum power is satisfied, but also it is determined whether or not a plurality of conditions are satisfied simultaneously, and this will be described as a multidimensional band selection (BS-multiD), and the case of two conditions will be described as a two-dimensional band selection (BS-2D), and the case of three conditions will be described as a three-dimensional band selection (BS-3D).
It is an object of the invention to provide a sound source separation system, a sound source separation method and an acoustic signal acquisition device which can accurately separate a target sound and a disturbance sound coming from an arbitrary direction, and enables miniaturization of a device.
<<Invention of a Sound Source Separation System>>
<Two Microphones Type Invention> Invention of a Type that Two Microphones Are Used
According to the invention, a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: two microphones disposed in such a manner as to be spaced away from each other; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound using received sound signals of the two microphones on a time domain or a frequency domain to generate at least one target sound superior signal; a target sound inferior signal generator which performs a linear combination process for suppressing the target sound using the received sound signals of the two microphones on a time domain or a frequency domain to generate at least one target sound inferior signal to be paired with the target sound superior signal; and a separator which separates the target sound and the disturbance sound from each other using a spectrum of the target sound superior signal generated by the target sound superior signal generator or obtained by a subsequent frequency analysis and a spectrum of the target sound inferior signal generated by the target sound inferior signal generator or obtained by a subsequent frequency analysis.
“A sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from” means a system that can perform sound source separation in a case where a direction in which the disturbance sound comes from is not specified, other than a case where both directions in which the target sound and the disturbance sound come from are already known, like a case where sound source separation is performed through independent component analysis (ICA). Moreover, “a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from” does not always mean all directions in 360 degrees other than the direction in which the target sound comes from, but may be an arbitrary direction in a range other than the direction in which the target sound comes from and the adjacent directions, and for example, when θ=0 degree is the direction in which the target sound comes from, only a range of θ=−90 to 90 degrees may be a separation target range, and in short, the disturbance sound comes from an unspecified direction. The same is true on other inventions.
“Performing a linear combination process for emphasizing the target sound using received sound signals of the two microphones on a time domain or a frequency domain” and “performing a linear combination process for suppressing the target sound using the received sound signals of the two microphones on a time domain or a frequency domain” include (1) performing linear combination processes for emphasizing and suppressing the target sound using the received sound signals of the two microphones as signals on a time domain, and generating a target sound superior signal and a target sound inferior signal as signals on a time domain, and (2) performing frequency analysis on the received sound signals (signals on a time domain) of the two microphones to make signals on a frequency domain (spectra), performing linear combination processes for emphasizing and suppressing the target sound, and generating a target sound superior signal and a target sound inferior signal as signals (spectra) on a frequency domain. The same is true on other inventions.
Further, when the target sound superior signal generated by the target sound superior signal generator is a signal on a frequency domain, “a spectrum of the target sound superior signal generated by the target sound superior signal generator or obtained by a subsequent frequency analysis” is that signal itself and is a signal on a frequency domain obtained by frequency analysis of that signal when the target sound superior signal obtained by the target sound superior signal generator is a signal on a frequency domain. The same is true on “a spectrum of the target sound inferior signal generated by the target sound inferior signal generator or obtained by a subsequent frequency analysis”. The same is true on other inventions.
The “linear combination process” includes a process of acquiring a sum or a difference, and a process of multiplying a coefficient. The same is true on other inventions.
“Separating the target sound and the disturbance sound” using “the spectrum of the target sound superior signal” and “the spectrum of the target sound inferior signal” includes, for example, a process for each frequency band, i.e., a process of using both powers of the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal at the same frequency band. The same is true on other inventions. The same process can be performed when amplitude values at the same frequency band are used, so that a process using powers represents both processes in the specification.
“The target sound” and “the disturbance sound” are mainly speeches of a human, but include, for example, a music, an animal call, natural sounds, such as a thunder, a ripping wave, and a murmur, various sound effects, such as a buzzer, an alarm sound, a honker, and an alarm whistle, and various mechanical sounds, such as a sound from a road, running sound of a vehicle, a takeoff sound of an airplane, and an operational sound of a machine. The same is true on other inventions.
According to the sound source separation system of such an invention, linear combination processes of emphasizing the target sound and suppressing the target sound are performed on a time domain or a frequency domain using the received sound signals of the two microphones to generate the target sound superior signal and the target sound inferior signal, so that controlling of the directivity appropriate for separation of the target sound and the disturbance sound becomes possible.
Because a separation process is performed using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal both generated by controlling the directivity, the target sound and the disturbance sound are precisely separated from each other. Accordingly, in comparison with the case of patent literature 4 where band selection is performed utilizing a sound-pressure difference of signals between the microphones originating from the positional relationships of the plurality of microphones, the separation performance can be improved.
The directivity is controlled by performing linear combination processes of emphasizing and suppressing the target sound, so that a sound coming from an unspecific direction can be separated unlike the case of a separation process utilizing independent component analysis (ICA) which separates only a sound coming from a specific direction.
The number of microphones to be used is two, and sound source separation can be realized by a few microphones, so that miniaturization of a device becomes possible, thereby achieving the foregoing object.
<Invention of a Type that Two Microphones are Disposed in Parallel with a Direction in which the Target Sound Comes from> Invention of a Type that Two Microphones are Disposed in the Direction in which the Target Sound Comes from or in an Approximately Same Direction as that Direction
To be more precise, it is possible to employ the following structure. That is, in the foregoing sound source separation system, the two microphones may be disposed side by side in the direction in which the target sound comes from or an approximately same direction as that direction, the target sound superior signal generator may acquire a difference between a received sound signal of one microphone disposed near a sound source of the target sound in the two microphones and a received sound signal of an other microphone disposed away from the sound source of the target sound on a time domain or a frequency domain, and the target sound inferior signal generator may acquire a difference between the received sound signal of the one microphone undergone a delayed process and the received sound signal of the other microphone on a time domain or a frequency domain (e.g., the case shown in
“Acquiring a difference between the received sound signal of the one microphone undergone a delayed process and the received sound signal of the other microphone on a time domain or a frequency domain” includes (1) after performing a delayed process on the received sound signal (signal on a time domain) of the one microphone on a time domain, acquiring a difference between the signal (signal on a time domain) undergone a delayed process and the received sound signal (signal on time domain) of the other microphone, and generating a signal on a time domain, (2) performing frequency analysis on both received sound signals (signals on a time domain) of the one and other microphones to generate signals (spectra) on a frequency domain, after performing a delayed process on the spectrum of the received sound signal of the one microphone on a frequency domain, acquiring a difference between the spectrum undergone the delayed process and the spectrum of the received sound signal of the other microphone, and generating a signal on a frequency domain, and (3) performing a delayed process on a received sound signal (signal on a time domain) of the one microphone on a time domain, performing frequency analysis on the signal undergone a delayed process (signal on a time domain) to generate a signal on a frequency domain (spectrum), and after performing frequency analysis on the received sound signal (signal on a time domain) of the other microphone to generate a signal on a frequency domain (spectrum), acquiring a difference between the spectrum of the received sound signal of the one microphone undergone a delayed process and the spectrum of the received sound signal of the other microphone, and generating a signal on a frequency domain. The same is true on other inventions.
In a case where the two microphones are disposed side by side in the direction in which the target sound comes from or in an approximately same direction as that direction, the separator may compare powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal for each frequency band, and perform band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation.
“Assigning power to a spectrum obtained by separation” means that when the power of the spectrum of the target sound superior signal is large, for the frequency band thereof, the larger power is assigned to the spectrum of the target sound, and when the power of the spectrum of the target sound inferior signal is large, for the frequency band thereof, the larger power is assigned to the spectrum of the disturbance sound (see
In a case where the two microphones are disposed side by side in the direction in which the target sound comes from or in an approximately same direction as that direction, the separator may perform spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band.
The “coefficient” is a coefficient depending on, for example, the largeness of a difference between the power of the target sound superior signal and the power of the target sound inferior signal. The same is true on other inventions when spectral subtraction is performed.
In a case where the two microphones are disposed side by side in the direction in which the target sound comes from or in an approximately same direction as that direction, it is preferable that a target sound to be separated should be changed over to a target sound in a normal mode and a target sound in a changeover mode coming from a direction opposite to the normal mode target sound, the one microphone should be disposed near a sound source of the normal mode target sound and the other microphone should be disposed away from the sound source of the normal mode target sound in the normal mode, the other microphone should be disposed near a sound source of the changeover mode target sound and the one microphone should be disposed away from the sound source of the changeover mode target sound in the changeover mode, and the target sound inferior signal generator should comprise: a first target sound inferior signal generation unit which acquires a difference between the received sound signal of the one microphone undergone a delayed process and the received sound signal of the other microphone on a time domain or a frequency domain; a second target sound inferior signal generation unit which acquires a difference between the received sound signal of the other microphone undergone a delayed process and the received sound signal of the one microphone on a time domain or a frequency domain; and a changeover unit which changes over a first target sound inferior signal for the normal mode generated by the first target sound inferior signal generation unit and a second target sound inferior signal for the changeover mode generated by the second target sound inferior signal generation unit as the target sound inferior signal to be processed by the separator.
In a case where changeover of a mode between the normal mode and the changeover mode is possible, it is possible to change over the direction of the target sound to be acquired without changing the position of the two microphones, thereby improving the usability of the system.
In a case where the two microphones are disposed side by side in the direction in which the target sound comes from or in an approximately same direction as that direction, the target sound inferior signal generator may apply a time delay which is a same as or an approximately same as a sound wave propagation time between the two microphones to the received sound of the microphone subject to the delayed process on a time domain or a frequency domain (see,
In a case where it is structured in such a way that a time delay which is the same as or an approximately same as the sound wave propagation time between the two microphones is applied, a directivity such that the amplitude value of the target sound inferior signal becomes zero can be created in the direction in which the target sound comes from (in the case of
In a case where the two microphones are disposed side by side in the direction in which the target sound comes from or in an approximately same direction as that direction, the target sound inferior signal generator may apply a time delay which is shorter than a sound wave propagation time between the two microphones to the received sound of the microphone subject to the delayed process on a time domain or a frequency domain (see,
In a case where it is structured in such a way that a time delay which is shorter than the sound wave propagation time between the two microphones is applied, a directivity that expands a range where the amplitude value of the target sound inferior signal is suppressed can be created in the vicinity of the direction in which the target sound comes from (in the case of
In a case where the two microphones are disposed side by side in the direction in which the target sound comes from or in an approximately same direction as that direction, it is possible to employ a structure such that the two microphones are respectively provided at a corresponding portion of a front face of a portable device at which an operation unit and/or a screen display unit is provided and a corresponding portion of a rear face opposite thereto.
The “portable device” includes, for example, a cellular phone (including a PHS), or a portable information terminal (PDA).
A “corresponding portion” means a directly opposite portion as viewed from each other.
In a case where the two microphones are respectively provided at the front and rear face of the portable device, the portable device may be a foldable cellular phone which is folded and closed when not in use and opened when in use, and it is possible to employ a structure such that a clearance between the two disposed microphones changes in accordance with an opening/closing operation of the cellular phone, and a clearance when the cellular phone is opened is larger than a clearance when the cellular phone is closed.
“Changing in accordance with an opening/closing operation” includes, for example, causing the microphone provided at the front face side to be retained when the portable device is closed, and causing the microphone to automatically protrude outwardly when opened, or causing the microphone provided at the rear face side to be retained when closed, and causing that microphone to automatically protrude outwardly when opened, and the combination thereof. For example, the microphone provided at the front face side of a cellular phone is urged outwardly by an elastic member, such as a spring or a rubber, and when the cellular phone is folded and closed, the microphone is pressed by an opposing surface (a surface constituting a face and becoming an opposing surface when folded) of the cellular phone, the elastic member is compressed and the microphone is retained, and when the cellular phone is opened, the microphone is caused to protrude outwardly by force of the elastic member returning to an original state, and such an operation may be realized by various mechanisms using a gear, cam, a belt, and a linkage, a mechanism using an air pressure or an oil pressure, and an electrical mechanism using a motor or the like. The same is true on other inventions that the microphones are disposed on both front and rear faces.
In a case where the two microphones are respectively provided at the front and rear face of the portable device, it is possible to employ a structure such that the two microphones are provided at end portions of both sides of a rotation support member attached in such a manner as to be rotatable around an axis parallel to the front/rear face of the cellular phone, and the rotation support member is retained in a state parallel to or approximately parallel to the front/rear surface of the cellular phone when not in use, and becomes orthogonal or approximately orthogonal to the front/rear face of the cellular phone when in use (e.g., the case shown in
As mentioned above, a mode can be changed over to the normal mode and the changeover mode when the target sound inferior signal generator is structured in such a manner as to include the first target sound inferior signal generator and the second target sound inferior signal generator and a changeover unit (e.g., the case shown in
When the foregoing structure is taken as the normal mode, the changeover mode can be structured as follows. That is, the target sound superior signal generator may acquire a difference between the received sound signal of the one microphone undergone a delayed process and the received sound signal of the other microphone on a time domain or a frequency domain (executing a process corresponding to a process executed by the first target sound inferior signal generator), and the target sound inferior signal generator may acquire a difference between the received sound signal of the other microphone undergone a delayed process and the received sound signal of the one microphone on a time domain or a frequency domain (executing a process corresponding to a process executed by the second target sound inferior signal generator), and in this case, it is preferable that at least one difference in the difference obtained by the target sound superior signal generator and the difference obtained by the target sound inferior signal generator should be multiplied by a coefficient, and the difference obtained by the target sound superior signal generator should be set relatively smaller than the difference obtained by the target sound inferior signal generator (e.g., the case shown in
<Invention of a Type that the Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and Sum/Difference are Both Acquired> Invention of a Type that the Two Microphones are Disposed Side by Side in a Direction Orthogonal to or Approximately Orthogonal to the Direction in which the Target Sound Comes from, and a Sum and Difference of Received Sound Signals are Used
In addition to the structure that the two microphones are disposed side by side in the direction in which the target sound comes from or in an approximately same direction, the following structure can be employed. That is, in the foregoing sound source separation system, the two microphones are disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, the target sound superior signal generator acquires a sum of the received sound signals of the two microphones on a time domain or a frequency domain, and the target sound inferior signal generator acquires a difference between the received sound signals of the two microphones on a time domain or a frequency domain (e.g., the case shown in
In a case where the two microphones are disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, and a sum of the received sound signals of the two microphones are acquired to generate the target sound superior signal, the separator may multiply at least one spectrum in the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal by a coefficient depending on a frequency, compare powers of the spectra at a same frequency band, and perform band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation (maximum level band selection: BS-MAX).
In a case where the two microphones are disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, and a sum of the received sound signals of the two microphones are acquired to generate the target sound superior signal, the separator may perform spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band.
<Invention of a Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired> Invention of a Type that the Two Microphones are Disposed Side by Side in a Direction Orthogonal to or Approximately Orthogonal to the Direction in which the Target Sound Comes from and a Difference Between the Received Sound Signals is Used but a Sum Thereof is Not Used
In addition to a structure that the two microphones are disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, and a sum of the received sound signals of the two microphones are acquired to generate the target sound superior signal, the following structure can be employed. That is, in the foregoing sound source separation system, the two microphones may be disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, the target sound superior signal generator may comprise: a first target sound superior signal generation unit which acquires a difference between the received sound signal of the one microphone in the two microphones and the received signal of the other microphone undergone a delayed process on a time domain or a frequency domain to generate a first target sound superior signal; and a second target sound superior signal generation unit which acquires a difference between the received sound signal of the other microphone and the received sound signal of the one microphone undergone a delayed process on a time domain or a frequency domain to generate a second target sound superior signal, and the target sound inferior signal generator acquires a difference between the received sound signals of the two microphones on a time domain or a frequency domain (e.g., the case shown in
In a case where the two microphones are disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, and the two first and second target sound superior signals are generated, the separator may comprise: a first separation unit which compares powers at a same frequency band between the spectrum of the first target sound superior signal and the spectrum of the target sound inferior signal for each frequency band, and performs band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation; a second separation unit which compares powers at a same frequency band between the spectrum of the second target sound superior signal and the spectrum of the target sound inferior signal for each frequency band, and performs band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation; and an integration unit which performs a spectrum integration process of adding those powers of the spectra for each frequency band or comparing the powers for each frequency band and assigning inferior power to a spectrum of the target sound, using a spectrum of one sound including the target sound separated by the first separation unit and a spectrum of an other sound including the target sound separated by the second separation unit.
In a case where the two microphones are disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, and the two first and second target sound superior signals are generated, the separator may comprise: a first separation unit that performs spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the first target sound superior signal at a same frequency band; a second separation unit that performs spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the second target sound superior signal of the same frequency band; and an integration unit which performs a spectrum integration process of adding those powers of the spectra for each frequency band or comparing the powers for each frequency band and assigning inferior power to a spectrum of the target sound, using a spectrum of one sound including the target sound separated by the first separation unit and a spectrum of an other sound including the target sound separated by the second separation unit.
<Invention of Three Microphones/Two Combinations Type> Invention of a Type that Two Combinations of Microphones are Made Using Three Microphones
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound on a time domain or a frequency domain, using received sound signals of the two first and second microphones to generate at least one target sound superior signal; a target sound inferior signal generator which performs a linear combination process for suppressing the target sound on a time domain or a frequency domain, using received sound signals of the two first and third microphones to generate at least a target sound inferior signal to be paired with the target sound superior signal; and a separator that separates the target sound and the disturbance sound from each other using a spectrum of the target sound superior signal generated by the target sound superior signal generator or obtained by a subsequent frequency analysis and a spectrum of the target sound inferior signal generated by the target sound inferior signal generator or obtained by a subsequent frequency analysis.
It is preferable that the “triangle” should be a right-angle isosceles triangle, an approximately right-angle isosceles triangle, or a right-angle triangle or approximately right-angle triangle other than an isosceles triangle, but may be a triangle other than a right-angle triangle, approximately right-angle triangle.
According to such a sound source separation system of the invention (e.g., the case shown in
A separation process is performed using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal both generated through directivity control in this manner, this enables precise separation of the target sound and the disturbance sound. Accordingly, in comparison with a case of patent literature 4 where band selection is performed using a sound pressure difference of signals between the microphones originating from the fixed positional relationships of the plurality of microphones, the separation performance can be improved.
Because the directivity is controlled by performing the linear combination processes for emphasizing and suppressing the target sound, unlike the case of the separation process using independent component analysis (ICA), not only a sound coming from a specified direction but also a sound coming from an unspecified direction are separated.
Further, the number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
In the foregoing sound source separation system, it is desirable that the first and second microphones should be disposed side by side in a direction in which the target sound comes from or in an approximately same direction as that direction, the first and third microphones should be disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, the target sound superior signal generator should acquire a difference between the received sound signal of the first microphone and the received sound signal of the second microphone on a time domain or a frequency domain, and the target sound inferior signal generator should acquire a difference between the received sound signal of the first microphone and the received sound signal of the third microphone on a time domain or a frequency domain.
In the foregoing sound source separation system, the separator may compare powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal for each frequency band, and perform band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation.
Further, in the foregoing sound source separation system, the separator may perform spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band.
<Invention of Four Microphones/Two Combinations Type> Invention of a Type that Two Combinations of Microphones are Made Using Four Microphones
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of four microphones, respective two microphones being disposed side by side as to be spaced away in a first direction and a second direction intersecting with each other; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound on a time domain or a frequency domain using received sound signals of the two microphones disposed side by side in the first direction in the four microphones to generate at least one target sound superior signal; a target sound inferior signal generator which performs a linear combination process for suppressing the target sound on a time domain or a frequency domain using received sound signals of the two microphones disposed side by side in the second direction in the four microphones to generate at least one target sound inferior signal; and a separator which separates the target sound and the disturbance sound from each other using a spectrum of the target sound superior signal generated by the target sound superior signal generator or obtained by a subsequent frequency analysis and a spectrum of the target sound inferior signal generated by the target sound inferior signal generator or obtained by a subsequent frequency analysis.
A case where “the first and second directions intersecting with each other” includes not only a case where the first and second directions intersect with each other at a right angle, but also a case where those directions intersect with each other at an angle other than 90 degree.
In such a sound source separation system of the invention (e.g., the case shown in
A separation process is performed using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal both generated through directivity control in this manner, this enables precise separation of the target sound and the disturbance sound. Accordingly, in comparison with a case of patent literature 4 where band selection is performed using a sound pressure difference of signals between the microphones originating from the fixed positional relationships of the plurality of microphones, the separation performance can be improved.
Because the directivity is controlled by performing the linear combination processes for emphasizing and suppressing the target sound, unlike the case of the separation process using independent component analysis (ICA), not only a sound coming from a specified direction but also a sound coming from an unspecified direction are separated.
Further, the number of microphones to be used is four, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
In the foregoing sound source separation system, it is desirable that the first direction should be the direction in which the target sound comes from or an approximately same direction as that direction, the second direction should be orthogonal to or approximately orthogonal to the direction in which the target sound comes from, the target sound superior signal generator should acquire a difference between the received sound signals of the two microphones disposed side by side in the first direction on a time domain or a frequency domain, and the target sound inferior signal generator should acquire a difference between the received sound signals of the two microphones disposed side by side in the second direction on a time domain or a frequency domain.
In the foregoing sound source separation system, the separator may compare powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal for each frequency band, and perform band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation.
In the foregoing sound source separation system, the separator may perform spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band.
<Invention of Four Microphones/Three Combinations Type> Invention of a Type that Three Combinations of Microphones are Made Using Four Microphones
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of four first, second, third and fourth microphones disposed at respective vertices of a rectangle; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound on a time domain or a frequency domain using received sound signals of the two first and second microphones to generate a target sound superior signal; a first target sound inferior signal generator which performs a linear combination process for suppressing the target sound on a time domain or a frequency domain using received sound signals of the two first and third microphones to generate a first target sound inferior signal; a second target sound inferior signal generator which performs a linear combination process for suppressing the target sound on a time domain or a frequency domain using received sound signals of the two first and fourth microphones to generate a second target sound inferior signal; a first separator which separates one sound including the target sound, using a spectrum of the target sound superior signal generated by the target sound superior signal generator or obtained by a successive frequency analysis, and a spectrum of the first target sound inferior signal generated by the first target sound inferior signal generator or obtained by a successive frequency analysis; a second separator which separates an other sound including the target sound, using the spectrum of the target sound superior signal generated by the target sound superior signal generator or obtained by a successive frequency analysis, and a spectrum of the second target sound inferior signal generated by the second target sound inferior signal generator or obtained by a successive frequency analysis; and an integration unit which performs a spectrum integration process of adding those powers of the spectra for each frequency band or comparing the powers for each frequency band and assigning inferior power to a spectrum of the target sound, using a spectrum of the one sound including the target sound separated by the first separation unit and a spectrum of the other sound including the target sound separated by the second separation unit.
It is preferable that the “rectangle” should be a rhomboid, an approximately rhomboid, a square, an approximately square, or a rectangle other than those and formed in a line-symmetric shape around a diagonal line, but may be a rectangle not formed in a line-symmetric shape around a diagonal line.
In such a sound source separation system of the invention (e.g., the case shown in
A separation process is performed using the spectrum of the target sound superior signal and the spectra of the first and second target sound inferior signals all generated through directivity control in this manner, this enables precise separation of the target sound and the disturbance sound. Accordingly, in comparison with a case of patent literature 4 where band selection is performed using a sound pressure difference of signals between the microphones originating from the fixed positional relationships of the plurality of microphones, the separation performance can be improved.
Because the directivity is controlled by performing the linear combination processes for emphasizing and suppressing the target sound, unlike the case of the separation process using independent component analysis (ICA), not only a sound coming from a specified direction but also a sound coming from an unspecified direction are separated.
Further, the number of microphones to be used is four, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
In the foregoing sound source separation system, it is desirable that the first and second microphones should be disposed side by side in a direction in which the target sound comes from or in an approximately same direction as that direction, the third microphone should be disposed at one end of a line interconnecting the first microphone and the second microphone, the fourth microphone should be disposed at an other end of the line interconnecting the first microphone and the second microphone, the target sound superior signal generator should acquire a difference between received sound signals of the first and second microphones on a time domain or a frequency domain, the first target sound inferior signal generator should acquire a difference between received sound signals of the first and third microphones on a time domain or a frequency domain, and the second target sound inferior signal generator should acquire a difference between received sound signals of the first and fourth microphones on a time domain or a frequency domain.
In the foregoing sound source separation system, the first separator may compare powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the first target sound inferior signal for each frequency band, and perform band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation, and the second separator may compare powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the second target sound inferior signal for each frequency band, and perform band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation.
Further, in the foregoing sound source separation system, the first separator may perform spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the first target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band, and the second separator may perform spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the second target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band.
<Invention of Three Microphones/Three Combinations Type> Invention of a Type that Three Combinations of Microphones are Made Using Three Microphones
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound on a time domain or a frequency domain, using received sound signals of the three microphones to generate a target sound superior signal; a first target sound inferior signal generator which performs a linear combination process for suppressing the target sound on a time domain or a frequency domain, using received sound signals of the two first and second microphones to generate a first target sound inferior signal to be paired with the target sound superior signal; a second target sound inferior signal generator which performs a linear combination process for suppressing the target sound on a time domain or a frequency domain, using received sound signals of the two first and third microphones to generate a second target sound inferior signal to be paired with the target sound superior signal; a first separator which separates one sound including the target sound, using a spectrum of the target sound superior signal generated by the target sound superior signal generator or obtained by a successive frequency analysis, and a spectrum of the first target sound inferior signal generated by the first target sound inferior signal generator or obtained by a successive frequency analysis; a second separator which separates an other sound including the target sound, using the spectrum of the target sound superior signal generated by the target sound superior signal generator or obtained by a successive frequency analysis, and a spectrum of the second target sound inferior signal generated by the second target sound inferior signal generator or obtained by a successive frequency analysis; and an integration unit which performs a spectrum integration process of adding those powers of the spectra for each frequency band or comparing the powers for each frequency band and assigning inferior power to a spectrum of the target sound, using a spectrum of the one sound including the target sound separated by the first separation unit and a spectrum of the other sound including the target sound separated by the second separation unit.
It is preferable that the “triangle” should be a right-angle isosceles triangle, an approximately right-angle isosceles triangle, or an isosceles triangle, an approximately isosceles triangle other than those triangles, but may be a triangle other than an isosceles triangle, an approximately isosceles triangle.
In such a sound source separation system of the invention (e.g., the case shown in
A separation process is performed using the spectrum of the target sound superior signal and the spectra of the first and second target sound inferior signals all generated through directivity control in this manner, this enables precise separation of the target sound and the disturbance sound. Accordingly, in comparison with a case of patent literature 4 where band selection is performed using a sound pressure difference of signals between the microphones originating from the fixed positional relationships of the plurality of microphones, the separation performance can be improved.
Because the directivity is controlled by performing the linear combination processes for emphasizing and suppressing the target sound, unlike the case of the separation process using independent component analysis (ICA), not only a sound coming from a specified direction but also a sound coming from an unspecified direction are separated.
Further, the number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
In the foregoing sound source separation system, it is desirable that the first and second microphones should be disposed side by side in a direction inclined with respect to a direction in which the target sound comes from, the first and third microphones should be disposed side by side in a direction inclined in a opposite direction to the inclined direction of the first and second microphones, the target sound superior signal generator should acquire a difference between the received sound signal of the first microphone and a sum, obtained by multiplying received sound signals of the second and third microphones by a same or different proportionality coefficients, on a time domain or a frequency domain, the first target sound inferior signal generator should acquire a difference between the received sound signals of the first and second microphones on a time domain or a frequency domain, and the second target sound inferior signal generator should acquire a difference between the received sound signals of the first and third microphones on a time domain or a frequency domain.
The “sum obtained by multiplying received sound signals of the second and third microphones by a same or different proportionality coefficients on a time domain or a frequency domain” is a sum obtained by multiplying the received sound signals of the second and third microphones by the same proportionality coefficient when the disposed positions of the three microphones form an isosceles triangle with the position of the first microphone serving as a vertex, or a sum obtained by multiplying the received sound signals of the second and third microphones by different coefficients, respectively, when the disposed positions of the microphones do not form an isosceles triangle.
In the foregoing sound source separation system, the first separator may compare powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the first target sound inferior signal for each frequency band, and perform band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation, and the second separator may compare powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the second target sound inferior signal for each frequency band, and perform band selection (maximum level band selection: BS-MAX) of assigning larger powers at the individual frequency bands to a spectrum obtained by separation.
Further, in the foregoing sound source separation system, the first separator may perform spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the first target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band, and the second separator may perform spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the second target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band.
<Invention of Two Sensitive Regions Integration Type that Three Microphones are Disposed on a Plane Orthogonal to the Direction in which the Target Sound Comes from> Invention of a Type that Three Microphones are Disposed on a Plane Orthogonal to or Approximately Orthogonal to the Direction in which the Target Sound Comes from, and Two Sensitive Regions are Integrated
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle on a plane orthogonal to or approximately orthogonal to a direction in which the target sound comes from; a first sensitive region formation signal generator that uses received sound signals of the two first and second microphones to generate a spectrum of a first sensitive region formation signal which forms a first sensitive region along a plane orthogonal to a line interconnecting those microphones; a second sensitive region formation signal generator that uses received sound signals of the two second and third microphones to generate a spectrum of a second sensitive region formation signal which forms a second sensitive region along a plane orthogonal to a line interconnecting those microphones; and a sensitive region integration unit that forms a sensitive region for separating the target sound at a common part of the first sensitive region and the second sensitive region using the spectrum of the first sensitive region formation signal generated by the first sensitive region formation signal generator and the spectrum of the second sensitive region formation signal generated by the second sensitive region formation signal generator.
According to such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
<Invention of Two Sensitive Regions Integration Type that Three Microphones are Disposed on a Plane Orthogonal to the Direction in which the Target Sound Comes from, and a Process Including the Process of the Invention of the Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired is Performed>
In the foregoing sound source separation system (invention of the two sensitive regions integration type that the three microphones are disposed on a plane orthogonal to the direction in which the target sound comes from), the first sensitive region formation signal generator may perform a same process as that of the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from), using the received sound signals of the two first and second microphones, and generate a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from), as the spectrum of the first sensitive region formation signal, the second sensitive region formation signal generator may perform a same process as that of the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from), using the received sound signals of the two second and third microphones, and generate a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from), as the spectrum of the second sensitive region formation signal, and the sensitive region integration unit may perform a spectrum integration process of comparing the powers of the spectra for each frequency band and assigning inferior power to a spectrum of the target sound, using the spectrum of the first sensitive region formation signal generated by the first sensitive region formation signal generator and the spectrum of the second sensitive region formation signal generated by the second sensitive region formation signal generator (the case shown in
In the foregoing sound source separation system (invention of the two sensitive regions integration type that the three microphones are disposed on a plane orthogonal to the direction in which the target sound comes from), the first sensitive region formation signal generator may perform a same process as that of the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from), using the received sound signals of the two first and second microphones, and generate a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from), as the spectrum of the first sensitive region formation signal, the second sensitive region formation signal generator may perform same processes as those of the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from) other than a process of the integration unit of the separator, using the received sound signals of the two second and third microphones, and have a sensitive region limitation unit which limits the second sensitive region to either of a region at the second microphone side and a region at the third microphone side, instead of the integration unit of the separator which constitutes the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from), when the first target sound superior signal generator performs a delayed process on the received sound signal of the second microphone and the second target sound superior signal generator performs a delayed process on the received sound signal of the third microphone, the first target sound superior signal generator and the second target sound superior signal generator constituting the sound source separation system (invention of the type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from), the sensitive region limitation unit may compare powers at a same frequency band between the spectrum of one sound including the target sound separated by the first separation unit and the spectrum of an other sound including the target sound separated by the second separation unit for each frequency band, perform band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of one sound including the target sound separated by the first separation unit for a frequency band where power of the spectrum of the one sound including the target sound separated by the first separation unit is smaller than power of a spectrum of an other sound including the target sound separated by the second separation unit to generate the spectrum of the second sensitive region formation signal which forms the second sensitive region limited to the region at the second microphone side, or perform band selection (minimum level band selection: BS-MIN) of assigning smaller power to the spectrum of the other sound including the target sound separated by the second separation unit for a frequency band where power of the spectrum of the other sound including the target sound separated by the second separation unit is smaller than power of the spectrum of the one sound including the target sound separated by the first separation unit to generate a spectrum of the second sensitive region formation signal which forms the second sensitive region limited to the region at the third microphone side, and the sensitive region integration unit may perform a spectrum integration process of comparing the powers of the spectra for each frequency band, using the spectrum of the first sensitive region formation signal generated by the first sensitive region formation signal generator and the spectrum of the second sensitive region formation signal generated by the second sensitive region formation signal generator, and assigning inferior power to a spectrum of the target sound (the case shown in
The foregoing sensitive region limitation unit may be able to change over limitation of the second sensitive region to either of the region at the second microphone side and the region at the third microphone side (see,
<Invention of Three Sensitive Regions Integration Type that Three Microphones are Disposed on a Plane Orthogonal to the Direction in which the Target Sound Comes from> Invention of a Type that Three Microphones are Disposed on a Plane Orthogonal to or Approximately Orthogonal to the Direction in which the Target Sound Comes from and Three Sensitive Regions are Integrated
Moreover, according to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle perpendicular to or approximately perpendicular to a direction in which the target sound comes from; a first sensitive region formation signal generator that generates a spectrum of a first sensitive region formation signal which forms a first sensitive region along a plane orthogonal to a line interconnecting the first and second microphones, using received sound signals of those two microphones; a second sensitive region formation signal generator that generates a spectrum of a second sensitive region formation signal which forms a second sensitive region along a plane orthogonal to a line interconnecting the second and third microphones, using received sound signals of those two microphones; a third sensitive region formation signal generator that generates a spectrum of a third sensitive region formation signal which forms a third sensitive region along a plane orthogonal to a line interconnecting the first and third microphones, using received sound signals of those two microphones; and a sensitive region integration unit that forms a sensitive region for separating the target sound at a common part of the first sensitive region, the second sensitive region and the third sensitive region, using the spectrum of the first sensitive region formation signal generated by the first sensitive region formation signal generator, the spectrum of the second sensitive region formation signal generated by the second sensitive region formation signal generator, and the spectrum of the third sensitive region formation signal generated by the third sensitive region formation signal generator.
According to such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
<Invention of Three Sensitive Regions Integration Type that Three Microphones are Disposed on a Plane Orthogonal to the Direction in which the Target Sound Comes from, and a Process Including the Process of the Invention of the Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired is Performed>
In the foregoing sound source separation system (invention of three sensitive regions integration type that three microphones are disposed on a plane orthogonal to the direction in which the target sound comes from), the first sensitive region formation signal generator may perform a same process as that of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), using the received sound signals of the two first and second microphones, and generate a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), as the spectrum of the first sensitive region formation signal, the second sensitive region formation signal generator may perform a same process as that of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), using the received sound signals of the two second and third microphones, and generate a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), as the spectrum of the second sensitive region formation signal, the third sensitive region formation signal generator may perform a same process as that of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), using the received sound signals of the two first and third microphones, and generate a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), as the spectrum of the third sensitive region formation signal, and the sensitive region integration unit may perform a spectrum integration process of comparing the powers of the spectra for each frequency band and assigning most inferior power to a spectrum of the target sound, using the spectrum of the first sensitive region formation signal generated by the first sensitive region formation signal generator, the spectrum of the second sensitive region formation signal generated by the second sensitive region formation signal generator and the spectrum of the third sensitive region formation signal generated by the third sensitive region formation signal generator.
In the foregoing sound source separation system (invention of three sensitive regions integration type that three microphones are disposed on a plane orthogonal to the direction in which the target sound comes from), the first sensitive region formation signal generator may perform a same process as that of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), using the received sound signals of the two first and second microphones, and generate a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), as the spectrum of the first sensitive region formation signal, the second sensitive region formation signal generator may perform same processes as those of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) other than a process of the integration unit of the separator, using the received sound signals of the two second and third microphones, and have a sensitive region limitation unit which limits the second sensitive region to either of a region at the second microphone side and a region at the third microphone side, instead of the integration unit of the separator which constitutes the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), when the first target sound superior signal generator performs a delayed process on the received sound signal of the second microphone and the second target sound superior signal generator performs a delayed process on the received sound signal of the third microphone, the first target sound superior signal generator and the second target sound superior signal generator constituting the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), the sensitive region limitation unit of the second sensitive region formation signal generator may compare powers at a same frequency band between the spectrum of one sound including the target sound separated by the first separation unit and the spectrum of an other sound including the target sound separated by the second separation unit for each frequency band, perform band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of one sound including the target sound separated by the first separation unit for a frequency band where power of the spectrum of the one sound including the target sound separated by the first separation unit is smaller than power of a spectrum of an other sound including the target sound separated by the second separation unit to generate the spectrum of the second sensitive region formation signal which forms the second sensitive region limited to the region at the second microphone side, or perform band selection (minimum level band selection: BS-MIN) of assigning smaller power to the spectrum of the other sound including the target sound separated by the second separation unit for a frequency band where power of the spectrum of the other sound including the target sound separated by the second separation unit is smaller than power of the spectrum of the one sound including the target sound separated by the first separation unit to generate a spectrum of the second sensitive region formation signal which forms the second sensitive region limited to the region at the third microphone side, the third sensitive region formation signal generator may perform same processes as those of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) other than a process of the integration unit of the separator, using the received sound signals of the two first and third microphones, and have a sensitive region limitation unit which limits the third sensitive region to either of a region at the first microphone side and a region at the third microphone side, instead of the integration unit of the separator which constitutes the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), when the first target sound superior signal generator performs a delayed process on the received sound signal of the first microphone and the second target sound superior signal generator performs a delayed process on the received sound signal of the third microphone, the first target sound superior signal generator and the second target sound superior signal generator constituting the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), the sensitive region limitation unit of the third sensitive region formation signal generator may compare powers at a same frequency band between the spectrum of one sound including the target sound separated by the first separation unit and the spectrum of an other sound including the target sound separated by the second separation unit for each frequency band, perform band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of one sound including the target sound separated by the first separation unit for a frequency band where power of the spectrum of the one sound including the target sound separated by the first separation unit is smaller than power of a spectrum of an other sound including the target sound separated by the second separation unit to generate the spectrum of the third sensitive region formation signal which forms the third sensitive region limited to the region at the first microphone side, or perform band selection (minimum level band selection: BS-MIN) of assigning smaller power to the spectrum of the other sound including the target sound separated by the second separation unit for a frequency band where power of the spectrum of the other sound including the target sound separated by the second separation unit is smaller than power of the spectrum of the one sound including the target sound separated by the first separation unit to generate a spectrum of the third sensitive region formation signal which forms the third sensitive region limited to the region at the third microphone side, and the sensitive region integration unit may perform a spectrum integration process of comparing the powers of the spectra for each frequency band and assigning most inferior power to a spectrum of the target sound, using the spectrum of the first sensitive region formation signal generated by the first sensitive region formation signal generator, the spectrum of the second sensitive region formation signal generated by the second sensitive region formation signal generator and the spectrum of the third sensitive region formation signal generated by the third sensitive region formation signal generator (e.g., the case shown in
<Invention of Three Microphones Type that a Control Signal is Generated Using Two Signals, an Opposite Disturbance Sound is Suppressed, and a Process Including the Process of the Invention of the Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired is Performed>
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle; an orthogonal-disturbance-sound-suppressing-signal generator that generates an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; an opposite-disturbance-sound-suppressing-control-signal generator that generates a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two second and third microphones; and an opposite-disturbance-sound suppressing unit that compares powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator and a spectrum of the control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performs band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein the orthogonal-disturbance-sound-suppressing-signal generator performs a same process as that of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) using received sound signals of the two first and second microphones, and generates a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), as the spectrum of the orthogonal-disturbance-sound suppressing signal, and the opposite-disturbance-sound-suppressing-control-signal generator has a control target sound superior signal generator which acquires a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the second microphone on a time domain or a frequency domain.
In such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
<Invention of Three Microphones Type that a Control Signal is Generated Using Three Signals, an Opposite Disturbance Sound is Suppressed, and a Process Including the Process of the Invention of the Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which The Target Sound Comes from and a Difference is Acquired is Performed>
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle; an orthogonal-disturbance-sound-suppressing-signal generator that generates an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; an opposite-disturbance-sound-suppressing-control-signal generator that generates a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; and an opposite-disturbance-sound suppressing unit that compares powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator and a spectrum of the control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performs band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein the orthogonal-disturbance-sound-suppressing-signal generator performs a same process as that of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) using received sound signals of the two first and second microphones, and generates a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), as the spectrum of the orthogonal-disturbance-sound suppressing signal, and the opposite-disturbance-sound-suppressing-control-signal generator has: a first control target-sound-superior-signal generator which acquires a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the second microphone on a time domain or a frequency domain; a second control target-sound-superior-signal generator which acquires a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the first microphone on a time domain or a frequency domain; and a control signal integration unit that performs a spectrum integration process of comparing powers for each frequency band, using a spectrum of a first control target sound superior signal generated by the first control target-sound-superior-signal generator or obtained by a successive frequency analysis, and a spectrum of a second control target sound superior signal generated by the second control target-sound-superior-signal generator or obtained by a successive frequency analysis, and of assigning inferior power to a spectrum of a control target sound superior signal.
According to such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
<Invention of Three Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of a Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and Sum/Difference are Both Acquired is Performed>
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle; an orthogonal-disturbance-sound-suppressing-signal generator that generates an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; an opposite-disturbance-sound-suppressing-control-signal generator that generates a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two second and third microphones; and an opposite-disturbance-sound suppressing unit that compares powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator and a spectrum of the control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performs band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein the orthogonal-disturbance-sound-suppressing-signal generator performs a same process as that of the sound source separation system (invention of a type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and sum/difference are both acquired) using received sound signals of the two first and second microphones, and generates a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation system (invention of a type that the two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and sum/difference are both acquired), as the spectrum of the orthogonal-disturbance-sound suppressing signal, and the opposite-disturbance-sound-suppressing-control-signal generator has a control target sound superior signal generator which acquires a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the second microphone on a time domain or a frequency domain.
According to such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
<Invention of Three Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of Three Microphone/Two Combinations Type is Performed>
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle; an orthogonal-disturbance-sound-suppressing-signal generator that generates an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; an opposite-disturbance-sound-suppressing-control-signal generator that generates a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; and an opposite-disturbance-sound suppressing unit that compares powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator and a spectrum of the control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performs band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein the orthogonal-disturbance-sound-suppressing-signal generator performs a same process as that of the sound source separation system (invention of three microphone/two combinations type) using received sound signals of the three first, second and third microphones, and generates a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation system (invention of three microphone/two combinations type), as the spectrum of the orthogonal-disturbance-sound suppressing signal, and the opposite-disturbance-sound-suppressing-control-signal generator has a control target sound superior signal generator which acquires a difference between the received sound signal of the second microphone undergone a delayed process and the received sound signal of the first microphone on a time domain or a frequency domain.
According to such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
<Invention of Four Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of Four Microphones/Two Combinations Type is Performed>
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of four microphones, respective two of which are disposed side by side in such a manner as to be spaced away from each other in a first direction and a second direction orthogonal to each other; an orthogonal-disturbance-sound-suppressing-signal generator that generates an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the four microphones; an opposite-disturbance-sound-suppressing-control-signal generator that generates a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two microphones disposed side by side in the first direction in the four microphones; and an opposite-disturbance-sound suppressing unit that compares powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator and a spectrum of the control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performs band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein the orthogonal-disturbance-sound-suppressing-signal generator performs a same process as that of the sound source separation system (invention of four microphones/two combinations type) using received sound signals of the four microphones, and generates a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation system (invention of four microphones/two combinations type), as the spectrum of the orthogonal-disturbance-sound suppressing signal, and the opposite-disturbance-sound-suppressing-control-signal generator has a control target sound superior signal generator which acquires a difference between the received sound signal of the microphone at the opposite disturbance sound side undergone a delayed process in the two microphones disposed side by side in the first direction and the received sound signal of the microphone at the target sound side on a time domain or a frequency domain.
According to such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is four, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
<Invention of Four Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of Four Microphones/Three Combinations Type is Performed>
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of four first, second, third and fourth microphones disposed at respective vertices of a rectangle; an orthogonal-disturbance-sound-suppressing-signal generator that generates an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the four microphones; an opposite-disturbance-sound-suppressing-control-signal generator that generates a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; and an opposite-disturbance-sound suppressing unit that compares powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator and a spectrum of the control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performs band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein the orthogonal-disturbance-sound-suppressing-signal generator performs a same process as that of the sound source separation system (invention of four microphones/three combinations type) using received sound signals of the four microphones, and generates a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation system (invention of four microphones/three combinations type), as the spectrum of the orthogonal-disturbance-sound suppressing signal, and the opposite-disturbance-sound-suppressing-control-signal generator has a control target sound superior signal generator which acquires a difference between a received sound signal of the second microphone undergone a delayed process and the received sound signal of the first microphone on a time domain or a frequency domain.
According to such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is four, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
<Invention of Three Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of Three Microphones/Three Combinations Type is Performed>
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a total of three first, second and third microphones disposed at respective vertices of a triangle; an orthogonal-disturbance-sound-suppressing-signal generator that generates an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; an opposite-disturbance-sound-suppressing-control-signal generator that generates a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; and an opposite-disturbance-sound suppressing unit that compares powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator and a spectrum of the control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performs band selection (minimum level band selection: BS-MIN) of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein the orthogonal-disturbance-sound-suppressing-signal generator performs a same process as that of the sound source separation system (invention of three microphones/three combinations type) using received sound signals of the three first, second and third microphones, and generates a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation system (invention of three microphones/three combinations type), as the spectrum of the orthogonal-disturbance-sound suppressing signal, and the opposite-disturbance-sound-suppressing-control-signal generator has: a first control target-sound-superior-signal generator which acquires a difference between the received sound signal of the second microphone undergone a delayed process and the received sound signal of the first microphone on a time domain or a frequency domain; a second control target-sound-superior-signal generator which acquires a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the first microphone on a time domain or a frequency domain; and a control signal integration unit that performs a spectrum integration process of comparing powers for each frequency band, using a spectrum of a first control target sound superior signal generated by the first control target-sound-superior-signal generator or obtained by a successive frequency analysis, and a spectrum of a second control target sound superior signal generated by the second control target-sound-superior-signal generator or obtained by a successive frequency analysis, and of assigning inferior power to a spectrum of a control target sound superior signal.
According to such a sound source separation system of the invention (e.g., the case shown in
The number of microphones to be used is three, and sound source separation is realized with the few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
Further, the following structure (e.g., the case shown in
<Invention of Performing Multidimensional Band Selection>
According to the invention, there is provided a sound source separation system that separates a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from and comprises: a plurality of different-directional-signal-group generators each generating more than or equal to two combinations of spectra of a plurality of signals each of which has a different directivity, using received sound signals of a plurality of microphones; and a sensitive region formation unit which determines whether or not a relationship between powers of the spectra in a combination simultaneously satisfies a plurality of conditions each defined for a combination, for each frequency band, using more than or equal to two combinations of the spectra of the plurality of signals generated by the respective different-directional-signal-group generators, and performs multidimensional band selection (BS-MultiD) of assigning power of a spectraelected beforehand to a spectrum of the target sound to be separated, for a frequency band where the plurality of conditions are simultaneously satisfied.
According to such a sound source separation system of the invention (e.g., the case shown in
Sound source separation is realized with a few microphones, so that miniaturization of the device is enabled, thereby achieving the foregoing object.
In the foregoing sound source separation system (invention of performing multidimensional band selection), each different-directional-signal-group generator may generate a spectrum of a target sound superior signal and a spectrum of a target sound inferior signal using the received sound signals of the plurality of microphones, and the sensitive region formation unit may set a condition for each combination as a condition that power of the spectrum of the target sound superior signal is larger than power of the spectrum of the target sound inferior signal, and determine whether or not those conditions are simultaneously satisfied for each frequency band.
<Invention of Performing Two-Dimensional Band Selection>
More specifically, as the invention of performing two-dimensional band selection, there may be provided the sound source separation system having a total of three first, second and third microphones disposed at respective vertices of a triangle, and wherein a first different-directional-signal-group generator comprises: a first target sound superior signal generator which acquires a difference between a received sound signal of the first microphone and a received sound signal of the second microphone undergone a delayed process on a time domain or a frequency domain and generates a first target sound superior signal; a second target sound superior signal generator which acquires a difference between a received sound signal of the second microphone and a received sound signal of the first microphone undergone a delayed process on a time domain or a frequency domain, and generates a second target sound superior signal; a target sound inferior signal generator which acquires a difference between received sound signals of the first and second microphones on a time domain or a frequency domain; and an integration unit which compares powers for each frequency band using a spectrum of the first target sound superior signal generated by the first target sound superior signal generator or obtained by a successive frequency analysis and a spectrum of the second target sound superior signal generated by the second target sound superior signal generator or obtained by a successive frequency analysis, and performs a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal, a second different-directional-signal-group generator comprises: a first target sound superior signal generator which acquires a difference between a received sound signal of the third microphone and a received sound signal of the second microphone undergone a delayed process on a time domain or a frequency domain and generates a first target sound superior signal; a second target sound superior signal generator which acquires a difference between a received sound signal of the second microphone and a received sound signal of the third microphone undergone a delayed process on a time domain or a frequency domain, and generates a second target sound superior signal; a target sound inferior signal generator which acquires a difference between received sound signals of the second and third microphones on a time domain or a frequency domain; and an integration unit which compares powers for each frequency band using a spectrum of the first target sound superior signal generated by the first target sound superior signal generator or obtained by a successive frequency analysis and a spectrum of the second target sound superior signal generated by the second target sound superior signal generator or obtained by a successive frequency analysis, and performs a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal, and the sensitive region formation unit performs two-dimensional-band selection of assigning power of a spectrum of a target sound superior signal generated by either one of the first and second different-directional-signal-group generators to a spectrum of the target sound to be separated (e.g., the case shown in
<Invention of Performing Three-Dimensional Band Selection>
As the invention of performing three-dimensional band selection, there may be provided the sound source separation system having a total of three first, second and third microphones disposed at respective vertices of a triangle, and wherein a first different-directional-signal-group generator comprises: a first target sound superior signal generator which acquires a difference between a received sound signal of the first microphone and a received sound signal of the second microphone undergone a delayed process on a time domain or a frequency domain and generates a first target sound superior signal; a second target sound superior signal generator which acquires a difference between a received sound signal of the second microphone and a received sound signal of the first microphone undergone a delayed process on a time domain or a frequency domain, and generates a second target sound superior signal; a target sound inferior signal generator which acquires a difference between received sound signals of the first and second microphones on a time domain or a frequency domain; and an integration unit which compares powers for each frequency band using a spectrum of the first target sound superior signal generated by the first target sound superior signal generator or obtained by a successive frequency analysis and a spectrum of the second target sound superior signal generated by the second target sound superior signal generator or obtained by a successive frequency analysis, and performs a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal, a second different-directional-signal-group generator comprises: a first target sound superior signal generator which acquires a difference between a received sound signal of the third microphone and a received sound signal of the second microphone undergone a delayed process on a time domain or a frequency domain and generates a first target sound superior signal; a second target sound superior signal generator which acquires a difference between a received sound signal of the second microphone and a received sound signal of the third microphone undergone a delayed process on a time domain or a frequency domain, and generates a second target sound superior signal; a target sound inferior signal generator which acquires a difference between received sound signals of the second and third microphones on a time domain or a frequency domain; and an integration unit which compares powers for each frequency band using a spectrum of the first target sound superior signal generated by the first target sound superior signal generator or obtained by a successive frequency analysis and a spectrum of the second target sound superior signal generated by the second target sound superior signal generator or obtained by a successive frequency analysis, and performs a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal, and a third different-directional-signal-group generator comprises: a first target sound superior signal generator which acquires a difference between a received sound signal of the third microphone and a received sound signal of the first microphone undergone a delayed process on a time domain or a frequency domain and generates a first target sound superior signal; a second target sound superior signal generator which acquires a difference between a received sound signal of the first microphone and a received sound signal of the third microphone undergone a delayed process on a time domain or a frequency domain, and generates a second target sound superior signal; a target sound inferior signal generator which acquires a difference between received sound signals of the first and third microphones on a time domain or a frequency domain; and an integration unit which compares powers for each frequency band using a spectrum of the first target sound superior signal generated by the first target sound superior signal generator or obtained by a successive frequency analysis and a spectrum of the second target sound superior signal generated by the second target sound superior signal generator or obtained by a successive frequency analysis, and performs a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal, and the sensitive region formation unit performs three-dimensional-band selection of assigning power of a spectrum of a target sound superior signal generated by either one of the first, second and third different-directional-signal-group generators to a spectrum of the target sound to be separated (e.g., the case shown in
<Invention of Applying a Delay which is an Integral Multiplication of a Sampling Period>
According to the foregoing sound source separation system, it is desirable that the delayed process should be a process of applying a delay which is an integral multiplication of a sampling period on a time domain or a frequency domain when a process of acquiring a difference between one signal undergone a delayed process in a pair of two signals and an other signal is performed.
In a case where a structure such that the delay which is an integral multiplication of the sampling period is applied is employed, delay operation through a digital filter having a large operand becomes unnecessary, and a process of giving a large delay to both two signals to be paired with each other becomes unnecessary.
<Common Feature>
According to the foregoing sound source separation system, the microphone may be a non-directional or an approximately non-directional microphone.
<<Invention of Sound Source Separation Method>>
As a sound source separation method which realizes the foregoing sound source separation system of the invention, there is provided the following sound source separation methods of the invention.
<Invention of Two Microphones Type> Invention of a Type that Two Microphones are Used
That is, according to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing two microphones in such a manner as to be spaced away from each other; performing a linear combination process for emphasizing the target sound using received sound signals of the two microphones on a time domain or a frequency domain to generate at least one target sound superior signal; performing a linear combination process for suppressing the target sound using the received sound signals of the two microphones on a time domain or a frequency domain to generate at least one target sound inferior signal to be paired with the target sound superior signal; and separating the target sound and the disturbance sound from each other using a spectrum of the target sound superior signal and a spectrum of the target sound inferior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of a Type that Two Microphones are Disposed in Parallel with a Direction in which the Target Sound Comes from> Invention of a Type that Two Microphones are Disposed in the Direction in which the Target Sound Comes from or in an Approximately Same Direction as that Direction
Specifically, the foregoing sound source separation method may further comprise disposing the two microphones side by side in the direction in which the target sound comes from or an approximately same direction as that direction, acquiring a difference between a received sound signal of one microphone disposed near a sound source of the target sound in the two microphones and a received sound signal of an other microphone disposed away from the sound source of the target sound on a time domain or a frequency domain when generating the target sound superior signal; and acquiring a difference between the received sound signal of the one microphone undergone a delayed process and the received sound signal of the other microphone on a time domain or a frequency domain when generating the target sound inferior signal.
In a case where the two microphones are disposed in the direction in which the target sound comes from or in an approximately same direction as that direction, when the target sound and the disturbance sound are separated from each other, powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed.
In a case where the two microphones are disposed in the direction in which the target sound comes from or in an approximately same direction as that direction, when the target sound and the disturbance sound are separated from each other, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal at a same frequency band may be performed.
In a case where the two microphones are disposed in the direction in which the target sound comes from or in an approximately same direction as that direction, to change over a target sound to be separated to a target sound in a normal mode and a target sound in a changeover mode coming from a direction opposite to the normal mode target sound, it is desirable that the one microphone should be disposed near a sound source of the normal mode target sound and the other microphone should be disposed away from the sound source of the normal mode target sound in the normal mode, the other microphone should be disposed near a sound source of the changeover mode target sound and the one microphone should be disposed away from the sound source of the changeover mode target sound in the changeover mode, when the target sound inferior signal is generated, a difference between the received sound signal of the one microphone undergone a delayed process and the received sound signal of the other microphone should be acquired on a time domain or a frequency domain to generate a first target sound inferior signal in the normal mode, a difference between the received sound signal of the other microphone undergone a delayed process and the received sound signal of the one microphone should be acquired on a time domain or a frequency domain to generate a second target sound inferior signal in the changeover mode, and when the target sound and the disturbance sound are separated from each other, as the target sound inferior signal, the first target sound inferior signal should be used in the normal mode and the second target sound inferior signal should be used in the changeover mode.
In a case where the two microphones are disposed in the direction in which the target sound comes from or in an approximately same direction as that direction, when the target sound inferior signal is generated, a time delay which is a same as or an approximately same as a sound wave propagation time between the two microphones may be performed on the received sound of the microphone subject to the delayed process on a time domain or a frequency domain.
In a case where the two microphones are disposed in the direction in which the target sound comes from or in an approximately same direction as that direction, when the target sound inferior signal is generated, a time delay which is shorter than a sound wave propagation time between the two microphones may be performed on the received sound of the microphone subject to the delayed process on a time domain or a frequency domain.
Further, in a case where the two microphones are disposed in the direction in which the target sound comes from or in an approximately same direction as that direction, the two microphones may be respectively provided at a corresponding portion of a front face of a portable device at which an operation unit and/or a screen display unit is provided and a corresponding portion of a rear face opposite thereto.
In a case where the two microphones are provided at the front and rear of the portable device one by one, the portable device may be a foldable cellular phone which is folded and closed when not in use and opened when in use, and a clearance between the two disposed microphones may change in accordance with an opening/closing operation of the cellular phone, and a clearance when the cellular phone is opened may be larger than a clearance when the cellular phone is closed.
Further, in a case where the two microphones are provided at the front and rear of the portable device one by one, the two microphones may be provided at end portions of both sides of a rotation support member attached in such a manner as to be rotatable around an axis parallel to the front/rear face of the cellular phone, and the rotation support member may be retained in a state parallel to or approximately parallel to the front/rear surface of the cellular phone when not in use, and may become orthogonal or approximately orthogonal to the front/rear face of the cellular phone when in use.
<Invention of a Type that the Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from And Sum/Difference are Both Acquired> Invention of a Type that the Two Microphones are Disposed Side by Side in a Direction Orthogonal to or Approximately Orthogonal to the Direction in which the Target Sound Comes From, and a Sum and Difference of Received Sound Signals are Used
In addition to disposing the two microphones side by side in the direction in which the target sound comes from or in an approximately same direction as that direction, the following structure may be employed. That is, in the foregoing sound source separation method, the two microphones may be disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, when the target sound superior signal is generated, a sum of the received sound signals of the two microphones may be acquired on a time domain or a frequency domain, and when the target sound inferior signal is generated, a difference between the received sound signals of the two microphones may be acquired on a time domain or a frequency domain.
In a case where the two microphones are disposed side by side in the direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from and a sum of the received sound signals of the two microphones are acquired to generate the target sound superior signal, when the target sound and the disturbance sound are separated from each other, at least one spectrum in the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal may be multiplied by a coefficient depending on a frequency, powers of the spectra may be compared at a same frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed.
In a case where the two microphones are disposed side by side in the direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from and a sum of the received sound signals of the two microphones are acquired to generate the target sound superior signal, when the target sound and the disturbance sound are separated from each other, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal may be performed at a same frequency band.
<Invention of a Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired> Invention of a Type that the Two Microphones are Disposed Side by Side in a Direction Orthogonal to or Approximately Orthogonal to the Direction in which the Target Sound Comes from and a Difference Between the Received Sound Signals is Used but a Sum Thereof is Not Used
In addition to disposing the two microphones side by side in the direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from and acquiring a sum of the received sound signals of the two microphones to generate the target sound superior signal, the following structure may be employed. That is, in the following sound source separation method, the two microphones may be disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, when the target sound superior signal is generated, a difference between the received sound signal of the one microphone in the two microphones and the received signal of the other microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a first target sound superior signal, and a difference between the received sound signal of the other microphone and the received sound signal of the one microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a second target sound superior signal, and when the target sound inferior signal is generated, a difference between the received sound signals of the two microphones may be acquired on a time domain or a frequency domain.
In a case where the two microphones are disposed side by side in the direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from and the two first and second target sound superior signals are generated, when the target sound and the disturbance sound are separated from each other, powers at a same frequency band between the spectrum of the first target sound superior signal and the spectrum of the target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed to separate one sound including the target sound, powers at a same frequency band between the spectrum of the second target sound superior signal and the spectrum of the target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed to separate an other sound including the target sound, and a spectrum integration process of adding those powers of the spectra for each frequency band or comparing the powers for each frequency band and assigning inferior power to a spectrum of the target sound may be performed, using a spectrum of one sound including the target sound and a spectrum of an other sound including the target sound.
In a case where the two microphones are disposed side by side in the direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from and the two first and second target sound superior signals are generated, when the target sound and the disturbance sound are separated from each other, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the first target sound superior signal may be performed at a same frequency band to separate one sound including the target sound, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the second target sound superior signal of the same frequency band may be performed to separate an other sound including the target sound, and a spectrum integration process of adding those powers of the spectra for each frequency band or comparing the powers for each frequency band and assigning inferior power to a spectrum of the target sound may be performed, using a spectrum of one sound including the target sound and a spectrum of an other sound including the target sound.
<Invention of Three Microphones/Two Combinations Type> Invention of a Type that Two Combinations of Microphones are Made Using Three Microphones
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle; performing a linear combination process for emphasizing the target sound on a time domain or a frequency domain, using received sound signals of the two first and second microphones to generate at least one target sound superior signal; performing a linear combination process for suppressing the target sound on a time domain or a frequency domain, using received sound signals of the two first and third microphones to generate at least a target sound inferior signal to be paired with the target sound superior signal; and separating the target sound and the disturbance sound from each other using a spectrum of the target sound superior signal and a spectrum of the target sound inferior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
In the foregoing sound source separation method, it is desirable that the first and second microphones should be disposed side by side in a direction in which the target sound comes from or in an approximately same direction as that direction, the first and third microphones should be disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, when the target sound superior signal is generated, a difference between the received sound signal of the first microphone and the received sound signal of the second microphone should be acquired on a time domain or a frequency domain, and when the target sound inferior signal is generated, a difference between the received sound signal of the first microphone and the received sound signal of the third microphone should be acquired on a time domain or a frequency domain.
According to the foregoing sound source separation method, when the target sound and the disturbance sound are separated from each other, powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed.
Further, according to the foregoing sound source separation method, when the target sound and the disturbance sound are separated from each other, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal may be performed at a same frequency band.
<Invention of Four Microphones/Two Combinations Type> Invention of a Type that Two Combinations of Microphones are Made Using Four Microphones
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of four microphones, respective two microphones being disposed side by side as to be spaced away in a first direction and a second direction intersecting with each other; performing a linear combination process for emphasizing the target sound on a time domain or a frequency domain using received sound signals of the two microphones disposed side by side in the first direction in the four microphones to generate at least one target sound superior signal; performing a linear combination process for suppressing the target sound on a time domain or a frequency domain using received sound signals of the two microphones disposed side by side in the second direction in the four microphones to generate at least one target sound inferior signal; and separating the target sound and the disturbance sound from each other using a spectrum of the target sound superior signal and a spectrum of the target sound inferior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
In the foregoing sound source separation method, it is desirable that the first direction should be the direction in which the target sound comes from or an approximately same direction as that direction, the second direction should be orthogonal to or approximately orthogonal to the direction in which the target sound comes from, when the target sound superior signal is generated, a difference between the received sound signals of the two microphones disposed side by side in the first direction should be acquired on a time domain or a frequency domain, and when the target sound inferior signal is generated, a difference between the received sound signals of the two microphones disposed side by side in the second direction should be acquired on a time domain or a frequency domain.
According to the foregoing sound source separation method, when the target sound and the disturbance sound are separated from each other, powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed.
Further, according to the foregoing sound source separation method, when the target sound and the disturbance sound are separated from each other, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal may be performed at a same frequency band.
<Invention of Four Microphones/Three Combinations Type> Invention of a Type that Three Combinations of Microphones are Made Using Four Microphones
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of four first, second, third and fourth microphones at respective vertices of a rectangle; performing a linear combination process for emphasizing the target sound on a time domain or a frequency domain using received sound signals of the two first and second microphones to generate a target sound superior signal; performing a linear combination process for suppressing the target sound on a time domain or a frequency domain using received sound signals of the two first and third microphones to generate a first target sound inferior signal; performing a linear combination process for suppressing the target sound on a time domain or a frequency domain using received sound signals of the two first and fourth microphones to generate a second target sound inferior signal; separating one sound including the target sound, using a spectrum of the target sound superior signal and a spectrum of the first target sound inferior signal; separating an other sound including the target sound, using the spectrum of the target sound superior signal and a spectrum of the second target sound inferior signal; and performing a spectrum integration process of adding those powers of the spectra for each frequency band or comparing the powers for each frequency band and assigning inferior power to a spectrum of the target sound, using a spectrum of the one sound including the target sound and a spectrum of the other sound including the target sound.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
In the foregoing sound source separation method, it is desirable that the first and second microphones should be disposed side by side in a direction in which the target sound comes from or in an approximately same direction as that direction, the third microphone should be disposed at one end of a line interconnecting the first microphone and the second microphone, the fourth microphone should be disposed at an other end of the line interconnecting the first microphone and the second microphone, when the target sound superior signal is generated, a difference between received sound signals of the first and second microphones should be acquired on a time domain or a frequency domain, when the first target sound inferior signal is generated, a difference between received sound signals of the first and third microphones should be acquired on a time domain or a frequency domain, and when the second target sound inferior signal is generated, a difference between received sound signals of the first and fourth microphones should be acquired on a time domain or a frequency domain.
According to the foregoing sound source separation method, when the one sound including the target sound is separated, powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the first target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed, and when the other sound including the target sound is separated, powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the second target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed.
In the foregoing sound source separation method, when the one sound including the target sound is separated, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the first target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal may be performed at a same frequency band, and when the other sound including the target sound is separated, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the second target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal may be performed at a same frequency band.
<Invention of Three Microphones/Three Combinations Type> Invention of a Type that Three Combinations of Microphones are Made Using Three Microphones
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle; performing a linear combination process for emphasizing the target sound on a time domain or a frequency domain, using received sound signals of the three microphones to generate a target sound superior signal; performing a linear combination process for suppressing the target sound on a time domain or a frequency domain, using received sound signals of the two first and second microphones to generate a first target sound inferior signal to be paired with the target sound superior signal; performing a linear combination process for suppressing the target sound on a time domain or a frequency domain, using received sound signals of the two first and third microphones to generate a second target sound inferior signal to be paired with the target sound superior signal; separating one sound including the target sound, using a spectrum of the target sound superior signal and a spectrum of the first target sound inferior signal; separating an other sound including the target sound, using the spectrum of the target sound superior signal and a spectrum of the second target sound inferior signal; and performing a spectrum integration process of adding those powers of the spectra for each frequency band or comparing the powers for each frequency band and assigning inferior power to a spectrum of the target sound, using a spectrum of the one sound including the target sound and a spectrum of the other sound including the target sound.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
According to the foregoing sound source separation method, it is desirable that the first and second microphones should be disposed side by side in a direction inclined with respect to a direction in which the target sound comes from, the first and third microphones should be disposed side by side in a direction inclined in a opposite direction to the inclined direction of the first and second microphones, when the target sound superior signal is generated, a difference between the received sound signal of the first microphone and a sum, obtained by multiplying received sound signals of the second and third microphones by a same or different proportionality coefficients, should be acquired on a time domain or a frequency domain, when the first target sound inferior signal is generated, a difference between the received sound signals of the first and second microphones should be acquired on a time domain or a frequency domain, and when the second target sound inferior signal is generated, a difference between the received sound signals of the first and third microphones should be acquired on a time domain or a frequency domain.
In the foregoing sound source separation method, when the one sound including the target sound is separated, powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the first target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed, and when the other sound including the target sound is separated, powers at a same frequency band between the spectrum of the target sound superior signal and the spectrum of the second target sound inferior signal may be compared for each frequency band, and band selection of assigning larger powers at the individual frequency bands to a spectrum obtained by separation may be performed.
Further, according to the foregoing sound source separation method, when the one sound including the target sound is separated, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the first target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal may be performed at a same frequency band, and when the other sound including the target sound is separated, spectral subtraction of subtracting a value, obtained by multiplying power of the spectrum of the second target sound inferior signal by a coefficient, from power of the spectrum of the target sound superior signal may be performed at a same frequency band.
<Invention of Two Sensitive Regions Integration Type that Three Microphones are Disposed on a Plane Orthogonal to the Direction in which the Target Sound Comes from> Invention of a Type that Three Microphones are Disposed on a Plane Orthogonal to or Approximately Orthogonal to the Direction in which the Target Sound Comes from, and Two Sensitive Regions are Integrated
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle on a plane orthogonal to or approximately orthogonal to a direction in which the target sound comes from; generating a spectrum of a first sensitive region formation signal which forms a first sensitive region along a plane orthogonal to a line interconnecting those microphones, using received sound signals of the two first and second microphones; generating a spectrum of a second sensitive region formation signal which forms a second sensitive region along a plane orthogonal to a line interconnecting those microphones, using received sound signals of the two second and third microphones; and forming a sensitive region for separating the target sound at a common part of the first sensitive region and the second sensitive region, using the spectrum of the first sensitive region formation signal and the spectrum of the second sensitive region formation signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Two Sensitive Regions Integration Type that Three Microphones are Disposed on a Plane Orthogonal to the Direction in which the Target Sound Comes from, and a Process Including the Process of the Invention of the Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired is Performed>
According to the foregoing sound source separation method, when the first sensitive region formation signal is generated, a same process as that of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be performed, using the received sound signals of the two first and second microphones, and a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be generated, as the spectrum of the first sensitive region formation signal, when the second sensitive region formation signal is generated, a same process as that of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be performed, using the received sound signals of the two second and third microphones, and a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be generated, as the spectrum of the second sensitive region formation signal, and when the sensitive region to separate the target sound is formed at the common part of the first sensitive region and the second sensitive region, a spectrum integration process of comparing the powers of the spectra for each frequency band and assigning inferior power to a spectrum of the target sound may be performed, using the spectrum of the first sensitive region formation signal and the spectrum of the second sensitive region formation signal.
Moreover, according to the sound source separation method, when the first sensitive region formation signal is generated, a same process as that of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be performed, using the received sound signals of the two first and second microphones, and a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be generated, as the spectrum of the first sensitive region formation signal, when the second sensitive region formation signal is generated, same processes as those of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) other than a process of the spectrum integration process in the separation process may be performed, using the received sound signals of the two second and third microphones, and a sensitive region limitation process of limiting the second sensitive region to either of a region at the second microphone side and a region at the third microphone side may be performed, instead of the spectrum integration process of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), in performing the sensitive region limitation process, when a delayed process is performed on the received sound signal of the second microphone in a first target sound superior signal generation process and a delayed process is performed on the received sound signal of the third microphone, the first target sound superior signal generation process and the second target sound superior signal generation process constituting the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), powers at a same frequency band between the spectrum of one sound including the target sound separated by a first separation process and the spectrum of an other sound including the target sound separated by a second separation process may be compared for each frequency band, band selection of assigning smaller power to a spectrum of one sound including the target sound separated by the first separation process may be performed for a frequency band where power of the spectrum of the one sound including the target sound separated by the first separation process is smaller than power of a spectrum of an other sound including the target sound separated by the second separation process to generate the spectrum of the second sensitive region formation signal which forms the second sensitive region limited to the region at the second microphone side, or band selection of assigning smaller power to the spectrum of the other sound including the target sound separated by the second separation process may be performed for a frequency band where power of the spectrum of the other sound including the target sound separated by the second separation process is smaller than power of the spectrum of the one sound including the target sound separated by the first separation process to generate a spectrum of the second sensitive region formation signal which forms the second sensitive region limited to the region at the third microphone side, and when the sensitive region to separate the target sound is formed at the common part of the first sensitive region and the second sensitive region, a spectrum integration process of comparing the powers of the spectra for each frequency band, using the spectrum of the first sensitive region formation signal and the spectrum of the second sensitive region formation signal, and assigning inferior power to a spectrum of the target sound may be performed.
Further, according to the foregoing case, when the sensitive region limitation process is performed, limitation of the second sensitive region to either of the region at the second microphone side and the region at the third microphone side can be changed over.
<Invention of Three Sensitive Regions Integration Type that Three Microphones are Disposed on a Plane Orthogonal to the Direction in which the Target Sound Comes from> Invention of a Type that Three Microphones are Disposed on a Plane Orthogonal to or Approximately Orthogonal to the Direction in which the Target Sound Comes from and Three Sensitive Regions are Integrated
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle perpendicular to or approximately perpendicular to a direction in which the target sound comes from; generating a spectrum of a first sensitive region formation signal which forms a first sensitive region along a plane orthogonal to a line interconnecting the first and second microphones, using received sound signals of those two microphones; generating a spectrum of a second sensitive region formation signal which forms a second sensitive region along a plane orthogonal to a line interconnecting the second and third microphones, using received sound signals of those two microphones; generating a spectrum of a third sensitive region formation signal which forms a third sensitive region along a plane orthogonal to a line interconnecting the first and third microphones, using received sound signals of those two microphones; and forming a sensitive region for separating the target sound at a common part of the first sensitive region, the second sensitive region and the third sensitive region, using the spectrum of the first sensitive region formation signal, the spectrum of the second sensitive region formation signal, and the spectrum of the third sensitive region formation signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Three Sensitive Regions Integration Type that Three Microphones are Disposed on a Plane Orthogonal to the Direction in which the Target Sound Comes from, and a Process Including the Process of the Invention of the Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired is Performed>
According to the foregoing sound source separation method, when the first sensitive region formation signal is generated, a same process as that of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be performed, using the received sound signals of the two first and second microphones, and a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be generated, as the spectrum of the first sensitive region formation signal, when the second sensitive region formation signal is generated, a same process as that of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be performed, using the received sound signals of the two second and third microphones, and a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be generated, as the spectrum of the second sensitive region formation signal, when the third sensitive region formation signal is generated, a same process as that of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be performed, using the received sound signals of the two first and third microphones, and a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be generated, as the spectrum of the third sensitive region formation signal, and when the sensitive region to separate the target sound is formed at the common part of the first sensitive region, the second sensitive region, and the third sensitive region, a spectrum integration process of comparing the powers of the spectra for each frequency band and assigning most inferior power to a spectrum of the target sound may be performed, using the spectrum of the first sensitive region formation signal, the spectrum of the second sensitive region formation signal and the spectrum of the third sensitive region formation signal.
Further, according to the foregoing sound source separation method, when the first sensitive region formation signal is generated, a same process as that of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be performed, using the received sound signals of the two first and second microphones, and a same spectrum as the spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be generated, as the spectrum of the first sensitive region formation signal, when the second sensitive region formation signal is generated, same processes as those of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) other than a spectrum integration process in a separation process may be performed, using the received sound signals of the two second and third microphones, and a sensitive region limitation process of limiting the second sensitive region to either of a region at the second microphone side and a region at the third microphone side may be performed, instead of the spectrum integration process of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), in performing the sensitive region limitation process when the second sensitive region formation signal is generated, when a delayed process is performed on the received sound signal of the second microphone in a first target sound superior signal generation process and a delayed process is performed on the received sound signal of the third microphone in a second target sound superior signal generation process, the first target sound superior signal generation process and the second target sound superior signal generation process constituting the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), powers at a same frequency band between the spectrum of one sound including the target sound separated by a first separation process and the spectrum of an other sound including the target sound separated by a second separation process may be compared for each frequency band, band selection of assigning smaller power to a spectrum of one sound including the target sound separated by the first separation process may be performed for a frequency band where power of the spectrum of the one sound including the target sound separated by the first separation process is smaller than power of a spectrum of an other sound including the target sound separated by the second separation process to generate the spectrum of the second sensitive region formation signal which forms the second sensitive region limited to the region at the second microphone side, or band selection of assigning smaller power to the spectrum of the other sound including the target sound separated by the second separation process may be performed for a frequency band where power of the spectrum of the other sound including the target sound separated by the second separation process is smaller than power of the spectrum of the one sound including the target sound separated by the first separation process to generate a spectrum of the second sensitive region formation signal which forms the second sensitive region limited to the region at the third microphone side, when the third sensitive region formation signal is generated, same processes as those of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) may be performed other than the spectrum integration process in the separation process, using the received sound signals of the two first and third microphones, and a sensitive region limitation process of limiting the third sensitive region to either of a region at the first microphone side and a region at the third microphone side may be performed, instead of the spectrum integration process of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), in performing the sensitive region limitation process when the third sensitive region formation signal is generated, when a delayed process is performed on the received sound signal of the first microphone in a first target sound superior signal generation process and a delayed process is performed on the received sound signal of the third microphone in a second target sound superior signal generation process, the first target sound superior signal generation process and the second target sound superior signal generation process constituting the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired), powers at a same frequency band between the spectrum of one sound including the target sound separated by the first separation process and the spectrum of an other sound including the target sound separated by the second separation process may be compared for each frequency band, band selection of assigning smaller power to a spectrum of one sound including the target sound separated by the first separation process may be performed for a frequency band where power of the spectrum of the one sound including the target sound separated by the first separation process is smaller than power of a spectrum of an other sound including the target sound separated by the second separation process to generate the spectrum of the third sensitive region formation signal which forms the third sensitive region limited to the region at the first microphone side, or band selection of assigning smaller power to the spectrum of the other sound including the target sound separated by the second separation process may be performed for a frequency band where power of the spectrum of the other sound including the target sound separated by the second separation process is smaller than power of the spectrum of the one sound including the target sound separated by the first separation process to generate a spectrum of the third sensitive region formation signal which forms the third sensitive region limited to the region at the third microphone side, and when the sensitive region to separate the target sound is formed at the common part of the first sensitive region, the second sensitive region, and the third sensitive region is formed, a spectrum integration process of comparing the powers of the spectra for each frequency band and assigning most inferior power to a spectrum of the target sound may be performed, using the spectrum of the first sensitive region formation signal, the spectrum of the second sensitive region formation signal and the spectrum of the third sensitive region formation signal.
<Invention of Three Microphones Type that a Control Signal is Generated Using Two Signals, an Opposite Disturbance Sound is Suppressed, and a Process Including the Process of the Invention of the Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired is Performed>
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle; generating an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; generating a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two second and third microphones; and comparing powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal and a spectrum of the control signal for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performing band selection of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein when the orthogonal-disturbance-sound suppressing signal, a same process as that of the sound source separation system (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) is performed using received sound signals of the two first and second microphones, and a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) is generated, as the spectrum of the orthogonal-disturbance-sound suppressing signal, and when the control signal is generated, a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the second microphone is acquired on a time domain or a frequency domain to generate a control target sound superior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Three Microphones Type that a Control Signal is Generated Using Three Signals, an Opposite Disturbance Sound is Suppressed, and a Process Including the Process of the Invention of the Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and a Difference is Acquired is Performed>
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle; generating an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; generating a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; and comparing powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal and a spectrum of the control signal for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performing band selection of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein when the orthogonal-disturbance-sound suppressing signal is generated, a same process as that of the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) is performed using received sound signals of the two first and second microphones, and a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation method (invention of the type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and a difference is acquired) is generated, as the spectrum of the orthogonal-disturbance-sound suppressing signal, and when the control signal is generated, a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the second microphone is acquired on a time domain or a frequency domain to generate a first control target sound superior signal, a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the first microphone is acquired on a time domain or a frequency domain to generate a second control target sound superior signal, and a spectrum integration process of comparing powers for each frequency band, using a spectrum of the first control target sound superior signal, and a spectrum of the second control target sound superior signal, and of assigning inferior power to a spectrum of a control target sound superior signal is performed.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Three Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of a Type that Two Microphones are Disposed in a Direction Orthogonal to the Direction in which the Target Sound Comes from and Sum/Difference are Both Acquired is Performed>
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle; generating an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; generating a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two second and third microphones; and comparing powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal and a spectrum of the control signal for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performing band selection of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein when the orthogonal-disturbance-sound suppressing signal is generated, a same process as that of the sound source separation method (invention of a type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and sum/difference are both acquired) is performed using received sound signals of the two first and second microphones, and a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation method (invention of a type that two microphones are disposed in a direction orthogonal to the direction in which the target sound comes from and sum/difference are both acquired) is generated, as the spectrum of the orthogonal-disturbance-sound suppressing signal, and when the control signal is generated, a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the second microphone is acquired on a time domain or a frequency domain to generate a control target sound superior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Three Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of Three Microphone/Two Combinations Type is Performed>
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle; generating an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; generating a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; and comparing powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal and a spectrum of the control signal for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performing band selection of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein when the orthogonal-disturbance-sound suppressing signal is generated, a same process as that of the sound source separation method (invention of three microphone/two combinations type) is performed using received sound signals of the three first, second and third microphones, and a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation method (invention of three microphone/two combinations type) is generated, as the spectrum of the orthogonal-disturbance-sound suppressing signal, and when the control signal is generated, a difference between the received sound signal of the second microphone undergone a delayed process and the received sound signal of the first microphone is acquired on a time domain or a frequency domain to generate a control target sound superior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Four Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of Four Microphones/Two Combinations Type is Performed>
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of four microphones, respective two of which are disposed side by side in such a manner as to be spaced away from each other in a first direction and a second direction orthogonal to each other; generating an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the four microphones; generating a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two microphones disposed side by side in the first direction in the four microphones; and comparing powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal and a spectrum of the control signal for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performing band selection of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein when the orthogonal-disturbance-sound suppressing signal is generated, a same process as that of the sound source separation method (invention of four microphones/two combinations type) is performed using received sound signals of the four microphones, and a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation method (invention of four microphones/two combinations type) is generated, as the spectrum of the orthogonal-disturbance-sound suppressing signal, and when the control signal is generated, a difference between the received sound signal of the microphone at the opposite disturbance sound side undergone a delayed process in the two microphones disposed side by side in the first direction and the received sound signal of the microphone at the target sound side is acquired on a time domain or a frequency domain to generate a control target sound superior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Four Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of Four Microphones/Three Combinations Type is Performed>
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of four first, second, third and fourth microphones at respective vertices of a rectangle; generating an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the four microphones; generating a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the two first and second microphones; and comparing powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal and a spectrum of the control signal for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performing band selection of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein when the orthogonal-disturbance-sound suppressing signal, a same process as that of the sound source separation method (invention of four microphones/three combinations type) is performed using received sound signals of the four microphones, and a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation method (invention of four microphones/three combinations type) is generated, as the spectrum of the orthogonal-disturbance-sound suppressing signal, and when the control signal is generated, a difference between a received sound signal of the second microphone undergone a delayed process and the received sound signal of the first microphone is acquired on a time domain or a frequency domain to generate a control target sound superior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Three Microphones/Opposite Disturbance Sound Suppressing Type that a Process Including the Process of the Invention of Three Microphones/Three Combinations Type is Performed>
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle; generating an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; generating a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; and comparing powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal and a spectrum of the control signal for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performing band selection of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein when the orthogonal-disturbance-sound suppressing signal is generated, a same process as that of the sound source separation method (invention of three microphones/three combinations type) is performed using received sound signals of the three first, second and third microphones, and a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation method (invention of three microphones/three combinations type) is generated, as the spectrum of the orthogonal-disturbance-sound suppressing signal, and when the control signal is generated, a difference between the received sound signal of the second microphone undergone a delayed process and the received sound signal of the first microphone is acquired on a time domain or a frequency domain to generate a first control target sound superior signal, a difference between the received sound signal of the third microphone undergone a delayed process and the received sound signal of the first microphone is acquired on a time domain or a frequency domain to generate a second target sound superior signal, and a spectrum integration process of comparing powers for each frequency band, using a spectrum of the first control target sound superior signal and a spectrum of the second control target sound superior signal, and of assigning inferior power to a spectrum of a control target sound superior signal is performed.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
Further, according to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: disposing a total of three first, second and third microphones at respective vertices of a triangle; generating an orthogonal-disturbance-sound suppressing signal which suppresses an orthogonal disturbance sound coming from a direction orthogonal to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; generating a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the direction in which the target sound comes from, using received sound signals of the three first, second and third microphones; and comparing powers at a same frequency band between a spectrum of the orthogonal-disturbance-sound suppressing signal and a spectrum of the control signal for each frequency band, and for a frequency band where power of the spectrum of the orthogonal-disturbance-sound suppressing signal is smaller than power of the control signal, performing band selection of assigning smaller power to a spectrum of the target sound to be separated, thereby suppressing a spectrum of the opposite disturbance sound included in the spectrum of the orthogonal-disturbance-sound suppressing signal, and wherein when the orthogonal-disturbance-sound suppressing signal is generated, a same process as that of the sound source separation method (invention of three microphones/three combinations type) is performed using received sound signals of the three first, second and third microphones, and a same spectrum as a spectrum of the target sound obtained through separation by the sound source separation method (invention of three microphones/three combinations type) is generated, as the spectrum of the orthogonal-disturbance-sound suppressing signal, and when the control signal is generated, a difference between a sum signal, obtained by multiplying received signals of the second and third microphones by a same or different proportionality coefficients, undergone a delayed process and the received sound signal of the first microphone is acquired on a time domain or a frequency domain to generate a control target sound superior signal.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
<Invention of Performing Multidimensional Band Selection>
According to the invention, there is provided a sound source separation method of separating a target sound and a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from, comprising: performing a plurality of different-directional-signal-group generation processes, each generating more than or equal to two combinations of spectra of a plurality of signals each of which has a different directivity, using received sound signals of a plurality of microphones; and determining whether or not a relationship between powers of the spectra in a combination simultaneously satisfies a plurality of conditions each defined for a combination, for each frequency band, using more than or equal to two combinations of the spectra of the plurality of signals generated by the respective different-directional-signal-group generators, and performing multidimensional band selection of assigning power of a spectraelected beforehand to a spectrum of the target sound to be separated, for a frequency band where the plurality of conditions are simultaneously satisfied to form a sensitive region.
According to such a sound source separation method of the invention, the working and effectiveness of the foregoing sound source separation system of the invention can be directly obtained, thus achieving the foregoing object.
Further, in the foregoing sound source separation method, when each different-directional-signal-group generation process is performed, a spectrum of a target sound superior signal and a spectrum of a target sound inferior signal may be generated using the received sound signals of the plurality of microphones, and when the sensitive region is formed, a condition for each combination may be set as a condition that power of the spectrum of the target sound superior signal is larger than power of the spectrum of the target sound inferior signal, and it may be determined for each frequency band whether or not those conditions are simultaneously satisfied.
<Invention of Performing Two-Dimensional Band Selection>
Specifically, in the foregoing sound source separation method, a total of three first, second and third microphones may be disposed at respective vertices of a triangle, when a first different-directional-signal-group generation process, a difference between a received sound signal of the first microphone and a received sound signal of the second microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a first target sound superior signal, a difference between a received sound signal of the second microphone and a received sound signal of the first microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a second target sound superior signal, a difference between received sound signals of the first and second microphones may be acquired on a time domain or a frequency domain to generate a target sound inferior signal, and powers for each frequency band may be compared using a spectrum of the first target sound superior signal and a spectrum of the second target sound superior signal, and a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal may be performed, when a second different-directional-signal-group generation process is performed, a difference between a received sound signal of the third microphone and a received sound signal of the second microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a first target sound superior signal, a difference between a received sound signal of the second microphone and a received sound signal of the third microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a second target sound superior signal, a difference between received sound signals of the second and third microphones may be acquired on a time domain or a frequency domain to generate a target sound inferior signal, and powers for each frequency band may be compared using a spectrum of the first target sound superior signal and a spectrum of the second target sound superior signal, and a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal may be performed, and when the sensitive region is formed, two-dimensional-band selection of assigning power of a spectrum of a target sound superior signal generated by either one of the first and second different-directional-signal-group generation processes to a spectrum of the target sound to be separated may be performed.
<Invention of Performing Three-Dimensional Band Selection>
Moreover, in the foregoing sound source separation method, a total of three first, second and third microphones may be disposed at respective vertices of a triangle, when a first different-directional-signal-group generation process is performed, a difference between a received sound signal of the first microphone and a received sound signal of the second microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a first target sound superior signal, a difference between a received sound signal of the second microphone and a received sound signal of the first microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a second target sound superior signal, a difference between received sound signals of the first and second microphones may be acquired on a time domain or a frequency domain to generate a target sound inferior signal, and powers for each frequency band may be compared using a spectrum of the first target sound superior signal and a spectrum of the second target sound superior signal, and a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal may be performed, when a second different-directional-signal-group generation process is performed, a difference between a received sound signal of the third microphone and a received sound signal of the second microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a first target sound superior signal, a difference between a received sound signal of the second microphone and a received sound signal of the third microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a second target sound superior signal, a difference between received sound signals of the second and third microphones may be acquired on a time domain or a frequency domain to generate a target sound inferior signal, and powers for each frequency band may be compared using a spectrum of the first target sound superior signal and a spectrum of the second target sound superior signal, and a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal may be performed, and when a third different-directional-signal-group generation process is performed, a difference between a received sound signal of the third microphone and a received sound signal of the first microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a first target sound superior signal, a difference between a received sound signal of the first microphone and a received sound signal of the third microphone undergone a delayed process may be acquired on a time domain or a frequency domain to generate a second target sound superior signal, a difference between received sound signals of the first and third microphones may be acquired on a time domain or a frequency domain to generate a target sound inferior signal, and powers for each frequency band may be compared using a spectrum of the first target sound superior signal and a spectrum of the second target sound superior signal, and a spectrum integration process of assigning inferior power to a spectrum of a target sound superior signal may be performed, and when the sensitive region is formed, three-dimensional-band selection of assigning power of a spectrum of a target sound superior signal generated by either one of the first, second and third different-directional-signal-group generation processes to a spectrum of the target sound to be separated may be performed.
<Invention of Applying a Delay which is an Integral Multiplication of a Sampling Period>
In the foregoing sound source separation method, it is desirable that when a process of acquiring a difference between one signal undergone a delayed process in a pair of two signals and an other signal is performed, the delayed process should be a process of applying a delay which is an integral multiplication of a sampling period on a time domain or a frequency domain.
<Common Feature>
In the foregoing sound source separation method, the microphone may be a non-directional or an approximately non-directional microphone.
<<Invention of an Acoustic Signal Acquisition Device>>
As an acoustic signal acquisition device which is a structural component of the foregoing sound source separation system of the invention, the following acoustic signal acquisition device can be used.
That is, according to the invention, there is provided an acoustic signal acquisition device that acquires a target sound under a circumstance where a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from is present, comprising: two microphones respectively provided at a corresponding portion of a front face of a portable device at which an operation unit and/or a screen display unit is provided, and a corresponding portion of a rear face opposite thereto; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound, using received sound signals of the two microphones to generate at least one target sound superior signal; and a target sound inferior signal generator which performs a linear combination process for suppressing the target sound, using the received sound signals of the two microphones to generate at least one target sound inferior signal to be paired with the target sound superior signal.
Moreover, according to the invention, there is provided an acoustic signal acquisition device that acquires a target sound under a circumstance where a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from is present, comprising: two microphones provided in such a manner as to be spaced away from each other at a front face of a portable device at which an operation unit and/or a screen display unit is provided; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound, using received sound signals of the two microphones to generate at least one target sound superior signal; and a target sound inferior signal generator which performs a linear combination process for suppressing the target sound, using the received sound signals of the two microphones to generate at least one target sound inferior signal to be paired with the target sound superior signal.
Further, according to the invention, there is provided an acoustic signal acquisition device that acquires a target sound under a circumstance where a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from is present, comprising: first and second microphones respectively provided at a corresponding portion of a front face of a portable device at which an operation unit and/or a screen display unit is provided, and a corresponding portion of a rear face opposite thereto; a third microphone provided at the front face in such a manner as to be spaced away from the first microphone; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound, using received sound signals of the two first and second microphones to generate at least one target sound superior signal; and a target sound inferior signal generator which performs a linear combination process for suppressing the target sound, using the received sound signals of the two first and third microphones to generate at least one target sound inferior signal to be paired with the target sound superior signal.
Still further, according to the invention, an acoustic signal acquisition device that acquires a target sound under a circumstance where a disturbance sound coming from an arbitrary direction other than a direction in which the target sound comes from is present, comprising: a first microphone provided at a front face of a portable device at which an operation unit and/or a screen display unit is provided; second and third microphones provided at a rear face opposite to the front face where the first microphone is provided in such a manner as to be displaced from a position corresponding to that position where the first microphone is provided; a target sound superior signal generator which performs a linear combination process for emphasizing the target sound, using received sound signals of the three first, second and third microphones to generate a target sound superior signal; a first target sound inferior signal generator which performs a linear combination process for suppressing the target sound, using the received sound signals of the two first and second microphones to generate a first target sound inferior signal to be paired with the target sound superior signal; and a second target sound inferior signal generator which performs a linear combination process for suppressing the target sound, using the received sound signals of the two first and third microphones to generate a second target sound inferior signal to be paired with the target sound superior signal.
The acoustic signal acquisition device of the invention can be used as the structural component of the sound source separation system of the invention, and can be used as, for example, a sound-source-location determination device which determines a direction in which a sound source is present. In using such a device as the sound-source-location determination device, for example, respective energies (sum of powers at individual frequency bands) of the spectra of the target sound superior signal and the spectrum of the target sound inferior signal are calculated and compared, and when the energy of the spectrum of the target sound superior signal is large, it is possible to determine that a sound source is present in the set direction of the target sound, and when the energy of the spectrum of the target sound inferior signal is large, it is possible to determine that no sound source is present in the set direction of the target sound.
As explained above, according to the invention, linear combination processes of emphasizing and suppressing the target sound are performed using a few microphones to generate the target sound superior signal and the target sound inferior signal, so that directivity control appropriate for separation of the target sound and the disturbance sound is enabled. A separation process is performed using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal both generated through the directivity control performed in this manner, thus enabling precise separation of the target sound and the disturbance sound and realizing sound source separation with a few microphones, resulting in an effect such that the device can be miniaturized.
Hereinafter, embodiments of the invention will be explained with reference to the accompanying drawings.
With reference to
The two microphones 21, 22 are both non-directional or approximately non-directional microphones in the embodiment, and as shown in
The clearance between the two microphones 21, 22 may change in accordance with an opening/closing operation of the cellular phone 80, and the clearance when the cellular phone is opened may be larger than the clearance when the cellular phone is closed. For example, the one microphone 21 may be always biased outwardly by an elastic member like a spring, pressed by a front face 85 with which the screen display unit 84 is provided, retained when the cellular phone 80 is closed, and caused to protrude outwardly when the cellular phone 80 is opened.
The sound source separation system 10 can change over its mode between a normal mode that the target sound coming from the front face 82 side of the cellular phone 80 is acquired (e.g., a conversation mode that the speech of a user who holds the cellular phone 80 by hands to use is acquired), and a changeover mode that the target sound coming from the rear face 83 side is acquired (e.g., a motion picture shooting mode that a motion picture is shot by a camera provided at the rear of the screen display unit 84 of the cellular phone 80 and a speech is also acquired).
As shown in
In
As shown in
As shown in
The changeover unit 43 is a switch that changes the first target sound inferior signal for the normal mode generated by the first target sound inferior signal generator 41 and the second target sound inferior signal for the changeover mode generated by the second target sound inferior signal generator 42, as a target sound inferior signal to be subjected to the process of the separation unit 60, and specifically, the changeover unit 43 may be realized by a key constituting the operation unit 81 of the cellular phone 80, or a switch provided separately from the operation unit 81 generally provided.
The frequency analyzer 50 performs frequency analysis on the target sound superior signal on a time domain generated by the target sound superior signal generator 30, and the target sound inferior signal on a time domain generated by the target sound inferior signal generator 40 (the first target sound inferior signal in the normal mode, and the second target sound inferior signal in the changeover mode). As the frequency analysis, for example, First Fourier Transform (FFT), Generalized Harmonic Analysis (GHA) can be adopted, but from the standpoint of calculating a more accurate frequency characteristic or analyzing a more fine frequency component without the effect of a window function, the GHA is desirable. The same is true on other embodiments. If the target sound superior signal generator 30 and the target sound inferior signal generator 40 generate signals on a frequency domain, the frequency analyzer 50 may be omitted.
The separation unit 60 performs maximum level band selection (BS-MAX) or Spectral Subtraction (SS) using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal (the first target sound inferior signal in the normal mode and the second target sound inferior signal in the changeover mode), and separates the target sound and the disturbance sound from each other.
In a case where maximum level band selection is performed, individual powers at the same frequency band are compared between the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal (first target sound inferior signal in the normal mode, and second target sound inferior signal in the changeover mode) for each frequency band, and larger powers at the individual frequency bands are assigned to the spectrum of a sound to be obtained by separation.
In a case where spectral subtraction is performed, a value, obtained by multiplying the power of the spectrum of the target sound inferior signal (first target sound inferior signal in the normal mode, and second target sound inferior signal in the changeover mode) by a coefficient is subtracted for each frequency band from the power of the spectrum of the target sound superior signal at the same frequency band.
According to such a first embodiment, the sound source separation system 10 performs a separation process for the target sound and the disturbance sound as follows.
First, a user of the cellular phone 70 performs mode selection through the changeover unit 43 between the normal mode and the changeover mode in accordance with the sound source position of a target sound that the user wants to obtain. For example, when the user obtains his/her speech while seeing the screen display unit 84, the normal mode is selected.
Next, the target sound superior signal generator 30 generates a target sound superior signal (signal on a time domain) and the target sound inferior signal generator 40 generates a target sound inferior signal (signal on a time domain), using the received sound signals (signals on a time domain) of the two microphones 21, 22. Subsequently, the frequency analyzer performs frequency analysis on the obtained target sound superior signal and target sound inferior signal (first target sound inferior signal in the normal mode, and second target sound inferior signal in the changeover mode), thereby acquiring the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal.
At this time, let the received sound signal of the one microphone 21 be X1(t), and the received sound signal of the other microphone 22 be X2(t), then a difference X1(t)−X2(t) between those signals is acquired by the target sound superior signal generator 30, and this difference becomes the target sound superior signal (see,
Let the received sound signal X1(t) of the one microphone be represented by a following equation (1) and the received sound signal X2(t) of the other microphone 22 be represented by a following equation (2), then the difference X1(t)−X2(t) can be represented by a following equation (3), and a signal |F<X1(t)−X2(t)>| is represented by a following equation (4), so that directional characteristic of the target sound superior signal can be represented by solid lines in
In contrast, let the received sound signal X1(t) of the one microphone 21 undergone a delayed process be D(X1(t)), and the received sound signal of the other microphone 22 be X2(t), then a difference D(X1(t))−X2(t) between those signals is acquired by the first target sound inferior signal generator 41 in the normal mode, and the difference becomes a first target sound inferior signal (see,
Further, let the signal D(X1(t)) of the received sound signal X1(t) of the one microphone 21 undergone a delayed process be expressed by a following equation (5), and the received sound signal X2(t) of the other microphone 22 be expressed by the foregoing equation (2), then a difference D(X1(t))−X2(t) of those signals is expressed by a following equation (6), and a signal |F<D(X1(t))−X2(t)>| can be represented by a following equation (7), so that the directional characteristic of the first target sound inferior signal can be represented by dot lines in
A delay time is L/V0 (sec), and is equal or approximately equal to the sound wave propagation time of the distance L between the two microphones 21, 22. Therefore, as shown in
The same is true on the case of the changeover mode, and let the received sound signal X2(t) of the other microphone 22 undergone a delayed process be D(X2(t)), and the received sound signal of the one microphone 21 be X1(t), then a difference D(X2(t))−X1(t) is acquired by the second target sound inferior signal generator 42, and the difference becomes a second target sound inferior signal (see,
Thereafter, the separation unit 60 performs maximum level band selection (BS-MAX) or spectral subtraction (SS), using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal (first target sound inferior signal in the normal mode, and second target sound inferior signal in the changeover mode), thereby separating the target sound and the disturbance sound from each other.
With reference to
At this time, the power α1 in the frequency band f1 and the power β1 in the same frequency band f1 are compared. When α1>β1 as illustrated in the figure, the larger power α1 is selected and is assigned to the spectrum of the target sound. Note that the smaller power β1 is not used for a process, that is, not assigned to the spectrum after separation and is abandoned.
Moreover, the power α2 in the frequency band f2 and the power β2 in the same frequency band f2 are compared. When β2>α2 as illustrated in the figure, the larger power β2 is selected and assigned to the disturbance sound. Note that the smaller power α2 is not used for a process, that is, not assigned to the spectrum after separation and is abandoned.
On the other hand, in a case where the separation unit 60 performs spectral subtraction, the procedure thereof is as follows. A value, obtained by multiplying power δ of a spectrum of the target sound inferior signal (first target sound inferior signal in the normal mode, and second target sound inferior signal in the changeover mode) generated by the target sound inferior signal generator 40 and obtained through the process of the frequency analyzer 50 by a coefficient K, (K×δ) is subtracted from power γ of a spectrum of the target sound superior signal generated by the target sound superior signal generator 30 and obtained through the process of the frequency analyzer 50 for each frequency band. That is, a calculated value of γ−K×δ becomes power of a spectrum of the target sound obtained after separation in each frequency band. The coefficient K is, for example, a coefficient or the like depending on the largeness of a difference between the power γ for the target sound superior signal and the power δ for the target sound inferior signal. Note that at a frequency band where the power γ of the spectrum of the target sound superior signal becomes smaller than the value (K×δ) obtained by multiplying the power δ of the spectrum of the target sound inferior signal by the coefficient K, for example, a minimum value defined by a certain rule (may be a certain value for each frequency band, or a value proportional to power at each frequency band of the spectrum of the target sound superior signal) may be a calculated value, or the calculated value may be caused to be zero.
After the separation unit 60 separates the target sound, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed. At this time, a synthesis process of converting the target sound, which is a signal on a frequency domain obtained through the process of the separation unit 60, into a sound wave, which is a signal on a time domain, may be performed, a noise may be added, frequency analysis may be performed, and then voice recognition may be performed. Addition of a noise may be performed on a frequency domain, not on a time domain.
According to such a first embodiment, the following effectiveness can be obtained. That is, because the sound source separation system 10 has the target sound superior signal generator 30 and the target sound inferior signal generator 40, it is possible to generate the target sound superior signal and the target sound inferior signal using the received sound signals of the two microphones 21, 22. This enables directivity control appropriate for separating the target sound and the disturbance sound from each other.
Because the sound source separation system 10 has the separation unit 60, the target sound and the disturbance sound can be precisely separated, using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal both generated by performing directivity control. Therefore, in comparison with the case like the patent literature 4 where band selection is performed using a sound-source-level difference of signals between the microphones originating from the fixed positional relationships of the plurality of microphones, a separation performance is improved.
The sound source separation system 10 has two microphones to be used, and sound source separation is realized with the few microphones, resulting in miniaturization of a device.
Further, because the target sound inferior signal generator 40 has the first target sound inferior signal generator 41 and the second target sound inferior signal generator 42 and the changeover unit 43, a user can change over a mode between the normal mode and the changeover mode. Accordingly, the direction of the target sound to be obtained can be changed over without changing the positions of the two microphones 21, 22, so that a user-friendly system for a user can be realized.
Still further, because the first target sound inferior signal generator 41 and the second target sound inferior signal generator 42 perform processes of applying a delay which is equal to or approximately equal to the sound wave propagation time of the distance between the two microphones 21, 22, it is possible to create a directional characteristic that the amplitude of the target sound inferior signal becomes zero in a direction in which the target sound comes from (as shown in
With reference to
The two microphones 221, 222 are both non-directional or approximately non-directional microphones in the embodiment. As indicated by a dashed line in
The target sound superior signal generator 230 performs a process of acquiring a sum of the received sound signal of the one microphone 221 and the received sound signal of the other microphone 222 on a time domain. This process may be a digital process or an analog process, and the process is executed on a time domain in the embodiment, but may be executed on a frequency domain.
The target sound inferior signal generator 240 performs a process of acquiring a difference between the received sound signal of the one microphone 221 and the received sound signal of the other microphone 222 on a time domain. This process may be a digital process or an analog process, and the process is executed on a time domain in the embodiment, but may be executed on a frequency domain.
The frequency analyzer 250 performs frequency analysis on both target sound superior signal on a time domain generated by the target sound superior signal generator 230 and target sound inferior signal on a time domain generated by the target sound inferior signal generator 240. Like the first embodiment, First Fourier Transform (FFT) and Generalized Harmonic Analysis (GHA) can be adopted as frequency analysis. Note that in a case where the target sound superior signal generator 230 and the target sound inferior signal generator 240 generate signals on a frequency domain, the frequency analyzer 250 may be omitted.
The separation unit 260 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal, and separates the target sound and a disturbance sound from each other. The schemes of band selection and spectral subtraction are the same as those of the first embodiment, thus omitting the detailed explanations.
In the embodiment, however, the target sound superior signal generator 230 performs a process of acquiring a sum of the received sound signals of the two microphones 221, 222, the amplitude largeness relationship in each direction (angle θ) between the directional characteristic of the target sound superior signal and the directional characteristic of the target sound inferior signal changes frequency by frequency, and is not stable, so that when the separation unit 260 performs a process, the spectrum of the target sound superior signal is multiplied by a coefficient A(ω), the spectrum of the target sound inferior signal is multiplied by a coefficient B(ω), and then band selection or spectral subtraction is performed. Either A(ω) or B(ω) may be multiplied as long as the relative largeness relationship is adjusted according to a frequency.
According to such a second embodiment, the sound source separation system 200 performs a separation process for the target sound and the disturbance sound as follows.
First, the target sound superior signal generator 230 generates the target sound superior signal (signal on a time domain) and the target sound inferior signal generator 240 generates the target sound inferior signal (signal on a time domain), using the received sound signals (signals on a time domain) of the two microphones 221, 222. Next, the frequency analyzer 250 performs frequency analysis on both obtained target sound superior signal and target sound inferior signal, thereby acquiring the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal.
At this time, let the received sound signal of the one microphone 221 be X1(t), and the received sound signal of the other microphone 222 be X2(t), then the target sound superior signal generator 230 acquires the sum X1(t)+X2(t) of those signals, and this sum becomes the target sound superior signal. The directional characteristic of the target sound superior signal obtained by multiplying a signal |F<X1(t)+X2(t)>|, obtained by performing frequency analysis on the sum X1(t)+X2(t) of the signals, by the coefficient A(ω) is as shown in
On the other hand, the target sound inferior signal generator 240 acquires a difference X1(t)−X2(t) between the received sound signal X1(t) of the one microphone 221 and the received sound signal X2(t) of the other microphone 222, and this difference becomes the target sound inferior signal. The directional characteristic of the target sound inferior signal obtained by multiplying a signal |F<X1(t)−X2(t)>|, obtained by performing frequency analysis on the difference X1(t)−X2(t) between those signals, by a coefficient B(ω) is as shown in
Thereafter, the separation unit 260 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal, thereby separating the target sound and the disturbance sound.
After the target sound is separated by the separation unit 260, like the first embodiment, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a second embodiment, the following effectiveness can be obtained. That is, because the sound source separation system 200 has the target sound superior signal generator 230 and the target sound inferior signal generator 240, it is possible to generate the target sound superior signal and the target sound inferior signal using the received sound signals of the two microphones 221, 222. This enables directivity control appropriate for separation of the target sound and the disturbance sound from each other.
Because the sound source separation system 200 has the separation unit 260, it is possible to separate the target sound and the disturbance sound precisely, using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal generated by performing directivity control. Therefore, in comparison with the case like the patent literature 4 where band selection is performed using a sound-pressure-level difference of signals between microphones originating from the fixed positional relationships of the plurality of microphones, a separation performance can be improved.
Further, according to the sound source separation system 200, the number of the microphones to be used is two, and sound source separation can be realized with the few microphones, resulting in miniaturization of a device.
With reference to
The two microphones 321, 322 are both non-directional or approximately non-directional microphones in the embodiment. As indicated by a dashed line in
The target sound superior signal generator 330 comprises a first target sound superior signal generator 331 and a second target sound superior signal generator 332.
The first target sound superior signal generator 331 performs a process of acquiring a difference between the received sound signal of the one microphone 321 and the received sound signal of the other microphone 332 undergone a delayed process, and generating a first target sound superior signal on a time domain. The first target sound superior signal is a signal that emphasizes a sound including a target sound which comes from a space (left side space in
The second target sound superior signal generator 332 performs a process of acquiring a difference between the received sound signal of the other microphone 322 and the received sound signal of the one microphone 321 undergone a delayed process, and generating a second target sound superior signal on a time domain. The second target sound superior signal is a signal that emphasizes a sound including the target sound which comes from a space (right space in
The target sound inferior signal generator 340 performs a process of acquiring a difference between the received sound signal of the one microphone 321 and the received sound signal of the other microphone 322, and generating a target sound inferior signal on a time domain. This process may be a digital process or an analog process, and the process is executed on a time domain in the embodiment, but may be executed on a frequency domain.
The frequency analyzer 350 performs frequency analysis on the first and second target sound superior signals on a time domain generated by the target sound superior signal generator 330 and the target sound inferior signal on a time domain generated by the target sound inferior signal generator 340. Like the first embodiment and the second embodiment, First Fourier Transform (FFT) and Generalized Harmonic Analysis (GHA) can be adopted as frequency analysis. Note that in a case where the target sound superior signal generator 330 and the target sound inferior signal generator 340 generate signals on a frequency domain, the frequency analyzer 350 may be omitted.
The separation unit 360 comprises a first separation unit 361, a second separation unit 362, and an integration unit 363.
The first separation unit 361 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the first target sound superior signal and the spectrum of the target sound inferior signal, and separates a sound including the target sound which comes from that space (left space in
The second separation unit 362 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the second target sound superior signal and the spectrum of the target sound inferior signal, and separates a sound including the target sound which comes from that space (right space in
The integration unit 363 adds powers of the spectra for each frequency band (addition) or compares powers for each frequency band and assigns inferior powers to the spectrum of the target sound (minimization), using the spectrum of a sound including the target sound which is separated by the first separation unit 361 and comes from that space (left space in
According to such a third embodiment, the sound source separation system 300 performs a separation process for the target sound and the disturbance sound as follows.
First, the first target sound superior signal generator 331 and the second target sound superior signal generator 332 generates first and second target sound superior signals (signals on a time domain), using the received sound signals (signals on a time domain) of the two microphones 321, 322, and the target sound inferior signal generator 340 generates a target sound inferior signal (signal on a time domain). Next, the frequency analyzer 350 performs frequency analysis on the obtained first and second target sound superior signals and target sound inferior signal, thereby acquiring the spectra of the first and second target sound superior signals and the spectrum of the target sound inferior signal.
At this time, let the received sound of the one microphone 321 be X1(t), and the received sound of the other microphone 322 be X2(t), then the first target sound superior signal 331 acquires a difference X1(t)−D(X2(t)) that is a difference between the received sound signal X1(t) of the one microphone 321 and a signal D(X2(t)) which is the received sound signal X2(t) undergone a delayed process, and this difference becomes the first target sound superior signal. In illustrating a signal |F<X1(t)−D(X2(t))>| that is obtained by performing frequency analysis on the first target sound superior signal X1(t)−D(X2(t)), the directional characteristic of the first target sound superior signal as shown in
Further, the second target sound superior signal acquires a difference X2(t)−D(X1(t)) that is a difference between the received sound signal X2(t) of the other microphone 322 and a signal D(X1(t)) which is the received sound signal X1(t) of the one microphone 321 undergone a delayed process, and this difference becomes the second target sound superior signal. In illustrating a signal |F<X2(t)−D(X1(t))>| obtained by performing frequency analysis on the second target sound superior signal X2(t)−D(X1(t)), the directional characteristic of the second target sound superior signal as shown in
On the other hand, the target sound inferior signal generator 340 acquires a difference X1(t)−X2(t) between the received sound signal X1(t) of the one microphone 321 and the received sound signal X2(t) of the other microphone 322, and this difference becomes the target sound inferior signal. In illustrating a signal |F<X1(t)−X2(t)>| obtained by performing frequency analysis on the difference X1(t)−X2(t) of those signals, the directional characteristic of the target sound inferior signal as shown in
Thereafter, the first separation unit 361 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the first target sound superior signal and the spectrum of the target sound inferior signal, and performs a process of separating a sound including the target sound which comes from that space (left space in
Thereafter, the integration unit 363 performs a spectrum integration process by addition or minimization, using the spectrum of the sound including the target sound separated by the first separation unit 361 and comes from that space (left space in
After the target sound is separated by the separation unit 360, like the first and second embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a third embodiment, the following effectiveness can be obtained. That is, because the sound source separation system 300 has the target sound superior signal generator 330 and the target sound inferior signal generator 340, it is possible to generate the target sound superior signal and the target sound inferior signal using the received sound signals of the two microphones 321, 322. This enables directivity control appropriate for separation of the target sound and the disturbance sound from each other.
Because the sound source separation system 300 has the separation unit 360, the target sound and the disturbance sound can be separated precisely, using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal generated undergone directivity control. Therefore, in comparison with the case like the patent literature 4 where band selection is performed using a difference in sound pressure levels of signals between the microphones originating from the fixed positional relationships of the plural microphones, a separation performance can be improved.
Further, according to the sound source separation system 300, the number of the microphones to be used is two, and sound source separation can be realized with the few microphones, resulting in miniaturization of a device.
With reference to
The three microphones 421, 422, 423 are all non-directional or approximately non-directional microphones in the embodiment. As shown in
The target sound superior signal generator 430 performs a process of acquiring a difference between the received sound signal of the first microphone 421 and the received sound signal of the second microphone 422 on a time domain. This process may be a digital process or an analog process, and the process is executed on a time domain in the embodiment, but may be executed on a frequency domain.
The target sound inferior signal generator 440 performs a process of acquiring a difference between the received sound signal of the first microphone 421 and the received sound signal of the third microphone 423 on a time domain. This process may be a digital process or an analog process, and the process is executed on a time domain in the embodiment, but may be executed on a frequency domain.
The frequency analyzer 450 performs frequency analysis on the target sound superior signal on a time domain generated by the target sound superior signal generator 430 and the target sound inferior signal on a time domain generated by the target sound inferior signal generator 440. Like the first to third embodiment, First Fourier Transform (FFT), Generalized Harmonic Analysis (GHA) or the like can be adopted as frequency analysis. Note that in a case where the target sound superior signal generator 430 and the target sound inferior signal generator 440 generate signals on a frequency domain, the frequency analyzer 450 can be omitted.
The separation unit 460 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal, and performs a process of separating the target sound and the disturbance sound from each other. The schemes of band selection and spectral subtraction are the same as those of the first embodiment, thus omitting the detailed explanations.
According to the fourth embodiment, the sound source separation system 400 performs a separation process for the target sound and the disturbance sound as follows.
First, the target sound superior signal generator 430 generates a target sound superior signal (signal on a time domain) using the received sound signals (signals on a time domain) of the first and second microphones 421, 422, and the target sound inferior signal generator 440 generates a target sound inferior signal (signal on a time domain) using the received sound signals (signals on a time domain) of the first and third microphones 421, 423. Subsequently, the frequency analyzer 450 performs frequency analysis on both obtained target sound superior signal and target sound inferior signal, thereby acquiring the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal.
At this time, let the received sound signal of the first microphone 421 be X1(t), and the received sound signal of the second microphone 422 be X2(t), then the target sound superior signal generator 430 acquires a difference X1(t)−X2(t) between those signals, and this difference becomes the target sound superior signal. In illustrating a signal |F<X1(t)−X2(t)>| obtained by performing frequency analysis on the difference X1(t)−X2(t) between those signals, the directional characteristic of the target sound superior signal indicated by solid lines in
Let the received sound signal of the first microphone 421 be X1(t), and the received sound signal of the third microphone 423 be X3(t), then the target sound inferior signal generator 440, a difference X1(t)−X3(t) between those signals, and this difference becomes the target sound inferior signal. In illustrating a signal |F<X1(t)−X3(t)>| obtained by performing frequency analysis on the difference X1(t)−X3(t) between those signals, the directional characteristic of the target sound inferior signal indicated by dotted lines in
Thereafter, the separation unit 460 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal, thereby separating the target sound and the disturbance sound from each other.
After the target sound is separated by the separation unit 460, like the first to third embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a fourth embodiment, the following effectiveness can be obtained. That is, because the sound source separation system 400 has the target sound superior signal generator 430 and the target sound inferior signal generator 440, it is possible to generate the target sound superior signal and the target sound inferior signal using the received sound signals of the three microphones 421, 422, and 423. This enables directivity control appropriate for separation of the target sound and the disturbance sound from each other.
Because the sound source separation system 400 has the separation unit 460, the target sound and the disturbance sound are separated precisely using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal generated undergone directivity control. Therefore, in comparison with the case like the patent literature 4 where band selection is performed using a difference in sound pressure levels of signals between microphones originating from the fixed positional relationship of the plural microphones, a separation performance can be improved.
Further, according to the sound source separation system 400, the number of microphones to be used is three, and sound source separation is realized with the few microphones, resulting in miniaturization of a device.
With reference to
The first to fourth microphones 521 to 524 are all non-directional or approximately non-directional microphones in the embodiment. The first and second microphones 521, 522 are disposed side by side in the direction in which the target sound comes from or in an approximately same direction as that direction, and this direction is set as the first direction in the embodiment. The third and fourth microphones 523, 524 are disposed side by side in a direction orthogonal to or approximately orthogonal to the direction in which the target sound comes from, and this direction is set as the second direction in the embodiment. In a case where those four microphones 521 to 524 are provided on a cellular phone that is a portable device, for example, the first microphone 521 is provided at a front face side, the second microphone 522 is provided at a rear face side, and the third and fourth microphones are provided at right and left side portions. In a case where the cellular phone is used in a folded state, as shown in
According to the fifth embodiment, the function of the first microphone 421 in the fourth embodiment (see,
According to the embodiment, the four microphones 521 to 524 are disposed in such a way that a line connecting the first microphone 521 and the second microphone 522 (not including an extended portion) and a line connecting the third microphone 523 and the fourth microphone 524 (not including an extended portion) intersect with each other, i.e., form a cross, but may not intersect with each other, and in a word, those microphones may be disposed in such a manner as to form the first direction and the second direction intersecting (intersecting at a right angle or approximately right angle in the embodiment) with each other.
The target sound superior signal generator 530 performs a process of acquiring a difference between the received sound signal of the first microphone 521 and the received sound signal of the second microphone 522 on a time domain. This process may be a digital process or an analog process, and the process is executed on a time domain in the embodiment, but may be executed on a frequency domain.
The target sound inferior signal generator 540 performs a process of acquiring a difference between the received sound signal of the third microphone 523 and the received sound signal of the fourth microphone 524 on a time domain. This process may be a digital process or an analog process, and the process is executed on a time domain in the embodiment, but may be executed on a frequency domain.
The frequency analyzer 550 performs frequency analysis on the target sound superior signal on a time domain generated by the target sound superior signal generator 530 and the target sound inferior signal on a time domain generated by the target sound inferior signal generator 540. Like the first to fourth embodiments, First Fourier Transform (FFT), Generalized Harmonic Analysis (GHA) or the like can be adopted as frequency analysis. Note that in a case where the target sound superior signal generator 530 and the target sound inferior signal generator 540 generate signals on a frequency domain, the frequency analyzer 550 may be omitted.
The separation unit 560 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal, thereby separating the target sound and the disturbance sound from each other. The schemes of band selection and spectral subtraction are the same as those of the first embodiment, thus omitting the detailed explanations.
According to such a fifth embodiment, the sound source separation system 500 performs a separation process for the target sound and the disturbance sound as follows.
First, the target sound superior signal generator 530 generates a target sound superior signal (signal on a time domain) using the received sound signals (signals on a time domain) of the first and second microphones 521, 522, and the target sound inferior signal generator 540 generates a target sound inferior signal (signal on a time domain) using the received sound signals (signals on a time domain) of the third and fourth microphones 523, 524. Subsequently, the frequency analyzer 550 performs frequency analysis on the obtained target sound superior signal and target sound inferior signal, thereby acquiring the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal.
At this time, let the received sound signal of the first microphone 521 be X1(t), and the received sound signal of the second microphone 522 be X2(t), then the target sound superior signal generator 530 acquires a difference X1(t)−X2(t) between those signals, and this difference becomes the target sound superior signal. In illustrating a signal |F<X1(t)−X2(t)>| obtained by performing frequency analysis on the difference X1(t)−X2(t) between those signals, the directional characteristic of the target sound superior signal indicated by solid lines in
On the other hand, let the received sound signal of the third microphone 523 be X3(t), and the received sound signal of the fourth microphone 524 be X4(t), then the target sound inferior signal generator 540 acquires a difference X3(t)−X4(t), and this difference becomes the target sound inferior signal. In illustrating a signal |F<X3(t)−X4(t)>| obtained by performing frequency analysis on the difference X3(t)−X4(t) between those signals, the directional characteristic of the target sound inferior signal indicated by dotted lines in
Thereafter, the separation unit 560 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal, thereby separating the target sound and the disturbance sound from each other.
After the target sound is separated by the separation unit 560, like the first to fourth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a fifth embodiment, the following effectiveness can be obtained. That is, because the sound source separation system 500 has the target sound superior signal generator 530 and the target sound inferior signal generator 540, it is possible to generate the target sound superior signal and the target sound inferior signal using the received sound signals of the four microphones 521 to 524. This enables directivity control appropriate for separation of the target sound and the disturbance sound from each other.
Because the sound source separation system 500 has the separation unit 560, the target sound and the disturbance sound can be separated precisely using the spectrum of the target sound superior signal and the spectrum of the target sound inferior signal generated undergone a delayed process. Accordingly, in comparison with the case like the patent literature 4 where band selection is performed using a difference of sound pressure levels of signals between microphones originating from the fixed positional relationships between the plural microphones, a separation performance can be improved.
The number of the microphones used is four according to the sound source separation system 500, and sound source separation can be realized with the few microphones, resulting in miniaturization of a device.
With reference to
All of the first to fourth microphones 621 to 624 are non-directional or approximate non-directional microphones in the embodiment. The first and second microphones 621, 622 are disposed side by side in a target sound coming direction or in the direction approximate to the same, while the third microphone 623 is disposed on one side (left side in
The target sound signal superior generator 630 performs a process of acquiring a difference between the received signals of the first and second microphones 621, 622. This process may be a digital process or an analog process, and the process is executed on a time domain in the embodiment, but may be executed on a frequency domain.
The target sound inferior signal generator 640 comprises a first target sound inferior signal generator 641 and a second target sound inferior signal generator 642.
The first target sound inferior signal generator 641 performs a process of acquiring a difference between the received sound signals of the first and third microphones 621, 623 on a time domain and generating a first target sound inferior signal. The first target sound inferior signal is a signal that suppresses a sound coming from a space at one side of the target sound coming direction, i.e., the space (left space in
The second target sound inferior signal generator 642 performs a process of acquiring a difference between the received signals of the first and fourth microphones 621, 624 on a time domain and generating a second target sound inferior signal. The second target sound inferior signal is a signal that suppresses a sound coming from the other side of the target sound signal coming direction, i.e., from a space where the fourth microphone 624 is provided (right space in
The frequency analyzer 650 performs frequency analyses on the target sound superior signal on a time domain generated by the target sound superior signal generator 630 and the first and second target sound inferior signals on a time domain generated by the target sound inferior signal generator 640. Like the first to fifth embodiments, Fast Fourier Transform (FFT), Generalized Harmonic Analysis (GHA) or the like can be adopted as frequency analyses. Note that in a case where signals on a frequency domain are generated by the target sound superior signal generator 630 and the target sound inferior signal generator 640, the frequency analyzer 650 can be omitted.
The separation unit 660 comprises a first separation unit 661, a second separation unit 662, and an integration unit 663.
The first separation unit 661 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) by using the target sound superior signal spectrum and the first target sound inferior signal spectrum to perform a separation process for the sound including the target sound coming from the one side, i.e., from the space (left space in
The second separation unit 662 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) by using the target sound superior signal spectrum and the second target sound inferior signal spectrum to perform a separation process for a sound including the target sound coming from the other side, i.e., from the space (right space in
Using the spectrum of the sound including the target sound separated by the first separation unit 661 and coming from the one side, i.e., from the space (left space in
According to such a sixth embodiment, the sound source separation system 600 performs separation process for the target sound and the disturbance sound.
First, using the received sound signals (signals on a time domain) of the first and second microphones 621, 622, the target sound superior signal (a signal on a time domain) is generated by the target sound superior signal generator 630, while using the received sound signals (signals on a time domain) of the first, third and fourth microphones 621, 623, 624, the first and second target sound inferior signals (signals on a time domain) are generated by the target sound inferior signal generator 640. Subsequently, the frequency analyzer 650 performs frequency analyses on the obtained target sound superior signal and first and second target sound inferior signals, thereby acquiring the spectrum of the target sound superior signal and the spectra of the first and second target sound inferior signals.
Let the received signal of the first microphone 621 be X1(t), and the received sound signal of the second microphone 622 be X2(t), then a difference X1(t)−X2(t) between these signals is acquired by the target sound superior signal generator 630, and the difference becomes the target sound superior signal. In illustrating a signal |F<X1(t)−X2(t)>| obtained by performing frequency analysis on the difference X1(t)−X2(t) between these signals, the directional characteristics of the target sound superior signal indicated by solid lines in
On the other hand, let the received signal of the first microphone 621 be X1(t), and the received sound signal of the third microphone 623 be X3(t), then a difference X1(t)−X3(t) between these signals is acquired by the first target sound inferior signal generator 641, and the difference becomes the first target sound inferior signal. In illustrating a signal |F<X1(t)−X3(t)>| obtained by performing frequency analysis on the difference X1(t)−X3(t) between these signals, the directional characteristics of the target sound inferior signal indicated by dotted lines in
Further, let the received signal of the first microphone 621 be X1(t), and the received signal of the fourth microphone 624 be X4(t), then a difference X1(t)−X4(t) between these signals is acquired by the second target sound inferior signal generator 642, and the difference becomes the second target sound inferior signal. In illustrating a signal |F<X1(t)−X4(t)>| obtained by performing frequency analysis on the difference X1(t)−X4(t) between these signals, the directional characteristics of the second target sound inferior signal indicated by dashed lines in
Thereafter, the first separation unit 661 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) by using the target sound superior signal spectrum and the first target sound inferior signal spectrum, and performs a process of separating the sound including the target sound coming from the one side, i.e., from the space (left space in
Using the spectrum of the sound including the target sound separated by the first separation unit 661 and coming from the one side, i.e., from the space (left space in
After the separation unit 660 has separated the target sound, like the first to fifth embodiments, voice recognition can be performed, using an acoustic model obtained by performing an adaptation process or a learning process beforehand.
According to such a sixth embodiment, the following effectiveness can be obtained. Namely, because the sound source separation system 600 has the target sound superior signal generator 630 and the target sound inferior signal generator 640, the target sound superior signal and the first and second target sound inferior signals can be generated using the received sound signals of four microphones 621 to 624. This enables directivity control appropriate for separating the target sound and the disturbance sound.
Further, because the sound source separation system 600 has the separation unit 660, the target sound and the disturbance sound can be separated precisely, using the spectrum of the target sound superior signal and the spectra of the first and second target sound inferior signals generated undergone directivity control. Consequently, a separation performance can be improved as compared to the case like the patent literature 4 where band selection is performed by using a difference of sound pressure levels of signals between microphones originating from a relationship between fixed positions of a plurality of microphones.
Furthermore, the number of the microphones used in the sound source separation system 600 is four, and sound source separation is realized with the few microphones, resulting in miniaturization of a device.
With reference to
All of the first to third microphones 721 to 723 are non-directional or approximately non-directional microphones in the embodiment. The first and second microphones 721, 722 are disposed side by side in an inclined direction (a diagonally-right-up direction in
The target sound superior signal generator 730 performs a process of acquiring a difference between the received sound signal of the first microphones 721 and a value, obtained by multiplying the sum of the received sound signals of the second and third microphones 722, 723 by a proportional coefficient k, on a time domain. This process may be a digital process or an analog process. The process is executed on a time domain in the embodiment, but may be executed on a frequency domain. In addition, in a case where the three microphones 721, 722, 723 are disposed at vertices of a triangle not an isosceles triangle, in acquiring a difference between the received sound signal of the first microphone 721 and that value, a sum of a value obtained by multiplying the received sound signal of the second microphone 722 by a proportional coefficient k1, and a value obtained by multiplying the received sound signal of the third microphone 723 by a proportional coefficient k2 is used instead of a value obtained by multiplying the sum of the received sound signals of the second and third microphones 722, 723 by the proportional coefficient k.
The target sound inferior signal generator 740 comprises a first target sound inferior signal generator 741 and a second target sound inferior signal generator 742.
The first target sound inferior signal generator 741 performs a process of acquiring a difference between the received sound signals of the first and second microphones 721, 722 on a time domain, and of generating a first target sound inferior signal. The first target sound inferior signal is a signal that suppresses a sound coming from one side of the target sound coming direction, i.e., from the space (left space in
The second target sound inferior signal generator 742 performs a process of acquiring a difference between the received sound signals of the first and third microphones 721, 723 on a time domain and of generating a second target sound inferior signal. The second target sound inferior signal is a signal that suppresses a sound coming from the other side of the target sound signal coming direction, i.e., from the space (right space in
The frequency analyzer 750 performs frequency analysis on the target sound superior signal on a time domain generated by the target sound superior signal generator 730 and first and second target sound inferior signals on a time domain generated by the target sound inferior signal generator 740. Like the first to sixth embodiments, Fast Fourier Transform (FFT), Generalization Harmonic Analysis (GHA) or the like can be adopted as frequency analyses. When signals on a frequency domain are generated by the target sound superior signal generator 730 and the target sound inferior signal generator 740, the frequency analyzer 750 can be omitted.
The separation unit 760 comprises a first separation unit 761, a second separation unit 762, and an integration unit 763.
The first separation unit 761 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) by using the target sound superior signal spectrum and a first target sound inferior signal spectrum and performs a process of separating the sound including the target sound coming from one side, i.e., from the space (left space in
The second separation unit 762 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) by using the spectra of the target sound superior signal and second target sound inferior signal, and performs a process of separating the sound including the target sound coming from the other side, i.e., from the space (right space in
Using the spectrum of the sound including the target sound separated by the first separation unit 761 and coming from one side, i.e., from the space (left space in
According to the seventh embodiment, the sound source separation system 700 separates the target sound and a disturbance sound in the following manner.
First, using the received sound signals (signals on a time domain) of the first, second and third microphones 721, 722, 733, the target sound superior signal (signal on a time domain) is generated by the target sound superior signal generator 730, while using the received sound signals (signals on a time domain) of the first, second and third microphones 721, 722, 733, the first and second target sound inferior signals (signals on a time domain) are generated by the target sound inferior signal generator 740. Subsequently, the frequency analyzer 650 performs frequency analysis on the obtained target sound superior signal and first and second target sound inferior signals, thereby acquiring the target sound superior signal spectrum and the first and second target sound inferior signal spectra.
At this time, let the received sound signals of the first, second and third microphones 721, 722, 723 be X1(t), X2(t), X3(t), respectively, then X1(t)−k(X2(t)+X3(t)) is acquired using these signals by the target sound superior signal generator 730, and this becomes the target sound superior signal. In illustrating a signal |F<X1(t)−k(X2(t)+X3(t))>| obtained by performing frequency analysis on the target sound superior signal X1(t)−k(X2(t)+X3(t)), the directional characteristics of the target sound superior signal indicated by solid lines in
Let the received sound signals of the first and second microphones 721, 722 be X1(t), X2(t), respectively, then a difference X1(t)−X2(t) between these signals is acquired by the first target sound inferior signal generator 741, and the difference becomes the target sound inferior signal. In illustrating a signal |F<X1(t)−X2(t)>| obtained by performing frequency analysis on the difference X1(t)−X2(t) between these signals, the directional characteristics of the first target sound inferior signal indicated by dotted lines in
Further, let the received sound signals of the first and third microphones 721, 723 be X1(t), X3(t), respectively, then a signal difference X1(t)−X3(t) between these signals is acquired by the second target sound inferior signal generator 742, and the difference becomes the second target sound inferior signal. In illustrating a signal |F<X1(t)−X3(t)>| obtained by performing frequency analysis on the signal difference X1(t)−X3(t) between these signals, directional characteristics of the second target sound inferior signal indicated by dashed lines in
Thereafter, the first separation unit 761 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) by using the target sound superior signal spectrum and the first target sound signal spectrum, and performs a process of separating the sound including the target sound coming from one side, i.e., from the space (left space in
Then, using a spectrum of the sound including the target sound separated by the first separation unit 761 and coming from one side, i.e., from the space (left space in
After the separation unit 760 has separated the target sound, like the first to sixth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process can be performed.
According to such a seventh embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 700 has the target sound superior signal generator 730 and the target sound inferior signal generator 740, the target sound superior signal and the first and second target sound inferior signals can be generated using the received sound signals of the three microphones 721 to 723. This enables directivity control appropriate for separation of the target sound and the disturbance sound.
Further, because the sound source separation system 700 has the separation unit 760, the target sound and the disturbance sound can be separated precisely using the target sound superior signal spectrum and the first and second target sound inferior signal spectra, which are generated undergone directivity control. Consequently, a separation function can be improved as compared to the case like the patent literature 4 where band selection is performed using a difference of sound pressure levels of signals between microphones originating from the fixed positional relationship of a plurality of microphones.
Furthermore, the number of the microphones used in the sound source separation system 700 is three, and sound source separation is realized with the few microphones, resulting in miniaturization of a device.
With reference to
The sound source separation system 1000 also comprises a first sensitive region formation signal generator 1001 which generates a first sensitive region formation signal spectrum for forming, by using received sound signals of the two first and second microphones 1021, 1022, a first sensitive region along a surface C1 (see,
The first sensitive region formation signal generator 1001 performs the same processes as that of the sound source separation system 300 (see,
The second sensitive region formation signal generator 1002 performs the same processes as those of the sound source separation system 300 (see
Using the first sensitive region formation signal spectrum S1 generated by the first sensitive region formation signal generator 1001 and the second sensitive region formation signal spectrum S2 generated by the second sensitive region formation signal generator 1002, the sensitive region integration unit 1003 compares powers for each frequency band and performs a spectrum integration process (minimization) of assigning the inferior powers to a spectrum S3 of the target sound. Specifically, as shown in
According to such an eighth embodiment, the sound source separation system 1000 performs a process of separating the target sound and a disturbance sound in the following manner.
First, using the received sound signals (signals on a time domain) of the two first and second microphones 1021, 1022, the first and second target signal superior signals (signals on a time domain) are generated by the first and second target sound superior signal generator 331, 332 of the first sensitive region formation signal generator 1001, and the target sound inferior signal (signal on a rime domain) is generated by the target sound inferior signal generator 340 of the first sensitive region formation signal generator 1001. Subsequently, the frequency analyzer 350 performs frequency analysis on the obtained first and second target sound superior signals and target sound inferior signal, to acquire first and second target sound superior signal spectra and a target sound inferior signal spectrum.
On this occasion, let the received signals of the first and second microphones 1021, 1022 be X1(t), X2(t), respectively, then a difference X1(t)−D(X2(t)) between the received sound signal X1(t) of the first microphone 1021 and a signal D(X2(t)) generated by performing a delayed process on the received sound signal X2(t) of the second microphone 1022 is acquired by the first target sound superior signal generator 331, and this difference becomes the first target sound superior signal. In illustrating a signal |F<X1(t)−D(X2(t))>| obtained by performing frequency analysis on the first target signal superior signal X1(t)−D(X2(t)), the directional characteristic of the first target sound superior signal indicated by a solid (heavy) line in
Further, a difference X2(t)−D(X1(t)) between the received sound signal X2(t) of the second microphone 1022 and a signal D(X1(t)) generated by performing a delayed process on the received sound signal X1(t) of the first microphone 1021 is acquired by the second target sound superior signal generator 332, and this difference becomes a second target sound superior signal. In illustrating a signal |F<X2−D(X1(t))>| obtained by performing frequency analysis on the second target sound superior signal X2(t)−D(X1(t)), the directional characteristic of the second target sound superior signal indicated by a dashed (heavy) line in
A difference X1(t)−X2(t) between the received signals X1(t), X2(t) of the first and second microphones 1021, 1022 is acquired by the target sound inferior signal generator 340, and this difference becomes the target sound inferior signal. In illustrating a signal |F<X1(t)−X2(t)>| obtained by performing frequency analysis on the difference X1(t)−X2(t) between these signals, likewise the case shown in
Thereafter, the first separation unit 761 of the first sensitive region formation signal generator 1001 performs maximum level band selection (BS-MAX) or spectral subtraction (SS) by using the spectra of the first target sound superior signal and target sound superior signal, and performs a separation process for a sound including the target sound coming from the space (left space in
Then, using the spectra of the sound including the target sound separated by the first separation unit 361 and coming from the space (left space in
In parallel with the foregoing process by the first sensitive region formation signal generator 1001, a process by the second sensitive region formation signal generator 1002 is performed by the same procedure as that of the first sensitive region formation signal generator 1001 to generate a spectrum S2 of the second sensitive region formation signal. At this time, the directional characteristic of each signal generated by the second sensitive region formation signal generator 1002 becomes one shown in
Thereafter, using the first sensitive region formation signal spectrum S1 generated by the first sensitive region formation signal generator 1001 and the second sensitive region formation signal spectrum S2 generated by the second sensitive region formation signal generator 1002, the sensitive region integration unit 1003 compares powers for each frequency band, and performs the spectrum integration process (minimization) of assigning the inferior power to the spectrum S3 of the target sound. At this time, in performing the spectrum integration process through minimization, at the common part (intersecting part) of the first sensitive region formed along the plane C1 of the center of the first sensitive region and second sensitive region formed along the plane C2 of the center of the second sensitive region, a sensitive region subsequent to spectrum integration is formed. Namely, as shown in
After the sensitive region integration unit 1003 has separated the target sound, like the first to seventh embodiments, voice recognition can be performed using an acoustic model obtained by performing an adaptation process or a learning process beforehand.
According to such an eighth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1000 has the first sensitive region formation signal generator 1001, the second sensitive region formation signal generator 1002 and the sensitive region integration unit 1003, the sensitive region can be formed by performing directivity control appropriate for separation of the target sound and the disturbance sound using the received sound signals of the three microphones 1021, 1022, 1023. This results in precise separation of the target sound and the disturbance sound.
Furthermore, the number of the microphones to be used in the sound source separation system 1000 is three, and sound source separation is realized with the few microphones, resulting in miniaturization of a device.
With reference to
The sound source separation system 1100 also comprises a first sensitive region formation signal generator 1101 which generates a first sensitive region formation signal spectrum forming, by using received sound signals of the two first and second microphones 1121, 1122, a first sensitive region along a surface C1 (the same as in
Like the first sensitive region formation signal generator 1001 in the eighth embodiment, the first sensitive region formation signal generator 1101 performs the same processes as those of the sound source separation system 300 (see,
Although the second sensitive region formation signal generator 1102 has approximately the same configuration as that of the second sensitive region formation signal generator 1002 in the eighth embodiment, the second sensitive region formation signal generator 1102 has a partially different configuration. Namely, the separation unit 360A of the second sensitive region formation signal generator 1002 in the eighth embodiment has the integration unit 363A which performs the spectrum integration process, but the separation unit 360B of the second sensitive region formation signal generator 1102 in the embodiment has a sensitive region limitation unit 1104, instead of the integration unit 363A. The other configurations are the same as those of the second sensitive region formation signal generator 1002 in the eighth embodiment, the same processes other than the spectrum integration process as those of the sound source separation system 300 (see,
The sensitive region limitation unit 1104 performs the sensitive region limitation process of limiting the second sensitive region to either of a region on a second microphone 1122 side or a region on a third microphone 1123 side. Namely, the sensitive region limitation unit 1104 limits the second sensitive region to either one of the regions with the surface C2 (see
More specifically, when limiting the second sensitive region to the second microphone 1122 side, the sensitive region limitation unit 1104 performs the following process. Namely, the sensitive region limitation unit 1104 compares powers at the same frequency band for each frequency band between a spectrum SA of a sound on one side (third microphone 1123 side) including the target sound separated by the first separation unit 361B of the second sensitive region formation signal generator 1102 and a spectrum SB of a sound on the other side (second microphone 1122 side) including the target sound separated by the second separation unit 362B of the second sensitive region formation signal generator 1002. With respect to a frequency band where power of the spectrum SA of the sound on one side (the third microphone 1123 side) including the target sound separated by the first separation unit 361B is smaller than power of the spectrum SB of the sound on the other side (the third microphone 1122 side) including the target sound separated by the second separation unit 362B, the sensitive region limitation unit 1104 performs minimum level band selection (BS-MIN) of assigning the smaller power to the spectrum SA, and causes the obtained spectrum (part of the spectrum SA before the process) to serve as the spectrum S2 of the second sensitive region formation signal.
As shown in, for example,
In focusing the spectrum SA of the sound on the one side (third microphone 1123 side) including the target sound separated by the first separation unit 361B, performing minimum level band selection (BS-MIN), and causing the spectrum (part of the spectrum SA before the processing) thus obtained to serve as the spectrum S2 of the second sensitive region formation signal, a sound in the part H in
On the contrary, when limiting the second sensitive region to a region where the third microphone 1123 is provided, the sensitive region limitation unit 1104 performs the following processes. Namely, between the spectrum SA of the sound on one side (third microphone 1123 side) including the target sound separated by the first separation unit 361B of the second sensitive region formation signal generator 1102 and the spectrum SB of the sound on the other side (second microphone 1122 side) including the target sound separated by the second separation unit 362B of the second sensitive region formation signal generator 1102, powers at the same frequency band are compared with each other for each frequency band, and with respect to the frequency band where the power of the spectrum SB of the sound on the other side (the second microphone 1122 side) including the target sound separated by the second separation unit 362B is smaller than that of the spectrum SA of the sound on the one side (the third microphone 1123 side) including the target sound separated by the first separation unit 361B, minimum level band selection (BS-MIN) of assigning the smaller power to the spectrum SB is performed, thus causing the obtained spectrum (part of the spectrum SB before processing) to serve as the spectrum S2 of the second sensitive region forming signal.
As shown in
In a case where the spectrum SB of the sound on the other side (second microphone 1122 side) including the target sound separated by the second separation unit 362B is focused, minimum level band selection (BS-MIN) is performed, and the obtained spectrum (part of the spectrum SB before processing) is caused to serve as the spectrum S2 of the second sensitive region formation signal, a sound in the G parts in
Further, the sensitive region limitation unit 1104 may be capable of changing over limitation of the second sensitive region to either of the region on the second microphone 1122 side and the region on the third microphone 1123 side. For example, as shown in
Like the case of the sensitive region integration unit 1003 in the eighth embodiment, using the first sensitive region formation signal spectrum S1 generated by the first sensitive region formation signal generator 1101 and the second sensitive region formation signal spectrum S2 generated by the second sensitive region formation signal generator 1102, the sensitive region integration unit 1103 compares powers for each frequency band, and performs a spectrum integration process (minimization) of assigning the inferior power to the spectrum S3 of the target sound (see,
According to such a ninth embodiment, the target sound separation system 1100 performs the separation process of the target sound and a disturbance sound in the following manner.
First, the first sensitive region formation signal generator 1101 generates the spectrum S1 of the first sensitive region formation signal. In parallel with this, the second sensitive region formation signal generator 1102 generates the spectrum S2 of the second sensitive region formation signal. At this time, the second sensitive region is limited to the region on the second microphone 1122 side or to the region on the third microphone 1123 side by the sensitive region formation signal generator 1104.
Thereafter, using the first sensitive region formation signal spectrum S1 generated by the first sensitive region formation signal generator 1101 and the second sensitive region formation signal spectrum S2 generated by the second sensitive region integration unit 1102, the sensitive region integration unit 1103 compares powers for each frequency band, and performs the spectrum integration process (minimization) of assigning the inferior power to the spectrum S3 of the target sound. As a result, for example, when the second sensitive region has been limited to a region of the second microphone 1122 side by the sensitive region limitation unit 1104, in the common part (the intersecting part) of the first sensitive region formed along the plane C1 (see
After the sensitive region integration unit 1103 has separated the target sound, like the first to eighth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to the ninth embodiment described above, the following effectiveness can be achieved. Namely, because the sound source separation system 1100 has the first sensitive region formation signal generator 1101, the second sensitive region formation signal generator 1102 and the sensitive region integration unit 1103, a sensitive region can be formed by performing directivity control appropriate for separation of the target sound and the disturbance sound, using the received sound signals of the three microphones 1121, 1122, 1123. This results in precise separation of the target sound and the disturbance sound.
Further, the number of the microphones used in the sound source separation system 1100 is three, and sound source separation is realized with the few microphones, resulting in miniaturization of a device.
With reference to
The sound source separation system 1200 further comprises a first sensitive region formation signal generator 1201 which generates a first sensitive region formation signal spectrum for forming, by using received sound signals of the two first and second microphones 1221, 1222, a first sensitive region along a plain C1 (see
Like the first sensitive region formation signal generator 1001 in the eighth embodiment, the first sensitive region formation signal generator 1201 performs the same processes as those of the sound source separation system 300 (see
The second sensitive region formation signal generator 1202 employs the same structure as that of the second sensitive region formation signal generator 1102 (see,
The sensitive region limitation unit 1205 has the same structure as that of the sensitive region limitation unit 1104 in the ninth embodiment, and performs a sensitive region limitation process of limiting the second sensitive region to any one of a region on the second microphone 1222 side and region on the third microphone 1223 side by performing minimum level band selection (BS-MIN). Namely, the sensitive region limitation unit 1205 limits the second sensitive region to either one of the regions with the plane C2 (see
The third sensitive region formation signal generator 1203 has the same structure as that of the second sensitive region formation signal generator 1102 (see,
Like the sensitive region limitation unit 1205, the sensitive region limitation unit 1206 has the same structure as that of the sensitive region limitation unit 1104 in the ninth embodiment, and performs the sensitive region limiting process of limiting the third sensitive region to either one of a region on the first microphone 1221 side and region on the third microphone 1223 side by performing minimum level band selection (BS-MIN). Namely, the sensitive region limitation unit 1206 limits the third sensitive region to either one of the regions with the plane C3 (see
Like the sensitive region limitation unit 1104 in the ninth embodiment, the sensitive region limitation units 1205, 1206 may be capable of changing limitation of the second sensitive region to either one of the regions on the second microphone 1222 side and on the third microphone 1223 side or may capable of changing limitation of the third sensitive region to either one of the regions on the first microphone 1221 side and on the third microphone 1223 side. Such structures enables mode change between the conversation mode and the motion picture shooting mode like the ninth embodiment.
Instead of the sensitive region limitation units 1205, 1206, like the eighth embodiment (see,
Like the sensitive region integration unit 1003 (see,
According to such a tenth embodiment, the sound source separation system 1200 performs the separation process of the target sound and a disturbance sound in the following manner.
First, the first sensitive region formation signal generator 1201 generates the spectrum S1 of the first sensitive region formation signal. In parallel with this, the second sensitive region formation signal generator 1202 generates the spectrum S2 of the second sensitive region formation signal. Further, at the same time, the third sensitive region formation signal generator 1203 generates the spectrum S3 of the third sensitive region formation signal. At this time, by the sensitive region formation signal generators 1205, 1206, the second and third sensitive regions are limited to the region at the second microphone 1222 side or the region at the third microphone 1223 side and are limited to the region at the first microphone 1221 side or the region at the third microphone 1223 side.
Subsequently, using the first sensitive region formation signal spectrum S1 generated by the first sensitive region formation signal generator 1201 and the second sensitive region formation signal spectrum S2 generated by the second sensitive region formation signal generator 1202, and the third sensitive region formation signal spectrum S3 generated by the third sensitive region formation signal generator 1203, the sensitive region integration unit 1204 performs the spectrum integration process (minimization) of comparing powers for each frequency band, and assigning the inferior power to the spectrum S4 of the target sound. As a result, for example, when the second sensitive region has been limited to the region on the second microphone 1222 side and the third sensitive region has been limited to the region on the first microphone 1223 side by the sensitive region limitation unit 1205, at the common part (intersecting part) of the first sensitive region formed along the plane C1 (see
After the sensitive region integration unit 1204 has separated the target sound, like the first to ninth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a tenth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1200 has the first sensitive region formation signal generator 1201, the second sensitive region formation signal generator 1202, the third sensitive region formation signal generator 1203, and the sensitive region integration unit 1204, the sensitive region can be formed by performing directivity control appropriate for separation of the target sound and the disturbance sound, using the received sound signals of the three microphones 1221, 1222, 1223. This results in precise separation of the target sound and the disturbance sound.
Further, the number of the microphones used in the sound source separation system 1200 is three, and sound source separation can be realized with the few microphones, resulting in miniaturization of a device.
With reference to
The sound source separation system 1300 further comprises an orthogonal-disturbance-sound-suppressing-signal generator 1301 that generates an orthogonal-disturbance-sound suppressing signal for suppressing an orthogonal-disturbance signal coming from in a direction orthogonal to the target sound coming direction, using received sound signals of the two first and second microphones 1321, 1322, an opposite-disturbance-sound-suppressing-control-signal generator 1302 that generates a control signal for suppressing the opposite disturbance sound coming from a direction opposite to the target sound coming direction, using received sound signals of the two second and third microphones 1322, 1323, and an opposite-disturbance-sound-suppressing unit 1303 that suppresses an opposite disturbance sound spectrum included in an orthogonal-disturbance-sound-suppressing-signal spectrum, using a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator 1301 and a spectrum of a control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator 1302.
Using the received sound signals of the two first and second microphones 1321, 1322, the orthogonal-disturbance-sound-suppressing-signal generator 1301 performs the same processes as those of the sound source separation system 300 (see,
The opposite-disturbance-sound-suppressing-control-signal generator 1302 has a control target-sound-superior-signal generator 1304 that generates a control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the third microphone 1323 and the received sound signal (on a time domain) of the second microphone 1322, and a frequency analyzer 1305 that performs frequency analysis on a control target-sound-superior signal, on a time domain, generated by the control target-sound-superior-signal generator 1304.
The control target-sound-superior signal generated by the control target-sound-superior-signal generator 1304 has the directional characteristic of a cardioid (heart-shaped curved line) that expands largely in the target sound coming direction and becomes narrow in an opposite disturbance sound coming direction, as shown by a chain double-dashed line in
In order to suppress the opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, the opposite-disturbance-sound-suppressing unit 1303 compares powers at the same frequency band between the orthogonal-disturbance-sound-suppressing signal spectrum S1 generated by the orthogonal-disturbance-sound-suppressing-signal generator 1301 and the control-target-sound-superior-signal spectrum S2 generated by the opposite-disturbance-sound-suppressing-control-signal generator 1302, for each frequency band. With respect to a frequency band where power of the orthogonal-disturbance sound suppressing signal spectrum S1 is smaller than power of the control signal spectrum S2, the opposite-disturbance-sound suppressing unit 1303 performs minimum level band selection (BS-MIN), and causes the obtained spectrum (part of the spectrum S1 before processing) to serve as a separated target sound spectrum S3. At this time, with respect to a frequency band where the power of the spectrum S1 is larger than the power of the control signal spectrum S2, the power of the spectrum S1 is caused to be zero. The spectrum S2 is used only as the control signal and therefore is not utilized and abandoned.
According to the eleventh embodiment, the sound source separation system 1300 performs the separation process for the target sound and a disturbance sound in the following manner.
First, the orthogonal-disturbance-sound-suppressing-signal generator 1301 generates the orthogonal-disturbance-sound-suppressing-signal spectrum S1. In parallel with this, the opposite-disturbance-sound-suppressing-control-signal generator 1302 generates the control-target-sound-superior-signal spectrum S2.
Subsequently, the opposite-disturbance-sound-suppressing unit 1303 performs minimum level band selection (BS-MIN), using the control-target-sound-superior-signal spectrum S2 to suppress the opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, thus obtaining the separated target sound spectrum S3.
After the opposite-disturbance-sound-suppressing unit 1303 has separated the target sound, like the first to tenth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such an eleventh embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1300 has the orthogonal-disturbance-sound-suppressing-signal generator 1301, the opposite-disturbance-sound-suppressing-control-signal generator 1302, and the opposite-disturbance-sound suppressing unit 1303, the target sound and the disturbance sound can be separated precisely by performing directivity control appropriate for separation of the target sound and the disturbance sound, using the received sound signals of the three microphones 1321, 1322, 1323.
Further, the number of the microphones used in the sound source separation system 1400 is three, and sound source separation can be realized with the few microphones, resulting in miniaturization of a device.
With reference to
The sound source separation system 1400 further comprises an orthogonal-disturbance-sound-suppressing-signal generator 1401 that generates an orthogonal-disturbance-sound suppressing signal for suppressing an orthogonal disturbance sound coming from a direction orthogonal to the target sound coming direction, using received sound signals of the two first and second microphones 1421, 1422, an opposite-disturbance-sound-suppressing-control-signal generator 1402 that generates a control signal for suppressing the opposite disturbance sound coming from a direction opposite to the target sound coming direction, using received sound signals of the three first, second and third microphones 1421, 1422, 1423, and an opposite-disturbance-sound suppressing unit 1403 that suppresses an opposite-disturbance sound spectrum included in an orthogonal-disturbance-sound-suppressing-signal spectrum, using a spectrum of the orthogonal-disturbance-sound suppressing signal generated by the orthogonal-disturbance-sound-suppressing-signal generator 1401 and a spectrum of a control signal generated by the opposite-disturbance-sound-suppressing-control-signal generator 1402.
Using the received sound signals of the two first and second microphones 1421, 1422, like the eleventh embodiment (see,
The opposite-disturbance-sound-suppressing-control-signal generator 1402 has a first control target-sound-superior-signal generator 1404 that generates a first control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the third microphone 1423, and the received sound signal of the second microphone 1422, a second control target-sound-superior-signal generator 1405 that generates a second control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the third microphone 1423, and the received sound signal (on a time domain) of the first microphone 1421, a frequency analyzer 1406 that performs frequency analysis on each of the first and second control target-sound-superior signals, on a time domain, generated by the first and second control target-sound-superior-signal generator 1404, 1405, and a control signal integration unit 1407 that compares powers for each frequency band, assigns inferior power to the spectrum S2 of a control target sound superior signal, thereby performing a spectrum integration process (minimization), using a spectrum SA of the first control target sound superior signal generated by the first control target-sound-superior-signal generators 1404 and obtained through frequency analysis by the frequency analyzer 1406 and a spectrum SB of the second control target sound superior signal generated by the second control target-sound-superior-signal generator 1405 and obtained through frequency analysis by the frequency analyzer 1406.
Each of the first and second control target-sound-superior signals generated by the first and second control target-sound-superior-signal generators 1404, 1405 have a cardioid (a heart-like shape) directional characteristic that expands largely in the target sound coming direction and becomes narrow in an opposite disturbance sound coming direction, as shown by a chain double-dashed lines in
In order to suppress the opposite-disturbance-sound spectrum included in the spectrum S1 of the orthogonal-disturbance-sound suppressing signal, the opposite-disturbance-sound suppressing unit 1403 compares powers at the same frequency band between the spectrum S1 of the orthogonal-disturbance-sound-suppressing-signal generated by the orthogonal-disturbance-sound-suppressing-signal generator 1401 and the spectrum S2 of the control target-sound-superior signal generated by the opposite-disturbance-sound-suppressing-control-signal generator 1402, for each frequency band. With respect to a frequency band where power of the orthogonal-disturbance-sound-suppressing-signal spectrum S1 is smaller than power of the control signal spectrum S2, the opposite-disturbance-sound suppressing unit 1403 performs minimum level band selection (BS-MIN) of assigning the smaller power to the spectrum S1, and causes the obtained spectrum (part of the spectrum S1 before processing) to serve as a target sound spectrum S3 separated. At this time, with respect to a frequency band where the power of the spectrum S1 is larger than the power of the control signal spectrum S2, the power of the spectrum S1 is caused to be zero. The spectrum S2 is used only for the control signal and therefore is not utilized and abandoned.
According to such a twelfth embodiment, the target sound separation system 1400 performs the separation process for the target sound and a disturbance sound in the following manner.
First, the orthogonal-disturbance-sound-suppressing-signal generator 1401 generates the orthogonal-disturbance-sound-suppressing-signal spectrum S1. In parallel with this, the opposite-disturbance-sound-suppressing-control-signal generator 1402 generates the control target-sound-superior-signal spectrum S2.
Subsequently, the opposite-disturbance-sound suppressing unit 1403 performs minimum level band selection (BS-MIN), using the control signal spectrum S2 thereby suppressing the opposite disturbance sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, and obtaining the separated target sound spectrum S3.
After the opposite-disturbance-sound suppressing unit 1403 has separated the target sound, like the first to eleventh embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a twelfth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1400 has the orthogonal-disturbance-sound-suppressing-signal generator 1401, the opposite-disturbance-sound-suppressing-control-signal generator 1402, and the opposite-disturbance-sound suppressing unit 1403, directivity control appropriate for separation of the target sound and the disturbance sound can be performed using the received sound signals of the three microphones 1421, 1422, 1423, thus separating the target sound and the disturbance sound precisely.
Further, the number of the microphones used in the sound source separation system 1400 is three, and sound source separation can be realized with the few microphones, resulting in miniaturization of a device.
With reference to
The sound source separation system 1500 further comprises an orthogonal-disturbance-sound-suppressing-signal generator 1501 that generates an orthogonal-disturbance-sound suppressing signal for suppressing an orthogonal disturbance sound coming from a direction orthogonal to the target sound coming direction, using received sound signals of the two first and second microphones 1521, 1522, an opposite-disturbance-sound-suppressing-control-signal generator 1502 that generates a control signal for suppressing the opposite disturbance sound coming from a direction opposite to the target sound coming direction, using received sound signals of the two second and third microphones 1522, 1523, and an opposite-disturbance-sound suppressing unit 1503 that suppresses an opposite-disturbance-sound spectrum included in an orthogonal-disturbance-sound-suppressing-signal spectrum, using an control signal spectrum generated by the orthogonal-disturbance-sound-suppressing-signal generator 1501 and a control signal spectrum generated by the opposite-disturbance-sound-suppressing-control-signal generator 1502.
Using the received sound signals of the two first and second microphones 1521, 1522, the orthogonal-disturbance-sound-suppressing-signal generator 1501 performs the same processes as those of the sound source separation system 200 (see,
The opposite-disturbance-sound-suppressing-control-signal generator 1502 has a control target-sound-superior-signal generator 1504 that generates a control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the third microphone 1523 and the received sound signal of the second microphone 1522, and a frequency analyzer 1505 that performs frequency analysis on the control target-sound-superior signal, on a time domain, generated by the control target-sound-superior-signal generator 1504.
The control target-sound-superior signal generated by the control target-sound-superior-signal generators 1504 has a cardioid-shaped (a heart-shaped curve) directional characteristic that expands largely in the target sound coming direction and becomes narrow in an opposite disturbance sound coming direction, as shown by a chain double-dashed line in
In order to suppress the opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, the opposite-disturbance-sound suppressing unit 1503 compares powers at the same frequency band between the orthogonal-disturbance-sound-suppressing-signal spectrum S1 generated by the orthogonal-disturbance-sound-suppressing-signal generator 1501 and the control target-sound-superior-signal spectrum S2 generated by the opposite-disturbance-sound-suppressing-control-signal generator 1502, for each frequency band. With respect to a frequency band where power of the orthogonal-disturbance-sound-suppressing-signal spectrum S1 is smaller than power of the control signal spectrum S2, the opposite-disturbance-sound suppressing unit 1503 performs minimum level band selection (BS-MIN) of assigning the smaller power to the spectrum S1 and causes the obtained spectrum (part of the spectrum S1 before processing) to serve as the separated target sound spectrum S3. At this time, with respect to a frequency band where the power of the spectrum S1 is larger than the power of the control signal spectrum S2, the power of the spectrum S1 is caused to be zero. The spectrum S2 is used only for the control signal and therefore is not utilized and abandoned.
According to the thirteenth embodiment, the target sound separation system 1500 performs the separation process for the target sound and a disturbance sound in the following manner.
First, the orthogonal-disturbance-sound-suppressing-signal generator 1501 generates the orthogonal-disturbance-sound-suppressing-signal spectrum S1. In parallel with this, the opposite-disturbance-sound-suppressing-control-signal generator 1502 generates the control target-sound-superior-signal spectrum S2.
Thereafter, the opposite-disturbance-sound suppressing unit 1503 performs minimum level band selection (BS-MIN), using the control target-sound-superior-signal spectrum S2 to suppress an opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, thus obtaining the separated target sound spectrum S3.
After the opposite-disturbance-sound suppressing unit 1503 has separated the target sound, like the first to twelfth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a thirteenth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1500 has the orthogonal-disturbance-sound-suppressing-signal generator 1501, the opposite-disturbance-sound-suppressing-control-signal generator 1502, and the opposite-disturbance-sound suppressing unit 1503, the target sound and the disturbance sound can be separated precisely by performing directivity control appropriate for separation of the target sound and the disturbance sound, using the received sound signals of the three microphones 1521, 1522, 1523.
Further, the number of the microphones used in the sound source separation system 1500 is three, and sound source separation can be realized with the few microphones, resulting in miniaturization a device.
With reference to
The sound source separation system 1600 further comprises an orthogonal-disturbance-sound-suppressing-signal generator 1601 that generates an orthogonal-disturbance-sound suppressing signal for suppressing an orthogonal-disturbance sound coming from a direction orthogonal to the target sound coming direction, using received sound signals of the two first and second microphones 1621, 1622, an opposite-disturbance-sound-suppressing-control-signal generator 1602 that generates a control signal for suppressing the opposite disturbance sound coming from the direction opposite to the target sound coming direction, using received sound signals of the two first and second microphones 1621, 1622, and an opposite-disturbance-sound suppressing unit 1603 that suppresses an opposite-disturbance-sound spectrum included in an orthogonal-disturbance-sound-suppressing-signal spectrum, using an orthogonal-disturbance-sound-suppressing-signal spectrum generated by the orthogonal-disturbance-sound-suppressing-signal generator 1601 and a control signal spectrum generated by the opposite-disturbance-sound-suppressing-control-signal generator 1602.
Using the received sound signals of the three first, second and third microphones 1621, 1622, 1623, the orthogonal-disturbance-sound-suppressing-signal generator 1601 performs the same processes those of the sound source separation system 400 in the fourth embodiment (see
The opposite-disturbance-sound-suppressing-control-signal generator 1602 has a control target-sound-superior-signal generator 1604 that generates a control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the second microphone 1622 and the received sound signal of the first microphone 1621, and a frequency analyzer 1605 that performs frequency analysis on the control target-sound-superior signal, on a time domain, generated by the control target-sound-superior-signal generator 1604.
The control target-sound-superior signal generated by the control target-sound-superior-signal generators 1604 has a cardioid (a heart-shaped curve) directional characteristic that expands largely in the target sound coming direction and becomes narrow in an opposite disturbance sound coming direction, as shown by a chain double-dashed line in
In order to suppress the opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, the opposite-disturbance-sound suppressing unit 1603 compares powers at the same frequency band between the orthogonal-disturbance-sound-suppressing-signal spectrum S1 generated by the orthogonal-disturbance-sound-suppressing-signal generator 1601 and the control target-sound-superior-signal spectrum S2 generated by the opposite-disturbance-sound-suppressing-control-signal generator 1602, for each frequency band. With respect to a frequency band where power of the orthogonal-disturbance-sound-suppressing-signal spectrum S1 is smaller than power of the control signal spectrum S2, the opposite-disturbance-sound suppressing unit 1603 performs minimum level band selection (BS-MIN) of assigning the smaller power to the spectrum S1 and causes the obtained spectrum (part of the spectrum S1 before processing) to serve as a separated target sound spectrum S3. At this time, with respect to a frequency band where the power of the spectrum S1 is larger than the power of the control signal spectrum S2, the power of the spectrum S1 is caused to be zero. The spectrum S2 is used only for the control signal and therefore is not utilized and abandoned.
According to such a fourteenth embodiment, the target sound separation system 1600 performs the separation process for the target sound and a disturbance sound in the following manner.
First, the orthogonal-disturbance-sound-suppressing-signal generator 1601 generates the orthogonal-disturbance-sound-suppressing-signal spectrum S1. In parallel with this, the opposite-disturbance-sound-suppressing-control-signal generator 1602 generates the control target-sound-superior-signal spectrum S2.
Thereafter, the opposite-disturbance-sound suppressing unit 1603 performs minimum level band selection (BS-MIN), using the control target-sound-superior-signal spectrum S2 to suppress an opposite-disturbance-sound spectrum contained in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, thus obtaining the separated target sound spectrum S3.
After the opposite-disturbance-sound suppressing unit 1603 has separated the target sound, like the first to thirteenth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a fourteenth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1600 has the orthogonal-disturbance-sound-suppressing-signal generator 1601, the opposite-disturbance-sound-suppressing-control-signal generator 1602, and the opposite-disturbance-sound suppressing unit 1603, using the received sound signals of the three microphones 1621, 1622, 1623, directivity control appropriate for separation of the target sound and the disturbance sound can be performed, thereby separating the target sound and the disturbance sound precisely.
Further, the number of the microphones used in the sound source separation system 1600 is three, and sound source separation can be realized with the few microphones, resulting in miniaturization a device.
With reference to
The sound source separation system 1700 further comprises an orthogonal-disturbance-sound-suppressing-signal generator 1701 that generates an orthogonal-disturbance-sound suppressing signal for suppressing an orthogonal-disturbance sound coming from a direction orthogonal to the target sound coming direction, using received sound signals of the four first, second, third and fourth microphones 1721, 1722, 1723, 1724, an opposite-disturbance-sound-suppressing-control-signal generator 1702 that generates a control signal for suppressing an opposite disturbance sound coming from a direction opposite to the target sound coming direction, using received sound signals of the two first and second microphones 1721, 1722, and an opposite-disturbance-sound suppressing unit 1703 that suppresses an opposite-disturbance sound spectrum included in an orthogonal-disturbance-sound-suppressing-signal spectrum, using a orthogonal-disturbance-sound-suppressing-signal spectrum generated by the orthogonal-disturbance-sound-suppressing-signal generator 1701 and a control signal spectrum generated by the opposite-disturbance-sound-suppressing-control-signal generator 1702.
Using the received sound signals of the four first, second, third and fourth microphones 1721, 1722, 1723, 1724, the orthogonal-disturbance-sound-suppressing-signal generator 1701 performs the same processes as those of the sound source separation system 500 (see,
The opposite-disturbance-sound-suppressing-control-signal generator 1702 has a control target-sound-superior-signal generator 1704 that generates a control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the second microphone 1722 and a received sound signal of the first microphone 1721, and a frequency analyzer 1705 that performs frequency analysis on the control target-sound-superior signal, on a time domain, generated by the control target-sound-superior-signal generator 1704.
The control target-sound-superior signal generated by the control target-sound-superior-signal generators 1704 has a cardioid (a heart-shaped curve) directional characteristic that expands largely in the target sound coming direction and becomes narrow in an opposite disturbance sound coming direction, as shown by a chain double-dashed line in
In order to suppress the opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, the opposite-disturbance-sound suppressing unit 1703 compares powers at the same frequency band between the orthogonal-disturbance-sound-suppressing-signal spectrum S1 generated by the orthogonal-disturbance-sound-suppressing-signal generator 1701 and the control target-sound-superior-signal spectrum S2 generated by the opposite-disturbance-sound-suppressing-control-signal generator 1702, for each frequency band. With respect to a frequency band where power of the orthogonal-disturbance-sound-suppressing-signal spectrum S1 is smaller than power of the control signal spectrum S2, the opposite-disturbance-sound suppressing unit 1703 performs minimum level band selection (BS-MIN) of assigning minor power to the spectrum S1, and causes the obtained spectrum (part of the spectrum S1 before processing) to serve as the separated target sound spectrum S3. At this time, with respect to a frequency band where the power of the spectrum S1 is larger than the power of the control signal spectrum S2, the power of the spectrum S1 is caused to be zero. The spectrum S2 is used only for the control signal and therefore is not utilized and abandoned.
According to the fifteenth embodiment, the target sound separation system 1700 performs the separation process for the target sound and a disturbance sound in the following manner.
First, the orthogonal-disturbance-sound-suppressing-signal generator 1701 generates the orthogonal-disturbance-sound-suppressing-signal spectrum S1. In parallel with this, the opposite-disturbance-sound-suppressing-control-signal generator 1702 generates the control target-sound-superior-signal spectrum S2.
Thereafter, the opposite-disturbance-sound suppressing unit 1703 performs minimum level band selection (BS-MIN), using the control target-sound-superior-signal spectrum S2 to suppress an opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, thus obtaining the separated target sound spectrum S3.
After the opposite-disturbance-sound suppressing unit 1703 has separated the target sound, like the first to fourteenth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a fifteenth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1700 has the orthogonal-disturbance-sound-suppressing-signal generator 1701, the opposite-disturbance-sound-suppressing-control-signal generator 1702, and the opposite-disturbance-sound suppressing unit 1703, directivity control appropriate for separation of the target sound and the disturbance sound is performed, using the received sound signals of the four microphones 1721, 1722, 1723, 1724, thus separating the target sound and the disturbance sound precisely.
Further, the number of the microphones used in the sound source separation system 1700 is four, and sound source separation can be realized with the few microphones, resulting in miniaturization a device.
With reference to
The sound source separation system 1800 further comprises an orthogonal-disturbance-sound-suppressing-signal generator 1801 that generates an orthogonal-disturbance-sound suppressing signal for suppressing an orthogonal-disturbance sound coming from a direction orthogonal to the target sound coming direction, using received sound signals of the four first, second, third and fourth microphones 1821, 1822, 1823, 1824, an opposite-disturbance-sound-suppressing-control-signal generator 1802 that generates a control signal for suppressing an opposite disturbance sound coming from the target sound coming direction, using the received sound signals of the two first and second microphones 1821, 1822, and an opposite-disturbance-sound suppressing unit 1803 that suppresses an opposite-disturbance sound spectrum included in an orthogonal-disturbance-sound-suppressing-signal spectrum, using an orthogonal-disturbance-sound-suppressing-signal spectrum generated by the orthogonal-disturbance-sound-suppressing-signal generator 1801 and a control signal spectrum generated by the opposite-disturbance-sound-suppressing-control-signal generator 1802.
Using the received sound signals of the four first, second, third and fourth microphones 1821, 1822, 1823, 1824, the orthogonal-disturbance-sound-suppressing-signal generator 1801 performs the same processes as those of the sound source separation system 600 in the sixth embodiment (see,
The opposite-disturbance-sound-suppressing-control-signal generator 1802 has a control target-sound-superior-signal generator 1804 that generates a control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the second microphone 1822, and the received sound signal of the first microphone 1821, and a frequency analyzer 1805 that performs frequency analysis on the control target-sound-superior signal, on a time domain, generated by the control target-sound-superior-signal generator 1804.
The control target-sound-superior signal generated by the control target-sound-superior-signal generators 1804 has a cardioid (a heart-shaped curve) directional characteristic that expands largely in the target sound coming direction and becomes narrow in an opposite disturbance sound coming direction, as shown by a chain double-dashed line in
In order to suppress the opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, the opposite-disturbance-sound suppressing unit 1803 compares powers at the same frequency band between the orthogonal-disturbance-sound-suppressing-signal spectrum S1 generated by the orthogonal-disturbance-sound-suppressing-signal generator 1801 and the control target-sound-superior-signal spectrum S2 generated by the opposite-disturbance-sound-suppressing-control-signal generator 1802, for each frequency band. With respect to a frequency band where power of the orthogonal-disturbance-sound-suppressing-signal spectrum S1 is smaller than power of the control signal spectrum S2, the opposite-disturbance-sound suppressing unit 1803 performs minimum level band selection (BS-MIN) of assigning the smaller power to the spectrum S1, and causes the obtained spectrum (part of the spectrum S1 before processing) to serve as the separated target sound spectrum S3. At this time, with respect to a frequency band where the power of the spectrum S1 is larger than the power of the control signal spectrum S2, the power of the spectrum S1 is caused to be zero. The spectrum S2 is used only for the control signal and therefore is not utilized and abandoned.
According to such a sixteenth embodiment, the target sound separation system 1800 performs the separation process for the target sound and a disturbance sound in the following manner.
First, the orthogonal-disturbance-sound-suppressing-signal generator 1801 generates the orthogonal-disturbance-sound-suppressing-signal spectrum S1. In parallel with this, the opposite-disturbance-sound-suppressing-control-signal generator 1802 generates the control target-sound-superior-signal spectrum S2.
Thereafter, the opposite-disturbance-sound suppressing unit 1803 performs minimum level band selection (BS-MIN), using the control target-sound-superior-signal spectrum S2 to suppress an opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, thus obtaining the separated target sound spectrum S3.
After the opposite-disturbance-sound suppressing unit 1803 has separated the target sound, like the first to fifteenth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a sixteenth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1800 has the orthogonal-disturbance-sound-suppressing-signal generator 1801, the opposite-disturbance-sound-suppressing-control-signal generator 1802, and the opposite-disturbance-sound suppressing unit 1803, directivity control appropriate for separation of the target sound and the disturbance sound is performed using the received sound signals of the four microphones 1821, 1822, 1823, 1824, thus separating the target sound and the disturbance sound precisely.
Further, the number of the microphones used in the sound source separation system 1800 is four, and sound source separation can be realized with the few microphones, resulting in miniaturization of a device.
With respect to
The sound source separation system 1900 further comprises an orthogonal-disturbance-sound-suppressing-signal generator 1901 that generates an orthogonal-disturbance-sound suppressing signal for suppressing an orthogonal disturbance sound coming from a direction orthogonal to the target sound coming direction, using received sound signals of the three first, second and third microphones 1921, 1922, 1923, an opposite-disturbance-sound-suppressing-control-signal generator 1902 that generates a control signal for suppressing the opposite disturbance sound coming from the direction opposite to the target sound coming direction, using the received sound signals of the three first, second and third microphones 1921, 1922, 1923, and an opposite-disturbance-sound suppressing unit 1903 that suppresses an opposite disturbance sound spectrum included in an orthogonal-disturbance-sound-suppressing-signal spectrum, using an orthogonal-disturbance-sound-suppressing signal spectrum generated by the orthogonal-disturbance-sound-suppressing-signal generator 1901 and a control signal spectrum generated by the opposite-disturbance-sound-suppressing-control-signal generator 1902.
Using the received sound signals of the three first, second and third microphones 1921, 1922, 1923, the orthogonal-disturbance-sound-suppressing-signal generator 1901 performs the same processes as those of the sound source separation system 700 in the seventh embodiment (see,
The opposite-disturbance-sound-suppressing-control-signal generator 1902 has a first control target-sound-superior-signal generator 1904 that generates a first control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the second microphone 1922, and the received sound signal of the first microphone 1621, a second control target-sound-superior-signal generator 1905 that generates a second control target-sound-superior signal by acquiring a difference between a signal (on a time domain) produced after a delayed process has been applied to the received sound signal (on a time domain) of the third microphone 1923, and the received sound signal (on a time domain) of the first microphone 1921, a frequency analyzer 1906 that performs frequency analysis on the first and second control target-sound-superior signals, on a time domain, generated by the first and second control target-sound-superior-signal generators 1904, 1905, and a control signal integration unit 1907 that performs a spectrum integration process (minimization) by comparing powers for each frequency band and assigning inferior powers to the control target sound superior signal spectrum S2, using a spectrum SA of the first control target-sound-superior signal generated by the first control target-sound-superior-signal generators 1904 and obtained through frequency analysis by the frequency analyzer 1906, and a spectrum SB of the second control target sound superior signal generated by the second control target-sound-superior-signal generator 1905 and obtained through frequency analysis by the frequency analyzer 1906.
The first and second control target-sound-superior signals generated by the first and second control target-sound-superior-signal generators 1904, 1905 each have a cardioid (a heart-shaped curve) directional characteristic that expands largely in the target sound coming direction and becomes narrow in an opposite disturbance sound coming direction, as shown by a chain double-dashed line in
In order to suppress the opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, the opposite-disturbance-sound suppressing unit 1903 compares powers at the same frequency band between the spectrum S1 of the orthogonal-disturbance-sound-suppressing-signal generated by the orthogonal-disturbance-sound-suppressing-signal generator 1901 and the spectrum S2 of the control target-sound-superior-signal generated by the opposite-disturbance-sound-suppressing-control-signal generator 1902, for each frequency band. With respect to a frequency band where power of the orthogonal-disturbance-sound-suppressing-signal spectrum S1 is smaller than power of the control signal spectrum S2, the opposite-disturbance-sound suppressing unit 1903 performs minimum level band selection (BS-MIN) of assigning the smaller power to the spectrum S1 of the orthogonal disturbance sound, and causes the obtained spectrum (part of the spectrum S1 before processing) to serve as the separated target sound spectrum S3. At this time, with respect to a frequency band where the power of the spectrum S1 is larger than the power of the control signal spectrum S2, the power of the spectrum S1 is caused to be zero. The spectrum S2 is used only for the control signal and therefore is not utilized and abandoned.
According to such a seventeenth embodiment, the target sound separation system 1900 performs the separation process for the target sound and a disturbance sound in the following manner.
First, the orthogonal-disturbance-sound-suppressing-signal generator 1901 generates the orthogonal-disturbance-sound-suppressing-signal spectrum S1. In parallel with this, the opposite-disturbance-sound-suppressing-control-signal generator 1902 generates the control target-sound-superior-signal spectrum S2.
Thereafter, the opposite-disturbance-sound suppressing unit 1903 performs minimum level band selection (BS-MIN), using the control target-sound-superior-signal spectrum S2, to suppress an opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, thus obtaining the separated target sound spectrum S3.
After the opposite-disturbance-sound suppressing unit 1903 has separated the target sound, like the first to sixteenth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a seventeenth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 1900 has the orthogonal-disturbance-sound-suppressing-signal generator 1901, the opposite-disturbance-sound-suppressing-control-signal generator 1902, and the opposite-disturbance-sound suppressing unit 1903, the target sound and the disturbance sound can be separated precisely by performing directivity control appropriate for separation of the target sound and the disturbance sound, using the received sound signals of the three microphones 1921, 1922, 1923.
Further, the number of the microphones used in the sound source separation system 1900 is three, and sound source separation is realized with the few microphones, resulting in miniaturization of a device.
With reference to
The sound source separation system 2000 further comprises an orthogonal-disturbance-sound-suppressing-signal generator 2001 that generates an orthogonal-disturbance-sound suppressing signal for suppressing an orthogonal-disturbance sound coming from in a direction orthogonal to the target sound coming direction, using received sound signals of the three first, second and third microphones 2021, 2022, 2023, an opposite-disturbance-sound-suppressing-control-signal generator 2002 that generates a control signal for suppressing the opposite-disturbance sound coming from a direction opposite to the target sound coming direction, using the received sound signals of the three first, second and third microphones 2021, 2022, 2023, and an opposite-disturbance-sound suppressing unit 2003 that suppresses an opposite-disturbance-sound spectrum included in an orthogonal-disturbance-sound-suppressing-signal spectrum, using a orthogonal-disturbance-sound-suppressing-signal spectrum generated by the orthogonal-disturbance-sound-suppressing-signal generator 2001, and a control signal spectrum generated by the opposite-disturbance-sound-suppressing-control-signal generator 2002.
Using the received sound signals of the three first, second and third microphones 2021, 2022, 2023, the orthogonal-disturbance-sound-suppressing-signal generator 2001 performs, like the seventeenth embodiment (see,
The opposite-disturbance-sound-suppressing-control-signal generator 2002 has a control target-sound-superior-signal generator 2004 that generates a control target-sound-superior signal by acquiring a difference between a signal (on a time domain) obtained by performing a delayed process on a sum signals, obtained by multiplying the received sound signals (on a time domain) of the second and third microphones 2022, 2023 by the same or different proportional coefficients (in the embodiment, the same proportional coefficient k as an example), and the received sound signal of the first microphone 2021, and a frequency analyzer 2005 that performs frequency analysis on the control target-sound-superior signal, on a time domain, generated by the control target-sound-superior-signal generator 2004.
The control target-sound-superior signal generated by the control target-sound-superior-signal generators 2004 has the cardioid (a heart-shaped curve) directional characteristic that expands largely in the target sound coming direction and becomes narrow in an opposite disturbance sound coming direction, as shown by a chain double-dashed line in
In order to suppress the opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, the opposite-disturbance-sound suppressing unit 2003 compares powers at the same frequency band between the orthogonal-disturbance-sound-suppressing-signal spectrum S1 generated by the orthogonal-disturbance-sound-suppressing-signal generator 2001 and the control target-sound-superior-signal spectrum S2 generated by the opposite-disturbance-sound-suppressing-control-signal generator 2002, for each frequency band. With respect to a frequency band where power of the orthogonal-disturbance-sound-suppressing-signal spectrum S1 is smaller than power of the control signal spectrum S2, the opposite-disturbance-sound suppressing unit 2003 performs minimum level band selection (BS-MIN) of assigning the smaller power to the spectrum S1, and causes the obtained spectrum (part of the spectrum S1 before processing) to serve as the separated target sound spectrum S3. At this time, with respect to a frequency band where the power of the spectrum S1 is larger than the power of the control signal spectrum S2, the power of the spectrum S1 is caused to be zero. The spectrum S2 is used only for the control signal and therefore is not utilized and abandoned.
According to such an eighteenth embodiment, the target sound separation system 2000 performs the separation process for the target sound and a disturbance sound in the following manner.
First, the orthogonal-disturbance-sound-suppressing-signal generator 2001 generates the orthogonal-disturbance-sound-suppressing-signal spectrum S1. In parallel with this, the opposite-disturbance-sound-suppressing-control-signal generator 2002 generates the control target-sound-superior-signal spectrum S2.
Thereafter, the opposite-disturbance-sound suppressing unit 2003 performs minimum level band selection (BS-MIN), using the control target-sound-superior-signal spectrum S2 to suppress an opposite-disturbance-sound spectrum included in the orthogonal-disturbance-sound-suppressing-signal spectrum S1, thus obtaining the separated target sound spectrum S3.
After the opposite-disturbance-sound suppressing unit 2003 has separated the target sound, like the first to seventeenth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such an eighteenth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 2000 has the orthogonal-disturbance-sound-suppressing-signal generator 2001, the opposite-disturbance-sound-suppressing-control-signal generator 2002, and the opposite-disturbance-sound suppressing unit 2003, directivity control appropriate for separation of the target sound and the disturbance sound is performed to separate the target sound and the disturbance sound precisely, using the received sound signals of the three microphones 2021, 2022, 2023.
Further, the number of the microphones used in the sound source separation system 2000 is three, and sound source separation is realized with the few microphones, thus miniaturizing a device.
With reference to
The sound source separation system 2100 further comprises a first different-directional-signal-group generator 2101 that generates a combination of a plurality (two in the embodiment) of signal spectra S1A, S1B with different directivities from one another, using received sound signals of the two first and second microphones 2121, 2122, a second different-directional-signal-group generator 2102 that generates a combination of a plurality (two in the embodiment) of signal spectra S2A, S2B with different directivities from each other, using received sound signals of the two second and third microphones 2122, 2123, and a sensitive region formation unit 2103 that performs multidimensional band selection (BS-MultiD, two-dimensional band selection: BS-2D in the embodiment), using two-set combinations of a plurality (two) of signal spectra each generated by the first and second different-directional-signal-group generators 2101, 2102.
The first different-directional-signal-group generators 2101 performs partially the same processes as those of the sound source separation system 300 in the third embodiment (see,
The first different-directional-signal-group generators 2101 has an integration unit 2104 that performs a spectrum integration process (minimization) by comparing powers for each frequency band, and assigning the inferior power to a target sound superior signal spectrum, using a first target sound superior signal spectrum generated by the first target sound superior signal generator 331 and obtained through frequency analysis by the frequency analyzer 350 and a second target sound superior signal spectrum generated by the second target sound superior signal generator 332 and obtained through frequency analysis by the frequency analyzer 350. A directional characteristic of the target sound superior signal undergone spectrum integration obtained through minimization by the integration unit 2104 results in an overlapped portion of the cardioid (a heart-shaped curve) directional characteristic, shown by a solid line in
Accordingly, the first different-directional-signal-group generators 2101 generates a combination of a target sound superior signal spectrum S1A having a directional characteristic configured by two cardioids overlapped portion shown in
Like the case of the first different-directional-signal-group generators 2101, the second different-directional-signal-group generators 2102 performs partially the same processes as those of the sound source separation system 300 in the third embodiment (see,
Besides, like the first different-directional-signal-group generators 2101, the second different-directional-signal-group generator 2102 has an integration unit 2105 that performs a spectrum integration process (minimization) by comparing powers for each frequency band and assigning the inferior power to the target sound superior signal spectrum, using the first target sound superior signal spectrum generated by the first target sound superior signal generator 331B and obtained through frequency analysis by the frequency analyzer 350B, and a second target sound superior signal spectrum generated by the second target sound superior signal generator 332B and obtained through frequency analysis by the frequency analyzer 350B.
Accordingly, likewise the first different-directional-signal-group generators 2101, the second different-directional-signal-group generator 2102 generates a combination of a target sound superior signal spectrum S2A having the directional characteristic of two-cardioids-overlapped portion shown in
When there are a condition of a largeness relationship of powers between spectra defined within a combination of the target sound superior signal spectrum S1A and the target sound superior signal spectrum S1B generated by the first different-directional-signal-group generator 2101, and a condition of a largeness relationship of powers between spectra defined within a combination of the target sound superior signal spectrum S2A and the target sound inferior signal spectrum S2B generated by the second different-directional-signal-group generators 2102, the sensitive region formation unit 2103 determines whether or not the plurality of (two in the embodiment) conditions are satisfied at the same time, for each frequency band, and performs multidimensional band selection (two-dimensional band selection because the conditions are two) of assigning power of a preliminarily selected spectrum (target sound superior signal spectrum S1A generated by the first different-directional-signal-group generator 2101 in the embodiment) as a target sound spectrum S3 to be separated, for frequency bands where the plurality of conditions are satisfied at the same time.
More specifically, for the spectra S1A, S1B of the plurality of (two) signals generated by the first different-directional-signal-group generators 2101, the sensitive region formation unit 2103 sets a condition that power of the target sound superior signal spectrum S1A is larger than power of the target sound inferior signal spectrum S1B (S1A>S1B), and for the spectra S2A, S2B of the plurality of (two) signals generated by the second different-directional-signal-group generators 2102, the sensitive region formation unit sets a condition that power of the target sound superior signal spectrum S2A is larger than power of the target sound inferior signal spectrum S2B (S2A>S2B), and determines whether or not S1A>S1B and S2A>S2B are satisfied for each frequency band. For a frequency band where both conditions are satisfied at the same time, power of the spectrum S1A of that frequency band is assigned as the spectrum S3 of the target sound to be separated, and for other frequency bands, powers are caused to be zero. In the embodiment, the target sound superior signal spectrum S1A generated by the first different-directional-signal-group generators 2101 is focused on, and whether power of the spectrum S1A is assigned to the target sound to be separated or abandoned is determined. However, the same process may be performed with the target sound superior signal spectrum S2A generated by the second different-directional-signal-group generators 2102 being focused on.
According to such a nineteenth embodiment, the target sound separation system 2100 performs the separation process for the target sound and a disturbance sound in the following manner.
First, using the received sound signals of the first and second microphones 2121, 2122, the first different-directional-signal-group generators 2101 generates the combination of the target sound superior signal spectrum S1A and target sound inferior signal spectrum S1B. In parallel with this, the second different-directional-signal-group generators 2101 generates the combination of the target sound superior signal spectrum S2A and target sound inferior signal spectrum S2B, using the received sound signals of the second and third microphones 2122, 2123.
Next, using the target sound superior signal spectrum S1A and the target sound inferior signal spectrum S1B generated by the first different-directional-signal-group generator 2101, and the target sound superior signal spectrum S2A and the target sound inferior signal spectrum S2B generated by the second different-directional-signal-group generator 2102, i.e., using two sets of the combinations of the two signals, the sensitive region formation unit 2103 performs two-dimensional band selection (BS-2D), thereby obtaining the target sound spectrum S3 to be separated.
After the sensitive region formation unit 2103 has separated the target sound, like the first to eighteenth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a nineteenth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 2100 has the first different-directional-signal-group generators 2101, the second different-directional-signal-group generators 2102 and the sensitive region formation unit 2103, directivity control appropriate for separation of the target sound and the disturbance sound is performed to form a sensitive region, using the received sound signals of the three microphones 2121, 2122, 2123. This results in precise separation of the target sound and the disturbance sound.
Further, the number of the microphones used in the sound source separation system 2100 is three, and sound source separation is realized with the few microphones, thereby miniaturizing a device.
With reference to
The sound source separation system 2200 further comprises a first different-directional-signal-group generator 2201 that generates a combination of spectra S1A, S1B of a plurality of (two in the embodiment) signals with different directivities (two directivities in the embodiment) from one another, using received sound signals of the two first and second microphones 2221, 2222, a second different-directional-signal-group generator 2202 that generates a combination of spectra S2A, S2B of a plurality of signals with different directivities (two directivities in the embodiment) from one another, using received sound signals of the two second and third microphones 2222, 2223, a third different-directional-signal-group generator 2203 that generates a combination of spectra S3A, S3B of a plurality of signals with different directivities (two directivities in the embodiment) from one another, using received sound signals of the first and third microphones 2221, 2223, and a sensitive region formation unit 2204 that performs multidimensional band selection (BS-MultiD, in embodiment, three-dimensional band selection: BS-3D), using three sets of combinations of the spectra in a plurality of (two) signals generated by the first, second and third different-directional-signal-group generators 2201, 2202, 2203.
The first different-directional-signal-group generators 2201 performs partially the same processes as those of the sound source separation system 300 (see,
The first different-directional-signal-group generators 2201 has an integration unit 2205 that performs a spectrum integration process (minimization) by comparing powers for each frequency band and assigning the inferior power to a target sound superior signal spectrum, using a first target sound superior signal spectrum generated by the first target sound superior signal generator 331 and obtained through frequency analysis by the frequency analyzer 350, and a second target sound superior signal spectrum generated by the second target sound superior signal generator 332 and obtained through frequency analysis by the frequency analyzer 350. A directional characteristic of the target sound superior signal undergone spectrum integration obtained through minimization by the integration unit 2105 results in an overlapped portion of the cardioid (a heart-shaped curve) directional characteristic, shown by a solid line in
Accordingly, the first different-directional-signal-group generators 2201 generates the combination of the target sound superior signal spectrum S1A with two-cardioids-overlapped portion shown in
Like the first different-directional-signal-group generators 2201, the second different-directional-signal-group generators 2202 performs partially the same processes as those of the sound source separation system 300 (see,
Besides, like the first different-directional-signal-group generators 2201, the second different-directional-signal-group generator 2202 has an integration unit 2206 that performs a spectrum integration process (minimization) by comparing powers for each frequency band and assigning the inferior power to target sound superior signal spectrum, using a first target sound superior signal spectrum generated by the first target sound superior signal generator 331C and obtained through frequency analysis by the frequency analyzer 350C, and a second target sound superior signal spectrum generated by the second target sound superior signal generator 332C and obtained through frequency analysis by the frequency analyzer 350C.
Accordingly, like the first different-directional-signal-group generators 2202, the second different-directional-signal-group generator 2201 generates a combination of the target sound superior signal spectrum S2A whose directional characteristic is two-cardioids-overlapped portion shown in
Like the first different-directional-signal-group generators 2201, the third different-directional-signal-group generators 2203 performs partially the same processes as those of the sound source separation system 300 (see,
Besides, like the first different-directional-signal-group generators 2201, the third different-directional-signal-group generator 2203 has an integration unit 2207 that performs a spectrum integration process (minimization) by comparing powers for each frequency band and assigning the inferior power to target sound superior signal spectrum, using a first target sound superior signal spectrum generated by the first target sound superior signal generator 331D and obtained through frequency analysis by the frequency analyzer 350D, and a second target sound superior signal spectrum generated by the second target sound superior signal generator 332D and obtained through frequency analysis by the frequency analyzer 350D.
Accordingly, like the first different-directional-signal-group generators 2201, the third different-directional-signal-group generator 2203 generates a combination of the target sound superior signal spectrum S3A with two-cardioids-overlapped portion shown in
When there are a condition of a largeness relationship of powers between spectra defined within a combination of the target sound superior signal spectrum S1A and the target sound inferior signal spectrum S1B generated by the first different-directional-signal-group generators 2201, and a condition of a largeness relationship of powers between spectra defined within a combination of the target sound superior signal spectrum S2A and the target sound inferior signal spectrum S2B generated by the second different-directional-signal-group generators 2202, and a condition of a largeness relationship of powers between spectra defined within a combination of the target sound superior signal spectrum S3A and the target sound inferior signal spectrum S3B generated by the third different-directional-signal-group generators 2203, the sensitive region formation unit 2204 determines whether or not a plurality of (three in the embodiment) those conditions are satisfied at the same time, and for each frequency band, and for a frequency band where the plurality of conditions are satisfied at the same time, performs multidimensional band selection (three-dimensional band selection since the conditions are three in the embodiment) of assigning power of a pre-selected spectrum (in the embodiment, spectrum S1A of the target sound superior signal generated by the first different-directional-signal-group generator 2201) to the spectrum S4 of the target sound to be separated.
More specifically, the sensitive region formation unit 2204 sets a condition that power of the spectrum S1A of the target sound is larger than power of the spectrum S1B of the target sound inferior signal (S1A>S1B) for the spectra S1A, S1B of a plurality of (two) signals generated by the first different-directional-signal-group generator 2201, for the plurality of (two) signal spectra S2A, S2B generated by the second different-directional-signal-group generators 2202, sets a condition that power of the target sound superior signal spectrum S2A is larger than power of the target sound inferior signal spectrum S2B (S2A>S2B), and for the plurality of (two) signal spectra S3A, S3B generated by the third different-directional-signal-group generators 2203, sets a condition that power of the target sound superior signal spectrum S3A is larger than power of the target sound inferior signal spectrum S3B (S3A>S3B), determines whether or not S1A>S1B, S2A>S2B and S3A>S3B are satisfied, for each frequency band. Then, for a frequency band where the three conditions are satisfied at the same time, the sensitive region formation unit 2204 assigns the power of the spectrum S1A of that frequency band to the target sound spectrum S4 to be separated, and for other frequency bands, powers are caused to be zero.
According to such a twentieth embodiment, the target sound separation system 2200 performs the separation process for the target sound and a disturbance sound in the following manner.
First, using the received sound signals of the first and second microphones 2221, 2222, the first different-directional-signal-group generators 2201 generates the combination of the target sound superior signal spectrum S1A and target sound inferior signal spectrum S1B. In parallel with this, the second different-directional-signal-group generators 2202 generates the combination of the target sound superior signal spectrum S2A and target sound inferior signal spectrum S2B using the received sound signals of the second and third microphones 2222, 2223. In parallel with these, the third different-directional-signal-group generators 2203 generates the combination of the target sound superior signal spectrum S3A and target sound inferior signal spectrum S3B using the received sound signals of the first and third microphones 2221, 2223.
Next, using the target sound superior signal spectrum S1A and the target sound inferior signal spectrum S1B generated by the first different-directional-signal-group generators 2201, the target sound superior signal spectrum S2A and the target sound inferior signal spectrum S2B generated by the second different-directional-signal-group generators 2202, and the target sound superior signal spectrum S3A and the target sound inferior signal spectrum S3B generated by the third different-directional-signal-group generators 2203, i.e., using three sets of combinations of the two signal spectra, sensitive region formation unit 2204 obtains the target sound spectrum S4 to be separated by performing three-dimensional band selection (BS-3D).
After the sensitive region formation unit 2204 has separated the target sound, like the first to nineteenth embodiments, voice recognition using an acoustic model obtained by performing an adaptation process or a learning process beforehand can be performed.
According to such a twentieth embodiment, the following effectiveness can be achieved. Namely, because the sound source separation system 2200 has the first different-directional-signal-group generator 2201, the second different-directional-signal-group generator 2202, the third different-directional-signal-group generator 2203 and the sensitive region formation unit 2204, directivity control appropriate for separation of the target sound and the disturbance sound is performed to form a sensitive region. Accordingly, the target sound and the disturbance sound can be precisely separated.
Further, the number of the microphones used in the sound source separation system 2200 is three, and sound source separation is realized with the few microphones, thus miniaturizing a device.
The invention is not limited to each of the foregoing embodiments, and various modifications or the like within the scope where the object of the invention can be achieved are included in the invention.
Namely, in each of the embodiments, explanations has been given of the case where the sound source separation system of the invention is applied to a portable device like a cellular phone, but the invention is not limited to this case, and can be applied to a case where remote uttering is necessary, such as a in-vehicle device like a car navigation system, and a conference minute drafting device.
In the first embodiment, as shown in
Further, when the structure in
In the first embodiment, as shown in
In the first embodiment, the target sound inferior signal generator 40 applies a time delay, equal to or approximately equal to a sound wave propagation time between the two microphones 21, 22, to the received sound signal of the microphone subject to a delayed process (the directional characteristic shown by a chain doubled-line in
The process of applying a delay to one signal in the two signals to be paired with each other has been performed in order to obtain the cardioid (a heat-shaped curve) directional characteristic in each of the embodiments. This doesn't necessarily means a process of applying a delay to only one signal, and a process of applying a delay to both signals to be paired with each other, and causing a delay amount of the one signal to be relatively large with respect to other signal is included. It is not particularly mentioned in each embodiment, but the foregoing delayed process may be a process of applying a delay, which is an integral multiplication of a sampling period, on a time domain or a frequency domain in the foregoing embodiments. In this manner, as the delay which is the integral multiplication of the sampling period is applied, delay calculation by a digital filter having a large operand becomes unnecessary, and a process of applying a large delay to both signals to be paired with each other becomes unnecessary.
The first and second different-directional-signal-group generator 2101, 2102 (see,
For example, the same microphone arrangement as those of the microphones 2121, 2122, 2123 (see
Further, the same microphone arrangement as that of the microphones 2221, 2222, 2223 (see
The first and second sensitive region formation signal generators 1001, 1002 (see,
For example, the same microphone arrangement as that of the microphones 1021, 1022, 1023 (see,
Further, the same microphone arrangement as that of the microphones 1221, 1222, 1223 (see,
As described above, the sound source separation system, the sound source separation method and the acoustic signal acquisition device of the invention are appropriate for a case where a desired speech is acquired through, for example, a portable device like a cellular phone, an in-vehicle device like a car navigation system, and a conference minute drafting device.
Number | Date | Country | Kind |
---|---|---|---|
2004-366202 | Dec 2004 | JP | national |
2005-270931 | Sep 2005 | JP | national |
Number | Date | Country | |
---|---|---|---|
Parent | 11721953 | Jun 2007 | US |
Child | 13486798 | US |