This application is a U.S. National Phase of International Patent Application No. PCT/JP2015/080481 filed on Oct. 29, 2015, which claims priority benefit of Japanese Patent Application No. JP 2014-228896 filed in the Japan Patent Office on Nov. 11, 2014. Each of the above-referenced applications is hereby incorporated herein by reference in its entirety.
The present technology relates to a sound processing device, a sound processing method, and a program. More specifically, the present technology relates to a sound processing device, a sound processing method, and a program, which can extract a desired sound as properly removing noise.
In recent years, a user interface that uses a sound has been getting popular. The user interface that uses a sound is used to make a phone call or search information, in a mobile phone (a device such as a smartphone) for example.
However, when the user interface is used in a condition with a lot of noise, a sound generated by a user cannot be properly analyzed due to the noise and a wrong process may be executed. Patent Document 1 proposes to emphasize a sound by a fixed beamformer, emphasize a noise by a block matrix unit, and perform generalized side lobe canceling. Further, Patent Document 1 proposes to switch a coefficient of the fixed beamformer by a beamformer switching unit and perform the switching by switching two filters between a case with a sound and a case without a sound.
When filters having different characteristics are switched between a case with a sound and a case without a sound as described in Patent Document 1, it is difficult to switch to a proper filter if a proper sound zone cannot be detected. However, it is difficult to detect a proper sound zone and this may cause a correct sound zone not to be properly detected and the filter not to be switched to a proper filter.
Further, according to Patent Document 1, since the filters are rapidly switched between a case with a sound and a case without a sound, sound quality may suddenly change and user may have a discomfort feeling.
Further, it may be considered that an effect to the sound quality is not large if the existing noise is generated at a point sound source; however, a noise is widespread in general. In addition, a sudden noise may occur. It is preferable to obtain a desired sound by handling such various noises.
The present technology is made in view of the above problem so that the filter can be properly switched and a desired sound can be obtained.
A sound processing device of an aspect of the present technology includes: a sound collection unit configured to collect a sound; an application unit configured to apply a predetermined filter to a signal of the sound collected by the sound collection unit; a selection unit configured to select a filter coefficient of the filter applied by the application unit; and a correction unit configured to correct the signal from the application unit.
The selection unit may select the filter coefficient on the basis of the signal of the sound collected by the sound collection unit.
The selection unit may create, on the basis of the signal of the sound collected by the sound collection unit, a histogram which associates a direction where the sound occurs and a strength of the sound and may select the filter coefficient on the basis of the histogram.
The selection unit may create the histogram on the basis of signals accumulated for a predetermined period of time.
The selection unit may select a filter coefficient of a filter that suppresses the sound in an area other than an area including a largest value in the histogram.
A conversion unit configured to convert the signal of the sound collected by the sound collection unit into a signal of a frequency range may further be included, wherein the selection unit may select the filter coefficient for all frequency bands by using the signal from the conversion unit.
A conversion unit configured to convert the signal of the sound collected by the sound collection unit into a signal of a frequency range may further be included, wherein the selection unit may select the filter coefficient for each frequency band by using the signal from the conversion unit.
The application unit may include a first application unit and a second application unit, the sound processing device may further include a mixing unit configured to mix signals from the first application unit and the second application unit, when a first filter coefficient is switched to a second filter coefficient, a filter with the first filter coefficient may be applied in the first application unit and a filter with the second filter coefficient may be applied in the second application unit, and the mixing unit may mix the signal from the first application unit and a signal from the second application unit with a predetermined mixing ratio.
After a predetermined period of time has passed, the first application unit may start a process in which the filter with the second filter coefficient is applied and the second application unit stops processing.
The selection unit may select the filter coefficient on the basis of an instruction from a user.
The correction unit may perform a correction to further suppress a signal which has been suppressed in the application unit when the signal of the sound collected by the sound collection unit is smaller than the signal to which a predetermined filter is applied by the application unit, and may perform a correction to suppress a signal which has been amplified by the application unit when the signal of the sound collected by the sound collection unit is larger than the signal to which a predetermined filter is applied by the application unit.
The application unit may suppress a constant noise, and the correction unit may suppress a sudden noise.
A sound processing method of an aspect of the present technology includes: collecting a sound; applying a predetermined filter to a signal of the collected sound; selecting a filter coefficient of the applied filter; and correcting the signal to which the predetermined filter is applied.
A program of an aspect of the present technology causes a computer to execute a process including the steps of: collecting a sound; applying a predetermined filter to a signal of the collected sound; selecting a filter coefficient of the applied filter; and correcting the signal to which the predetermined filter is applied.
According to an aspect of the sound processing device, sound processing method, and program according to the present technology, a noise can be suppressed and a desired sound can be collected by collecting a sound, applying a predetermined filter to a signal of the collected sound, selecting a filter coefficient of the applied filter, and correcting the signal to which the predetermined filter is applied.
According to an aspect of the present technology, filters can be properly switched and a desired sound can be obtained.
Here, the effects described here do not set any limitation and anyone of the effects described in this specification may be realized.
In the following, a mode (hereinafter, referred to as “an embodiment”) for carrying out the present technology will be described. It is noted that the descriptions will be given in the following order.
1. External configuration of sound processing device
2. About sound source
3. Internal configuration and operation of first sound processing device (first-1 and first-2 sound processing devices)
4. Internal configuration and operation of second sound processing device (second-1 and second-2 sound processing devices)
5. About recording medium
<External Configuration of Sound Processing Device>
Further, the sound processing device according to the present technology may be a mobile terminal or a device used as being placed at a predetermined location. Further, the present technology may be applied to a device called a wearable device, which is a glasses-type terminal or a terminal wearable on an arm or the like.
Here, the explanation will be given using a mobile phone (smartphone) as an example.
The speaker 21 and the microphone 23 are used for a voice phone call. The display 22 displays various information. The display 22 may be a touch panel.
The microphone 23 has a function to collect a voice of a user and is a part to which a target sound processed in a later described process is input. The microphone 23 is an electret condenser microphone, an MEMS microphone, or the like. Further, sampling is performed by the microphone 23 with 16000 Hz for example.
Further, in
The illustrated installed position of the microphone 23 in the mobile phone 10 is an example and the installed position is not limited to the lower center portion illustrated in
It may be different, depending on the device that includes the microphones 23, where the microphones 23 are placed or how many microphones 23 are provided as long as the microphones 23 are provided at a proper installation position of each device.
<About Sound Source>
With reference to
Out of the sounds collected by the microphone 51, a sound that causes a noise which is not desirable to collect is assumed to be generated by a sound source 61. The noise generated by the sound source 61 is, for example, a noise that is constantly generated from a same direction, such as a fan noise of a projector and a noise of an air conditioner. Such a noise is defined here as a constant noise.
When there is a constant noise and a sudden noise is generated while executing a process to remove the constant noise and extract a desired sound, the sudden noise cannot be handled and, in other words, the sudden noise cannot be removed and this may affect the extraction of the desired sound. Or, for example, in a case that a sudden noise is generated, a filter for processing the sudden noise is used, and then the filter for processing the constant noise is used again while processing a constant noise by applying a predetermined filter, the filter switching is frequently repeated and a noise may be caused by the filter switching.
In view of the above, a sound processing device that reduces a constant noise, properly handles a generated sudden noise, and processes not to cause a new noise by the process to reduce the noise will be described.
<Internal Configuration and Operation of First Sound Processing Device>
<Internal Configuration and Operation of First-1 Sound Processing Device>
Here, the mobile phone 10 also includes a communication unit to function as a telephone and a function to connect to a network; however, a configuration of the sound processing device 100 related to sound processing is illustrated, and illustration and explanation of other functions are omitted here.
The sound collection unit 101 includes the plurality of microphones 23 and, in the example illustrated in
A sound signal collected by the sound collection unit 101 is provided to the time-frequency conversion unit 102. The time-frequency conversion unit 102 converts the provided signal of a time range into a signal of a frequency range and provides the signal to each of the beamforming unit 103, filter selection unit 104, and correction coefficient calculation unit 107.
The beamforming unit 103 performs a process of beamforming by using the sound signals of the microphones 23-1 to 23-M, which are provided from the time-frequency conversion unit 102, and a filter coefficient provided from the filter coefficient storage unit 105. The beamforming unit 103 has a function for performing a process with a filter and beamforming is one of the examples of the function. The beamforming executed by the beamforming unit 103 is a process of beamforming of an addition-type or a subtraction-type.
The filter selection unit 104 calculates an index of a filter coefficient used in beamforming by the beamforming unit 103, for each frame.
The filter coefficient storage unit 105 stores the filter coefficient used in the beamforming unit 103.
The sound signal output from the beamforming unit 103 is provided to the signal correction unit 106 and correction coefficient calculation unit 107.
The correction coefficient calculation unit 107 receives the sound signal from the time-frequency conversion unit 102 and a beamformed signal from the beamforming unit 103, and calculates a correction coefficient used in the signal correction unit 106, on the basis of the signals.
The signal correction unit 106 corrects the signal output from the beamforming unit 103 by using the correction coefficient calculated by the correction coefficient calculation unit 107.
The signal corrected by the signal correction unit 106 is provided to the time-frequency reverse conversion unit 108. The time-frequency reverse conversion unit 108 converts the provided signal of a frequency band into a signal of a time range and outputs the signal to an unillustrated unit in a later stage.
With reference to the flowcharts of
In step S101, sound signals are respectively collected by the microphones 23-1 to 23-M of the sound collection unit 101. Here, the collected sound in this example is a sound generated by a user, a noise, and a sound of mixture of those.
In step S102, input signals are clipped for each frame. The sampling in a case of clipping is performed with 16000 Hz for example. In this example, a signal of a frame clipped from the microphone 23-1 is set as a signal x1 (n), . . . a signal of a frame clipped from the microphone 23-2 is set as a signal x2(n), . . . and a signal of a frame clipped from the microphone 23-M is set as a signal xm(n). Here, m represents an index (1 to M) of the microphones, and n represents a sample number of a signal in which a sound is included.
The clipped signals x1(n) to xm(n) are each provided to the time-frequency conversion unit 102.
In step S103, the time-frequency conversion unit 102 converts the provided signals x1(n) to xm(n) into respective time-frequency signals. With reference to
In this example, the description will be given under an assumption that the time range signal x1(n) is converted into a frequency range signal x1(f,k), a time range signal x2(n) is converted into a frequency range signal x2(f,k), . . . , and a time range signal xm(n) is converted into a frequency range signal xm(f,k). The letter f of (f,k) is an index indicating a frequency band, and the letter k of (f,k) is a frame index.
As illustrated in
Back to the explanation of the flowchart of
In step S104, the filter selection unit 104 calculates an index I(k) of a filter coefficient used in beamforming for each frame. The calculated index I(k) is transmitted to the filter coefficient storage unit 105. A filter selection process is performed in the following three steps.
First Step: Sound Source Azimuth Estimation
Second Step: Creation of Sound Source Distribution Histogram
Third Step: Determination of Filter to be Used
First Step: Sound Source Azimuth Estimation
Firstly, the filter selection unit 104 performs a sound source azimuth estimation by using signals x1(f,k) to xm(f,k) which are time-frequency signals provided from the time-frequency conversion unit 102. The sound source azimuth estimation can be performed on the basis of a multiple signal classification (MUSIC) method for example. In the MUSIC method, a method described in the following document may be applied.
R. O. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. Antennas Propagation, vol. AP-34, no. 3, pp. 276 to 280, March 1986.
The estimation result by the filter selection unit 104 is assumed as P(f,k). For example, in a case that microphones 23-1 to 23-M (
Second Step: Creation of Sound Source Distribution Histogram
The result estimated in First step is accumulated. An accumulation time may be set to a period of previous ten seconds, for example. By using the estimation result of the amount of this accumulation time, a histogram is created. Here, by providing such an accumulation time, a sudden noise can be handled.
It will be obvious in the following description that, when a histogram is created on the basis of data of an accumulated amount of a predetermined time, even if a sudden noise occurs, the histogram is prevented from being significantly changed due to data of the sudden noise.
When the histogram does not change by a certain amount, the filter is not switched in a later process and this can prevent the filter from being switched due to an effect of a sudden noise. Thus, the filter can be prevented from being frequently switched due to an effect of a sudden noise, and stability is improved.
Referring to the histogram, a condition of distribution of a sound source such as a target sound and a noise existing in the space can be clearly seen. For example, on the basis of the histogram illustrated in
Such a histogram may be created for each frequency or may be created for all frequencies. The following description will be given with an example that the histogram is created as integrating all frequencies.
Third Step: Determination of Filter to be Used
When a histogram is created, a filter to be used is determined in Third step. In this example, the description will be given under an assumption that the filter coefficient storage unit 105 maintains filters of three patterns illustrated in
The filter A is a filter that significantly reduces gain in the left side (−90 degree azimuth) seen from the sound processing device. The filter A is selected, for example, when it is desired to obtain a sound in the right side (+90 degrees azimuth) seen from the sound processing device or when it is determined that there is a nose in the left side and it is desired to reduce the noise.
The filter B is a filter that enlarges gain at a center (0-degree azimuth) seen from the sound processing device and reduces gain in other directions compared to the center area. The filter B is selected, for example, when it is desired to obtain a sound at the center area (0-degree azimuth) seen from the sound processing device, when it is determined that there are noises in both right side and left side and it is desired to reduce the noises, or when noises occur in a wide area and neither filter A nor filter C (later described) can be applied.
The filter C is a filter that significantly reduces gain in the right side (90-degree azimuth) seen from the sound processing device. The filter C is selected, for example, when it is desired to obtain a sound in the left side (−90-degree azimuth) seen from the sound processing device, or when it is determined that there is a noise in the right side and it is desired to reduce the noise.
Here, the description will be continued as an assumption that those filters are switched; however, it may be any configuration as long as each filter is a filter that extracts a sound to be collected and suppresses sounds other than the sound to be collected, and more than one filter like this is provided and switched.
Further, as filters (filter coefficients), a plurality of filters which are set corresponding to a plurality of environmental noises are set in advance, each of the plurality of filters has a fixed coefficient, and one or more filters corresponding to an environmental noise are selected from the plurality of filters of the fixed coefficients.
Here, a description will be continued with an example that the above described three filters are provided. When such three filters are provided, the histogram generated in Second step will be divided into three areas.
In the example illustrated in
Highest signal strengths in the three areas are compared. The highest signal strength in the area A is strength Pa, the highest signal strength in the area B is strength Pb, and the highest signal strength in the area C is strength Pc.
The relationship among the strengths is described as follows.
strength Pb>strength Pa>strength Pc
In a case of such a relationship, it is determined that the strength Pb is the sound from the desired sound source. In other words, in this case, the sound having the strength Pb in the area B is the sound which is desired to be obtained compared to the sounds in other areas.
In this manner, when the strength Pb is a sound desired to be obtained, it is likely that the respective sounds of the remaining strength Pa and strength Pc are noises. When the remaining area A and area C are compared, between the strength Pa in the area A and the strength Pb in the area B, the strength Pa is greater than the strength Pc. In this case, it may be preferable to suppress the noise in the area A, which is a noise and has a great strength.
In other words, in this case, the filter A is selected. With the filter A, the sound in the area A is suppressed and the sounds in the area B and area C are output without being suppressed.
In this manner, a filter is selected by generating a histogram, dividing the histogram into areas corresponding to the number of the filters, and comparing the signal strengths in the divided areas. As described above, since the histogram is generated as accumulating the data in the past, even when itself that a rapid change such as a sudden noise is involved occurs, the histogram can be prevented from being significantly changed due to data of the rapid change.
Thus, in the selection of the filter A, filter B, and filter C, switching to another filter drastically or switching filters frequently can be prevented, so that a stable filter ring is compensated.
Here, in this example, the above description has been given with an example that the number of filters is three; however, it is obvious that the number may be any number other than three. Further, the description has been given that the number of filters and the dividing number of the histogram are the same number; however the numbers may be different numbers.
Further, for example, the filter A and filter C illustrated in
Further, more than one filter group including a plurality of filters may be maintained and a filter group may be selected.
Further, in the above described example, the filter is determined on the basis of the histogram; however, an application range of the present technology is not limited to this method. For example, there may be a method, in which a relationship between a shape of the histogram and a most preferable filter may be learned by using a machine learning algorism in advance and a filter to be selected is determined.
In this example, as illustrated in
As illustrated in
The following explanation will be continued under an assumption that a filter index is output to the filter coefficient storage unit 105 for each frame as illustrated in
The explanation will be given referring back to the flowchart of
In step S105, it is determined whether the filter is changed. For example, in step S104, the filter selection unit 104 sets a filter, stores the set filter index, compares the set filter index with a filter index stored at a previous timing, and determines whether or not the indexes are the same. By executing this process, the process in step S105 is performed.
When it is determined in step S105 that the filter is not changed, the process in step S106 is skipped and the process proceeds to step S107 (
In step S106, the filter coefficient is read from the filter coefficient storage unit 105 and supplied to the beamforming unit 103. The beamforming unit 103 performs beamforming in step S107. Here, the explanation will be given about the beamforming performed in the beamforming unit 103 and a filter index which is used in the beamforming and is read from the filter coefficient storage unit 105.
With reference to
A sound enhancement process may be executed by addition-type beamforming. Delay and Sum beamforming (hereinafter, referred to as DS) is addition-type beamforming and enhances gain of a target sound azimuth.
A sound attenuation process may be executed by attenuation-type beamforming. Null beam forming (hereinafter, referred to as NBF) is attenuation-type beamforming and attenuates gain of a target sound azimuth.
Firstly, with reference to
When a sound enhancement process is performed on the basis of DS beamforming, the beamforming unit 103 has a configuration illustrated in
The sound signal from the microphone 23-1 is provided to the adder 132, and the sound signal from the microphone 23-2 is delayed by a predetermined time by the delay device 131 and provided to the adder 132. The microphone 23-1 and microphone 23-2 are provided apart with a predetermined distance and receive signals with propagation delay times which are different by an amount of a channel difference.
In beamforming, a signal from one of the microphones 23 is delayed so as to compensate a propagation delay related to a signal which comes from a predetermined direction. The delay is performed by the delay device 131. In the DS beamforming illustrated in
In
When the sound waves come from the direction as illustrated in
In the beamforming unit 103 that performs DS beamforming illustrated in
With this, as illustrated in
The target sound of the signal D(f,k) output from the beamforming unit 103 is enhanced compared to the target sound included in the signals x1(f,k) to xm(f,k) input to the beamforming unit 103. Further, the noise of the signal D(f,k) output from the beamforming unit 103 is reduced compared to the noise included in the signals x1(f,k) to xm(f,k), which are input to the beamforming unit 103.
Next, with reference to
When performing the sound attenuation process on the basis of NULL beamforming, the beamforming unit 103 has a configuration as illustrated in
The sound signal from the microphone 23-1 is provided to the subtractor 142, and the sound signal from the microphone 23-2 is delayed with a predetermined time by the delay device 141 and provided to the subtractor 142. The configuration for performing Null beamforming and the configuration for performing DS beamforming described above with reference to
When sound waves come from a direction indicated by the arrows in
In the beamforming unit 103 that performs the NULL beamforming illustrated in
With this, as illustrated in
The target sound of the signal D(f,k) output from the beamforming unit 103 is attenuated compared to the target sound included in the signals x1(f,k) to xm(f,k) input to the beamforming unit 103. Further, the noise included in the signals x1(f,k) to xm(f,k) input to the beamforming unit 103 is in a similar level with the noise of the signal D(f,k) output from the beamforming unit 103.
The beamforming by the beamforming unit 103 can be expressed by the following expressions (1) to (4).
As expressed by the expression (1), signal D(f,k) can be obtained by multiplying the input signals x1(f,k) to xm(f,k) and filter coefficient vector C(f,k). The expression (2) is an expression related to the filter coefficient vector C(f,k), and Cm(f,k) (m=1 to M), which is provided from the filter coefficient storage unit 105 and composes the filter coefficient vector C(f,k), is expressed by the expression (3).
In the expression (3), f is a sampling frequency, n is the number of FFTs, dm is a position of microphone m, θ is an azimuth desired to be emphasized, i is an imaginary unit, and s is a constant number that expresses a sound speed. In the expression (4), the superscript “.T” represents a transposition.
The beamforming unit 103 executes beamforming by assigning a value to the expressions (1) to (4). Here, in this example, the description has been given with DS beamforming as an example; however, a sound enhancement process and a sound attenuation process by other beamforming such as adaptive beamforming or a method other than beamforming may be applied to the present technology.
The description refers back to the flowchart of
In step S108, the correction coefficient calculation unit 107 calculates a correction coefficient from the input signal and the beamformed signal. In step S109, the calculated correction coefficient is supplied from the correction coefficient calculation unit 107 to the signal correction unit 106.
In step S110, the signal correction unit 106 corrects the beamformed signal by using the correction coefficient. The processes in steps S108 to S110, which are processes in the correction coefficient calculation unit 107 and signal correction unit 106, will be described.
As illustrated in
[Mathematical Formula 2]
Z(f,k)=1)(f,k)a(f,k) (5)
In the expression (5), G(f,k) represents a correction coefficient provided from the correction coefficient calculation unit 107. The correction coefficient G(f,k) is calculated by the correction coefficient calculation unit 107. As illustrated in
The correction coefficient calculation unit 107 calculates a correction coefficient in the following two steps.
First Step: Calculation of Signal Change Rate
Second Step: Determination of Gain Value
First Step: Calculation of Signal Change Rate
Regarding the signal change rate, by using the levels of the input signal x(f,k) from the time-frequency conversion unit 102 and the signal D(f,k) from the beamforming unit 103, a change rate Y(f,k), which indicates how much the signal has changed by beamforming, is calculated on the basis of the following expressions (6) and (7).
As written in the expression (6), the change rate Y(f,k) is obtained by a ratio between an absolute value of the beamformed signal D(f,k) and an absolute value of an average value of the input signals x1(f,k) to xm(f,k). The expression (7) is to calculate an average value of the input signals x1(f,k) to xm(f,k).
Second Step: Determination of Gain Value
By using the change rate Y(f,k) obtained in First step, a correction coefficient G(f,k) is determined. The correction coefficient G(f,k) is, for example, determined by using a table illustrated in
[Mathematical Formula 4]
|D(f,k)|<|Xave(frk)| Condition 1
|D(frk)>|Xave(frk)| Condition 2
|D(f,k)|≅|Xave(frk)| Condition 3
The condition 1 is a case that the absolute value of the beamformed signal D(f,k) is equal to or smaller than the absolute value of the average value of the input signals x1(f,k) to xm(f,k). In other words, it is a case that the change rate Y(f,k) is equal to or smaller than 1.
The condition 2 is a case that the absolute value of the beamformed signal D(f,k) is equal to or greater than the absolute value of the average value of the input signals x1(f,k) to xm(f,k). In other words, it is a case that the change rate Y(f,k) is equal to or greater than 1.
The condition 3 is a case that the absolute value of the beamformed signal D(f,k) and the absolute value of the average value of the input signals x1(f,k) to xm(f,k) are the same. In other words, it is a case that the change rate Y(f,k) is 1.
When the condition 1 is satisfied, a correction is performed to further suppress the beamformed signal D(f,k) which has been suppressed in the process by the beamforming unit 103. When the condition 1 is satisfied, the average value of the input signals x1(f,k) to xm(f,k) increases due to a sudden noise occurred in a direction where a noise is being suppressed and becomes greater than the beamformed signal D(f,k).
Thus, a correction is performed to further suppress the beamformed signal D(f,k) and to suppress an effect caused by the increased sound due to the sudden noise.
When the condition 2 is satisfied, a correction is performed to suppress the beamformed signal D(f,k) which has been amplified in the process by the beamforming unit 103. When the condition 2 is satisfied, it is a case that a sudden noise occurs in a direction different from the direction where the noise is being suppressed, and the sudden noise is amplified in the beamforming process so that the beamformed signal D(f,k) becomes larger than the average value of the input signals x1(f,k) to xm(f,k).
Thus, to suppress the sudden noise which is enhanced by beamforming, a correction to suppress the beamformed signal D(f,k) which has been amplified in the process by the beamforming unit 103 is performed.
When the condition 3 is satisfied, a correction is not performed. In this case, since a sudden noise is not occurring, there is no significant change of sounds and the beamformed signal D(f,k) and the average value of the input signals x1(f,k) to xm(f,k) are kept in a substantially same level so that any correction is not needed and a correction is not performed.
Such a correction can prevent a noise from being amplified by mistake when a sudden noise is input, while suppressing the constant noise by the beamforming process.
Here, the table illustrated in
The description refers back to the flowchart in
In step S111, the time-frequency reverse conversion unit 108 converts the time-frequency signal z(f,k) from the signal correction unit 106 into a time signal z(n). The time-frequency reverse conversion unit 108 generates an output signal z(n) by adding frames as shifting the frames. As described above with reference to
In step S113, the generated output signal z(n) is output from the time-frequency reverse conversion unit 108 to an unillustrated processing unit in a later stage.
Here, a brief description of an operation of the above described first-1 sound processing device 100 will be provided again with reference to
The first section 151 is a part to reduce a constant noise such as a fan noise of a projector and a noise of an air conditioner, by beamforming. In the first section 151, the filter maintained in the filter coefficient storage unit 105 is a linear filter and this realizes a high quality sound and a stable operation.
Further, by the process in the first section 151, a follow-up process is executed to select a most preferable filter according to need when an azimuth of a noise changes or when the position of the sound processing device 100 itself changes for example, and its follow-up speed (accumulation time to create a histogram) can be set by the designer arbitrarily. When the follow-up speed is set properly, the process can be performed without a sudden change of the sound and an uncomfortable feeling caused during listening, which may occur in a case of adaptive beamforming for example.
The second section 152 is a part to reduce a sudden noise which comes from a direction other than the azimuth being attenuated by beamforming. In addition, a process to further reduce the constant noise which has been reduced by beamforming is executed according to the situation.
Here, operations by the first section 151 and second section 152 will be further described with reference to
At time T1, the filter A described above with reference to
At time T2, it is assumed that a sudden noise 172 occurs in a direction of 90 degrees. Also at time T2, the filter A is applied and the sound from the direction of 90 degrees is amplified (in a condition with a high gain). When a sudden noise occurs in the direction being amplified, the sudden noise is also amplified.
However, since a correction to reduce the gain by the increased amount is performed by the signal correction unit 106, the final output sound is a sound in which an increased sound due to sudden noise is prevented.
In other words, in this case, even when a process to amplify the sudden noise is performed in the first section 151 (
At time T3, the constant noise moves since the orientation of the sound processing device 100 is changed or the sound source of the noise moves for example, and this causes a condition that the constant noise 173 is in the direction of 90 degrees. When a predetermined period of time, that is, the time for accumulating to create a histogram, has passed since the above condition was caused, the filter is switched from the filter A to filter C to react the change.
When the sound source of the noise moves in this manner, the filter can be properly switched according to the direction of the sound source and frequent filter switching can be prevented.
According to the present technology that can perform a process in this manner, while suppressing a constant noise, a sudden noise, which occurs in a different direction, can be also reduced. Further, the noise can be suppressed even when the noise is not generated at a point sound source but is widespread in a space. Further, stable operation can be achieved without a rapid change in a sound quality caused in an adaptive beamforming of the related art.
Further, since it is not needed to detect a sound zone, the above described effects can be achieved regardless of the accuracy of the sound zone detection.
Further, according to the present technology, for example, since a target sound can be obtained only with a small omnidirectional microphone and signal processing without using a directional microphone (shotgun microphone) which has a large body, this helps to make a smaller and lighter product. Further, the present technology may also be applied in a case that a directional microphone is used and may also operate in the case that the directional microphone is used, so that a higher performance can be expected.
Further, since the desired sound can be collected as reducing the effect due to the constant noise and sudden noise, an accuracy of sound processing such as a sound recognition rate can be improved.
<Internal Configuration and Operation of First-2 Sound Processing Device>
Next, a configuration and an operation of a first-2 sound processing device will be described. The above described first-1 sound processing device 100 (
The sound processing device 200 illustrated in
As the information, which is needed to select a filter and provided to the filter instruction unit 201, for example, information input by the user is used. For example, there may be a configuration that a user selects a direction of a sound the user desires to collect and the selected information is input.
For example, a screen illustrated in
The options are an area 221 on the left, an area 222 in the middle, and an area 223 on the right. The user looks at the message and the options and selects a direction of the sound the user desires to collect from the options. For example, when the sound desired to be collected is in the middle (front), the area 222 is selected. Such a screen may be shown to the user and the user may select a direction of the sound the user desires to collect.
In this example, a direction of the sound to be collected is selected; however, for example, a message like “Which direction a large noise exists in?” may be displayed to let the user select a direction of a noise.
Further, a list of filters may be displayed, a user may select a filter from the list, and the selected information may be input. For example, although it is not illustrated, a list of filters may be displayed, on the display 22 (
Or, the sound processing device 200 may include a switch for switching a filter and information of an operation on the switch may be input.
The filter instruction unit 201 obtains such information and instructs a filter coefficient index used in beamforming to the filter coefficient storage unit 105, on the basis of the obtained information.
An operation of the sound processing device 200, which has the above described configuration, will be described with reference to the flowcharts in
Each process described in steps S201 to S203 (
In the first-1 sound processing device 100, a process to determine a filter is executed in step S104; however, such a process is not needed in the first-2 sound processing device 200 and the process is omitted in the process flow. Then, in the first-2 sound processing device 200, in step S204, it is determined whether or not there is an instruction to change the filter.
In step S204, when it is determined that there is an instruction to change the filter, for example, when an instruction is received from the user in the above described method, the process proceeds to step S205, and, when it is determined that there is not an instruction to change the filter, the process in step S205 is skipped and the process proceeds to step S206 (
The process in step S205, similarly to step S106 (
Since each process in steps S206 to S212 (
In this manner, in the first-2 sound processing device 200, the information used to select a filter is input from outside (by a user). Also in the first-2 sound processing device 200, similarly to the first-1 sound processing device 100, a proper filter can be selected and a sudden noise or the like can be properly handled so that an accuracy of sound processing such as a sound recognition rate can be improved.
<Internal Configuration and Operation of Second Sound Processing Device>
<Internal Configuration of Second-1 Sound Processing Device>
The beamforming unit 301 includes a main beamforming unit 302 and a secondary beamforming unit 303. The parts having a function similar to that in the sound processing device 100 illustrated in
The sound processing device 300 according to the second embodiment is different from the sound processing device 100 according to the first embodiment in that the beamforming unit 103 (
As illustrated in
The beamforming unit 301 includes the main beamforming unit 302 and secondary beamforming unit 303 to prevent a sound from being changed at a moment when the filter coefficient C(f,k) provided from the filter coefficient storage unit 105 is switched. The beamforming unit 301 performs the following operation.
Normal condition (a condition that filter coefficient C(f,k) is not switched)
Only the main beamforming unit 302 of the beamforming unit 301 operates and the secondary beamforming unit 303 stays without operating.
Case that the filter coefficient C(f,k) is switched
Both of the main beamforming unit 302 and secondary beamforming unit 303 in the beamforming unit 301 operate, the main beamforming unit 302 executes a process with a previous filter coefficient (a filter coefficient before switching), and the secondary beamforming unit 303 executes a process with a new filter coefficient (a filter coefficient after the switching).
After a predetermined frame (a predetermined period of time), which is t frame in this example, has passed, the main beamforming unit 302 starts an operation with a new filter coefficient and the secondary beamforming unit 303 stops operation. Here, “t” is the number of transition frames and is set arbitrarily.
From the beamforming unit 301, when the filter coefficient C(f,k) is switched, beamformed signals are each output from the main beamforming unit 302 and secondary beamforming unit 303. The signal transition unit 304 executes a process to mix the signals each output from the main beamforming unit 302 and secondary beamforming unit 303.
When mixing, the signal transition unit 304 may perform the process with a fixed mixing ratio or may perform the process as changing the mixing ratio. For example, immediately after the filter coefficient C(f,k) is switched, the process is performed with a mixing ratio with more signals from the main beamforming unit 302 than signals from the secondary beamforming unit 303, and after that, the ratio to mix the signals from the main beamforming unit 302 is gradually reduced, and the mixing ratio is switched to a mixing ratio with more signals from the secondary beamforming unit 303.
In this manner, when the filter coefficient is changed, by mixing the respective signals from the main beamforming unit 302 and secondary beamforming unit 303 with a predetermined mixing ratio, even if the filter coefficient changes, the user does not have to have an uncomfortable feeling in the output signals. The signal transition unit 304 performs the following operation.
Normal condition (a condition that the filter coefficient C(f,k) is not changed)
The signals from the main beamforming unit 302 are simply output to the signal correction unit 106.
Until t frame passes after the filter coefficient C(f,k) is switched
The signals from the main beamforming unit 302 and the signals from the secondary beamforming unit 303 are mixed on the basis of the following expression (8) and the mixed signals are output to the signal correction unit 106.
[Mathematical Formula 5]
D(f,k)=αDmain(f,k)+(1−α)Dsub(f,k) (8)
In the expression (8), a is a coefficient that takes a value from 0.0 to 1.0, and is a value set by the designer arbitrarily. The coefficient α is a fixed value and a same value may be used until t frame passes after the filter coefficient C(f,k) is switched.
Or, the coefficient α may be a variable value and may be a value which is set to be 1.0 when the filter coefficient C(f,k) is switched, reduces as the time passes, and set to be 0.0 when t frame passes, for example.
According to the expression (8), the output signal D(f,k) from the signal transition unit 304 after the filter coefficient has been switched is a signal which is calculated by adding a signal that a is multiplied to the signal Dmain(f,k) from the main beamforming unit 302 and a signal that (1−α) is multiplied to the signal Dsub(f,k) from the secondary beamforming unit 303.
An operation of the sound processing device 300 including the main beamforming unit 302, secondary beamforming unit 303, as well as the signal transition unit 304 in this manner will be described with reference to the flowcharts of
In steps S301 to S305, processes by the sound collection unit 101, time-frequency conversion unit 102, and filter selection unit 104 are executed. Since the processes in steps S301 to S305 are performed similarly to the processes in steps S101 to S105 (
In step S305, when it is determined that the filter is not changed, the process proceeds to step S306. In step S306, the main beamforming unit 302 performs a beamforming process by using a filter coefficient C(f,k) which is set at the time. In other words, the process with the filter coefficient which is set at the time is continued.
The beamformed signal from the main beamforming unit 302 is supplied to the signal transition unit 304. In this case, since the filter coefficient is not changed, the signal transition unit 304 simply outputs the supplied signal to the signal correction unit 106.
In step S312, the correction coefficient calculation unit 107 calculates a correction coefficient from an input signal and a beamformed signal. Since each process performed by the signal correction unit 106, correction coefficient calculation unit 107, and time-frequency reverse conversion unit 108 in steps S312 to S317 is performed similarly to the process executed by the first-1 sound processing device 100 insteps S108 to S113 (
On the other hand, in step S305, when it is determined that a filter is changed, the process proceeds to step S306. In step S306, the filter coefficient is read from the filter coefficient storage unit 105 and supplied to the secondary beamforming unit 303.
In step S307, the beamforming process is executed by each of the main beamforming unit 302 and secondary beamforming unit 303. The main beamforming unit 302 executes beamforming with a filter coefficient before the filter is changed (hereinafter, referred to as a previous filter coefficient), and the secondary beamforming unit 303 executes beamforming with a filter coefficient after the filter is changed (hereinafter, referred to as a new filter coefficient).
In other words, the main beamforming unit 302 continues the beamforming process without changing the filter coefficient, and the secondary beamforming unit 303 starts a beamforming process in step S307 by using a new filter coefficient provided from the filter coefficient storage unit 105.
When the beamforming process is performed in each of the main beamforming unit 302 and secondary beamforming unit 303, the process proceeds to step S309 (
In step S310, it is determined whether or not the number of signal transition frames has passed and, when it is determined that the number of signal transition frames has not passed, the process returns to step S309 and repeats the processes in step S309 and subsequent steps. In other words, until it is determined that the number of signal transition frames has passed, the signal transition unit 304 performs a process of mixing the signal from the main beamforming unit 302 and the signal from the secondary beamforming unit 303 and outputting the signals.
Here, since the time when it is determined that the filter coefficient is switched and until it is determined that the number of the signal transition frames has passed, processes in step S312 to S317 are performed on the output from the signal transition unit 304 and the signal are continued to be supplied to an unillustrated processing unit in a later stage.
In step S310, when it is determined that the number of the signal transition frames has passed, the process proceeds to step S311. In step S311, a process to transfer a new filter coefficient to the main beamforming unit 302 is executed. After that, the main beamforming unit 302 starts a beamforming process by using the new filter coefficient, and the secondary beamforming unit 303 stops the beamforming process.
By mixing the signal from the main beamforming unit 302 and the signal from the secondary beamforming unit 303 in this manner when the filter coefficient is changed, the output signal can be prevented from being suddenly changed and the user does not have to have an uncomfortable feeling in the output signals even if the filter coefficient is changed.
Further, the above described effects of the first-1 sound processing device 100 and first-2 sound processing device 200 can be obtained with the second-1 sound processing device 300.
<Internal Configuration and Operation of Second-2 Sound Processing Device>
Next, an internal configuration and operation of a second-2 sound processing device will be described. The above described second-1 sound processing device 300 (
The sound processing device 400 illustrated in
The filter instruction unit 401 may have a configuration same as that of the filter instruction unit 201 of the first-2 sound processing device 200.
As the information, which is needed to select a filter and supplied to the filter instruction unit 401, for example, information input by a user is used. For example, there may be a configuration that the user is made to select a direction of a sound the user desires to collect and the selected information is input.
For example, the above described screen illustrated in
Or, a list of filters may be displayed, the user may select a filter from the list, and the selected information may be input. Or, a switch (not illustrated) for switching filters may be provided to the sound processing device 400 and information of an operation on the switch may be input.
The filter instruction unit 401 obtains such information and, instructs from the obtained information, a filter coefficient index used in beamforming to the filter coefficient storage unit 105.
An operation of the sound processing device 400 having such a configuration will be explained with reference to the flowcharts of
Each process of steps S401 to S403 (
In other words, the second-1 sound processing device 300 preforms a process of determining a filter in step S304; however, such a process is not needed in the second-2 sound processing device 400 and the process is omitted in the flowchart. Then, in the second-2 sound processing device 400, in step S404, it is determined whether or not there is an instruction to change the filter.
When it is determined in step S404 that there is not an instruction to change the filter, the process proceeds to step S405 and, when it is determined that there is an instruction to change the filter, the process proceeds to step S406.
Since each process insteps S405 to S416 (
In this manner, in the second-2 sound processing device 400, information used to select a filter is input from outside (by the user). Similarly to the first-1 sound processing device 100, first-2 sound processing device 200, and second-1 sound processing device 300, also in the second-2 sound processing device 400, a proper filter can be selected and an occurrence of a sudden noise or the like can be properly handled so that the accuracy of the sound processing such as a sound recognition rate can be improved.
Further, similarly to the second-1 sound processing device 300, also in the second-2 sound processing device 400, the user does not have to have an uncomfortable feeling in the output signals even if the filter coefficient is changed.
<About Recording Medium>
The series of the above described processes may be executed by hardware or may be executed by software. When the series of the processes is executed by software, a program composing the software is installed to a computer. Here, the computer may be a computer mounted in dedicated hardware, a general personal computer which executes various functions by installing various programs, or the like.
The input unit 1006 is composed of a keyboard, a mouse, a microphone, or the like. The output unit 1007 is composed of a display, a speaker, or the like. The storage unit 1008 is composed of a hard disk, a non-volatile memory, or the like. The communication unit 1009 is composed of a network interface, or the like. The driver 1010 drives a removable medium 1011 such as a magnetic disk, an optical disk, a magnetic optical disk, a semiconductor memory, or the like.
In the computer having an above described configuration, for example, the above described series of processes is performed by the CPU 1001 by loading a program stored in the storage unit 1008 to the RAM 1003 via the input/output interface 1005 and bus 1004 and executing the program.
The program executed by the computer (CPU 1001) can be recorded in the removable medium 1011 as a packaged medium or the like and provided for example. Further, the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
In the computer, the program can be installed to the storage unit 1008 via the input/output interface 1005 by attaching the removable medium 1011 to the driver 1010. Further, the program can be received by the communication unit 1009 via a wired or wireless transmission medium and installed to the storage unit 1008. In addition, the program may be installed to the ROM 1002 or storage unit 1008 in advance.
Here, the program executed by the computer may be a program that the processes are executed in chronological order according to the order described in this specification or may be a program that the processes are executed in parallel or at necessary timings according to a call.
Further, in this specification, the system represents an entire device composed of a plurality of devices.
Here, the effects described in this specification are examples and do not set any limitation, and there may be another effect.
Here, embodiments according to the present technology are not limited by the above described embodiments and various modifications can be made within a scope of the present technology.
Here, the present technology may have the following configurations.
(1)
A sound processing device including:
a sound collection unit configured to collect a sound;
an application unit configured to apply a predetermined filter to a signal of the sound collected by the sound collection unit;
a selection unit configured to select a filter coefficient of the filter applied by the application unit; and
a correction unit configured to correct the signal from the application unit.
(2)
The sound processing device according to (1), wherein the selection unit selects the filter coefficient on the basis of the signal of the sound collected by the sound collection unit.
(3)
The sound processing device according to (1) or (2), wherein the selection unit creates, on the basis of the signal of the sound collected by the sound collection unit, a histogram which associates a direction where the sound occurs and a strength of the sound and selects the filter coefficient on the basis of the histogram.
(4)
The sound processing device according to (3), wherein the selection unit creates the histogram on the basis of signals accumulated for a predetermined period of time.
(5)
The sound processing device according to (3), wherein the selection unit selects a filter coefficient of a filter that suppresses the sound in an area other than an area including a largest value in the histogram.
(6)
The sound processing device according to any of (1) to (5), further including a conversion unit configured to convert the signal of the sound collected by the sound collection unit into a signal of a frequency range,
wherein the selection unit selects the filter coefficient for all frequency bands by using the signal from the conversion unit.
(7)
The sound processing device according to any of (1) to (5), further including a conversion unit configured to convert the signal of the sound collected by the sound collection unit into a signal of a frequency range,
wherein the selection unit selects the filter coefficient for each frequency band by using the signal from the conversion unit.
(8)
The sound processing device according to any of (1) to (7),
wherein the application unit includes a first application unit and a second application unit,
the sound processing device further includes a mixing unit configured to mix signals from the first application unit and the second application unit,
when a first filter coefficient is switched to a second filter coefficient, a filter with the first filter coefficient is applied in the first application unit and a filter with the second filter coefficient is applied in the second application unit, and
the mixing unit mixes the signal from the first application unit and a signal from the second application unit with a predetermined mixing ratio.
(9)
The sound processing device according to (8), wherein, after a predetermined period of time has passed, the first application unit starts a process in which the filter with the second filter coefficient is applied and the second application unit stops processing.
(10)
The sound processing device according to (1), wherein the selection unit selects the filter coefficient on the basis of an instruction from a user.
(11)
The sound processing device according to any of (1) to (10), wherein
the correction unit
performs a correction to further suppress a signal which has been suppressed in the application unit when the signal of the sound collected by the sound collection unit is smaller than the signal to which a predetermined filter is applied by the application unit, and
performs a correction to suppress a signal which has been amplified by the application unit when the signal of the sound collected by the sound collection unit is larger than the signal to which a predetermined filter is applied by the application unit.
(12)
The sound processing device according to any of (1) to (11),
wherein
the application unit suppresses a constant noise, and
the correction unit suppresses a sudden noise.
(13)
A sound processing method including:
collecting a sound;
applying a predetermined filter to a signal of the collected sound;
selecting a filter coefficient of the applied filter; and
correcting the signal to which the predetermined filter is applied.
(14)
A program that causes a computer to execute a process including the steps of:
collecting a sound;
applying a predetermined filter to a signal of the collected sound;
selecting a filter coefficient of the applied filter; and
correcting the signal to which the predetermined filter is applied.
Number | Date | Country | Kind |
---|---|---|---|
2014-228896 | Nov 2014 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2015/080481 | 10/29/2015 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2016/076123 | 5/19/2016 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6868365 | Balan | Mar 2005 | B2 |
8666090 | Townsend | Mar 2014 | B1 |
9191738 | Niwa | Nov 2015 | B2 |
20030063759 | Brennan | Apr 2003 | A1 |
20040013038 | Kajala | Jan 2004 | A1 |
20100215184 | Buck | Aug 2010 | A1 |
20130142359 | Murata et al. | Jun 2013 | A1 |
20160088392 | Huttunen | Mar 2016 | A1 |
Number | Date | Country |
---|---|---|
103166597 | Jun 2013 | CN |
2001-100800 | Apr 2001 | JP |
2010-091912 | Apr 2010 | JP |
2013-120987 | Jun 2013 | JP |
Entry |
---|
Tatsuta, et al., “Blind Source Separation by the method of Orientation Histograms”, IEICE Technical Report, Jun. 2005, 06 pages. |
Tatsuta, et al., “Blind Source Separation by the Method of Orientation Histograms”, Technical Report of IEICE, The Institute of Electronics, Information and Communication Engineers Jun. 2005, pp. 1-6. |
International Search Report and Written Opinion of PCT Application No. PCT/JP2015/080481, dated Feb. 2, 2016, 12 pages of English Translation and 10 pages of ISRWO. |
International Preliminary Report on Patentability of PCT Application No. PCT/JP2015/080481, dated May 26, 2017, 12 pages of English Translation and 6 pages of IPRP. |
Number | Date | Country | |
---|---|---|---|
20170332172 A1 | Nov 2017 | US |