SOUND RECOGNITION DEVICE AND SOUND RECOGNITION METHOD

BACKGROUND OF THE INVENTION

(1) Field of the Invention

The present invention relates to a sound recognition device which discriminates between a periodic sound, such as engine sound or voice, and an aperiodic sound, such as wind noise, rain sound, or background noise, to determine a frequency signal of the periodic or aperiodic sound.

(2) Description of the Related Art

The following are the sound recognition technologies having conventionally been employed.

Japanese Unexamined Utility Model Application Publication No. 5-92767 discloses a technology to sense a nearby vehicle present around a user's vehicle by detecting a sound of the nearby vehicle. This technology is referred to as a first conventional technology hereafter. The first conventional technology uses a spectral subtraction method (referred to as the SS method hereafter) to eliminate an engine sound of the user's vehicle and ambient noise. Then, this technology senses the nearby vehicle on the basis of power of a sound signal from which the noises have been eliminated, and detects a direction of the nearby vehicle on the basis of an arrival time difference between the engine sounds received by microphones.

Moreover, Japanese Patent No. 4310371, for example, discloses a technology related to eliminating noises such as wind noise. This technology is referred to as a second conventional technology. The second conventional technology focuses on differences of temporal fluctuations in the phases of sound signals, and accordingly discriminates between a periodic sound, such as voice, and an aperiodic sound, such as wind noise.

SUMMARY OF THE INVENTION

The first conventional technology uses the SS method to eliminate the noises. In the SS method, frequency analysis is performed on sounds included in a certain period of time, and then power for each obtained frequency is subtracted as noise to extract a sound included in the certain period of time. For doing so, it is necessary to estimate the noises beforehand. In the case where a sound having steady power is present in the ambient noise, the noise can be estimated and thus eliminated. However, an unsteady noise, such as wind noise, fluctuates in power over time. The SS method is not robust enough to such an unsteady noise, and cannot accurately discriminate between the wind noise and the vehicle sound.

The second conventional technology recognizes a periodic sound on the basis of characteristics that the periodic sound, such as an engine sound, is approximately constant in frequency and is constant in phase with respect to the time.

When the vehicle is running at a constant speed and the number of engine revolutions is constant (meaning that the frequency of the engine sound is constant with respect to the time), the periodic sound can be recognized.

However, when the number of engine revolutions fluctuates according to acceleration or deceleration of the vehicle, the recognition accuracy needs to be improved so as to respond to the temporal fluctuations in frequency. In particular, in the case of, for example, an application for detecting a vehicle present in a blind spot of the user's vehicle, it is important to accurately detect, for supporting safer driving, an accelerating vehicle which may cause a serious accident with a high probability.

The present invention is conceived in view of the stated problem, and has an object to provide a sound recognition device which discriminates between a periodic sound, such as engine sound or voice, and an aperiodic sound, such as wind noise, rain sound, or background noises, to determine a frequency signal of the periodic or aperiodic sound, and to provide particularly a sound recognition device which accurately recognizes the periodic sound fluctuating in frequency over time.

In order to achieve the aforementioned object, the sound recognition device according to an aspect of the present invention is a sound recognition device including: a frequency analysis unit which analyzes a frequency signal of a sound signal; a phase curve calculation unit which calculates a phase curve approximating temporal fluctuations of a phase of the frequency signal; an error calculation unit which calculates an error between the phase curve and the phase of the frequency signal; and a sound signal recognition unit which recognizes whether or not the sound signal is a signal of a periodic sound, based on the calculated error, wherein the phase curve is expressed by a quadratic polynomial in which a value of the phase is a variable.

When the frequency fluctuates over time, the phase also fluctuates over time. The temporal phase fluctuations can be represented by a phase curve. Based on the error with respect to the phase curve, the sound signal can be determined as being of a periodic sound or not. As a result, the sound recognition device can discriminate between a periodic sound, such as engine sound or voice, and an aperiodic sound, such as wind noise, rain sound, or background noise, to determine a frequency signal of the periodic or aperiodic sound. In particular, the sound recognition device can accurately recognize the periodic sound fluctuating in frequency over time.

When the frequency fluctuations of the sound signal can be expressed by a linear equation, the phase fluctuations can be expressed by a quadratic polynomial. Thus, the phase curve can be expressed using a curve represented by a quadratic polynomial, so that the phase fluctuations can be expressed with accuracy.

Preferably, the sound recognition device may further include a phase modification unit which modifies a phase which is different from a predetermined number of phases, by adding ±2 π*m (radian), where m is a natural number, to the phase to reduce a difference between the phase and the predetermined number of phases.

With this, the phase which is significantly shifted with respect to the phases at other times can be modified, so that the sound recognition can be performed with accuracy.

Moreover, the sound recognition device may further include a phase modification unit which modifies the phase of the frequency signal by adding ±2 π*m (radian), where m is a natural number, to the phase to include the phase within an angular range, the modification being performed for each of different angular ranges, wherein the phase curve calculation unit calculates the phase curve for each of the angular ranges, the error calculation unit calculates the error for each of the angular ranges, the phase modification unit further selects one of the angular ranges in which the error is a minimum, and the sound signal recognition unit recognizes whether or not the sound signal is the signal of the periodic sound, based on the error in the selected angular range.

With this, the phase which is significantly shifted with respect to the phases at other times can be modified, so that the sound recognition can be performed with accuracy.

More preferably, the frequency analysis unit may analyze the frequency signal for each of a plurality of sound signals received, respectively, by a plurality of microphones arranged at a distance from each other, and the sound recognition device may further include a direction detection unit which detects a sound source direction of the periodic sound on the basis of an arrival time difference between the sound signals received by the microphones, when the sound signal recognition unit recognizes that the sound signal received by at least one of the microphones is the signal of the periodic sound.

When the periodic sound is recognized, a direction of an approaching vehicle is detected from the arrival time difference between the sound signals received by the microphones. Thus, the direction of the approaching vehicle can be accurately detected without being influenced by the noises.

It should be noted that the present invention can be implemented not only as a sound recognition device including the characteristic units as described above, but also as a sound recognition method having, as steps, the characteristic processing units included in the sound recognition device. Also, the present invention can be implemented as a computer program causing a computer to execute the characteristic steps included in the sound recognition method. It should be obvious that such a computer program can be distributed via a recording medium such as a Compact Disc-Read Only Memory (CD-ROM) or via a communication network such as the Internet.

The sound recognition device according to the present invention is capable of discriminating between a periodic sound, such as engine sound or voice, and an aperiodic sound, such as wind noise, rain sound, or background noises, to determine a frequency signal of the periodic or aperiodic sound. In particular, the present invention can provide a sound recognition device which accurately recognizes the periodic sound fluctuating in frequency over time.

Further Information About Technical Background to this Application

The disclosure of Japanese Patent Application No. 2010-025930 filed on Feb. 8, 2010 including specification, drawings and claims is incorporated herein by reference in its entirety.

The disclosure of PCT application No. PCT/W2011/000036 filed on Jan. 7, 2011, including specification, drawings and claims is incorporated herein by reference in its entirety.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other objects, advantages and features of the invention will become apparent from the following description thereof taken in conjunction with the accompanying drawings that illustrate a specific embodiment of the invention. In the Drawings:

FIG. 1 is a diagram explaining a phase according to the present invention;

FIG. 2 is a diagram explaining a phase according to the present invention;

FIG. 3 is a diagram explaining an engine sound;

FIG. 4 is a diagram explaining a phase of an engine sound in the case where the number of engine revolutions is constant;

FIG. 5 is a diagram explaining a phase of an engine sound in the case where the number of engine revolutions increases and a vehicle thus accelerates;

FIG. 6 is a diagram explaining a phase of an engine sound in the case where the number of engine revolutions decreases and a vehicle thus decelerates;

FIG. 7 is a block diagram showing an entire configuration of a noise elimination device in a first embodiment according to the present invention;

FIG. 8 is a block diagram showing a configuration of a sound determination unit included in the noise elimination device, in the first embodiment according to the present invention;

FIG. 9 is a flowchart showing an operational procedure executed by the noise elimination device in the first embodiment according to the present invention;

FIG. 10 is a flowchart showing an operational procedure for determining a frequency signal of an extracted sound in the first embodiment according to the present invention;

FIG. 11 is a diagram explaining a frequency analysis;

FIG. 12 is a diagram explaining an engine sound and a wind noise;

FIG. 13 is a diagram explaining a phase modification process;

FIG. 14 is a diagram explaining a phase modification process;

FIG. 15 is a diagram explaining a process of calculating a phase curve;

FIG. 16 is a diagram explaining a process of calculating a phase distance;

FIG. 17 is a diagram explaining a phase curve of an engine sound;

FIG. 18 is a diagram explaining an error with respect to the phase curve;

FIG. 19 is a diagram explaining a process of extracting an engine sound;

FIG. 20 is a diagram explaining a phase modification process;

FIG. 21 is a diagram explaining a phase modification process;

FIG. 22 is a block diagram showing an entire configuration of a vehicle detection device in a second embodiment according to the present invention;

FIG. 23 is a block diagram showing a configuration of a sound determination unit of the vehicle detection device in the second embodiment according to the present invention;

FIG. 24 is a flowchart showing an operational procedure executed by the vehicle detection device in the second embodiment according to the present invention; and

FIG. 25 is a flowchart showing an operational procedure for determining a frequency signal of an extracted sound in the second embodiment according to the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention focuses attention on characteristics of temporal frequency fluctuations of a periodic sound such as engine sound or voice. The inventors of the present invention analyzed the sound-generating mechanism and the data of sound actually collected. As a result, the inventors made a new finding that the temporal frequency fluctuations of the periodic sound in a time-frequency domain can be approximated by a piecewise linear function. From this new finding, the inventors further found that the temporal phase fluctuations which have been piecewise-linearly approximated can be modeled by a curve. Thus, the periodic sound can be recognized with accuracy even when the frequency fluctuates over time. It should be noted that the periodic sound in the present invention refers to a sound whose phase is constant or whose phase fluctuations are cyclic.

Here, the term “phase” used in the present invention is defined with reference to FIG. 1. In (a) of FIG. 1, an example of a received engine sound is schematically shown. The horizontal axis represents time whereas the vertical axis represents amplitude. This diagram shows a case, as an example, where the number of engine revolutions is constant with respect to the time and the frequency of the engine sound does not fluctuate.

Moreover, (b) of FIG. 1 shows a sine wave at a predetermined frequency f which is a base waveform used when a frequency analysis is performed via a Fourier transform (in this example, a value which is the same as the frequency of the engine sound is used as the predetermined frequency f). The horizontal axis and the vertical axis are the same as those in (a) of FIG. 1. A frequency signal (phase) is obtained by the convolution process performed on this base waveform and the received engine sound. In the present example, by performing the convolution process on the received engine sound while the base waveform is fixed without being shifted in the direction of the time axis, the frequency signal (phase) is obtained for each of the times.

The result obtained by this process is shown in (c) of FIG. 1. The horizontal axis represents time and the vertical axis represents phase. In this example, the number of engine revolutions is constant with respect to the time, and the frequency of the received engine sound is constant with respect to the time. In other words, the phase at the predetermined frequency f does not increase at an accelerating rate nor decrease at an accelerating rate. In the present example, the value which is the same as the frequency of the engine sound whose number of revolutions is constant is used as the predetermined frequency f. In the case where a value smaller than the frequency of the engine sound is used as the predetermined frequency f, the phase increases like a linear function. In the case where a value greater than the frequency of the engine sound is used as the predetermined frequency f, the phase decreases like a linear function. In either of these cases, the phase at the predetermined frequency f does not increase at an accelerating rate nor decrease at an accelerating rate.

It should be noted that, in the sound signal processing, the Fast Fourier Transform (FFT), and the like, it is common to perform the convolution process while the base waveform is being shifted in the direction of the time axis. In the case where the convolution process is performed while the base waveform is being shifted in the direction of the time axis, the phase can be modified later to be converted into a phase defined in the present invention. The explanation is given as follows, with reference to the drawings.

FIG. 2 is a diagram explaining a phase. In (a) of FIG. 2, an example of a received engine sound is schematically shown. The horizontal axis represents time whereas the vertical axis represents amplitude.

Moreover, (b) of FIG. 2 shows a sine wave at a predetermined frequency f which is a base waveform used when a frequency analysis is performed via a Fourier transform (in this example, a value which is the same as the frequency of the engine sound is used as the predetermined frequency f). The horizontal axis and the vertical axis are the same as those in (a) of FIG. 2. A frequency signal (phase) is obtained by the convolution process performed on this base waveform and the received engine sound. In the present example, by performing the convolution process on the received engine sound while the base waveform is being shifted in the direction of the time axis, the frequency signal (phase) is obtained for each of the times.

The result obtained by this process is shown in (c) of FIG. 2. The horizontal axis represents time and the vertical axis represents phase. In this example, since the received engine sound is at the frequency f, the pattern of the phase at the frequency f is cyclically repeated in a cycle of 1/f. When the phase cyclically repeated in the calculated phase ψ (t) is modified (that is, modified to a phase ψ′ (t)=mod 2 π(ψ(t)−2 π f t) (where f is the analysis-target frequency)), a phase shown in (d) of FIG. 2 is obtained. More specifically, the phase modification process can convert the phase into the phase defined in the present invention as shown in (c) of FIG. 1.

Next, an explanation is given about temporal fluctuations in the frequency of the engine sound. The frequency of the engine sound fluctuates as the number of engine revolutions fluctuates over time.

FIG. 3 is a conceptual diagram explaining the characteristics of the following embodiments.

FIG. 3 is a diagram showing a spectrogram obtained as a result of an analysis performed on the engine sound of a vehicle by a Discrete Fourier Transform (DFT) analysis unit 2402 which is described later. The horizontal axis represents time whereas the vertical axis represents frequency. The color density of the spectrogram represents the magnitude of power of a frequency signal. When the color is darker (i.e., closer to black), the power of the frequency signal is greater. FIG. 3 shows data in which noise such as wind noise has been eliminated as much as possible and, therefore, the darker parts (i.e., the blackish parts) basically indicate the engine sound. Generally speaking, the engine sound can be represented by the data of the revolutions fluctuating over time, as shown in FIG. 3. From the spectrogram, it can be seen that the frequency fluctuates over time.

In an engine, a predetermined number of cylinders make piston motion to cause revolutions to a powertrain. The engine sound from the vehicle includes: a sound dependent on the engine revolutions; and a fixed vibration sound and an aperiodic sound which are independent of the engine revolutions. In particular, the sound mainly detected from the outside of the vehicle is the periodic sound dependent on the engine revolutions. In the following embodiments, the periodic sound dependent on the engine revolutions is extracted as the engine sound.

It can be seen from dashed-line circles 501, 502, and 503 in FIG. 3 that, as the number of engine revolutions fluctuates, the frequency of the engine sound fluctuates, period by period, with respect to the time. Here, attention is focused on the fluctuations in the frequency. As can be seen, the frequency seldom randomly fluctuates and is seldom discretely scattered. On this account, the frequency fluctuations in a certain time period can be approximated by a linear model. Thus, the engine sound can be approximated by a piecewise linear function represented by Equation 1 as follows.

f(t)=At+f₀ (Equation 1)

To be more specific, the frequency f at a time t can be linearly approximated using a line segment which increases or decreases from an initial value f₀in proportion to the time t (i.e., a proportionality coefficient A) in a predetermined time period. For example, when the vehicle is accelerating, the number of engine revolutions generally increases almost linearly. In a period B showing the frequency fluctuations of when the vehicle is accelerating, the frequency increases, that is, rises to the right. During the period B, the number of engine revolutions is increasing, meaning that the vehicle is accelerating. Thus, the frequency of this engine sound can be approximated by a piecewise linear function where a slope A is positive. When the vehicle is decelerating, the number of engine revolutions decreases linearly. In a period A showing the frequency fluctuations of when the vehicle is decelerating, the frequency decreases, that is, falls to the right. Thus, the frequency of this engine sound can be approximated by a piecewise linear function where the slope A is negative. When the vehicle is running at a constant speed, the number of engine revolutions remains constant. In a period C showing the frequency fluctuations of when the vehicle is running at the constant speed, the frequency remains approximately constant. Thus, the frequency of this engine sound can be approximated by a piecewise linear function where the slope A is zero.

When the frequency f is expressed by Equation 1 above, the phase ψ at the time t can be expressed as follows.

ψ(t)=2π∫f(t)dt=π∫(At+f₀)dt=πAt²+2πf₀t+ψ₀ (Equation 2)

In Equation 2, ψ₀in the third term on the right-hand side indicates an initial phase, and the second term (2 π f₀t) indicates that the phase advances by an angular frequency 2 π f₀t in proportion to the time t. Also, the first term (π A t²) indicates that the phase can be approximated by a quadratic curve.

As described above, the temporal phase fluctuations of the periodic sound, such as an engine sound, can be modeled by a curve. On the other hand, the temporal phase fluctuations of the aperiodic sound, such as wind noise, are random and show no periodicity, meaning that these fluctuations cannot be approximated by a quadratic curve. The inventors of the present invention noted the difference of the temporal phase fluctuations between the periodic sound and the aperiodic sound. That is, the inventors found out that a frequency signal of the periodic or aperiodic sound can be determined by discriminating between the periodic sound, such as the engine sound, which shows change in the periodicity and the aperiodic sound, such as wind noise, rain sound, or background noise. In particular, an application for detecting a vehicle present in a blind spot, for example, can instantaneously detect an accelerating vehicle.

A relation between the fluctuations in the number of engine revolutions and the phase of the engine sound is analyzed as follows.

In FIG. 4, (a) schematically shows the engine sound in the period C where the number of engine revolutions is constant. Note that the frequency of the engine sound is represented by “f”. In FIG. 4, (b) shows a base waveform. In this diagram, the frequency of the base waveform is represented by the same value as the frequency f of the engine sound. In FIG. 4, (c) shows a phase with respect to the base waveform. When the number of revolutions is constant, the engine sound shows a certain periodicity as is the case with the sine wave shown in (a) of FIG. 1. Thus, as shown in (c) of FIG. 4, the phase at the predetermined frequency f does not increase at an accelerating rate over time nor decrease at an accelerating rate over time.

It should be noted that, when the frequency of a target sound is constant and the frequency of a base waveform is low, the phase gradually delays. However, since the amount of decrease is constant, the phase linearly decreases. On the other hand, when the frequency of the target sound is constant and the frequency of the base waveform is high, the phase gradually advances. However, since the amount of increase is constant, the phase linearly increases.

In FIG. 5, (a) schematically shows the engine sound in the period B where the number of engine revolutions increases and the vehicle thus accelerates. During the period B, the frequency of the engine sound increases over time. In FIG. 5, (b) shows a base waveform. Note that the frequency of the engine sound is represented by “f”, for example. In FIG. 5, (c) shows a phase with respect to the base waveform. The engine sound has a periodicity like a sine wave, and the frequency gradually increases. Thus, as shown in (c) of FIG. 5, the phase with respect to the base waveform increases at an accelerating rate over time.

In FIG. 6, (a) schematically shows the engine sound in the period A where the number of engine revolutions decreases and the vehicle thus decelerates. During the period B, the frequency of the engine sound decreases over time. In FIG. 6, (b) shows a base waveform. Note that the frequency of the engine sound is represented by “f”, for example. In FIG. 6, (c) shows a phase with respect to the base waveform. The engine sound has a periodicity like a sine wave, and the frequency gradually decreases. Thus, as shown in (c) of FIG. 6, the phase with respect to the base waveform decreases at an accelerating rate over time.

The following is a description of the embodiments according to the present invention, with reference to the drawings.

First Embodiment

A noise elimination device in the first embodiment is described as follows.

FIG. 7 and FIG. 8 are diagrams each showing a configuration of the noise elimination device in the first embodiment according to the present invention.

In FIG. 7, a noise elimination device 1500 includes a microphone 2400, the DFT analysis unit 2402, and a noise elimination processing unit 1504. The DFT analysis unit 2402 corresponds to a frequency analysis unit described in the claims set forth below.

The microphone 2400 collects a mixed sound 2401 from the outside. The mixed sound 2401 includes an engine sound of a vehicle and wind noise.

Receiving the mixed sound 2401, the DFT analysis unit 2402 performs the Fourier transform processing on the mixed sound 2401 to obtain a frequency signal of the mixed sound 2401 for each of frequency bands.

It should be noted that, instead of the Fourier transform processing, the DFT analysis unit 2402 may perform the frequency conversion according to a different method of processing, such as the fast Fourier transform processing, the discrete cosine transform processing, or the wavelet transform processing.

The number of frequency bands included in the frequency signal obtained by the DFT analysis unit 2402 is represented as M and a number identifying a frequency band is represented as a symbol j (j=1 to M).

The noise elimination processing unit 1504 includes a phase modification unit 1501(j) (j=1 to M), a sound determination unit 1502(j) (j=1 to M), and a sound extraction unit 1503(j) (j=1 to M). That is to say, the phase modification unit, the sound determination unit, and the sound extraction unit are provided for each of the frequency bands. The phase modification unit 1501(j) (j=1 to M) corresponds to a phase modification unit described in the claims set forth below. The sound extraction unit 1503(j) (j=1 to M) corresponds to a sound signal recognition unit in the claims set forth below.

The phase modification unit 1501(j) (j=1 to M) includes an M number of phase modification units, and a j-th phase modification unit 1501(j) executes processing for a j-th frequency band. In the present specification, the same processing is performed for the other frequency bands by the corresponding units having reference numbers assigned as above.

Supposing that a phase of the frequency signal at a time t is represented as ψ (t) (radian), the phase modification unit 1501(j) (j=1 to M) makes a phase modification to the frequency signal of the frequency band j obtained by the DFT analysis unit 2402. To be more specific, the phase ψ (t) of the frequency signal at the time t is modified to ψ′ (t)=mod 2 π (ψ(t)—2 π f t) (where f is the analysis-target frequency).

The sound determination unit 1502(j) (j=1 to M) calculates a phase curve (an approximate curve) by approximating temporal phase fluctuations using a phase-modified signal at an analysis-target time in a predetermined period, and then calculates an error between the calculated phase curve and the phase at the analysis-target time. Here, a phase distance (i.e., the error between the phase curve and the phase at the analysis-target time) is calculated using ψ′ (t).

Then, finally, on the basis of the error (i.e., the phase distance) calculated by the sound determination unit 1502(j) (j=1 to M), the sound extraction unit 1503(j) (j=1 to M) extracts, as an extracted sound, a frequency signal whose error is equal to or smaller than a threshold.

These processes are performed while the predetermined period is being shifted in the direction of the time axis. Accordingly, a frequency signal 2408 of the extracted sound can be extracted for each time-frequency domain.

FIG. 8 is a block diagram showing a configuration of the sound determination unit 1502(j) (j=1 to M).

The sound determination unit 1502(j) (j=1 to M) includes a frequency signal selection unit 1600(j) (j=1 to M), a phase distance determination unit 1601(j) (j=1 to M), and a phase curve calculation unit 1602(j) (j=1 to M). The phase curve calculation unit 1602(j) (j=1 to M) corresponds to an error calculation unit in the claims set forth below.

The frequency signal selection unit 1600(j) (j=1 to M) selects frequency signals which are to be used for calculating a phase curve and phase distances, from among the frequency signals, in the predetermined period, to which the phase modification unit 1501(j) (j=1 to M) has made phase modifications.

The phase curve calculation unit 1602(j) (j=1 to M) calculates, as a quadratic curve, a phase form which fluctuates over time, using the modified phase ψ′ (t) of the frequency signal selected by the frequency signal selection unit 1600(j) (j=1 to M). Following this, the phase distance determination unit 1601(j) (j=1 to M) determines a phase distance between the phase curve calculated by the phase curve calculation unit 1602(j) (j=1 to M) and the modified phase at the analysis-target time.

It should be noted that essential components in the present invention are the DFT analysis unit 2402 and the sound extraction unit 1503(j) shown in FIG. 7 and the phase distance determination unit 1601(j) and the phase curve calculation unit 1602(j) shown in FIG. 8. In the case where the DFT analysis unit 2402 is capable of directly deriving the phase defined in the present invention as shown in (c) of FIG. 1, the phase modification unit 1501(j) is unnecessary. Moreover, note that the microphone 2400 is not an essential component in the present invention.

Next, an operation performed by the noise elimination device 1500 configured as described thus far is explained.

In the following, the j-th frequency band is described. The same processing is performed for the other frequency bands. Here, the explanation is given, as an example, about the case where a center frequency and an analysis-target frequency of the frequency band agree with each other.

The analysis-target frequency refers to a frequency f as in ψ′ (t)=mod 2 π (ψ(t)−2 π f t) used in calculating the phase distance. The noise elimination device 1500 determines whether or not a to-be-extracted sound exists in the frequency f.

As another method, the to-be-extracted sound may be determined using a plurality of frequencies including the frequency band as the analysis frequencies. In such a case, whether or not the to-be-extracted sound exists in the frequencies around the center frequency can be determined.

FIG. 9 and FIG. 10 are flowcharts each showing an operational procedure executed by the noise elimination device 1500.

The microphone 2400 collects the mixed sound 2401 from the outside and then outputs the collected mixed sound 2401 to the DFT analysis unit 2402 (step S200).

Next, supposing that the phase of the frequency signal at the time t is represented as ψ (t) (radian), the phase modification unit 1501(j) (j=1 to M) makes a phase modification to the phase ψ (t) of the frequency signal obtained by the DFT analysis unit 2402 to convert the phase ψ (t) into the phase ψ′ (t)=mod 2 π (ψ(t)−2 π f t) (where f is the analysis-target frequency) for each frequency band j (step S1700(j)).

The following explains a reason why the phase is used in the present invention and also describes an example of a phase modification method.

FIG. 3 is a spectrogram obtained as a result of the analysis performed on the engine sound of the vehicle by the DFT analysis unit 2402. The vertical axis represents frequency whereas the horizontal axis represents time. The color density of the spectrogram represents the magnitude of power of a frequency signal. When the color is darker, the power of the frequency signal is greater. FIG. 3 shows data in which noise such as wind noise has been eliminated as much as possible and, therefore, the darker parts basically indicate the engine sound. The engine sound used in the analysis is represented by the data of the revolutions fluctuating over time. From the spectrogram, it can be seen that the frequency fluctuates over time.

FIG. 11 is a diagram explaining about power and phase in the DFT analysis. In FIG. 11, (a) shows a spectrogram obtained as a result of the analysis performed on the engine sound of the vehicle, as in FIG. 3.

In FIG. 11, (b) is a diagram showing a frequency signal 601 in a complex space using the Hanning window with a predetermined time window width measured from a time t1. A power and a phase are calculated for each of the frequencies such as frequencies f1, f2, and f3. A length of the frequency signal 601 indicates the power, and an angle which the frequency signal 601 forms with the real axis indicates the phase.

Then, the frequency signal is obtained for each of the times while the time shift is being executed as shown by t1, t2, t3, and so on in (a) of FIG. 11. In general, the spectrogram shows only the power of the frequency at each of the times and omits the phase. Thus, each of the spectrograms shown in FIG. 3 and (a) of FIG. 11 shows only the magnitude of power obtained as a result of the DFT analysis.

In FIG. 11, (c) shows temporal phase fluctuations of a predetermined frequency (a frequency f4, for example) shown in (a) in FIG. 11. The horizontal axis represents time. The vertical axis represents the phase of the frequency signal, and the phase is represented by a value from 0 to 2 π (radian).

In FIG. 11, (d) shows temporal power fluctuations of the predetermined frequency (the frequency f4, for example) shown in (a) in FIG. 11. The horizontal axis represents time whereas the vertical axis represents the magnitude (power) of the frequency signal.

Suppose that a real part of the frequency signal is represented as x (t) and that an imaginary part of the frequency signal is represented as y (t). In this case, the phase y (t) and the magnitude (power) P (t) are expressed as follows.

ψ(t)=mod 2π(arctan(y(t)/x(t))) (Equation 3)

P(t)=√{square root over (x(t)²+y(t)²)}{square root over (x(t)²+y(t)²)} (Equation 4)

In the above equations, “t” represents a time corresponding to the frequency. Here, a vehicle engine sound of when a noise such as a wind noise is present is explained, with reference to FIG. 12. In FIG. 12, (a) shows a spectrogram obtained as a result of the DFT analysis performed on the engine sound of the vehicle, as in FIG. 3. The horizontal axis represents time whereas and the vertical axis represents frequency. The color density of the spectrogram represents the magnitude of power of the frequency signal. Note that the spectrogram in FIG. 12 is different from the one shown in FIG. 3 in that a noise such as a wind noise is included in the spectrogram shown in FIG. 12. Therefore, there are darker parts in frequencies (at the times t1 and t2, for example) other than the frequency of the engine sound. This makes it difficult to determine, only from the power, whether the engine sound or the wind noise is present.

In FIG. 12, (b) is a graph showing temporal fluctuations in power of the frequency f4 including the engine sound at the time t2 in the predetermined period. As can be seen, the power is erratic due to the wind noise. In FIG. 12, (c) is a graph showing temporal fluctuations in power of the frequency f4 including no engine sound at the time t3 in the predetermined period. It can be seen that unsteady power is present. By a comparison between the graphs shown in (b) and (c) of FIG. 12, it is still difficult to determine, only from the power, whether the wind noise or the engine sound is present.

With this being the situation, the engine sound is extracted using the temporal phase fluctuations in the present invention. Firstly, phase characteristics of the engine sound are explained.

In an engine, a predetermined number of cylinders make piston motion to cause revolutions to a powertrain. The engine sound from the vehicle includes: a sound dependent on the engine revolutions; and a fixed vibration sound or an aperiodic sound which is independent of the engine revolutions. In particular, the sound mainly detected from the outside of the vehicle is the periodic sound dependent on the engine revolutions. In the present invention, this periodic sound dependent on the engine revolutions is extracted as the engine sound.

It can be seen from the dashed-line circles 501, 502, and 503 in FIG. 3 that, as the number of engine revolutions fluctuates, the frequency of the engine sound fluctuates. Here, attention is focused on the fluctuations in the frequency. As can be seen, the frequency seldom randomly fluctuates and is seldom discretely scattered. In a predetermined time period, the frequency fluctuations almost proportionately with the time. Thus, the engine sound can be approximated by the piecewise linear function represented by Equation 1 as above. To be more specific, the frequency f at the time t can be linearly approximated using the line segment which increases or decreases from the initial value f_oin proportion to the time t (i.e., the proportionality coefficient A) in the predetermined time period.

When the frequency f is expressed by Equation 1 above, the phase ψ at the time t can be expressed by Equation 2 above.

Next, the phase modification process to ease the approximation performed on the temporal phase fluctuations is explained.

The phase modification is made to convert the phase ψ (t) of the frequency signal shown in (c) of FIG. 11 into the phase ψ′ (t)=mod 2 π (ψ(t)−2 π f t) (where f is the analysis-target frequency).

Firstly, the phase modification unit 1501(j) determines a reference time. Here, (a) of FIG. 13 shows the same temporal phase fluctuations as in (c) of FIG. 11. In the example shown in (a) of FIG. 13, a time t0 indicated by a filled circle is determined as the reference time.

Next, the phase modification unit 1501(j) determines a plurality of times of the frequency signals to which phase modifications are to be made. In this example, five times (t1, t2, t3, t4, and t5) indicated by open circles in (a) of FIG. 13 are determined as the times of the frequency signals to which the phase modifications are to be made.

Here, note that the phase of the frequency signal at the reference time t0 is expressed as follows.

ψ(t₀)=mod 2π(arctan(y(t₀)/x(t₀))) (Equation 5)

Also note that the phases of the to-be-modified frequency signals at the five times are expressed as follows.

ψ(t_i)=mod 2π(arctan(y(t_i)/x(t_i))) (i=1,2,3,4,5) (Equation 6)

Each of the phases before the modifications is indicated by X in (a) of FIG. 13. Also, the magnitudes of the frequency signals at these times can be expressed as follows.

P(t_i)=√{square root over (x(t_i)²+y(t_i)²)}{square root over (x(t_i)²+y(t_i)²)} (i=1,2,3,4,5) (Equation 7)

FIG. 14 shows a method of modifying the phase of the frequency signal at the time t2. The details in (a) of FIG. 14 are the identical to those in (a) of FIG. 13. In (b) of FIG. 14, the phase cyclically fluctuating from 0 to 2 π (radian) at a constant angular velocity in a cycle of 1/f (where f is the analysis-target frequency) is drawn by a solid line. The modified phase is expressed as follows.

ψ′(t_i) (i=0,1,2,3,4,5)

In (b) of FIG. 14, as compared with the phase at the reference time t0, the phase at the time t2 is larger than the phase at the time t0 by Δψ which is expressed as follows.

Δψ=2πf(t₂−t₀) (Equation 8)

Thus, in order to modify this phase difference caused by a time difference between the phases at the times t0 and t2 in (a) of FIG. 14, a phase ψ′ (t2) is calculated by subtracting Δψ from the phase ψ (t2) at the time t2. This obtained phase is the modified phase at the time t2. Here, since the phase at the time t0 is the phase at the reference time, the value of the present phase remains the same after the phase modification. To be more specific, the phase to be obtained after the phase modification is calculated by the following equations.

ψ′(t₀)=ψ(t₀) (Equation 9)

ψ′(t_i)=mod 2π(ψ(t_i)−2πf(t_i−t₀)) (i=1,2,3,4,5) (Equation 10)

The phases of the frequency signals obtained as a result of the phase modifications are indicated by X in (b) in FIG. 13. The representations in (b) of FIG. 13 are the same as those in (a) in FIG. 13 and, therefore, the explanation is not repeated.

Returning to FIG. 9, the sound determination unit 1502(j) calculates a form of the phase using the phase information obtained by the phase modification unit 1501(j) as a result of the modifications. Then, the sound determination unit 1502(j) calculates the phase distances (i.e., errors) between the frequency signal at the analysis-target time and the frequency signals at a plurality of times other than the analysis-target time (step S1701(j)).

FIG. 10 is a flowchart showing an operational procedure performed in the process (step S1701(j)) of determining the frequency signal of the extracted sound.

Firstly, the frequency signal selection unit 1600(j) selects the frequency signals which are to be used by the phase curve calculation unit 1602(j) for calculating the phase curve, from among the frequency signals, in the predetermined period, to which the phase modification unit 1501(j) has made the phase modifications (step S1800(j)). In this example, the analysis-target time is t0, and the phase curve is calculated from the phases of the frequency signals at the times t1 to t5 with respect to the phase at the time t0. Here, the number of frequency signals (six signals in total at the times t0 to t5) used for calculating the phase curve is equal to or greater than a predetermined value. This is because it would be difficult to determine the regularity of the temporal phase fluctuations when the number of frequency signals selected for the phase curve calculation is small. The time length of the predetermined period may be determined on the basis of characteristics of the temporal phase fluctuations of the extracted sound.

Next, the phase curve calculation unit 1602(j) calculates the phase curve (step S1801(j)). Note that the phase curve is calculated via approximation according to, for example, a quadratic polynomial expressed by Equation 11 as follows.

Ψ(t)=A₂t²+A₁t+A₀ (Equation 11)

FIG. 15 is a diagram explaining a process of calculating the phase curve. As shown in FIG. 15, a quadratic curve can be calculated from the predetermined number of points. In the present invention, the quadratic curve is calculated as a multiple regression curve. To be more specific, when the modified phase at a time t_i(where i=0, 1, 2, 3, 4, and 5) is represented as ψ′ (t_i), coefficients A₂, A₁, and A₀of the quadratic curve ψ (t) are represented as follows.

$\begin{matrix} A_{2} = \frac{S_{(t \times t, ψ)} \times S_{(t, t)} - S_{(t, ψ)} \times S_{(t, t \times t)}}{S_{(t, t)} \times S_{(t \times t, t \times t)} - S_{(t, t \times t)} \times S_{(t, t \times t)}} & (Equation 12) \\ A_{1} = \frac{S_{(t, ψ)} \times S_{(t \times t, t \times t)} - S_{(t \times t, ψ)} \times S_{(t, t \times t)}}{S_{(t, t)} \times S_{(t \times t, t \times t)} - S_{(t, t \times t)} \times S_{(t, t \times i)}} & (Equation 13) \\ A_{0} = \frac{\sum ψ_{i}^{'}}{n} - A_{1} \times \frac{\sum t_{i}}{n} - A_{2} \times \frac{\sum {(t_{i})}^{2}}{n} & (Equation 14) \end{matrix}$

Moreover, coefficients in the above equations are expressed as follows.

$\begin{matrix} S_{(t, t)} = \sum (t_{i} \times t_{i}) - \frac{\sum t_{i} \times \sum t_{i}}{n} & (Equation 15) \\ S_{(t, ψ)} = \sum (t_{i} \times ψ^{'} (t_{i})) - \frac{\sum t_{i} \times \sum ψ^{'} (t_{i})}{n} & (Equation 16) \\ S_{(t, t \times t)} = \sum (t_{i} \times t_{i} \times t_{i}) - \frac{\sum t_{i} \times \sum (t_{i} \times t_{i})}{n} & (Equation 17) \\ S_{(t \times t, ψ)} = \sum (t_{i} \times t_{i} \times ψ^{'} (t_{i})) - \frac{\sum (t_{i} \times t_{i}) \times \sum ψ^{'} (t_{i})}{n} & (Equation 18) \\ S_{(t \times t, t \times t)} = \sum (t_{i} \times t_{i} \times t_{i} \times t_{i}) - \frac{\sum (t_{i} \times t_{i}) \times \sum (t_{i} \times t_{i})}{n} & (Equation 19) \end{matrix}$

Returning to FIG. 10, the phase distance determination unit 1601(j) calculates the phase distance between the form calculated by the phase curve calculation unit 1602(j) and the modified phase at the analysis-target time (step S1802(j)). In the present example, a phase distance (i.e., an error) E₀is a difference error between the phases, and is calculated as follows.

E
₀=|Ψ(t₀)−ψ′(t₀)| (Equation 20)

It should be noted that the analysis-target point may be excluded in calculating the form of the phase, and that a phase difference between the calculated form and the analysis-target point may be calculated. With this method, when a noise shifted significantly from the calculated form is included in the analysis-target point, the form can be approximated more accurately.

It should be noted that, in the present example, the phase form is calculated from the phases at the times t1 to t5 with respect to the phase at the analysis-target time t0. For example, when the time t2 is an analysis target time (in other words, the time t2 is set as a time t0′), a phase curve may be newly calculated from phases at times t1′, t2′, t3′, t4′, and t5′ to calculate an error. Alternatively, the phase curve which has been already calculated from the phases at the times t0 to t5 may be used for calculating the error. To be more specific, the error calculated using the already-calculated phase curve is expressed as follows.

E
_i=|Ψ(t_i)−ψ′(t_i)| (Equation 21)

With this method, the number of times to calculate the phase curve is reduced, so that the amount of calculation can be accordingly reduced. Moreover, a predetermined period may be set as an analysis target, and it may be determined, on the basis of an average of errors, whether all of the frequency signals included in the analysis-target period have errors. For example, the average of the errors may be expressed as follows.

$\begin{matrix} E = 1 / n \sum_{k = 1}^{n} \langle Ψ (t_{k}) - ψ^{'} (t_{k}) \rangle & (Equation 22) \end{matrix}$

It should be noted that the analysis-target period may be variable depending on circumstances. For example, the analysis-target period may be set shorter around an intersection where vehicles are likely to suddenly accelerate or decelerate, and may be longer where acceleration or deceleration is relatively unlikely to happen.

Returning to FIG. 9, the sound extraction unit 1503(j) extracts, as the extracted sound, each of the analysis-target frequency signals each having a phase distance (i.e., an error) equal to or smaller than the threshold (step S1702(j)).

FIG. 16 is a diagram schematically showing the modified phase ψ′ (t) of the frequency signal of the mixed sound in a predetermined period (96 ms) for which the phase distance is calculated. The horizontal axis represents the time t whereas the vertical axis represents the modified phase ψ′ (t). A filled circle indicates the phase of the analysis-target frequency signal. Open circles indicate the phases of the frequency signals used for calculating the phase curve. A thick dashed line 1101 is the calculated phase curve. It can be seen that a quadratic curve is calculated, as the phase curve, is from the phase-modified points. Each thin dashed line 1102 indicates an error threshold (20 degrees, for example). More specifically, the upper dashed line 1102 is shifted upward from the dashed line 1101 by the threshold degrees whereas the lower dashed line 1102 is shifted downward from the dashed line 1101 by the threshold degrees. When the phase of the analysis-target frequency signal is present between the two dashed lines 1102, the present frequency signal is determined to be a frequency signal of the to-be-extracted sound (i.e., the periodic sound). When the phase of the analysis-target frequency signal is not present between the two dashed lines 1102, the present frequency signal is determined to be a frequency signal of the noise.

In (a) of FIG. 16, an error between the phase of the analysis-target frequency signal indicated by the filled circle and the quadratic curve of the phase is smaller than the threshold. Thus, the sound extraction unit 1503(j) extracts this frequency signal as the frequency signal of the to-be-extracted sound. In (b) of FIG. 16, each error between the phases of the analysis-target frequency singles indicated by the filled circles and the quadratic curve of the phase is greater than the threshold. Thus, instead of extracting these signals as the frequency signals of the to-be-extracted sound, the sound extraction unit 1503(j) eliminates these frequency signals as noises.

FIG. 17 is a diagram explaining a process of extracting the engine sound according to the method described in the present embodiment. When the engine sound is approximated by the piecewise linear function as expressed by Equation 1, the phase can be approximated by the quadratic curve as expressed by Equation 11.

In FIG. 17, (a) shows the same spectrogram that is shown in FIG. 5. In FIG. 17, (b) to (e) are graphs respectively showing frequency signals included in four areas indicated by squares in (a) of FIG. 17. Each of the areas has one frequency band. In each of the graphs shown in (b) to (e) of FIG. 17, the horizontal axis represents time whereas the vertical axis represents phase. Also, in each of the graphs, open circles indicate the frequency signals which have been actually analyzed and a thick dashed line indicates the calculated approximate curve. Moreover, each thin dashed line indicates a threshold between a to-be-extracted sound and a noise.

In (b) of FIG. 17, the number of engine revolutions is decreasing. This graph shows the modified phase of the engine sound part which can be approximated by a linear expression representing the temporal frequency fluctuations as a negative slope in the time-frequency domain. As can be seen from this graph, the phase curve is convex upward. Also, almost all the analyzed frequency signals are present between the thin dashed lines each indicating the threshold.

In (c) of FIG. 17, the number of engine revolutions is increasing. This graph shows the modified phase of the engine sound part which can be approximated by a linear expression representing the temporal frequency fluctuations as a positive slope in the time-frequency domain. As can be seen from this graph, the phase curve is convex downward. Also, almost all the analyzed frequency signals are present between the thin dashed lines each indicating the threshold.

In (d) of FIG. 17, the number of engine revolutions is constant. This graph shows the modified phase of the engine sound part which can be approximated by a quadratic coefficient which is zero where the frequency does not fluctuate in the time-frequency domain. A second-order term of the phase curve is 0 and, as can be seen, the graph is a straight line. Also, almost all the analyzed frequency signals are present between the thin dashed lines each indicating the threshold. From this graph, the engine sound including a sound part whose frequency does not fluctuate can be recognized using a quadratic curve.

In (e) of FIG. 17, the graph shows the modified phase of the wind noise part. The phase of the frequency signal of the wind noise is erratic. For this reason, even when an approximate quadratic curve is calculated, an error between the phase and the curve is significant. Thus, as can be seen, only a few signals are present between the thin dashed lines each indicating the threshold.

As described thus far, the wind noise and the engine sound can be discriminated on the basis of the calculated curve and the error with respect to the curve.

FIG. 18 is a diagram explaining an error with respect to the phase curve. The horizontal axis represents sound signals of an engine sound, a rain sound, and a wind noise. The vertical axis represents an average and distribution of errors with respect to the phase curve calculated according to the present method. To be more specific, a width of a line segment shown in the vertical axis indicates a range of allowable errors, and a rhombus indicates the average. In the case of the engine sound, for example, the range of allowable errors is from 1 degree to 18 degrees and the average of errors is 10 degrees.

Analysis conditions are that: frequency analyses are performed at 256 points (32 ms) of each of the sounds sampled at 8 kHz; and a phase curve calculation is performed using 768 points as a period (96 ms). Then, the average and distribution of the errors with respect to the phase curve are calculated. As shown in FIG. 18, the error average value of the engine sound with respect to the phase curve is 10 degrees which is small while the error average values of the rain sound and wind noise are 68 degrees and 48 degrees, respectively, which are large. It can be understood that there is a significant difference in the error with respect to the phase curve between the periodic sound such as an engine sound and the aperiodic sound such as a wind noise. In the present embodiment, the threshold is set at, for example, 20 degrees so that a sound having an error equal to or smaller than the threshold is appropriately extracted as an engine sound.

FIG. 19 is a diagram explaining sound recognition. In each of graphs shown in FIG. 19, the horizontal axis represents time whereas the vertical axis represents frequency. In FIG. 19, (a) shows a spectrogram obtained as a result of frequency analysis performed on a sound including both a wind noise and an engine sound. The color density of the spectrogram represents the magnitude of power. When the color is darker, the power is greater. Analysis conditions are that: frequency analyses are performed at 512 points of the sound sampled at 8 kHz; and a phase curve calculation is performed using 1536 points as a period. The threshold of an error with respect to the phase curve is set at 20 degrees, and then the engine sound is extracted.

In FIG. 19, (b) shows a graph in which the wind noise and the engine sound are recognized according to the method described in the present embodiment. The darker parts indicate the extracted engine sound. The graph shown in (a) of FIG. 19 includes noises such as a wind noise. Thus, it is difficult to extract, from this graph, the engine sound. However, according to the method in the present embodiment, it can be seen that the engine sound is appropriately extracted. In particular, the present method can extract sound parts where the number of engine revolutions suddenly increases and decreases, as well as a steady sound.

Note that the phase modification unit 1501(j) may further perform the following process during the phase modification. When the following phase modification process is further performed, processes including calculating a phase curve and calculating errors with respect to the phase curve are also performed. Thus, the phase modification unit 1501(j) performs the following process, referring to as necessary the calculation results given by the sound determination unit 1502(j).

FIG. 20 is a diagram explaining the phase modification process which is further performed. Each of graphs shown in FIG. 20 is obtained as a result of the frequency analysis performed on a part of the engine sound. In each of the graphs, the horizontal axis represents time whereas the vertical axis represents phase. In the graphs, open circles indicate the frequency signals obtained as a result of the phase modifications performed by the phase modification unit 1501(i).

In (a) of FIG. 20, when a phase curve is calculated using the phases of the frequency signals indicated by the open circles, a curve indicated by a thick dashed line is obtained as a result. Each of thin dashed lines indicates an error threshold. It can be seen that errors between the calculated phase curve and the frequency signals are significant and that many points are significantly shifted from the threshold. In particular, the phases of the frequency signals at the times t6 to t9 are significantly shifted from the phases at the other times. This is because the phases lie on a torus, cyclically from 0 to 2 π. Thus, the phase curve may be calculated, with consideration given to this torus state. With this, the phase significantly shifted from the phases at the other times can be modified, so that the sound recognition can be performed with accuracy.

For example, the phase may be modified using an N number of phases which are present before, after, or before and after the present phase. Suppose, as an example, that an average of the phases at the times t1 to t5 (N=5) shown in (b) of FIG. 20 is calculated, and that the average phase is calculated as ψ=2 π*10/360. Also suppose that the phase at the time t6 is ψ (6)=2 π*170/360. Here, since the phases lie on a torus as mentioned above, the phase at the time t6 may possibly be ψ (6)=(2 π*170/360)±2 π. Although there is, in fact, a possibility that “±2 π” may be “±2 π*m” (where m represents a natural number), the present example considers only the case where m=1. When the frequency fluctuates significantly, so does the phase. On account of this, the value of m may be variable depending on a sound which is to be analyzed. The times selected for calculating the average of the phases are not limited to the times t1 to t5, and any times may be selected.

Next, the phase ψ (6) at the time t6 is modified to a value such that an error between the phase at the time t6 and the average phase ψ becomes smaller. In the case shown in (b) of FIGS. 20, ψ (6)=(2 π*170/360)=2 π. Similarly, the phase at the time t7 is modified using the phases at the times t2 to t5 and the modified phase at the time t6. In the present example, the phase at the time t7 is modified into ψ (7)=ψ (7)−2 π. In this way, the same process is performed on the phases at the times t8, t9, and so on.

In FIG. 20, (c) shows the modified phases. As shown, the phases at the times t6 to t9 have been modified. When the phase curve is calculated using the phase information obtained as a result of the modifications, the curve indicated by a thick dashed line is obtained. In the case shown in (c) of FIG. 20, since all the frequency signals are present between the curve and the threshold, the sound is appropriately extracted as the engine sound.

It should be noted that the phase modification method is not limited to the method described thus far. For example, the phase curve may be firstly calculated, and then the phase modification using ±2 π may be performed on each point at which an error with respect to the curve is significant. Alternatively, the range of possible angles for the phase may be modified. The explanation is presented as follows, with reference to the drawing.

FIG. 21 is a diagram explaining a phase modification process. In each of graphs shown in FIG. 21, the vertical axis represents phase whereas the horizontal axis represents time. In the graphs, open circles indicate the phases of the frequency signals at the corresponding times. In FIG. 21, (a) shows the phases of the frequency signals in the case where the angular range is from 0 to 2 π. A phase curve has been calculated from the phases, and is indicated by a solid line. In (c) of FIG. 21, the phases are modified on the basis of errors between the curve and the present phases. To be more specific, a phase modification is performed by adding +2 π to the phase at the time t1. Moreover, a phase modification is performed by adding −2 π to the phase at the time t8.

In FIG. 21, (b) shows the phases of the frequency signals in the case where the angular range is from −π to π. As in the case shown in (a) of FIG. 21, a phase curve has been calculated from the phases, and is indicated by a solid line. In (d) of FIG. 21, the phase is modified on the basis of an error between the curve and the present phase. To be more specific, a phase modification is performed by adding −2 π to the phase at the time t10. When the errors are compared between the angular ranges shown in (c) and (d) of FIG. 21, the error in the case of the angular range shown in (c) is smaller. Hence, the phase curve based on the angular range shown in (c) is used. In this way, the angular range may be controlled to calculate the phase curve. As a result, a phase which is significantly shifted from the phases at the other times can be modified, so that the sound recognition can be is performed with a higher degree of accuracy.

As described thus far, the present embodiment can discriminate between the periodic sound, such as engine sound or voice, and the aperiodic sound, such as wind noise, rain sound, or background noise, for each time-frequency domain, so as to determine a frequency signal of the periodic or aperiodic sound. The present embodiment can accurately recognize especially the periodic sound, such as the engine sound, which fluctuates in frequency over time in the time-frequency domain. In particular, an application for detecting a vehicle present in a blind spot can accurately detect an accelerating vehicle which may cause a serious accident with a high probability.

Second Embodiment

The following is a description of a vehicle detection device in the second embodiment. The vehicle detection device in the second embodiment determines a frequency signal of an engine sound (i.e., a to-be-extracted sound) from each of mixed sounds received by a plurality of microphones, calculates an arrival direction of an approaching vehicle from a sound arrival time difference, and informs a driver about the direction and presence of the approaching vehicle.

FIG. 22 and FIG. 23 are diagrams each showing a configuration of the vehicle detection device in the third embodiment according to the present invention.

In FIG. 22, a vehicle detection device 4100 includes a microphone 4107(1), a microphone 4107(2), a DFT analysis unit 1100, a vehicle detection processing unit 4101, and a direction detection unit 4108.

The vehicle detection processing unit 4101 includes a phase modification unit 4102(j) (j=1 to M), a sound determination unit 4103(j) (j=1 to M), and a sound extraction unit 4104(j) (j=1 to M).

In FIG. 23, the sound determination unit 4103(j) (j=1 to M) includes a phase distance determination unit 4200(j) (j=1 to M), a phase curve calculation unit 4201(j) (j=1 to M), and a frequency signal selection unit 4202(j) (j=1 to M).

The microphone 4107(1) shown in FIG. 22 receives a mixed sound 2401(1) from the outside. The microphone 4107(2) shown in FIG. 22 receives a mixed sound 2401(2) from the outside. In the present example, the microphone 4107(1) and the microphone 4107(2) are set on left and right front bumpers, respectively. Each of the mixed sounds includes an engine sound of a vehicle and a wind noise sampled at, for example, 8 kHz. It should be noted that a sampling frequency is not limited 8 kHz.

The DFT analysis unit 1100 performs the discrete Fourier transform processing on the mixed sound 2401(1) and the mixed sound 2401(2) to obtain the respective frequency signals of the mixed sound 2401(1) and the mixed sound 2401(2). In this example, the time window width for the DFT is 256 points (38 ms). Hereinafter, the number of frequency bands obtained by the DFT analysis unit 1100 is represented as M and a number specifying a frequency band is represented as a symbol j (j=1 to M). In this example, a frequency band from 10 Hz to 500 Hz where an engine sound of a vehicle exists is divided into 10-Hz bands (M=50) to obtain the frequency signal.

Supposing that a phase of a frequency signal at a time t is ψ (t) (radian), the phase modification unit 4102(j) (j=1 to M) modifies the phase ψ (t) of the frequency signal of the frequency band j (j=1 to M) obtained by the DFT analysis unit 1100 to a phase ψ″ (t)=mod 2 π (ψ (t)−2 π f′ t) (where f′ is a frequency of the frequency band). In the present example, the phase ψ (t) is modified using the frequency f′ of the frequency band where the frequency signal is obtained, instead of using the analysis-target frequency.

The sound determination unit 4103(j) (j=1 to M) calculates the phase curve from the phase-modified frequency signal at an analysis-target time in a predetermined period, and then determines a to-be-extracted sound on the basis of the calculated phase curve. Here, the number of frequency signals used for calculating a phase distance is equal to or greater than a first threshold. In the present example, the predetermined period is 96 ms. Also, the phase distance is calculated using ψ″ (t). The sound determination unit 4103(j) (j=1 to M) performs the same processing as the processing performed by the sound determination unit 1502(j) (j=1 to M) in the first embodiment. Therefore, the detailed description is not repeated here.

FIG. 23 is a block diagram showing a configuration of the sound determination unit 4103(j) (j=1 to M).

The sound determination unit 4103(j) (j=1 to M) includes a phase distance determination unit 4200(j) (j=1 to M), a phase curve calculation unit 4201(j) (j=1 to M), and a frequency signal selection unit 4202(j) (j=1 to M).

The frequency signal selection unit 4202(j) (j=1 to M) selects frequency signals which are to be used for calculating a phase curve and phase distances, from among the frequency signals, in the predetermined period, to which the phase modification unit 4102(j) (j=1 to M) has made phase modifications. The frequency signal selection unit 4202(j) (j=1 to M) performs the same processing as the processing performed by the frequency signal selection unit 1600(j) (j=1 to M) in the first embodiment. Therefore, the detailed description is not repeated here.

The phase curve calculation unit 4201(j) (j=1 to M) calculates, as a curve, a phase form which fluctuates over time, using the modified phase ψ″ (t) of the frequency signal. The phase curve calculation unit 4201(j) (j=1 to M) performs the same processing as the processing performed by the phase curve calculation unit 1602(j) (j=1 to M) in the first embodiment. Therefore, the detailed description is not repeated here.

The phase distance determination unit 4200(j) (j=1 to M) determines whether a phase distance with respect to the phase curve calculated by the phase curve calculation unit 4201(j) (j=1 to M) is equal to or smaller than a second threshold. To be more specific, the phase curve calculation is performed using 768 points as a period (96 ms), and the phase distance is calculated. The phase distance determination unit 4200(j) (j=1 to M) employs the same methods for calculating the phase curve and phase distance as those employed by the phase distance determination unit 1601(j) (j=1 to M) in the first embodiment. Therefore, the detailed description is not repeated here.

Next, the sound extraction unit 4104(j) (j=1 to M) extracts the engine sound on the basis of the phase distance determined by the sound determination unit 4103(j) (j=1 to M). To be more specific, the threshold of error is set at 20 degrees, and then a sound having an error equal to or smaller than the threshold is extracted as the engine sound. The sound extraction unit 4104(j) (j=1 to M) performs the same processing as the sound extraction unit 1503(j) (j=1 to M) in the first embodiment. Therefore, the detailed description is not repeated here. It should be noted that, when the engine sound is extracted, the sound extraction unit 4104(j) (j=1 to M) also outputs a sound detection flag 4105.

Returning to FIG. 22, when the sound detection flag 4105 is outputted from the sound extraction unit 4104(j) (j=1 to M), the direction detection unit 4108 identifies a direction in which the nearby vehicle is present, for the time-frequency domain of the engine sound extracted by the sound extraction unit 4104(j) (j=1 to M). The direction detection unit 4108 detects the direction of the nearby vehicle on the basis of, for example, a sound arrival time difference of the engine sound in the present domain. For example, when either one of the microphones extracts the engine sound, the direction of the nearby vehicle is identified using both of the microphones. This is because the wind noise is not uniformly detected by both of the microphones, that is, one of the microphones detects the wind noise while the other microphone does not. It should be noted that the direction may be identified when the engine sound is detected by both of the microphones.

Suppose that a spacing between the microphone 4107 (1) and the microphone 4107 (2) is d (m). Also suppose that an engine sound is detected from an angle θ (radian) with respect to the driver's vehicle. In this case, the angle θ (radian) can be expresses by Equation 23 as follows, where a sound arrival time difference is represented as Δt (s) and a sound speed is represented as c (m/s).

θ=sin⁻¹(Δtc/d) (Equation 23)

Finally, the presentation unit 4106 connected to the vehicle detection device 4100 informs the driver about the direction of the nearby vehicle detected by the direction detection unit 4108. For example, the presentation unit 4106 may show, on a display, the direction from which the nearby vehicle is approaching.

The vehicle detection device 4100 and the presentation unit 4106 perform these processes while the predetermined period is being shifted in the direction of the time axis.

Next, an operation performed by the vehicle detection device 4100 configured as described thus far is explained.

In the following, the j-th frequency band (where the frequency is f′) is described.

FIG. 24 and FIG. 25 are flowchart each showing an operational procedure performed by the vehicle detection device 4100.

Firstly, each of the microphone 4107 (1) and the microphone 4107 (2) receives the mixed sound 2401 from the outside, and sends the received mixed sound to the DFT analysis unit 2402 (step S201).

Receiving the mixed sound 2401 (1) and the mixed sound 2401 (2), the DFT analysis unit 1100 performs the discrete Fourier transform processing on the mixed sound 2401 (1) and the mixed sound 2401 (2) to obtain the respective frequency signals of the mixed sound 2401 (1) and the mixed sound 2401 (2) (step S300).

Supposing that a phase of a frequency signal at a time t is ψ (t) (radian), the phase modification unit 4102(j) modifies the phase ψ (t) of the frequency signal of the frequency band j (the frequency f′) obtained by the DFT analysis unit 1100 to a phase ψ″ (t)=mod 2 π (ψ (t)−2 π f′ t) (where f′ is the frequency of the frequency band) (step S4300(j)).

Next, the sound determination unit 4103(j) (the phase distance determination unit 4200(j)) determines the analysis-target frequency f, for each of the mixed sound 2401 (1) and the mixed sound 2401 (2), using the phase ψ″ (t) of the phase-modified frequency signals in the predetermined period. Here, the number of phase-modified signals is equal to or greater than the first threshold. Also, the first threshold is represented by a value which corresponds to 80% of the frequency signals at the times in the predetermined period. Then, the sound determination unit 4103(j) (the phase distance determination unit 4200(j)) calculates the phase distance using the determined analysis-target frequency f (step S4301(j)).

The process performed in step S4301(j) is described in detail with reference to FIG. 25. Firstly, the frequency signal selection unit 4202(j) selects frequency signals which are to be used by the phase curve calculation unit 4201(j) for calculating a phase form, from among the frequency signals, in a predetermined period, to which the phase modification unit 4102(j) has made phase modifications (step S1800(j)).

Following this, the phase curve calculation unit 4201(j) calculates the phase curve (step S1801(j)).

Next, the phase distance determination unit 4200(j) calculates the phase distance between the form calculated by the phase curve calculation unit 4201(j) and the modified phase at the analysis-target time (step S1802(j)).

Returning to FIG. 24, the sound extraction unit 4104(j) determines, as the frequency signal of the engine sound, the frequency signal whose phase distance is equal to or smaller than the second threshold in the predetermined period (step S4302(j)).

The direction detection unit 4108 identifies the direction in which the nearby vehicle is present, for the time-frequency domain of the engine sound extracted by the sound extraction unit 4104(j), and the presentation unit 4106 informs the driver about the direction of the nearby vehicle detected by the direction detection unit 4108 (step S4304).

As described thus far, when the engine sound is extracted, the vehicle detection device in the second embodiment identifies the direction of the vehicle on the basis of the arrival time difference of the engine sound. Thus, the direction of the vehicle can be accurately detected without any influence from the noises.

Although the noise elimination device and the vehicle detection device in the embodiments according to the present invention have been described, the present invention is not limited to these embodiments.

In the above embodiments, the engine sound is extracted as an example. Note that the extraction target in the present invention is not limited to the engine sound. The present invention is applicable in any case as long as the sound is periodic like a human voice, an animal sound, or a motor sound.

In the above embodiments, the sound extraction unit determines, for each frequency signal, whether the signal represents a periodic sound or a noise. However, the sound extraction unit may perform this determination for each predetermined period, and thus may determine whether the frequency signals included in the predetermined period represent a periodic sound or a noise. For example, referencing to FIG. 16, when a proportion of the phases of the frequency signals within the predetermined period whose errors with respect to the quadratic curve calculated by the phase curve calculation unit are below the threshold is equal to or higher than a predetermined proportion, the sound extraction unit may determine all the frequency signals included in this period as belonging to the periodic sound. On the other hand, when the proportion is below the predetermined proportion, the sound extraction unit may determine all the frequency signals included in this period as belonging to the noise.

Also, to be more specific, each of the above-described devices may be a computer system configured with a microprocessor, a ROM, a RAM, a hard disk drive, a display unit, a keyboard, a mouse, and so forth. The RAM or the hard disk drive stores a computer program. The microprocessor operates according to the computer program, so that functions of the components included in the computer system are carried out. Here, note that the computer program includes a plurality of instruction codes indicating instructions to be given to the computer so as to achieve a specific function.

Moreover, some or all of the components included in each of the above-described devices may be realized as a single system Large Scale Integration (LSI). The system LSI is a super multifunctional LSI manufactured by integrating a plurality of components onto a signal chip. To be more specific, the system LSI is a computer system configured with a microprocessor, a ROM, a RAM, and so forth. The RAM stores a computer program. The microprocessor operates according to the computer program, so that a function of the system LSI is carried out.

Furthermore, some or all of the components included in each of the above-described devices may be implemented as an IC card or a standalone module that can be inserted into and removed from the corresponding device. The IC card or the module is a computer system configured with a microprocessor, a ROM, a RAM, and so forth. The IC card or the module may include the aforementioned super multifunctional LSI. The microprocessor operates according to the computer program, so that a function of the IC card or the module is carried out. The IC card or the module may be tamper resistant.

Also, the present invention may be the methods described above. Each of the methods may be a computer program implemented by a computer, or may be a digital signal of the computer program.

Moreover, the present invention may be the aforementioned computer program or digital signal recorded on a computer-readable recording medium, such as a flexible disk, a hard disk, a CD-ROM, an MO, a DVD, a DVD-ROM, a DVD-RAM, a Blu-ray Disc (BD) (registered trademark), or a semiconductor memory. Also, the present invention may be the digital signal recorded on such a recording medium.

Furthermore, the present invention may be the aforementioned computer program or digital signal transmitted via a telecommunication line, a wireless or wired communication line, a network represented by the Internet, and data broadcasting.

Also, the present invention may be a computer system including a microprocessor and a memory. The memory may store the aforementioned computer program and the microprocessor may operate according to the computer program.

Moreover, by transferring the recording medium having the aforementioned program or digital signal recorded thereon or by transferring the aforementioned program or digital signal via the aforementioned network or the like, the present invention may be implemented by a different independent computer system.

Furthermore, the above embodiments and variations may be combined.

Although only some exemplary embodiments of this invention have been described in detail above, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention.

INDUSTRIAL APPLICABILITY

The present invention is applicable to a sound recognition device capable of discriminating, for each time-frequency domain, between a periodic sound, such as engine sound, and an aperiodic sound, such as wind noise, rain sound, or background noise, to determine a frequency signal of the periodic or aperiodic sound, and also applicable to a vehicle detection device capable of detecting a direction of a vehicle on the basis of a recognized periodic sound.

	Number	Date	Country
Parent	PCT/JP2011/000036	Jan 2011	US
Child	13282902		US

SOUND RECOGNITION DEVICE AND SOUND RECOGNITION METHOD

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Priority Claims (1)

CROSS REFERENCE TO RELATED APPLICATION

Continuations (1)