This application claims the benefit under 35 U.S.C. § 371 as a U.S. National Stage Entry of International Application No. PCT/JP2016/074332, filed in the Japanese Patent Office as a Receiving Office on Aug. 22, 2016, which claims priority to Japanese Patent Application Number JP2015-192866, filed in the Japanese Patent Office on Sep. 30, 2015, each of which is hereby incorporated by reference in its entirety.
The present disclosure relates to a signal processing device, a signal processing method, and a program.
Stereo recording is performed using stereo microphones for which two microphones (hereinafter, also simply referred to as mics in some cases) are provided on the left and right. There is an effect that, for example, a sense of localization can be obtained by recording through stereo mics. However, since a distance between mics is short in a small-sized device like, for example, an IC recorder, a sense of localization cannot sufficiently be obtained in some cases.
Accordingly, directional mics are used for improving a sense of localization. For example, the following Patent Literature 1 discloses a technology that can adjust a sense of localization by adjusting an angle of two directional mics.
Patent Literature 1 JP 2008-311802A
However, there is a case where costs can be increased by using directional mics. Therefore, it is preferable to obtain an output with a superior sense of localization even in a case of using a non-directional mic that is relatively inexpensive than a directional mic.
Accordingly, the present disclosure proposes a novel and improved signal processing device, signal processing method, and program capable of obtaining an output signal with a superior sense of localization even if an input signal is an audio signal obtained on the basis of a non-directional mic.
According to the present disclosure, there is provided a signal processing device including: a first arithmetic processing unit that performs first suppressing processing for suppressing a first audio signal based on a first microphone on a basis of a second audio signal based on a second microphone; and a second arithmetic processing unit that performs second suppressing processing for suppressing the second audio signal on a basis of the first audio signal.
In addition, according to the present disclosure, there is provided a signal processing method to be executed by a signal processing device, the signal processing method including: performing first suppressing processing for suppressing a first audio signal based on a first microphone on a basis of a second audio signal based on a second microphone; and performing second suppressing processing for suppressing the second audio signal on a basis of the first audio signal.
In addition, according to the present disclosure, there is provided a program for causing a computer to implement: a first arithmetic processing function of performing first suppressing processing for suppressing a first audio signal based on a first microphone on a basis of a second audio signal based on a second microphone; and a second arithmetic processing function of performing second suppressing processing for suppressing the second audio signal on a basis of the first audio signal.
As mentioned above, according to the present disclosure, it is possible to obtain an output signal with a superior sense of localization even if an input signal is an audio signal obtained on the basis of a non-directional mic.
Note that the effects described above are not necessarily limitative. With or in the place of the above effects, there may be achieved any one of the effects described in this specification or other effects that may be grasped from this specification.
Hereinafter, (a) preferred embodiment(s) of the present disclosure will be described in detail with reference to the appended drawings. Note that, in this specification and the appended drawings, components that have substantially the same functional configuration are denoted with the same reference symbols, and repeated explanation of these components is omitted.
Note that, in this description and the drawings, components that have substantially the same functional configuration are sometimes distinguished from each other using different alphabets after the same reference symbol. However, when there is no need in particular to distinguish components that have substantially the same functional configuration, the same reference symbol alone is attached.
Note that an explanation will be given in the following order.
«1. First Embodiment»
«2. Second Embodiment»
«3. Third Embodiment»
«4. Fourth Embodiment»
«5. Modified example»
«6. Example of hardware configuration»
«7. Conclusion»
«1. First Embodiment»
<1-1. Outline According to First Embodiment>
First, an explanation will be given of an outline of a signal processing device according to a first embodiment of the present disclosure with reference to
A recording and reproducing device 1 illustrated in
In a small-sized device such as an IC recorder, it is difficult to increase a distance between two mics (for example, a distance d between the left mic 110L and the right mic 110R illustrated in
In a case where the left and right mics have directivity in the left and right directions, respectively, a sense of localization can be improved. Accordingly, a configuration having two directional mics, for example, is considered for the purpose of obtaining a sufficient sense of localization even in a case where a distance between mics is short. However, it is often the case that a directional mic is more expensive than a non-directional mic. Further, in a case of the configuration using directional mics, in order to adjust a sense of localization, an angle adjusting mechanism is needed to physically adjust an angle of the directional mics, and there is a possibility that the structure becomes complicated.
Hence, the present embodiment is developed in a viewpoint of the above-mentioned condition. According to the present embodiment, even in a case where input signals are audio signals obtained by non-directional mics, directivity of an audio signal is emphasized by suppressing each of left and right audio signals on the basis of the audio signal of each opposite side thereto and an output signal with a superior sense of localization can be obtained. Further, according to the present embodiment, a sense of localization can be adjusted by changing a parameter without requiring a physical angle adjusting mechanism of mics. Hereinafter, a configuration and operations of a recording and reproducing device according to the present embodiment exhibiting such effects will be described in detail.
<1-2. Configuration According to First Embodiment>
The background to an invention of a recording and reproducing device according to the present embodiment has been described above. Subsequently, a configuration of a recording and reproducing device will be described according to the present embodiment with reference to
The left mic 110L (first microphone) and the right mic 110R (second microphone) are, for example, non-directional mics. The left mic 110L and the right mic 110R convert ambient sound into analog audio signals (electrical signals), and supply the analog audio signals to the A/D converting unit 120L and the A/D converting unit 120R, respectively.
The A/D converting unit 120L and the A/D converting unit 120R respectively convert the analog audio signals supplied from the left mic 110L and the right mic 110R into digital audio signals (hereinafter, also simply referred to as audio signals in some cases).
The gain correcting unit 130L and the gain correcting unit 130R respectively perform gain correcting processing for correcting a gain difference (a sensitivity difference) between the left mic 110L and the right mic 110R. The gain correcting unit 130L and the gain correcting unit 130R according to the present embodiment respectively correct a difference in audio signals outputted from the A/D converting unit 120L and the A/D converting unit 120R.
For example, the gain correcting unit 130L and the gain correcting unit 130R may measure in advance a gain difference between the left mic 110L and the right mic 110R, and perform gain correcting processing by multiplying the audio signals with a predetermined value to suppress the gain difference to. With the configuration, it is possible to suppress an influence of the gain difference between the left mic 110L and the right mic 110R and emphasize directivity with higher accuracy by a processing, which will be described later.
Note that the above description has been given of an example in which gain correcting processing is performed to a digital audio signal after A/D conversion. However, gain correcting processing may be performed to an analog audio signal before executing A/D conversion.
Further, hereinafter, there is a case where an audio signal outputted from the gain correcting unit 130L is referred to as a left input signal or a first audio signal, and an audio signal outputted from the gain correcting unit 130R is referred to as a right input signal or a second audio signal.
The first arithmetic processing unit 140L and the second arithmetic processing unit 140R perform arithmetic processing on the basis of the left input signal and the right input signal. For example, the first arithmetic processing unit 140L performs first suppressing processing to suppress the left input signal on the basis of the right input signal. Further, the second arithmetic processing unit 140R performs second suppressing processing to suppress the right input signal on the basis of the left input signal.
Functions of the first arithmetic processing unit 140L and the second arithmetic processing unit 140R may be implemented by, for example, different processors, respectively. Further, one processor may have both functions of the first arithmetic processing unit 140L and the second arithmetic processing unit 140R. Note that, hereinafter, an example will be described in which functions of the first arithmetic processing unit 140L and the second arithmetic processing unit 140R are implemented by a digital signal processor (DSP).
As illustrated in
The delay filters 142L and 142R are filters that perform processing to delay input signals. As illustrated in
The above-mentioned first delay processing and second delay processing are performed on the basis of a distance between the left mic 110L and the right mic 110R (distance between the mics). Since timing for transferring sound to each mic depends on a distance between the mics, it is possible, with the configuration, to obtain a directivity emphasizing effect based on a distance between the mics, for example, in combination with a suppressing processing, which will be described later.
For example, a first delay processing and a second delay processing using the delay filters 142L and 142R may delay a processing thereof by the number of samples corresponding to the time for transferring sound in a distance between mics. When a distance between mics is d [cm], a sampling frequency is f [Hz], and a speed of sound is c [m/s.], a number D of delay samples for delay by the delay filters 142L and 142R is calculated by, for example, the following formula.
Herein, in general, the number D of delay samples calculated by Formula (1) is not limited to an integer. In a case where the number D of delay samples is a non-integer, the delay filters 142L and 142R are non-integer delay filters. Strictly speaking, an implementation of a non-integer delay filter requires a filter at length of an infinite tap. However, in practice, a filter cut at length of a finite tap or a filter approximate with linear interpolation or the like may be used as the delay filters 142L and 142R. Hereinafter, a configuration example of a delay filter 142 will be described in a case of implementing the delay filter 142 (delay filters 142L and 142R) as a filter approximate with the linear interpolation or the like with reference to
When an integer part and a decimal part of the number D of delay samples are M and η, respectively, an approximate value of a signal obtained by delaying a signal y(n) inputted to the delay filter 142 by the number D of delay samples is obtained as the following formula.
[Math. 2]
y(n−D)≈ŷ(n−m−η)=(1−η)·y(n−M)+η·y(n−M−1) (2)
The above-mentioned Formula (2) is represented as a block diagram shown in
The delay filter 1421 is an integer delay filter that delays by the number M of delay samples. Further, the delay filter 1423 is an integer delay filter that delays by one as the number of delay samples. Further, the linear filter 1425 and the linear filter 1427 individually multiply the inputted signals with 1−η and η, and output the signals. Furthermore, the adder 1429 adds the inputted signals and outputs the added signals.
The above-mentioned first delay processing and second delay processing by the delay filter 142L and the delay filter 142R are performed on the basis of a predetermined filter coefficient. The filter coefficient may be specified to obtain the above-mentioned delay filter on the basis of a distance between mics. Note that according to the present embodiment, the left mic 110L and the right mic 110R are fixedly provided for the recording and reproducing device 1. Therefore, for example, the filter coefficient may be determined in advance on the basis of an implementation method of the above-mentioned delay filter 142.
Returning to
The suppressing unit 146L subtracts a signal based on the first delay processing from a left input signal to perform the first suppressing processing. Further, the suppressing unit 146R subtracts a signal based on the second delay processing from a right input signal to perform the second suppressing processing. With the configuration, an output signal of the suppressing unit 146L obtains directivity in a left direction by suppressing a signal in a right direction. Furthermore, an output signal of the suppressing unit 146R obtains directivity in a right direction by suppressing a signal in a left direction.
For example, as illustrated in
The equalization filter 148L is a filter that corrects frequency characteristics of a signal obtained by the first suppressing processing by the suppressing unit 146L. Further, the equalization filter 148R is a filter that corrects frequency characteristics of a signal obtained by the second suppressing processing by the suppressing unit 146R. The equalization filter 148L and the equalization filter 148R may perform correction to compensate for suppression in a frequency band that is suppressed irrespective of directivity with the above-mentioned suppressing processing. For example, with the above-mentioned suppressing processing, signals in a low band having a long wavelength are suppressed because a phase difference is small between a delayed signal and a non-delayed signal. The equalization filter 148L and the equalization filter 148R therefore may correct the frequency characteristics to emphasize signals in the low band. With the configuration, it is possible to reduce a change in frequency characteristics due to the suppressing processing. Note that a filter coefficient for performing the above-mentioned correction may be specified on the basis of a distance between mics.
Herein, when a left input signal is xl(n) and a right input signal is xr(n), an output signal yl(n) of the first arithmetic processing unit 140L and an output signal yr(n) of the second arithmetic processing unit 140R are expressed by the following formulae. Note that, hereinafter, it is assumed that the parameter α relating to the directivity correcting units 144L and 144R is 1.
[Math. 3]
yl(n)={xl(n)−xr(n)*p(n)}*q(n) (3)
yr(n)={xr(n)−xl(n)*p(n)}*q(n) (4)
Note that in Formulae (3) and (4), reference symbol “*” denotes a convolution operation, p(n) denotes the delay filters 142L and 142R, and q(n) denotes the equalization filters 148L and 148R.
In a case of implementing the arithmetic operations of Formulae (3) and (4) with the fixed-point operation, if a result of arithmetic operations in { } is rounded and set into a short length word, for example, a low band is amplified with a convolution operation of the equalization filter q(n) to the result of the arithmetic operations. Thus, there is a possibility to reduce a signal/noise ratio (S/N ratio) in the low band.
Further, such a method can also be considered that the result of arithmetic operations in { } of Formulae (3) and (4) is stored in a form of a long length word and the convolution operation of the equalization filter q(n) is executed with double precision. However, a memory of a buffer area for storing the result of the arithmetic operations is increased and a cost of arithmetic operations in double precision is also high.
Herein, by using a synthesized filter u(n)=p(n)*q(n) of the delay filter p(n) and the equalization filter q(n), the output signal yl(n) of the first arithmetic processing unit 140L and the output signal yr(n) of the second arithmetic processing unit 140R are expressed by the following formulae.
[Math. 4]
yl(n)=xl(n)*q(n)−xr(n)*u(n) (5)
yr(n)=xr(n)*q(n)−xl(n)*u(n) (6)
When arithmetic-operation is applied to the Formulae (5) and (6) with, for example, a DSP that can perform fixed-point arithmetic processing, the number of multiply-add operations is increased as compared with Formulae (3) and (4), but a synthesis of the convolution operation is not required. By subtracting two convolution operation results stored in an accumulator of the DSP with long length word, the arithmetic operation results of Formulae (5) and (6) are obtained. Therefore, the arithmetic operations using Formulae (5) and (6) avoid a reduction of S/N ratio and unnecessitate storage for results of arithmetic operations in double precision and a convolution operation in double precision.
Note that, although the parameter α relating to the directivity correcting units 144L and 144R is 1 in the above description, the arithmetic operations can be performed similarly even in a case where the parameter α is not 1.
An output signal of the first arithmetic processing unit 140L obtained as mentioned above is an audio signal of a left channel in stereo audio signals, and an output signal of the second arithmetic processing unit 140R is an audio signal of a right channel in the stereo audio signals. That is, the above-mentioned processing results in obtaining a stereo audio signal by combining an audio signal of a left channel with directivity in a left direction and an audio signal of a right channel with directivity in a right direction. With the configuration, the stereo audio signals have a sense of localization superior than that of stereo audio signals, for example, by combining the left input signal and the right input signal.
The encoding unit 150 performs encoding with the combination of above-mentioned audio signal of a left channel and audio signal of a right channel. An encoding method executed by the encoding unit 150 is not limited and may be, for example, a non-compression method, a lossless compression method, or a lossy compression method.
The storing unit 160 stores data obtained by an encoding with the encoding unit 150. The storing unit 160 may be implemented by, for example, a flash memory, a magnetic disc, an optical disc, a magneto-optical disc, or the like.
The decoding unit 170 decodes data stored in the storing unit 160. The decoding by the decoding unit 170 may be performed in accordance with an encoding method of the encoding unit 150.
The D/A converting unit 180L and the D/A converting unit 180R convert an audio signal of a left channel and an audio signal of a right channel that are outputted from the decoding unit 170 into an analog audio signal of the left channel and an analog audio signal of the right channel, respectively.
The speaker 190L and the speaker 190R reproduce (output sound) the analog audio signal of the left channel and the analog audio signal of the right channel that are respectively outputted from the D/A converting unit 180L and the D/A converting unit 180R. Note that the analog audio signal of the left channel and the analog audio signal of the right channel that are outputted from the D/A converting unit 180L and the D/A converting unit 180R may be outputted to an external speaker, an earphone, a headphone, or the like.
<1-3. Operation According to First Embodiment>
As mentioned above, a configuration example of the recording and reproducing device 1 has been described according to the first embodiment of the present disclosure. Subsequently, an operational example of a recording and reproducing device 1 will be described according to the present embodiment by paying attention to, in particular, operations of the first arithmetic processing unit 140L and the second arithmetic processing unit 140R with reference to
As illustrated in
Subsequently, the delay filter 142L performs a delay processing (first delay processing) of the right input signal, and the delay filter 142R performs a delay processing (second delay processing) of the left input signal (S104). The signals obtained by the above-mentioned delay processing are corrected to adjust directivity by the directivity correcting unit 144L and the directivity correcting unit 144R (S106).
Subsequently, the suppressing unit 146L suppresses the left input signal (first suppressing processing), and the suppressing unit 146R suppresses the right input signal (second suppressing processing). The equalization filter 148L and the equalization filter 148R correct frequency characteristics of suppressed signals obtained by the suppression (S110).
<1-4. Effect According to First Embodiment>
The first embodiment has been described above. According to the present embodiment, each of left and right audio signals is suppressed on the basis of the audio signal of each opposite side thereto to emphasize directivity of the audio signals. Even in the case where the input signal is an audio signal obtained by a non-directional mic, it is possible to obtain an output signal with a superior sense of localization. Further, according to the present embodiment, a sense of localization can be adjusted by changing the parameter α for adjusting directivity without requiring the physical mechanism for adjusting an angle of the mics.
«2. Second Embodiment»
<2-1. Outline According to Second Embodiment>
In the above-mentioned first embodiment, an example has been described in which the same device performs a recording and a reproduction. However, a device that performs a recording and a device that performs a reproduction is not limited to the same device. A recording device that performs a recording and a reproducing device that performs a reproduction may be, for example, IC recorders, respectively.
For example, there are a case of reproducing contents recorded with one IC recorder (recording device) by another IC recorder (reproducing device) via a network and a case of copying a file of the contents to another IC recorder (reproducing device) and reproducing the file.
In the case, for example, the reproducing device performs a suppressing processing on the basis of a distance between mics of the recording device and, thus, directivity of an audio signal can be emphasized and an output signal with a superior sense of localization can be obtained. Hence, herein, according to the second embodiment, an example will be described of a case where a recording device that performs a recording is different from a reproducing device that performs a reproduction.
<2-2. Configuration According to Second Embodiment>
A recording and reproducing system according to the second embodiment of the present disclosure will be described with reference to
(Recording Device)
The recording device 22 has at least a recording function. As illustrated in
Note that the recording device 22 according to the present embodiment performs processing corresponding to step S102 described with reference to
The meta-data storing unit 229 stores meta data used in a case where the reproducing device 24, which will be described later, performs a suppressing processing (processing for emphasizing directivity). The meta data stored in the meta-data storing unit 229 may include, for example, distance information associated with a distance between the left mic 221L and the right mic 221R, or information associated with a filter coefficient calculated on the basis of the distance between the mics. Further, the meta data stored in the meta-data storing unit 229 may include a device model code for identifying a model of the recording device 22, or the like. Further, the meta data stored in the meta-data storing unit 229 may include information associated with a gain difference between the left mic 221L and the right mic 221R.
Note that a format of meta data stored in the meta-data storing unit 229 may be of a chunk type used for Waveform Audio Format or the like or of a type using a structure of eXtensible Markup Language (XML) or the like.
Hereinafter, an example will be described in which meta data stored in the meta-data storing unit 229 includes information associated with a filter coefficient used in a case of performing at least a suppressing processing. Another example will be described later as a complement.
The multiplexer 231 outputs a plurality of input signals as one output signal. The multiplexer 231 according to the present embodiment outputs an audio signal encoded by the encoding unit 227 and meta data stored by the meta-data storing unit 229 as a single output signal.
The output signal outputted from the multiplexer 231 is stored in the storing unit 233 as a data file including audio data and meta data.
(Reproducing Device)
As illustrated in
Note that the reproducing device 24 according to the present embodiment performs a processing corresponding to steps S104 to S110 described with reference to
The de-multiplexer 241 receives, from the recording device 22, a signal multiplexing a audio signal and meta data together which are stored in the storing unit 233 of the recording device 22, de-multiplexes the signal into an audio signal and meta data, and outputs the audio signal and the meta data. The de-multiplexer 241 provides the audio signal to the decoding unit 243 and provides the meta data to the first arithmetic processing unit 249L and the second arithmetic processing unit 249R. As mentioned above, in the example illustrated in
Note that the example illustrated in
The UI unit 245 receives an input of a user for selecting whether or not the first arithmetic processing unit 249L and the second arithmetic processing unit 249R perform a processing for emphasizing directivity. A sound outputted by the processing for emphasizing directivity has an effect that the sound is spatially separated to be easily listened to. However, there is a case where, depending on the user, recorded raw contents are more preferable, and therefore the reproducing device 24 may include the UI unit 245.
The UI unit 245 may be implemented by various input mechanisms.
Further, as illustrated on the right in
Note that it is needless to say that a user may operate a physical switch or a touch panel to perform an input for a selection without apparent automatic notification as mentioned above to prompt a user to input for the selection.
Referring again to
The first arithmetic processing unit 249L includes, as illustrated in
The delay filters 2491L and 2491R are filters that perform a processing for delaying an input signal, similarly to the delay filters 142L and 142R described with reference to
Similarly to the equalization filters 148L and 142R described with reference to
<2-3. Effect According to Second Embodiment>
The above description has been given according to the second embodiment. According to the present embodiment, meta data based on a distance between mics at the time of recording is provided to a device that performs a reproduction, thereby enabling to obtain an output signal with a superior sense of localization even in a case where a device that performs a recording is different from a device that performs a reproduction.
<2-4. Complement According to Second Embodiment>
In the foregoing, an example has been described in which meta data stored in the meta-data storing unit 229 in the recording device 22 includes information associated with a filter coefficient used at least in the case of performing a suppressing processing. However, the present embodiment is not limited to the example.
For example, meta data may be a device model code for identifying a model of the recording device 22. In the case, for example, the reproducing device 24 determines whether or not the recording device 22 and the reproducing device 24 are of the same device model by using the device model code and, only in a case where the devices are of the same device model, a processing for emphasizing directivity may be performed.
Further, meta data may be distance information associated with a distance between mics. In the case, the de-multiplexer 241 in the reproducing device 24 functions as a distance information obtaining unit that obtains the distance information. In the case, for example, the reproducing device 24 may further include a storing unit that stores a plurality of the filter coefficients and a filter coefficient selecting unit that selects the filter coefficient corresponding to the distance information obtained by the de-multiplexer 241 from a plurality of the filter coefficients stored in the storing unit. Furthermore, in the case, the reproducing device 24 may further include a filter coefficient specifying unit that specifies the filter coefficient on the basis of the distance information obtained by the de-multiplexer 241 to dynamically generate the filter at the time of reproduction.
Further, meta data may include information associated with a gain difference between the left mic 221L and the right mic 221R. In the case, for example, in place of the case where the recording device 22 includes the gain correcting units 225L and 225R, the reproducing device 24 may include gain correcting units, and the gain correcting units in the reproducing device 24 may correct the gain on the basis of the information associated with the gain difference.
«3. Third Embodiment»
In the above-mentioned first embodiment and second embodiment, an example of storing a sound obtained via mics in a storing unit and thereafter reproducing the sound has been described. On the other hand, hereinafter, an example of reproducing in real time a sound obtained via mics will be described according to a third embodiment.
<3-1. Outline According to Third Embodiment>
An outline according to the third embodiment of the present disclosure will be described with reference to
The sending system 32 is a system that simultaneously sends sound and another data, such as character multiplex broadcasting. For example, the sending system 32 obtains a first audio signal and a second audio signal via stereo mics, and sends (broadcasts) information including the first audio signal, the second audio signal, and meta data to the compatible receiving devices 34A and 34B and the incompatible receiving devices 36A and 36B. Meta data according to the present embodiment may include information similar to meta data described with some examples in the second embodiment, and further may include meta data (character information, etc.) associated with broadcasting.
The compatible receiving devices 34A and 34B are signal processing devices corresponding to the suppressing processing (processing for emphasizing directivity) using meta data, and can perform a suppressing processing in a case of receiving meta data for the processing for emphasizing directivity. Further, the incompatible receiving devices 36A and 36B are devices that do not correspond to the suppressing processing using meta data, and ignore meta data for the processing for emphasizing directivity and process only the audio signal.
With the configuration, even in a case of reproducing in real time a sound obtained via the mics, if the device corresponds to the processing for emphasizing directivity, it is possible to obtain an output signal with a superior sense of localization.
<3-2. Configuration According to Third Embodiment>
In the foregoing, an outline of the broadcasting system 3 has been described according to the present embodiment. Subsequently, configuration examples of the sending system 32, a compatible receiving device 34, and an incompatible receiving device 36 which are provided for the broadcasting system 3 will be sequentially described in detail according to the present embodiment with reference to
(Sending System)
Note that the sending system 32 according to the present embodiment performs a processing corresponding to step S102 described with reference to
The obtaining unit 329 obtains meta data such as a distance between the left mic 321L and the right mic 321R or a filter coefficient based on the distance between the mics thereof. The obtaining unit 329 can obtain meta data by various methods.
Further, the obtaining unit 329 may be a sensor that is attached to both the left mic 321L and the right mic 321R to measure and output a distance between the mics.
For example, in audio recording of live broadcasting on TV or the like, it is assumed that a stereo mic is set to each camera. A distance between mics, however, is not uniquely defined because of camera size or the like. There is a possibility that a distance between mics is varied each time of switching between cameras. Further, even using the same mics, a case is considered where a distance between the mics is to be varied in real time. With the above-mentioned configuration of the obtaining unit 329, for example, even in a case of switching to a stereo mic of a different distance between mics or varying a distance between mics in real time, it is possible to send meta data such as a distance between mics obtained in real time.
Note that processing of the obtaining unit 329 may be included in the processing in step S102 described with reference to
The sending unit 331 illustrated in
(Compatible Receiving Device)
Note that the compatible receiving device 34 according to the present embodiment performs a processing corresponding to steps S104 to S110 described with reference to
The receiving unit 341 receives information including a first audio signal based on the left mic 321L of the sending system 32, a second audio signal based on the right mic 321R of the sending system 32, and meta data from the sending system 32.
The decoding unit 343 decodes the first audio signal and the second audio signal from the information received from the receiving unit 341. Further, the decoding unit 343 retrieves the meta data from the information received by the receiving unit 341 and provides to the meta-data parser 345.
The meta-data parser 345 analyzes meta data received from the decoding unit 343, and switches the switch units 347A to 347D in accordance with the meta data. For example, in a case where meta data includes distance information associated with a distance between mics or information associated with a filter coefficient, the meta-data parser 345 may switch the switch units 347A to 347D to perform a processing for emphasizing directivity including the first suppressing processing and the second suppressing processing.
With the configuration, in a case where processing for emphasizing the directivity is possible, the processing for emphasizing directivity is automatically executed, thereby enabling to obtain a superior sense of localization.
Further, in the case where meta data includes distance information associated with a distance between mics or information associated with a filter coefficient, the meta-data parser 345 provides the information to the first arithmetic processing unit 349L and the second arithmetic processing unit 349R.
As illustrated in
Stereo audio signals (left output and right output) outputted from the D/A converting units 351L and 351R may be reproduced via an external speaker, a headphone, or the like.
(Incompatible Receiving Device)
The decoding unit 363 decodes a first audio signal and a second audio signal from information received by the receiving unit 361. Note that, in a case where information received by the receiving unit 341 includes meta data, the decoding unit 343 may discard the meta data.
With the configuration, a receiving device incompatible to a processing for emphasizing directivity does not implement the processing for emphasizing directivity performs a general stereo reproduction. Therefore, a user does not feel something wrong.
<3-3. Effect According to Third Embodiment>
The third embodiment has been described above. According to the third embodiment, even in a case where a sound obtained via mics is reproduced in real time, a device compatible to a processing for emphasizing directivity can obtain the output signal with a superior sense of localization.
«4. Fourth Embodiment»
In the above-mentioned first embodiment, second embodiment, and third embodiment, examples have been described in which mics and a signal processing device are integrated, or completely disconnected (the mics are included in a device other than the signal processing device). On the other hand, hereinafter, according to a fourth embodiment, an example will be described in which mics and a signal processing device can be connected/disconnected and a mic component can be replaced as an accessory of the signal processing device.
<4-1. Outline According to Fourth Embodiment>
The stereo microphone devices 42A to 42C respectively have different distances d1, d2, and d3 between mics. A user can connect any of the stereo microphone devices 42A to 42C to a connector unit 441 of the smartphone 44.
With the above-mentioned connection, the smartphone 44 can receive a stereo audio signal and meta data from the stereo microphone devices 42A to 42C. Note that meta data according to the present embodiment may include information similar to meta data described as some examples in the second embodiment.
With the configuration, even in a case where a mic component can be replaced as an accessory of the smartphone 44, processing for emphasizing directivity is possible. Note that the smartphone 44 may obtain meta data of the stereo microphone devices 42A to 42C, other contents (stereo audio signal), and meta data corresponding thereto from the external server 8 via the communication network 9.
<4-2. Configuration According to Fourth Embodiment>
An outline according to the present embodiment has been described above. Subsequently, respective configurations of the stereo microphone devices 42A to 42C and the smartphone 44 will be described according to the present embodiment with reference to
(Stereo Microphone Device)
Hereinafter, configurations of the stereo microphone devices 42A to 42C will be described. However, the stereo microphone devices 42A to 42C have no difference in configurations other than the different distances between mics. Thus, the stereo microphone device 42A will be described as an example, and a description of the stereo microphone devices 42B and 42C is omitted.
As illustrated in
Respective configurations of the left mic 421AL, the right mic 421AR, and the A/D converting units 423AL and 423AR are similar to those of the left mic 110L, the right mic 110R, and the A/D converting units 120L and 120R which are described with reference to
Note that the stereo microphone devices 42A to 42C according to the present embodiment perform a processing corresponding to step S102 described with reference to
The connector unit 427A is a communication interface that is connected to the connector unit 441 of the smartphone 44 and provides stereo audio signals received from the A/D converting units 423AL and 423AR and meta data received from the meta-data storing unit 425A to the smartphone 44. The connector unit 427A may be, for example, a 3.5 mm phone plug that can multiplex the stereo audio signal and the meta data and send the signal and data. In the case, the connector unit 441 of the smartphone 44 may be a 3.5 mm phone jack corresponding to the plug. Note that a connection for communication between the stereo microphone device 42A and the smartphone 44 may be of another connection method, for example, a physical connecting method such a USB or a non-contact connecting method such an NFC or Bluetooth (registered trademark).
(Smartphone)
Respective configurations of the D/A converting units 457L and 457R are similar to those of the D/A converting units 180L and 180R described with reference to
Note that the smartphone 44 according to the present embodiment implements processing corresponding to steps S104 to S110 described with reference to
The connector unit 441 is connected to the stereo microphone devices 42A to 42C to obtain from the stereo microphone devices 42A to 42C meta data such as distance information associated with a distance between mics or filter coefficient information.
With the configuration, the smartphone 44 can receive stereo data and meta data from the stereo microphone devices 42A to 42C. Even in a case where a mic component can be replaced as an accessory of the smartphone 44, processing for emphasizing directivity is possible.
The data buffer 443 temporarily stores data obtained from the connector unit 441, and provides the data to the contents parser 445 and the meta-data parser 447. The contents parser 445 receives a stereo audio signal from the data buffer 443, and distributes the signal to a left input signal and a right input signal.
Note that contents parser 445 may obtain a stereo audio signal from the server 8 illustrated in
<4-3. Effect According to Fourth Embodiment>
The fourth embodiment has been described above. According to the present embodiment, the smartphone 44 can receive meta data required for processing for emphasizing directivity from the stereo microphone devices 42A to 42C. With the configuration, even if a mic and a signal processing device can be connected/disconnected and a mic component has a configuration that can be replaced as an accessory of a signal processing device, an output signal with a superior sense of localization can be obtained.
«5. Modified Example»
The first embodiment, the second embodiment, the third embodiment, and the fourth embodiment of the present disclosure have been described above. Hereinafter, modified examples of the respective embodiments will be described. Note that the modified examples, which will be described hereinafter, may be applied in place of the configurations described above in the respective embodiments, or may additionally be applied to the configurations described above in the respective embodiments.
In the above-mentioned embodiments, although an example has been described in which two mics are provided for one device, the present disclosure is not limited to the example. For example, a device according to the present disclosure may have three or more mics. Hereinafter, with reference to
A signal processing device 6 illustrated in
In the case, the signal processing device 6 may select two mics that are effective (aligned horizontally) depending on a direction, select a distance between the two mics, and execute processing such as storing or sending thereof. For example, the signal processing device 6 may include a sensor that can sense information associated with a direction of the signal processing device 6, e.g., an acceleration sensor, a gyro sensor, or the like, thereby determining the direction with information obtained by the sensor.
For example, in an example of using a vertical direction illustrated in
With the configuration, a proper mic is selected depending on a direction used by a user, and a distance between mics is selected depending on the selected mic to be used for processing for emphasizing directivity.
Note that in a case of sending the above-mentioned selected distance between the mics, as meta data, from the signal processing device 6 to another device, the other device may perform a processing for emphasizing directivity or reproducing processing.
«6. Example of Hardware Configuration»
The above description has been given according to each embodiment and the modified example of the present disclosure. The above-mentioned signal processing such as signal delay processing, processing for correcting directivity, signal suppressing processing, and processing for correcting the frequency characteristics may be implemented by hardware such as a combination of arithmetic units or may alternatively be implemented by a cooperation of software and a signal processing device hardware described later. Hereinafter, with reference to
The CPU 1001 functions as an arithmetic processing unit and a control device, and controls the whole operations in the signal processing device 1000 under various kinds of programs. Further, the CPU 1001 may be a microprocessor. The ROM 1002 stores a program and a parameter used by the CPU 1001. The RAM 1003 temporarily stores a program used in execution of the CPU 1001 and a parameter that is appropriately changed in the execution thereof. These are mutually connected by a host bus including a CPU bus or the like. Mainly, a cooperation of software with the CPU 1001, the ROM 1002 and the RAM 1003 implements functions of the first arithmetic processing units 140L, 249L, 349L, and 455L and the second arithmetic processing units 140R, 249R, 349R, and 455R.
The input device 1004 includes an input mechanism that allows a user to input information, such as a mouse, a keyboard, a touch panel, a button, a mic, a switch, and a lever, and an input control circuit that generates an input signal on the basis of an input by a user and outputs the signal to the CPU 1001. A user of the signal processing device 1000 operates the input device 1004, thereby enabling to input various kinds of data to the signal processing device 1000 or instruct a processing operation.
The output device 1005 includes a display device such as a liquid crystal display (LCD) device, an OLED device, or a lamp, for example. Further, the output device 1005 includes an audio output device such as a speaker or a headphone. For example, a display device displays a captured image or a generated image. On the other hand, an audio output device converts audio data or the like into sound and outputs the sound. The output device 1005 corresponds to, for example, the speakers 190L and 190R described with reference to
The storage device 1006 is a device for data storage. The storage device 1006 may include a storage medium, a recording device that records data to a storage medium, a reading device that reads data from a storage medium, a deleting device that deletes data recorded to a storage medium, or the like. The storage device 1006 stores a program executed by the CPU 1001 and various kinds of data. The storage device 1006 corresponds to, for example, the storing unit 160 described with reference to
The communication device 1007 is a communication interface that includes, for example, a communication device for connection to the communication network 9 or the like. Further, the communication device 1007 may include a wireless local area network (LAN) compatible communication device, a long term evolution (LTE) compatible communication device, a wired communication device that performs a wired communication, or a Bluetooth (registered trademark) communication device. The communication device 1007 corresponds to, for example, the receiving unit 341 described with reference to
As above, an example of a hardware configuration has been illustrated that can implements functions of the signal processing device 1000 according to the present embodiment. The respective components may be implemented by generic parts or may be implemented by hardware specific to functions of the respective components. Therefore, it is possible to appropriately change hardware configurations to be used in accordance with a technical level at the time when the present embodiments are in use.
Note that a computer program for implementing the respective functions of the above-mentioned signal processing device 1000 according to the present embodiment can be created and be mounted in a PC or the like. Further, it is also possible to provide a computer-readable recording medium that stores such a computer program. The recording medium is, for example, a magnetic disc, an optical disc, a magneto-optical disc, a flash memory, or the like. Furthermore, the above-mentioned computer program may be delivered without using a recording medium, for example, via a network.
«7. Conclusion»
As mentioned above, according to the embodiments of the present disclosure, even if the input signal is an audio signal obtained on the basis of a non-directional mic, it is possible to emphasize directivity and obtain an output signal with a superior sense of localization. For example, according to the embodiments of the present disclosure, even in a case of recording by using a small-sized device such as an IC recorder, sound localization is obtained as if a binaural recording were performed.
In particular, in the case where a conference is recorded and is thereafter reproduced to make minutes of meeting, specification of a speaker is important. According to the present disclosure, a position of a sound image of the speaker can be perceived. Therefore, with a so-called cocktail-party effect, it is easy to specify an utterer or listen to speaking contents.
The preferred embodiment(s) of the present disclosure has/have been described above with reference to the accompanying drawings, whilst the present disclosure is not limited to the above examples. A person skilled in the art may find various alterations and modifications within the scope of the appended claims, and it should be understood that they will naturally come under the technical scope of the present disclosure.
For example, each step according to the above-mentioned embodiments does not always need to be processed in time series in the order described as the flowcharts. For example, each step in the processing according to the above-mentioned embodiments may be processed in order different from that described as the flowcharts, or be processed in parallel.
Further, the effects described in this specification are merely illustrative or exemplified effects, and are not limitative. That is, with or in the place of the above effects, the technology according to the present disclosure may achieve other effects that are clear to those skilled in the art from the description of this specification.
Additionally, the present technology may also be configured as below.
A signal processing device including:
a first arithmetic processing unit that performs first suppressing processing for suppressing a first audio signal based on a first microphone on a basis of a second audio signal based on a second microphone; and
a second arithmetic processing unit that performs second suppressing processing for suppressing the second audio signal on a basis of the first audio signal.
The signal processing device according to (1), in which
an output signal of the first arithmetic processing unit is an audio signal of one channel in a stereo audio signal, and an output signal of the second arithmetic processing unit is an audio signal of another channel in the stereo audio signal.
The signal processing device according to (1) or (2), in which
the first arithmetic processing unit performs first delay processing for delaying the second audio signal, and performs the first suppressing processing by subtracting a signal based on the first delay processing from the first audio signal, and
the second arithmetic processing unit performs second delay processing for delaying the first audio signal, and performs the second suppressing processing by subtracting a signal based on the second delay processing from the second audio signal.
The signal processing device according to (3), in which
the first delay processing and the second delay processing are performed on a basis of a distance between the first microphone and the second microphone.
The signal processing device according to (4), in which
the first delay processing and the second delay processing are processing for delay by a number of samples corresponding to a time taken to transmit sound for the distance.
The signal processing device according to (4) or (5), in which
the first delay processing and the second delay processing are performed on a basis of a filter coefficient specified on a basis of the distance.
The signal processing device according to (6), further including:
a filter coefficient obtaining unit that obtains information associated with the filter coefficient.
The signal processing device according to (6), further including:
a distance information obtaining unit that obtains distance information associated with the distance;
a storing unit that stores a plurality of filter coefficients corresponding to the distance information; and
a filter coefficient selecting unit that selects the filter coefficient corresponding to the distance information obtained by the distance information obtaining unit from the plurality of the filter coefficients stored in the storing unit.
The signal processing device according to (6), further including:
a distance information obtaining unit that obtains distance information associated with the distance; and
a filter coefficient specifying unit that specifies the filter coefficient on a basis of the distance information.
The signal processing device according to any one of (4) to (9), further including:
a receiving unit that receives information including at least the first audio signal and the second audio signal,
in which the first suppressing processing and the second suppressing processing are performed in a case where the receiving unit further receives distance information associated with the distance.
The signal processing device according to any one of (6) and (7), further including:
a receiving unit that receives at least the first audio signal and the second audio signal,
in which the first suppressing processing and the second suppressing processing are performed in a case where the receiving unit receives information associated with the filter coefficient.
The signal processing device according to any one of (4) to (11), in which
the distance is specified by a jig that connects the first microphone and the second microphone and fixes the distance.
The signal processing device according to any one of (4) to (12), further including:
a connector unit that is connected to a stereo microphone device including the first microphone and the second microphone,
in which the connector unit obtains distance information associated with the distance from the stereo microphone device.
The signal processing device according to (6) or (7), further including:
a connector unit that is connected to a stereo microphone device including the first microphone and the second microphone, and
in which the connector unit obtains information associated with the filter coefficient from the stereo microphone device.
The signal processing device according to any one of (3) to (14), in which
the first arithmetic processing unit performs the first suppressing processing by subtracting a signal obtained by multiplying a signal obtained through the first delay processing by a predetermined value, from the first audio signal, and
the second arithmetic processing unit performs the second suppressing processing by subtracting a signal obtained by multiplying a signal obtained through the second delay processing by a predetermined value, from the second audio signal.
The signal processing device according to any one of (1) to (15), in which
the first arithmetic processing unit corrects a frequency characteristic of a signal obtained through the first suppressing processing, and
the second arithmetic processing unit corrects a frequency characteristic of a signal obtained through the second suppressing processing.
The signal processing device according to any one of (1) to (16), further including:
a gain correcting unit that corrects a difference in gain between the first microphone and the second microphone.
The signal processing device according to any one of (1) to (17), in which
the first microphone and the second microphone are non-directional microphones.
A signal processing method to be executed by a signal processing device, the signal processing method including:
performing first suppressing processing for suppressing a first audio signal based on a first microphone on a basis of a second audio signal based on a second microphone; and
performing second suppressing processing for suppressing the second audio signal on a basis of the first audio signal.
A program for causing a computer to implement:
a first arithmetic processing function of performing first suppressing processing for suppressing a first audio signal based on a first microphone on a basis of a second audio signal based on a second microphone; and
a second arithmetic processing function of performing second suppressing processing for suppressing the second audio signal on a basis of the first audio signal.
Number | Date | Country | Kind |
---|---|---|---|
2015-192866 | Sep 2015 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2016/074332 | 8/22/2016 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2017/056781 | 4/6/2017 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5206910 | Hamada | Apr 1993 | A |
6002776 | Bhadkamkar | Dec 1999 | A |
20100008511 | Kawano | Jan 2010 | A1 |
20110311064 | Teutsch | Dec 2011 | A1 |
Number | Date | Country |
---|---|---|
2 590 430 | May 2013 | EP |
04-027298 | Jan 1992 | JP |
2008-311802 | Dec 2008 | JP |
2009-239500 | Oct 2009 | JP |
2012-204900 | Oct 2012 | JP |
WO 2012027569 | Mar 2012 | WO |
WO 2014087195 | Jun 2014 | WO |
Entry |
---|
Written Opinion and English translation thereof dated Oct. 11, 2016 in connection with International Application No. PCT/JP2016/074332. |
International Preliminary Report on Patentability and English translation thereof dated Apr. 12, 2018 in connection with International Application No. PCT/JP2016/074332. |
International Search Report and English translation thereof dated Oct. 11, 2016 in connection with International Application No. PCT/JP2016/074332. |
Extended European Search Report dated May 2, 2019 in connection with European Application No. 16850957.8. |
Number | Date | Country | |
---|---|---|---|
20180262837 A1 | Sep 2018 | US |