This application is a U.S. National Stage Application filed under 35 U.S.C. § 371 claiming priority to International Patent Application No. PCT/JP2021/004641, filed on 8 Feb. 2021, which application claims priority to and the benefit of International Patent Application No. PCT/JP2020/010080, filed on 9 Mar. 2020; International Patent Application No. PCT/JP2020/010081, filed on 9 Mar. 2020; and International Patent Application No. PCT/JP2020/041216, filed on 4 Nov. 2020, the disclosures of which are hereby incorporated herein by reference in their entireties.
The present disclosure relates to a technique of obtaining a monaural sound signal from a plurality of channel sound signals for the purpose of monaural coding of a sound signal, coding of a sound signal by a combination of monaural coding and stereo coding, monaural signal processing of a sound signal, and signal processing of a stereo sound signal using a monaural sound signal.
A technique of obtaining a monaural sound signal from a 2-channel sound signal and performing embedded coding/decoding of the 2-channel sound signal and the monaural sound signal is disclosed in PTL 1. PTL 1 discloses a technique in which a monaural signal is obtained by averaging an input left-channel sound signal and an input right-channel sound signal for each corresponding sample, a monaural code is obtained by coding (monaural coding) the monaural signal, a monaural local decoding signal is obtained by decoding (monaural decoding) the monaural code, and the difference (predictive residual signal) of a predictive signal obtained from the monaural local decoding signal and the input sound signal is coded for each of the left channel and the right channel. In the technique disclosed in PTL 1, for each channel, the degradation of the sound quality of the decoding sound signal of each channel is suppressed by selecting a predictive signal, which is set as a signal provided with an amplitude ratio by delaying the monaural local decoding signal, with a delay and an amplitude ratio achieving a minimum error between the input sound signal and the predictive signal, or by subtracting a predictive signal from the input sound signal by using a predictive signal with a delay and an amplitude ratio that maximizes the mutual correlation between the input sound signal and the monaural local decoding signal, so as to obtain a predictive residual signal to be subjected to coding/decoding.
PTL 1 WO2006/070751
In the technique disclosed in PTL 1, the coding efficiency for each channel can be increased by optimizing the delay and the amplitude ratio given to the monaural local decoding signal when obtaining the predictive signal. In the technique disclosed in PTL 1, however, the monaural local decoding signal is obtained by coding/decoding the monaural signal obtained by averaging the left-channel sound signal and the right-channel sound signal. That is, the technique disclosed in PTL 1 is disadvantageous in that no contrivance is made to obtain a monaural signal useful for signal processing such as coding processing from a sound signal of a plurality of channels.
An object of the present disclosure is to provide a technique for obtaining a monaural signal useful for signal processing such as coding processing from a sound signal of a plurality of channels.
A sound signal downmix method according to an aspect of the present disclosure is a method of obtaining a downmix signal that is a monaural sound signal from input sound signals of N channels, N being an integer of three or greater, the sound signal downmix method including an inter-channel relationship information obtaining step of obtaining an inter-channel correlation value and preceding channel information of every pair of two channels included in the N channels, the inter-channel correlation value being a value indicating a degree of a correlation between input sound signals of the two channels, the preceding channel information being information indicating which of the input sound signals of the two channels is preceding, and a downmix step of obtaining the downmix signal by weighting and adding the input sound signals of the N channels, the input sound signal of each channel being weighted based on the inter-channel correlation value and the preceding channel information such that the larger a correlation with an input sound signal of a preceding channel that precedes the channel, the smaller a weight, whereas the larger a correlation with an input sound signal of a succeeding channel that succeeds the channel, the larger the weight, in which the inter-channel relationship information obtaining step includes a channel sorting step of sequentially performing sorting in an order from a first channel such that an adjacent channel is a channel with a most similar input sound signal among remaining channels, and obtaining a first sorted input sound signal to an Nth sorted input sound signal that are signals after the sorting of the N channels, and first original channel information to Nth original channel information of the N channels for the sorted input sound signals, the first original channel information to the Nth original channel information being channel numbers of the N channels for the input sound signals, an inter-adjacent-channel relationship information estimation step of obtaining an inter-channel correlation value and an inter-channel time difference of every pair of two channels after the sorting with adjacent channel numbers after the sorting among the first to Nth sorted input sound signals, and an inter-channel relationship information complement step including obtaining an inter-channel correlation value of every pair of two channels after the sorting with non-adjacent channel numbers after the sorting from the inter-channel correlation value of every pair of two channels after the sorting with adjacent channel numbers after the sorting, obtaining the inter-channel correlation value between the input sound signals of every pair of two channels included in the N channels by associating the inter-channel correlation value of every pair of channels after the sorting with a pair of channels for the input sound signals of the N channels by using the original channel information, obtaining an inter-channel time difference of every pair of two channels after the sorting with non-adjacent channel numbers after the sorting from the inter-channel time difference of every pair of two channels after the sorting with adjacent channel numbers after the sorting, and obtaining preceding channel information of every pair of two channels included in the N channels by establishing an association with a pair of channels for the input sound signals of the N channels by using the original channel information from the inter-channel time difference of every pair of channels after the sorting, and obtaining the preceding channel information based on whether the inter-channel time difference is positive, negative or zero, two channel numbers of every pair of two channels after the sorting with adjacent channel numbers after the sorting are denoted as i and i+1, i being an integer from 1 to N−1, the inter-channel correlation value of every pair of two channels after the sorting with adjacent channel numbers after the sorting is denoted as γ′i(i+1), the inter-channel time difference of every pair of two channels after the sorting with adjacent channel numbers after the sorting is denoted as τ′i(i+1), two channel numbers of every pair of two channels after the sorting with non-adjacent channel numbers after the sorting are denoted as n and m, n being an integer from 1 to N−2, m being an integer from n+2 to N, the inter-channel correlation value of every pair of two channels after the sorting with non-adjacent channel numbers after the sorting is denoted as γ′nm, and the inter-channel time difference of every pair of two channels after the sorting with non-adjacent channel numbers after the sorting is denoted as τ′nm, the inter-channel correlation value γ′nm of every pair of two channels after the sorting with non-adjacent channel numbers after the sorting is a value that has a monotonically non-decreasing relationship with every of one or more of the inter-channel correlation values γ′i(i+1) including a minimum value of the inter-channel correlation values γ′i(i+1) of the pairs of two channels with adjacent channel numbers after the sorting, i of the inter-channel correlation values γ′i(i+1) being from n to m−1, and the inter-channel time difference τ′nm of every pair of two channels after the sorting with non-adjacent channel numbers after the sorting is a value obtained by adding up all of the inter-channel time differences τ′i(i+1) of the pairs of two channels with adjacent channel numbers after the sorting, i of the inter-channel time differences τ′i(i+1) being from n to m−1.
A sound signal coding method according to an aspect of the present disclosure includes the sound signal downmix method as a sound signal downmix step, a monaural coding step of obtaining a monaural code by coding the downmix signal obtained in the downmix step, and a stereo coding step of obtaining a stereo code by coding the input sound signals of the N channels.
According to the present disclosure, it is possible to obtain a monaural signal useful for signal processing such as coding processing from a sound signal of a plurality of channels.
A 2-channel sound signal that is the target of signal processing such as coding processing is often a digital sound signal obtained through an AD conversion of sounds picked up by a left-channel microphone and a right-channel microphone disposed in a certain space. In this case, a left-channel input sound signal, which is a digital sound signal obtained through an AD conversion of a sound picked up by the left-channel microphone disposed in the space, and a right-channel input sound signal, which is a digital sound signal obtained through an AD conversion of a sound picked up by the right-channel microphone disposed in the space, are input to an apparatus for performing signal processing such as coding processing. The left-channel input sound signal and right-channel input sound signal each include the sound output by each sound source in the space with a given difference (so-called arrival time difference) between the arrival time at the left-channel microphone from the sound source and the arrival time at the right-channel microphone from the sound source.
In the above-described technique disclosed in PTL 1, a predictive residual signal is obtained by subtracting, from an input sound signal, a predictive signal, which is a monaural local decoding signal provided with a delay and an amplitude ratio, and the predictive residual signal is subjected to coding/decoding. That is, for each channel, the higher the similarity between the input sound signal and the monaural local decoding signal, the higher the efficiency of the coding. However, for example, in the case where only a sound output by one sound source in a certain space is included in the left-channel input sound signal and the right-channel input sound signal with a given arrival time difference, and the monaural local decoding signal is a signal obtained by coding/decoding a monaural signal obtained by averaging the left-channel sound signal and the right-channel sound signal, the similarity of the left-channel sound signal and the monaural local decoding signal is not significantly high, and the similarity of the right-channel sound signal and the monaural local decoding signal is also not significantly high, even though the left-channel sound signal, the right-channel sound signal, and the monaural local decoding signal each include only a sound output by the same single sound source. In this manner, when a monaural signal is obtained by only averaging the left-channel sound signal and the right-channel sound signal, a monaural signal useful for signal processing such as coding processing cannot be obtained in some situation.
In view of this, a sound signal downmix apparatus of a first embodiment performs downmix processing that takes into account the relationship between the left-channel input sound signal and the right-channel input sound signal so that a monaural signal useful for signal processing such as coding processing can be obtained. The sound signal downmix apparatus of the first embodiment will be described below.
First, a sound signal downmix apparatus of a first example of the first embodiment will be described. As illustrated in
Left-Right Relationship Information Estimation Unit 183
A left-channel input sound signal input to the sound signal downmix apparatus 401 and a right-channel input sound signal input to the sound signal downmix apparatus 401 are input to the left-right relationship information estimation unit 183. The left-right relationship information estimation unit 183 obtains a left-right correlation value γ and preceding channel information from the left-channel input sound signal and the right-channel input sound signal and outputs the left-right correlation value γ and the preceding channel information (step S183).
The preceding channel information is information representing whether a sound output by a main sound source in a certain space has arrived first at the left-channel microphone disposed in the space or the right-channel microphone disposed in the space. That is, the preceding channel information is information indicating whether the same sound signal is included first in the left-channel input sound signal or the right-channel input sound signal. When the case where the same sound signal is included first in the left-channel input sound signal is referred to as “the left channel is preceding” or “the right channel is succeeding” and the case where the same sound signal is included first in the right-channel input sound signal is referred to as “the right channel is preceding” or “the left channel is succeeding”, the preceding channel information is information indicating which of the left channel and the right channel is preceding. The left-right correlation value γ is a correlation value that takes into account the time difference between the left-channel input sound signal and the right-channel input sound signal. That is, the left-right correlation value γ is a value indicating the degree of the correlation between the sample sequence of the input sound signal of the preceding channel and the sample sequence of the input sound signal of the succeeding channel shifted backward by τ samples relative to the sample sequence of the preceding channel. In the following description, τ is also referred to as a left-right time difference. The preceding channel information and the left-right correlation value γ are information indicating the relationship between the left-channel input sound signal and the right-channel input sound signal, and therefore can be referred to as left-right relationship information.
A case where, for example, the absolute value of a correlation coefficient is used as a value indicating the degree of the correlation will be described. For each candidate number of samples τcand from τmax to τmin set in advance (for example, τmax is a positive number and τmin is a negative number), the left-right relationship information estimation unit 183 obtains and outputs, as the left-right correlation value γ, a maximum value of an absolute value γcand of the correlation coefficient between the sample sequence of the left-channel input sound signal and the sample sequence of the right-channel input sound signal shifted backward relative to the sample sequence of the left-channel input sound signal by the candidate number of samples τcand, obtains and outputs information indicating that the left channel is preceding as the preceding channel information in the case where τcand when the absolute value of the correlation coefficient is a maximum value is a positive value, and obtains and outputs information indicating that the right channel is preceding as the preceding channel information in the case where τcand when the absolute value of the correlation coefficient is a maximum value is a negative value. In the case where τcand when the absolute value of the correlation coefficient is a maximum value is zero, the left-right relationship information estimation unit 183 may obtain and output information indicating that the left channel is preceding as the preceding channel information or obtain and output information indicating that the right channel is preceding as the preceding channel information, while it is preferable to obtain and output information indicating that no channel is preceding as the preceding channel information.
Each candidate number of samples set in advance may be an integer value from τmax to τmin, may include fractions and decimals between τmax and τmin, and may not include any of integer values between τmax and τmin. In addition, τmax may or may not be equal to −τmin. When it is assumed that an input sound signal whose preceding channel is unknown is targeted, it is preferable that τmax be a positive number and that τmin be a negative number. When a special input sound signal in which any of channels is necessarily preceding is targeted, both τmax and τmin may be positive numbers, or negative numbers. To calculate the absolute value γcand of the correlation coefficient, one or more samples of a past input sound signal continuous to the sample sequence of the input sound signal of the current frame may also be used. In this case, it suffices to store the sample sequences of the input sound signals in a predetermined number of past frames in a storage unit not illustrated in the drawing in the left-right relationship information estimation unit 183.
In addition, for example, instead of the absolute value of the correlation coefficient, a correlation value using information about a phase of a signal may be set as γcand as follows. In this example, the left-right relationship information estimation unit 183 first obtains frequency spectra XL(k) and XR(k) at each frequency k of 0 to T−1 by performing Fourier transform on each of the left-channel input sound signals xL(1), xL(2) . . . , xL(t) and the right-channel input sound signals xR(1), xR(2) . . . , xR(t) as in the following Equation (1-1) and Equation (1-2).
Next, the left-right relationship information estimation unit 183 obtains a phase difference spectrum φ(k) at each frequency k through the following Equation (1-3) by using the frequency spectra XL(k) and XR(k) at each frequency k obtained through Equation (1-1) and Equation (1-2).
Next, the left-right relationship information estimation unit 183 obtains a phase difference signal ψ(τcand) for each candidate number of samples τcand from τmax to τmin as in the following Equation (1-4) by performing inverse Fourier transform on the phase difference spectrum obtained through Equation (1-3).
The absolute value of the phase difference signal ψ(τcand) obtained through Equation (1-4) represents some kind of correlation corresponding to the plausibility of the time difference between the left-channel input sound signals xL(1), xL(2) . . . , xL(t) and the right-channel input sound signals xR(1), xR(2) . . . , xR(t), and therefore the left-right relationship information estimation unit 183 uses, as a correlation value γcand, the absolute value of the phase difference signal ψ(τcand) for each candidate number of samples τcand. Specifically, the left-right relationship information estimation unit 183 obtains and outputs a maximum value of the correlation value γcand that is the absolute value of the phase difference signal ψ(τcand) as the left-right correlation value γ, obtains and outputs information indicating that the left channel is preceding as the preceding channel information in the case where τcand when the correlation value is a maximum value is a positive value, and obtains and outputs information indicating that the right channel is preceding as the preceding channel information in the case where τcand when the correlation value is a maximum value is a negative value. In the case where τcand when the correlation value is a maximum value is zero, the left-right relationship information estimation unit 183 may obtain and output information indicating that the left channel is preceding as the preceding channel information, and may obtain and output information indicating that the right channel is preceding as the preceding channel information, while it is preferable to obtain and output information indicating that no channel is preceding as the preceding channel information. Note that instead of using as it is the absolute value of the phase difference signal ψ(τcand) as the correlation value γcand, the left-right relationship information estimation unit 183 may use a normalized value such as a relative difference between the average of the absolute values of phase difference signals obtained for a plurality of candidate numbers of samples before and after τcand and the absolute value of the phase difference signal ψ(τcand) for each τcand, for example. That is, the left-right relationship information estimation unit 183 may use, as γcand, a normalized correlation value obtained by obtaining an average value through the following Equation (1-5) using the positive number τrange set in advance for each τcand, and by using the following Equation (1-6) using the obtained average value ψc(τcand) and phase difference signal ψ(τcand).
Note that the normalized correlation value obtained through Equation (1-6) is a value from 0 to 1, with a property in which the higher the plausibility of τcand as the left-right time difference, the closer it is to 1, whereas the lower the plausibility of τcand as the left-right time difference, the closer it is to 0.
Downmix Unit 112
The left-channel input sound signal input to the sound signal downmix apparatus 401, the right-channel input sound signal input to the sound signal downmix apparatus 401, the left-right correlation value γ output by the left-right relationship information estimation unit 183, and the preceding channel information output by the left-right relationship information estimation unit 183 are input to the downmix unit 112. The downmix unit 112 obtains a downmix signal by weighting and averaging the left-channel input sound signal and the right-channel input sound signal such that as the left-right correlation value γ becomes larger, the input sound signal of the preceding channel of the left-channel input sound signal and the right-channel input sound signal is more included in the downmix signal, and the downmix unit 112 outputs the downmix signal (step S112).
For example, in the case where the absolute value of the correlation coefficient and the normalized value are used for the correlation value as in the above-described example of the left-right relationship information estimation unit 183, the left-right correlation value γ input from the left-right relationship information estimation unit 183 is a value from 0 to 1. Therefore, the downmix unit 112 may obtain a downmix signal xM(t) obtained by weighting and adding the left-channel input sound signal xL(t) and the right-channel input sound signal xR(t) by using the weight set by the left-right correlation value γ for each corresponding sample number t. To be more specific, the downmix unit 112 may obtain the downmix signal xM(t) as xM(t)=((1+γ)/2)×xL(t)+((1−γ)/2)×xR(t) in the case where the preceding channel information is information indicating that the left channel is preceding, that is, in the case where the left channel is preceding, and the downmix unit 112 may obtain the downmix signal xM(t) as xM(t)=((1−γ)/2)×xL(t)+((1+γ)/2)×xR(t) in the case where the preceding channel information is information indicating that the right channel is preceding, that is, in the case where the right channel is preceding. When the downmix unit 112 obtains the downmix signal in the above-described manner, the smaller the left-right correlation value γ, that is, the smaller the correlation of the left-channel input sound signal and the right-channel input sound signal, the downmix signal is similar to a signal obtained by averaging the left-channel input sound signal and the right-channel input sound signal, whereas the larger the left-right correlation value γ, that is, the larger the correlation of the left-channel input sound signal and the right-channel input sound signal, the downmix signal is similar to the input sound signal of the preceding channel of the left-channel input sound signal and the right-channel input sound signal.
Note that in the case where no channel is preceding, it is preferable that the downmix unit 112 obtain and output the downmix signal by averaging the left-channel input sound signal and the right-channel input sound signal such that the left-channel input sound signal and the right-channel input sound signal are included in the downmix signal with the same weight. That is, in the case where the preceding channel information indicates that no channel is preceding, the downmix unit 112 preferably obtains, for each sample number t, the downmix signal xM(t) as xM(t)=(xL(t)+xR(t))/2 obtained by averaging the left-channel input sound signal xL(t) and the right-channel input sound signal xR(t).
For example, in the case where an apparatus different from the sound signal downmix apparatus performs stereo coding processing of the left-channel input sound signal and the right-channel input sound signal, and in the case where the left-channel input sound signal and the right-channel input sound signal are signals obtained through the stereo decoding processing in an apparatus different from the sound signal downmix apparatus signal, either one or both of the preceding channel information and the left-right correlation value γ identical to that obtained by the left-right relationship information estimation unit 183 can possibly be obtained in the apparatus different from the sound signal downmix apparatus. In the case where either one or both of the left-right correlation value γ and the preceding channel information has been obtained in the different apparatus, either one or both of the left-right correlation value γ and the preceding channel information obtained in the different apparatus is input to the sound signal downmix apparatus, and the left-right relationship information estimation unit 183 obtains the left-right correlation value γ or the preceding channel information that has not been input to the sound signal downmix apparatus. Below, a second example, which is an example of the sound signal downmix apparatus on the assumption that either one or both of the left-right correlation value γ and the preceding channel information is input from the outside, will be described mainly about differences from the first example.
As illustrated in
Left-Right Relationship Information Obtaining Unit 185
The left-right relationship information obtaining unit 185 obtains and outputs the left-right correlation value γ, which is a value indicating the degree of the correlation of the left-channel input sound signal and the right-channel input sound signal, and the preceding channel information, which is information indicating which of the left-channel input sound signal and the right-channel input sound signal is preceding (step S185).
As indicated by the dashed line in
As indicated by the broken line in
As indicated by the broken line in
Even in the case where the number of channels is three or more, a monaural signal useful for signal processing such as coding processing can be obtained by setting the same relationship between the downmix signal and the input sound signal of each channel as that of the sound signal downmix apparatuses 401 and 405 of the first embodiment. This configuration will be described as a second embodiment.
The way of including the input sound signal of a certain channel in a downmix signal in the sound signal downmix apparatuses 401 and 405 of the first embodiment will be described below with the channel number of each of the left channel and the right channel set as n. The sound signal downmix apparatuses 401 and 405 of the first embodiment operate such that, for each nth channel, the larger the correlation of the input sound signal of a channel succeeding the nth channel and the input sound signal of the nth channel, the larger the weight of the input sound signal of the nth channel included in the downmix signal, whereas the larger the correlation of the input sound signal of a channel preceding the nth channel and the input sound signal of the nth channel, the smaller the weight of the input sound signal of the nth channel included in the downmix signal. The sound signal downmix apparatus of the second embodiment expands the above-described relationship between the input sound signal and the downmix signal, so as to support the case with a plurality of preceding channels, the case with a plurality of succeeding channels, and the case with both a preceding channel and a succeeding channel. The sound signal downmix apparatus of the second embodiment will be described below. Note that the sound signal downmix apparatus of the second embodiment is an apparatus that expands the sound signal downmix apparatus of the first embodiment so as to support the case where the number of channels is three or more, and operates in the same manner as that of the sound signal downmix apparatus of the first embodiment when the number of channels is two.
In the first embodiment, an example has been described in which the smaller the correlation of the input sound signals between channels, the similar the downmix signal obtained by the sound signal downmix apparatuses 401 and 405 is to a signal obtained by averaging all input sound signals. The above-described relationship between the input sound signal and the downmix signal can be achieved even when the number of channels is three or more, and therefore it is described as an example of the sound signal downmix apparatus of the second embodiment.
First, a sound signal downmix apparatus of a first example of the second embodiment will be described. As illustrated in
Inter-Channel Relationship Information Estimation Unit 186
The input sound signals of the N channels input to the sound signal downmix apparatus 406 are input to the inter-channel relationship information estimation unit 186. The inter-channel relationship information estimation unit 186 obtains an inter-channel correlation value and the preceding channel information from the input sound signals of the N channels input thereto and outputs the inter-channel correlation value and the preceding channel information (step S186). The inter-channel correlation value and the preceding channel information are information indicating the relationship between channels for the input sound signals of the N channels, and can be referred to as inter-channel relationship information.
The inter-channel correlation value is a value indicating the degree of the correlation for each pair of two channels included in the N channels in consideration of the time difference between input sound signals. (N×(N−1))/2 pairs of two channels are included in the N channels. In the case where n is an integer from 1 to N, m is an integer greater than n and equal to or smaller than N, and the inter-channel correlation value between the nth channel input sound signal and mth channel input sound signal is γnm, the inter-channel relationship information estimation unit 186 obtains the inter-channel correlation value γnm of each of (N×(N−1))/2 pairs of n and m.
The preceding channel information is information, for each pair of two channels included in the N channels, indicating which of the input sound signals of the two channels include the same sound signal first and thus indicating which of the two channels is preceding. In the case where the preceding channel information between the nth channel input sound signal and mth channel input sound signal is referred to as INFOnm, the inter-channel relationship information estimation unit 186 obtains the preceding channel information INFOnm of each of the above-described (N×(N−1))/2 pairs of n and m. Note that in the following description, for each pair of n and m, the case where the same sound signal is included in the nth channel input sound signal earlier than the mth channel input sound signal may be referred to as “the nth channel is preceding the mth channel”, “the nth channel precedes the mth channel”, “the mth channel is succeeding the nth channel”, “the mth channel succeeds the nth channel”, and the like. Likewise, in the following description, for each pair of n and m, the case where the same sound signal is included in the mth channel input sound signal earlier than the nth channel input sound signal may be referred to as “the mth channel is preceding the nth channel”, “the mth channel precedes the nth channel”, “the nth channel is succeeding the mth channel”, “the nth channel succeeds the mth channel”, and the like.
It suffices that the inter-channel relationship information estimation unit 186 obtains the inter-channel correlation value γnm and the preceding channel information INFOnm as with the left-right relationship information estimation unit 183 of the first embodiment for each of the (N×(N−1))/2 pairs of the nth channel and the mth channel. Specifically, the inter-channel relationship information estimation unit 186 can obtain the inter-channel correlation value γnm and the preceding channel information INFOnm of each pair of the nth channel and the mth channel by performing the same operation as that of each example of the left-right relationship information estimation unit 183 of the first embodiment for each of the (N×(N−1))/2 pairs of the nth channel and the mth channel. Here, in each example of the description of the left-right relationship information estimation unit 183 of the first embodiment, the left channel is read as the nth channel, the right channel is read as the mth channel, L is read as n, R is read as m, the preceding channel information is read as the preceding channel information INFOnm, and the left-right correlation value γ is read as the inter-channel correlation value γnm, for example.
For example, the absolute value of a correlation coefficient is used as a value indicating the degree of the correlation. In such a case, for each candidate number of samples τcand from τmax to τmin set in advance for each of the (N×(N−1))/2 pairs of the nth channel and the mth channel, the inter-channel relationship information estimation unit 186 obtains and outputs, as an inter-channel correlation coefficient γnm, a maximum value of the absolute value γcand of the correlation coefficient between the sample sequence of the nth channel input sound signal and the sample sequence of the mth channel input sound signal shifted backward relative to the sample sequence of the nth channel input sound signal by the candidate number of samples τcand, obtains and outputs information indicating that the nth channel is preceding as the preceding channel information INFOnm in the case where τcand when the absolute value of the correlation coefficient is a maximum value is a positive value, and obtains and outputs information indicating that the mth channel is preceding as the preceding channel information INFOnm in the case where τcand when the absolute value of the correlation coefficient is a maximum value is a negative value. In the case where τcand when the absolute value of the correlation coefficient is a maximum value is zero, the inter-channel relationship information estimation unit 186 may obtain and output information indicating that the nth channel is preceding as the preceding channel information INFOnm, or may obtain and output information indicating that the mth channel is preceding as the preceding channel information INFOnm, for each pair of the nth channel and the mth channel. Note that τmax and τmin are the same as those of the first embodiment.
In addition, for example, instead of the absolute value of the correlation coefficient, a correlation value using information about a phase of a signal may be set as γcand as follows. In this example, first, the inter-channel relationship information estimation unit 186 obtains the frequency spectrum Xi(k) at each frequency k of 0 to T−1 by performing Fourier transform on input sound signals xi(1), xi(2) . . . , xi(T) as in the following Equation (2-1) for each channel i from the first channel input sound signal to the Nth channel input sound signal.
Next, the inter-channel relationship information estimation unit 186 performs subsequent processing for each of the (N×(N−1))/2 pairs of the nth channel and the mth channel. First, the inter-channel relationship information estimation unit 186 obtains the phase difference spectrum φ(k) at each frequency k through the following Equation (2-2) by using the nth channel frequency spectrum Xn(k) and the mth channel frequency spectrum Xm(k) at each frequency k obtained through Equation (2-1).
Next, the inter-channel relationship information estimation unit 186 obtains the phase difference signal ψ(τcand) for each candidate number of samples τcand from τmax to τmin as in Equation (1-4) by performing inverse Fourier transform on the phase difference spectrum obtained through Equation (2-2). Next, the inter-channel relationship information estimation unit 186 obtains and outputs the maximum value of the correlation value γcand that is the absolute value of the phase difference signal ψ(τcand) as the inter-channel correlation value γnm, obtains and outputs information indicating that the nth channel is preceding as the preceding channel information INFOnm in the case where τcand when the correlation value is a maximum value is a positive value, and obtains and outputs information indicating that the mth channel is preceding as the preceding channel information INFOnm in the case where τcand when the correlation value is a maximum value is a negative value. In the case where τcand when the correlation value is a maximum value is zero, the inter-channel relationship information estimation unit 186 may obtain and output information indicating that the nth channel is preceding as the preceding channel information INFOnm, or information indicating that the mth channel is preceding as the preceding channel information INFOnm.
Note that instead of using as it is the absolute value of the phase difference signal ψ(τcand) as the correlation value γcand, the inter-channel relationship information estimation unit 186, as with the left-right relationship information estimation unit 183, may use a normalized value such as a relative difference between the average of the absolute values of phase difference signals obtained for a plurality of candidate numbers of samples before and after τcand and the absolute value of the phase difference signal ψ(τcand) for each τcand, for example. That is, the inter-channel relationship information estimation unit 186 may obtain an average value through Equation (1-5) by using the positive number τrange set in advance for each τcand, and use, as γcand, a normalized correlation value obtained through Equation (1-6) by using the obtained average value ψc(τcand) and phase difference signal ψ(τcand).
Downmix Unit 116
The input sound signals of the N channels input to the sound signal downmix apparatus 406, the inter-channel correlation value γnm of each the (N×(N−1))/2 pairs of n and m (that is, the inter-channel correlation value of each pair of two channels included in the N channels) output by the inter-channel relationship information estimation unit 186, and the preceding channel information INFOnm of each of the (N×(N−1))/2 pairs of n and m (that is, the preceding channel information of each pair of two channels included in the N channels) output by the inter-channel relationship information estimation unit 186 are input to the downmix unit 116. The downmix unit 116 weights the input sound signal of each channel such that the larger the correlation with the input sound signal of each channel that precedes the channel, the smaller the weight, whereas the larger the correlation with the input sound signal of each channel that succeeds the channel, the larger the weight, and thus obtains and outputs a downmix signal by weighting and adding the input sound signals of the N channels (step S116).
Specific Example 1 of Downmix Unit 116
A specific example 1 of the downmix unit 116 will be described below with the channel number of each channel (channel index) as i, input sound signals of the ith channel as xi(1), xi(2) . . . , xi(T), and the downmix signals as xM(1), xM(2) . . . , xM(T). Assume that in the specific example 1, the inter-channel correlation value is a value from 0 to 1 as with the absolute value of the correlation coefficient and the normalized value in the above-described example of the inter-channel relationship information estimation unit 186. In addition, here, M is not a channel number, but is a subscript indicating that a downmix signal is a monaural signal. The downmix unit 116 obtains a downmix signal by performing the processing of step S116-1 to step S116-3 described below, for example. First, for each ith channel, the downmix unit 116 obtains the set ILi of the channel numbers of the channels preceding the ith channel and the set IFi of the channel numbers of the channels succeeding the ith channel from the preceding channel information of the (N−1) pairs of two channels including the ith channel of the preceding channel information INFOnm input to the downmix unit 116 (step S116-1). Next, for each ith channel, the downmix unit 116 obtains a weight wi of the ith channel through the following Equation (2-3) using the inter-channel correlation value of the (N−1) pairs of two channels including the ith channel of the inter-channel correlation value γnm input to the downmix unit 116, the set ILi of the channel numbers of the channels preceding the ith channel, and the set IFi of the channel numbers of the channels succeeding the ith channel (step S116-2).
Note that for each pair of n and m described above, the inter-channel correlation value γmn is the same value as the inter-channel correlation value γnm, and therefore both an inter-channel correlation value γij of the case where i is greater than j and an inter-channel correlation value γik of the case where i is greater than k are included in the inter-channel correlation value γnm input to the downmix unit 116.
Next, the downmix unit 116 obtains the downmix signals xM(1), xM(2) . . . , xM(T) by obtaining a downmix signal sample xM(t) through the following Equation (2-4) for each sample number t (sample index t) by using the input sound signals xi(1), xi(2) . . . , xi(T) of each ith channel whose i is from 1 to N, and the weight wi of each ith channel whose i is from 1 to N (step S116-3).
Note that the downmix unit 116 may obtain the downmix signal by using an equation in which the weight wi of Equation (2-4) is replaced with the right side of Equation (2-3) instead of sequentially performing step S116-2 and step S116-3. Specifically, it suffices that the downmix unit 116 obtains each sample xM(t) of the downmix signal through Equation (2-4) with the set of the channel numbers of the channels preceding each ith channel as ILi, the set of the channel numbers of the channels succeeding each ith channel as IFi, the inter-channel correlation value of a pair of each ith channel and each channel j preceding the ith channel as γij, the inter-channel correlation value of a pair of each ith channel and each channel k succeeding the ith channel as γik, and the weight of each ith channel as wi expressed by Equation (2-3).
Equation (2-4) is an equation for obtaining a downmix signal by weighting and adding the input sound signals of the N channels, and Equation (2-3) is for obtaining the weight wi of each ith channel given to the input sound signal of each ith channel in the weighted addition. The part of the following Equation (2-3-A) in Equation (2-3) sets the weight such that the larger the correlation between the input sound signal of the ith channel and the input sound signal of each channel preceding the ith channel, the smaller the value of the weight wi, and that the weight wi is set to a value close to zero when there is at least one channel with a significantly large correlation between the input sound signal of the ith channel and the input sound signal of the preceding channel in the channels preceding the ith channel.
The part of the following Equation (2-3-B) in Equation (2-3) sets the weight such that the larger the correlation with the input sound signal of each channel succeeding the ith channel, the more the weight wi has a value greater than 1.
When the input sound signals of all channels are independent, i.e., when there is no correlation among the channels, it is desirable to set the simple additive average of the input sound signals of all the channels as the downmix signal. In view of this, in Equation (2-3), the weight wi is obtained by multiplying Equation (2-3-A), Equation (2-3-B) and 1/N such that the maximum value of the part of Equation (2-3-A) is 1 and that the minimum value of the part of Equation (2-3-B) is 1. Thus, when all the correlations among channels have small values, the weight wi of all channels is set to a value close to 1/N.
Specific Example 2 of Downmix Unit 116
Since at step S116-1 of the specific example 1, the sum of all channels of the weight wi obtained by the downmix unit 116 is not 1 in some situation, the downmix unit 116 may obtain the downmix signal by using a value obtained by normalizing the weight wi of each ith channel such that the sum of all channels of the weight is 1 instead of the weight wi of Equation (2-4), or by using a transformed equation of Equation (2-4) including normalization of the weight wi such that the sum of all channels of the weight is 1. Differences of this example, referred to as a specific example 2 of the downmix unit 116, from the specific example 1 will be described below.
For example, the downmix unit 116 may obtain the downmix signals xM(1), xM(2) . . . , xM(T) by obtaining the weight wi for each ith channel through Equation (2-3), obtaining a normalized weight w′i by normalizing the weight wi for each ith channel such that the sum of all channels is 1 (that is, obtaining the normalized weight w′i through the following Equation (2-5) for each ith channel), and obtaining the downmix signal sample xM(t) through the following Equation (2-6) for each sample number t by using the input sound signals xi(1), xi(2) . . . , xi(T) of each ith channel whose i is from 1 to N and the normalized weight w′i.
That is, it suffices that the downmix unit 116 obtains each sample xM(t) of the downmix signal through Equation (2-6) with the set of the channel numbers of the channels preceding each ith channel as ILi, the set of the channel numbers of the channels succeeding each ith channel as IFi, the inter-channel correlation value of a pair of each ith channel and each channel j preceding the ith channel as γij, the inter-channel correlation value of a pair of each ith channel and each channel k succeeding the ith channel as γik, the weight of each ith channel as wi expressed by Equation (2-3), and the weight normalized for each ith channel as w′i expressed by Equation (2-5).
For example, in the case where an apparatus different from the sound signal downmix apparatus performs the stereo coding processing on the input sound signals of the N channels, or the case where the input sound signals of the N channels are signals obtained through stereo decoding processing at an apparatus different from the sound signal downmix apparatus, any or all of the same inter-channel correlation value γnm and preceding channel information INFOnm as those obtained by the inter-channel relationship information estimation unit 186 may possibly be obtained by the apparatus different from the sound signal downmix apparatus. It suffices that in the case where any or all of the inter-channel correlation value γnm and the preceding channel information INFOnm are obtained by the different apparatus, any or all of the inter-channel correlation value γnm and the preceding channel information INFOnm obtained by the different apparatus are input to the sound signal downmix apparatus, and the inter-channel relationship information estimation unit 186 obtains the inter-channel correlation value γnm and/or the preceding channel information INFOnm that has not been input to the sound signal downmix apparatus. Hereinafter, differences of a second example from the first example will be mainly described. The second example is an example of a sound signal downmix apparatus on the assumption that any or all of the inter-channel correlation value γnm and the preceding channel information INFOnm are input from the outside.
As illustrated in
Inter-Channel Relationship Information Obtaining Unit 187
The inter-channel relationship information obtaining unit 187 obtains and outputs the inter-channel correlation value γnm, which is a value indicating the degree of the correlation of each pair of two channels included in the N channels, and the preceding channel information INFOnm, which is information indicating which of the input sound signals of two channels includes the same sound signal first, for each pair of two channels included in the N channels (step S187).
As indicated by the dashed line in
As indicated by the broken line in
As indicated by the broken line in
Note that there may be a case where a part of the inter-channel correlation value γnm is obtained by the different apparatus while the remaining part of the inter-channel correlation value γnm is not obtained by the different apparatus, a case where a part of the preceding channel information INFOnm is obtained by the different apparatus while the remaining part of the preceding channel information INFOnm is not obtained by the different apparatus, and the like. In such cases, it suffices to include the inter-channel relationship information estimation unit 186 in the inter-channel relationship information obtaining unit 187 such that, as described above, the inter-channel relationship information obtaining unit 187 outputs one obtained by the different apparatus and input to the sound signal downmix apparatus 407 to the downmix unit 116, and that the inter-channel relationship information estimation unit 186 obtains, from the input sound signals of the N channels, one that is not obtained by the different apparatus and not input to the sound signal downmix apparatus 407, and outputs it to the downmix unit 116, as with the inter-channel relationship information estimation unit 186 of the first example.
The inter-channel relationship information estimation unit 186 of the second embodiment obtains the inter-channel correlation value γnm and the preceding channel information INFOnm for each pair of two channels included in the N channels. There are (N×(N−1))/2 pairs of two channels included in the N channels, and as such, in the case where the inter-channel correlation value γnm and the preceding channel information INFOnm are obtained by the method exemplified in the description of the inter-channel relationship information estimation unit 186 of the second embodiment, the amount of arithmetic processing can become an issue when the number of channels is large. The third embodiment describes a sound signal downmix apparatus performing inter-channel relationship information estimation processing of obtaining the inter-channel correlation value γnm and the preceding channel information INFOnm in an approximate manner by a method with a smaller amount of arithmetic processing than the inter-channel relationship information estimation unit 186. The downmix processing of the third embodiment is the same as that of the second embodiment.
The downmix processing performed by the downmix unit 116 of the second embodiment is processing in which, for example, when only the same sound output by a certain sound source with a given time difference is included in each of signals of a plurality of channels, one of the input sound signals of the plurality of channels including the same sound output at the earliest timing is included in the downmix signal. This processing will be described with an example in which input sound signals of six channels from a first channel (1ch) to a sixth channel (6ch) are those schematically illustrated in
γ13=γ12×γ23=1×0=0
γ14=γ12γ23×γ34=1×0×1=0
γ15=γ12×γ23×γ34×γ45=1×0×1×1=0
γ16=γ12×γ23×γ34×γ45×γ56=1×0×1×1×1=0
γ24=γ23×γ34=0×1=0
γ25=γ23×γ34×γ45=0×1×1=0
γ26=γ23×γ34×γ45×γ56=0×1×1×1=0
γ35γ34×γ45=1×1=1
γ36=γ34×γ45×γ56=1×1×1=1
γ46=γ45×γ56=1×1=1
Likewise, no problem arises even when the time differences of non-adjacent channels are obtained by the following equations in an approximate manner using time differences τ12, τ23, τ34, τ45, and τ56 of the adjacent channels, and the preceding channel information INFOnm is obtained in an approximate manner based on whether each obtained time difference between channels is positive, negative, or zero.
τ13=τ12+τ23
τ14=τ12+τ23+τ34
τ15=τ12+τ23+τ34+τ45
τ16=τ12+τ23+τ34+τ45+τ56
τ24=τ23+τ34
τ25=τ23+τ34+τ45
τ26=τ23+τ34+τ45+τ56
τ35=τ34+τ45
τ36=τ34+τ45+τ56
τ46=τ45+τ56
It should be noted that the inter-channel correlation value γnm and the preceding channel information INFOnm can be obtained using the above-mentioned equations in an approximate manner only in the case where the input sound signals with the same or similar waveforms are located at successive channels as exemplified in
A sound signal downmix apparatus of a first example of the third embodiment is described below. As illustrated in
Inter-Channel Relationship Information Estimation Unit 188
The input sound signals of the N channels input to the sound signal downmix apparatus 408 are input to the inter-channel relationship information estimation unit 188. While the number of channels N is an integer of 2 or greater in the second embodiment, the number of channels N is an integer of three or greater in the third embodiment because no channel with a significantly different waveform of the input sound signal can be present between the channels with the same or similar waveforms of the input sound signal when the number of channel N is two. As illustrated in
Channel Sorting Unit 1881
The channel sorting unit 1881 sequentially performs sorting in the order from the first channel such that the adjacent channel is the channel with highest similarity of the waveform of the input sound signal among the remaining channels when the time differences are aligned, and obtains and outputs a first sorted input sound signal to an Nth sorted input sound signal, which are signals after the sorting of the N channels, and first original channel information c1 to Nth original channel information cN, which are the channel numbers (that is, the channel numbers of the input sound signals) when each input sound signal to be sorted has been input to the sound signal downmix apparatus 408, for example (step S1881A). As the similarity in waveform after the aligning of the time differences, it suffices that the channel sorting unit 1881 uses a value indicating the degree of the correlation such as a value indicating the closeness of the distance between the input sound signals of two channels after the aligning of the time differences, and a value obtained by dividing the inner product of the input sound signals of the two channels after the aligning of the time differences by the geometric mean of the energy of the input sound signals of two channels.
For example, when a value indicating the closeness of the distance between the input sound signals of two channels after the aligning of the time differences is used as the similarity in waveform after the aligning of the time differences, the channel sorting unit 1881 performs the following step S1881A-1 to step S1881A-N. First, the channel sorting unit 1881 obtains the first channel input sound signal as the first sorted input sound signal, and obtains “1” that is the channel number of the first channel as the first original channel information c1 (step S1881A-1).
Next, for each candidate number of samples τcand from τmax to τmin set in advance (for example, τmax is a positive number and τmin is a negative number) for each channel m from the second channel to the Nth channel, the channel sorting unit 1881 obtains the distance between the sample sequence of the first sorted input sound signal and the sample sequence of the mth channel input sound signal shifted backward relative to the sample sequence of the first sorted input sound signal by the candidate number of samples τcand, obtains the input sound signal of the channel m with the minimum distance value as a second sorted input sound signal, and obtains the channel number of the channel m with the minimum distance value as second original channel information c2 (step S1881A-2).
Next, for each candidate number of samples τcand from τmax to τmin for each channel m that has not been set as a sorted input sound signal among channels from the second channel to the Nth channel, the channel sorting unit 1881 obtains the distance between the sample sequence of the second sorted input sound signal and the sample sequence of the mth channel input sound signal shifted backward relative to the sample sequence of the second sorted input sound signal by the candidate number of samples τcand, and obtains the input sound signal of the channel m with a minimum distance value as a third sorted input sound signal, and obtains the channel number of the channel m with the minimum distance value as third original channel information c3 (step S1881A-3). Thereafter, the same processing is repeated until there is only one channel that has not been set as a sorted input sound signal left, so that a fourth sorted input sound signal to a (N−1)th sorted input sound signal, and fourth original channel information c4 to (N−1)th original channel information c(N-1) are obtained (step S1881A-4 step S1881A-(N−1)).
Finally, the channel sorting unit 1881 obtains the input sound signal of the remaining one channel that has not been set as a sorted input sound signal as the Nth sorted input sound signal, and obtains the channel number of the remaining one channel that has not been set as a sorted input sound signal as the Nth original channel information cN (step S1881A-N). Note that in the following description, the nth sorted input sound signal for each n from 1 to N is referred to also as the input sound signal of the nth channel after the sorting, and the n of the nth sorted input sound signal is referred to also as the channel number after the sorting.
Note that the channel sorting unit 1881 may perform the sorting by evaluating the similarity without aligning the time differences, considering that the purpose is to sort the input sound signals of the N channels such that there is no channel with a significantly different waveform of the input sound signal between the channels with the same or similar waveforms of the input sound signals, and that it is preferable that the amount of arithmetic processing for the sorting processing be small. For example, the channel sorting unit 1881 may perform the following step S1881B-1 to step S1881B-N. First, the channel sorting unit 1881 obtains the first channel input sound signal as the first sorted input sound signal, and obtains “1” that is the channel number of the first channel as the first original channel information c1 (step S1881B-1).
Next, the channel sorting unit 1881 obtains the distance between the sample sequence of the first sorted input sound signal and the sample sequence of the mth channel input sound signal for each channel m from the second channel to the Nth channel, obtains the input sound signal of the channel m with a minimum distance value as the second sorted input sound signal, and obtains the channel number of the channel m with a minimum distance value as the second original channel information c2 (step S1881B-2).
Next, for each channel m that has not been set as a sorted input sound signal among channels from the second channel to the Nth channel, the channel sorting unit 1881 obtains the distance between the sample sequence of the second sorted input sound signal and the sample sequence of the mth channel input sound signal, obtains the input sound signal of the channel m with a minimum distance value as the third sorted input sound signal, and obtains the channel number of the channel m with a minimum distance value as the third original channel information c3 (step S1881B-3). Thereafter, the same processing is repeated until there is only one channel that has not been set as a sorted input sound signal left, so that the fourth sorted input sound signal to the (N−1)th sorted input sound signal, and the fourth original channel information c4 to the (N−1)th original channel information c(N-1) are obtained (step S1881B-4 step S1881B-(N−1)).
Finally, the channel sorting unit 1881 obtains the input sound signal of the remaining one channel that has not been set as a sorted input sound signal as the Nth sorted input sound signal, and obtains the channel number of the remaining one channel that has not been set as a sorted input sound signal as the Nth original channel information cN (step S1881B-N).
In short, it suffices that regardless of whether or not the time differences are aligned or regardless of the value to be used as the similarity of the signals, the channel sorting unit 1881 sequentially performs the sorting in the order from the first channel such that the adjacent channel is the channel with the most similar input sound signal among the remaining channels, and obtains and outputs the first sorted input sound signal to the Nth sorted input sound signal as the signals after the sorting of the N channels, and the first original channel information c1 to the Nth original channel information cN as the channel numbers (that is, the channel numbers of the input sound signals) when each sorted input sound signal is input to the sound signal downmix apparatus 408 (step S1881).
Inter-Adjacent-Channel Relationship Information Estimation Unit 1882
The N sorted input sound signals from the first sorted input sound signal to the Nth sorted input sound signal are input to the inter-adjacent-channel relationship information estimation unit 1882. The inter-adjacent-channel relationship information estimation unit 1882 obtains and outputs the inter-channel correlation value and the inter-channel time difference of each pair of two channels after the sorting with adjacent channel numbers after the sorting in the N sorted input sound signals (step S1882).
The inter-channel correlation value obtained at step S1882 is a correlation value that takes into account the time difference between the sorted input sound signals for each pair of two channels after the sorting with adjacent channel numbers after the sorting, that is, a value indicating the degree of the correlation that takes into account the time difference between the sorted input sound signals. There are (N−1) pairs of two channels included in the N channels. In the case where n is an integer from 1 to N−1, and the inter-channel correlation value between the nth sorted input sound signal and the (n+1)th sorted channel input sound signal is γ′n(n+1), the inter-adjacent-channel relationship information estimation unit 1882 obtains the inter-channel correlation value γ′n(n+1) for each of (N−1) pairs of two channels after the sorting with adjacent channel numbers after the sorting.
The inter-channel time difference obtained at step S1882 is information indicating which of two sorted input sound signals includes the same sound signal and how much earlier the same sound signal is included for each pair of two channels after the sorting with adjacent channel numbers after the sorting. In the case where the inter-channel time difference between the nth sorted input sound signal and the (n+1)th sorted input sound signal is τ′n(n+1), the inter-adjacent-channel relationship information estimation unit 1882 obtains the inter-channel time difference τ′n(n+1) for each of (N−1) pairs of two channels after the sorting with adjacent channel numbers after the sorting.
For example, the absolute value of a correlation coefficient is used as a value indicating the degree of the correlation. In such a case, for each candidate number of samples τcand from τmax to τmin set in advance for each n from 1 to N−1 (that is, for each pair of two channels after the sorting with adjacent channel numbers after the sorting), the inter-adjacent-channel relationship information estimation unit 1882 obtains and outputs, as the inter-channel correlation value γ′n(n+1), the maximum value of the absolute value γcand of the correlation coefficient between the sample sequence of the nth sorted input sound signal and the sample sequence of the (n+1)th sorted input sound signal shifted backward relative to the sample sequence of the nth sorted input sound signal by the candidate number of samples τcand, and obtains and outputs, as the inter-channel time difference τ′n(n+1), τcand when the absolute value of the correlation coefficient is a maximum value.
In addition, for example, instead of the absolute value of the correlation coefficient, a correlation value using information about a phase of a signal may be set as γcand as follows. In this example, first, for each channel i from the first channel input sound signal to the Nth channel input sound signal, the inter-adjacent-channel relationship information estimation unit 1882 obtains the frequency spectrum Xi(k) at each frequency k of 0 to T−1 by performing Fourier transform on the input sound signals xi(1), xi(2) . . . , xi(T) as in Equation (2-1).
Next, the inter-adjacent-channel relationship information estimation unit 1882 performs the following processing for each n from 1 to N−1, that is, each pair of two channels after the sorting with adjacent channel numbers after the sorting. First, the inter-adjacent-channel relationship information estimation unit 1882 obtains the phase difference spectrum φ(k) at each frequency k through the following Equation (3-1) by using the frequency spectrum Xn(k) of the nth channel and the frequency spectrum X(n+1)(k) of the (n+1)th channel at each frequency k obtained through Equation (2-1).
Next, the inter-adjacent-channel relationship information estimation unit 1882 obtains the phase difference signal ψ(τcand) for each candidate number of samples τcand from τmax to τmin as in Equation (1-4) by performing inverse Fourier transform on the phase difference spectrum obtained through Equation (3-1). Next, the inter-adjacent-channel relationship information estimation unit 1882 obtains and outputs, as the inter-channel correlation value γ′n(n+1), the maximum value of the correlation value γcand that is the absolute value of the phase difference signal ψ(τcand), and obtains and outputs, as the inter-channel time difference τ′n(n+1), τcand when the correlation value is a maximum value.
Note that instead of using as it is the absolute value of the phase difference signal ψ(τcand) as the correlation value γcand, the inter-adjacent-channel relationship information estimation unit 1882, as with the left-right relationship information estimation unit 183 and the inter-channel relationship information estimation unit 186, may use a normalized value such as a relative difference between the absolute value of the phase difference signal ψ(τcand) for each τcand and the average of the absolute values of phase difference signals obtained for a plurality of candidate numbers of samples before and after τcand, for example. That is, the inter-adjacent-channel relationship information estimation unit 1882 may obtain an average value through Equation (1-5) by using the positive number τrange set in advance for each τcand, and use, as γcand, a normalized correlation value obtained through Equation (1-6) by using the obtained average value ψc(τcand) and phase difference signal ψ(τcand).
Inter-Channel Relationship Information Complement Unit 1883
The inter-channel correlation value and the inter-channel time difference of each pair of two channels after the sorting with adjacent channel numbers after the sorting output by the inter-adjacent-channel relationship information estimation unit 1882, and the original channel information for each channel after the sorting output by the channel sorting unit 1881 are input to the inter-channel relationship information complement unit 1883. The inter-channel relationship information complement unit 1883 obtains and outputs the inter-channel correlation value and the preceding channel information for all pairs of two channels (that is, all pairs of two channels being the sorting targets) by performing processing of step S1883-1 step S1883-5 described below (step S1883).
First, from the inter-channel correlation value of each pair of two channels after the sorting with adjacent channel numbers after the sorting, the inter-channel relationship information complement unit 1883 obtains the inter-channel correlation value of each pair of two channels after the sorting with non-adjacent channel numbers after the sorting (step S1883-1). In the case where n is an integer from 1 to N−2, m is an integer from n+2 to N, and the inter-channel correlation value between the nth sorted input sound signal and the mth sorted input sound signal is γ′nm, the inter-channel relationship information complement unit 1883 obtains the inter-channel correlation value γ′nm of each pair of two channels after the sorting with non-adjacent channel numbers after the sorting.
In the case where the two channel numbers of each pair of two channels after the sorting with adjacent channel numbers after the sorting are i (i is an integer from 1 to N−1) and i+1, and the inter-channel correlation value of each pair of two channels after the sorting with adjacent channel numbers after the sorting is γ′i(i+1), the inter-channel relationship information complement unit 1883 obtains, as the inter-channel correlation value γ′nm, a value obtained by multiplying all inter-channel correlation values γ′i(i+1) of pairs of two channels with adjacent channel numbers after the sorting whose i is from n to m−1, for each pair of n and m (that is, for each pair of two channels after the sorting with non-adjacent channel numbers after the sorting), for example. That is, the inter-channel relationship information complement unit 1883 obtains the inter-channel correlation value γ′nm through the following Equation (3-2).
Note that the inter-channel relationship information complement unit 1883 may obtain, as the inter-channel correlation value γ′nm, the geometric mean of all the inter-channel correlation values γ′i(i+1) of pairs of two channels with adjacent channel numbers after the sorting whose i is from n to m−1, for each pair of n and m (that is, for each pair of two channels after the sorting with non-adjacent channel numbers after the sorting). That is, the inter-channel relationship information complement unit 1883 may obtain the inter-channel correlation value γ′nm through the following Equation (3-3).
It should be noted that in the case where the inter-channel correlation value is a value whose upper limit is not 1 such as the absolute value of the correlation coefficient and the normalized value, it is preferable that the inter-channel relationship information complement unit 1883 obtain the geometric mean expressed by Equation (3-3) as the inter-channel correlation value γ′nm, rather than the multiplication value expressed by Equation (3-2) such that the inter-channel correlation value of each pair of two channels after the sorting with non-adjacent channel numbers after the sorting does not exceed the normal upper limit of the inter-channel correlation value.
Note that for example, for each pair of n and m (that is, for each pair of two channels after the sorting with non-adjacent channel numbers after the sorting), if there is a pair whose correlation is extremely small due to different sound signals included in two input sound signals of the pair in the pairs of two channels with adjacent channel numbers after the sorting whose i is from n to m−1, the inter-channel correlation value γ′nm may be set to a value that depends on the inter-channel correlation value γ′i(i+1) of that pair. For example, for each pair of n and m (that is, for each pair of two channels after the sorting with non-adjacent channel numbers after the sorting), the inter-channel relationship information complement unit 1883 may obtain, as the inter-channel correlation value γ′nm, the minimum value of the inter-channel correlation values γ′i(i+1) of pairs of two channels with adjacent channel numbers after the sorting whose i is from n to m−1. In addition, for example, for each pair of n and m (that is, for each pair of two channels after the sorting with non-adjacent channel numbers after the sorting), the inter-channel relationship information complement unit 1883 may obtain, as the inter-channel correlation value γ′nm, a multiplication value or a geometric mean of a plurality of the inter-channel correlation values γ′i(i+1) including the minimum value in the inter-channel correlation values γ′i(i+1) of pairs of two channels with adjacent channel numbers after the sorting whose i is from n to m−1. It should be noted that in the case where the inter-channel correlation value is a value whose upper limit is not 1 such as the absolute value of the correlation coefficient and the normalized value, it is preferable that the inter-channel relationship information complement unit 1883 obtain the geometric mean rather than the multiplication value as the inter-channel correlation value γ′nm such that the inter-channel correlation value of each pair of two channels after the sorting with non-adjacent channel numbers after the sorting does not exceed the normal upper limit of the inter-channel correlation value.
In short, in the case where the two channel numbers of each pair of two channels after the sorting with adjacent channel numbers after the sorting are i (i is an integer from 1 to N−1) and i+1, the inter-channel correlation value of each pair of two channels after the sorting with adjacent channel numbers after the sorting is γ′i(i+1), n is an integer from 1 to N−2, m is an integer from n+2 to N, and the inter-channel correlation value between the nth sorted input sound signal and mth sorted input sound signal is γ′nm, it suffices that, for each pair of n and m (that is, for each pair of two channels after the sorting with non-adjacent channel numbers after the sorting), the inter-channel relationship information complement unit 1883 obtains, as the inter-channel correlation value γ′nm, a value that has a monotonically non-decreasing relationship with each of one or more of the inter-channel correlation values γ′i(i+1) including the minimum value of the inter-channel correlation values γ′i(i+1) of pairs of two channels with adjacent channel numbers after the sorting whose i is from n to m−1. Further, in the case where the two channel numbers of each pair of two channels after the sorting with adjacent channel numbers after the sorting are i (i is an integer from 1 to N−1) and i+1, the inter-channel correlation value of each pair of two channels after the sorting with adjacent channel numbers after the sorting is γ′i(i+1), n is an integer from 1 to N−2, m is an integer from n+2 to N, and the inter-channel correlation value between the nth sorted input sound signal and mth sorted input sound signal is γ′nm, it suffices that, for each pair of n and m (that is, for each pair of two channels after the sorting with non-adjacent channel numbers after the sorting), the inter-channel relationship information complement unit 1883 obtains, as the inter-channel correlation value γ′nm, a value that has a monotonically non-decreasing relationship with each of one or more of the inter-channel correlation values γ′i(i+1) including the minimum value of the inter-channel correlation values γ′i(i+1) of pairs of two channels with adjacent channel numbers after the sorting whose i is from n to m−1 within the possible range of the inter-channel correlation value.
The inter-channel correlation value of each pair of two channels after the sorting with adjacent channel numbers after the sorting obtained by the inter-adjacent-channel relationship information estimation unit 1882 has been input, and the inter-channel correlation value of each pair of two channels after the sorting with non-adjacent channel numbers after the sorting is obtained at step S1883-1. Therefore, at the time point when step S1883-1 is performed, the inter-channel relationship information complement unit 1883 has all inter-channel correlation values for (N×(N−1))/2 pairs of two channels after the sorting included in the N channels after the sorting. That is, in the case where n is an integer from 1 to N, m is an integer greater than n and equal to or smaller than N, and the inter-channel correlation value between the nth sorted input sound signal and the mth sorted input sound signal is γ′nm, the inter-channel relationship information complement unit 1883 has the inter-channel correlation value γ′nm for each of (N×(N−1))/2 pairs of two channels after the sorting at the time point when step S1883-1 is performed.
After step S1883-1, the inter-channel relationship information complement unit 1883 obtains the inter-channel correlation value between input sound signals for each pair of two channels included in the N channels by associating the inter-channel correlation value γ′nm for each of the (N×(N−1))/2 pairs of two channels after the sorting with a pair of channels for the input sound signals of the N channels (that is, a pair of channels being the sorting targets) by using the original channel information c1 to cN for the channels after the sorting (step S1883-2). In the case where n is an integer from 1 to N, m is an integer greater than n and equal to or smaller than N, and the inter-channel correlation value between the nth channel input sound signal and the mth channel input sound signal is γnm, the inter-channel relationship information complement unit 1883 obtains the inter-channel correlation value γnm for each of (N×(N−1))/2 pairs of two channels.
In addition, the inter-channel relationship information complement unit 1883 obtains the inter-channel time difference of each pair of two channels after the sorting with non-adjacent channel numbers after the sorting from the inter-channel time difference of each pair of two channels after the sorting with adjacent channel numbers after the sorting (step S1883-3). In the case where n is an integer from 1 to N−2, m is an integer from n+2 to N, and the inter-channel time difference between the nth channel sorted input sound signal and the mth channel sorted input sound signal is τ′nm, the inter-channel relationship information complement unit 1883 obtains an inter-channel time difference τ′nm of each pair of two channels after the sorting with non-adjacent channel numbers after the sorting. In the case where the two channel numbers of each pair of two channels after the sorting with adjacent channel numbers after the sorting are i (i is an integer from 1 to N−1) and i+1, and the inter-channel time difference of each pair of two channels after the sorting with adjacent channel numbers after the sorting is τ′i(i+1), the inter-channel relationship information complement unit 1883 obtains, as the inter-channel time difference τnm, a value obtained by adding up all of inter-channel time differences τ′i(i+1) of pairs of two channels with adjacent channel numbers after the sorting whose i is from n to m−1, for each pair of n and m (that is, for each pair of two channels after the sorting with non-adjacent channel numbers after the sorting). That is, the inter-channel relationship information complement unit 1883 obtains the inter-channel time difference τ′nm through the following Equation (3-4).
The inter-channel time difference of each pair of two channels after the sorting with adjacent channel numbers after the sorting obtained by the inter-adjacent-channel relationship information estimation unit 1882 has been input, and the inter-channel time difference of each pair of two channels after the sorting with non-adjacent channel numbers after the sorting is obtained at step S1883-3. Therefore, at the time point when step S1883-3 is performed, the inter-channel relationship information complement unit 1883 has all the inter-channel time differences of (N×(N−1))/2 pairs of two channels after the sorting included in the N channels after the sorting. That is, in the case where n is an integer from 1 to N, m is an integer greater than n and equal to or smaller than N, and the inter-channel time difference of the pair of the nth channel after the sorting and the mth channel after the sorting is τ′nm, the inter-channel relationship information complement unit 1883 has the inter-channel time difference τ′nm of each of the (N×(N−1))/2 pairs of two channels after the sorting at the time point when step S1883-3 is performed.
After step S1883-3, the inter-channel relationship information complement unit 1883 obtains the inter-channel time difference between input sound signals for each pair of two channels included in the N channels by associating the inter-channel time difference τ′nm for each of the (N×(N−1))/2 pairs of two channels after the sorting with a pair of channels for the input sound signal of the N channels (that is, a pair of channels being the sorting targets) by using the original channel information c1 to cN for the channels after the sorting (step S1883-4). In the case where n is an integer from 1 to N, m is an integer greater than n and equal to or smaller than N, and the inter-channel time difference between the nth channel input sound signal and the mth channel input sound signal is τnm, the inter-channel relationship information complement unit 1883 obtains the inter-channel time difference τnm of each of the (N×(N−1))/2 pairs of two channels.
After step S1883-4, the inter-channel relationship information complement unit 1883 obtains the preceding channel information INFOnm of each of the (N×(N−1))/2 pairs of two channels from the inter-channel time difference τnm of each of the (N×(N−1))/2 pairs of two channels (step S1883-5). The inter-channel relationship information complement unit 1883 obtains information indicating that the nth channel is preceding as the preceding channel information INFOnm when the inter-channel time difference τnm is a positive value, and obtains information indicating that the mth channel is preceding as the preceding channel information INFOnm when the inter-channel time difference τnm is a negative value. The inter-channel relationship information complement unit 1883 may obtain, for each pair of two channels, information indicating that the nth channel is preceding as the preceding channel information INFOnm when the inter-channel time difference τnm is zero, or information indicating that the mth channel is preceding as the preceding channel information INFOnm.
Note that instead of step S1883-4 and step S1883-5, the inter-channel relationship information complement unit 1883 may perform step S1883-4′ of obtaining preceding channel information INFO′nm from the inter-channel time difference τ′nm as in step S1883-5 for each of the (N×(N−1))/2 pairs of two channels after the sorting, and step S1883-5′ of obtaining the preceding channel information INFOnm of each pair of two channels included in the N channels by associating the preceding channel information INFO′nm for each of the (N×(N−1))/2 pairs of two channels after the sorting obtained at step S1883-4′ with a pair of channels for the input sound signals of the N channels (that is, a pair of channels being the sorting targets) by using the original channel information c1 to cN for the channels after the sorting. That is, it suffices that the inter-channel relationship information complement unit 1883 obtains the preceding channel information INFOnm of each pair of two channels included in the N channels by establishing an association with a pair of channels for the input sound signals of the N channels using the original channel information c1 to cN, from the inter-channel time difference τ′nm of each of the (N×(N−1))/2 pairs of two channels after the sorting, and by obtaining the preceding channel information based on whether the inter-channel time difference is positive, negative or zero.
Instead of the inter-channel relationship information estimation unit 186 of the second example of the second embodiment, the inter-channel relationship information estimation unit 188 of the first example of the third embodiment may be used. In this case, it suffices that the inter-channel relationship information obtaining unit 187 of the sound signal downmix apparatus 407 includes the inter-channel relationship information estimation unit 188 instead of the inter-channel relationship information estimation unit 186, and that the inter-channel relationship information obtaining unit 187 performs an operation in which the inter-channel relationship information estimation unit 186 is read as the inter-channel relationship information estimation unit 188. In this case, the sound signal downmix apparatus 407 has the apparatus configuration exemplified in
It is possible to provide the sound signal downmix apparatus of the second and third embodiments as a sound signal downmix unit in a coding apparatus for coding sound signals, and this configuration will be described as a fourth embodiment.
Sound Signal Coding Apparatus 106
As illustrated in
Sound Signal Downmix Unit 407
The sound signal downmix unit 407 obtains and outputs a downmix signal from N input sound signals of the first channel input sound signal to the Nth channel input sound signal input to the sound signal coding apparatus 106 (step S407). As with the sound signal downmix apparatus 407 of the second embodiment or the third embodiment, the sound signal downmix unit 407 includes the inter-channel relationship information obtaining unit 187 and the downmix unit 116. The inter-channel relationship information obtaining unit 187 performs the above-described step S187, and the downmix unit 116 performs the above-described step S116. That is, the sound signal coding apparatus 106 includes the sound signal downmix apparatus 407 of the second embodiment or the third embodiment as the sound signal downmix unit 407, and performs the processing of the sound signal downmix apparatus 407 of the second embodiment or the third embodiment as step S407.
Coding Unit 196
At least the downmix signal output by the sound signal downmix unit 407 is input to the coding unit 196. The coding unit 196 obtains a sound signal code by performing at least coding on the input downmix signal, and outputs the signal (step S196). The coding unit 196 may also perform coding on N input sound signals of the first channel input sound signal to the Nth channel input sound signal, and may output the sound signal code including the code obtained through the coding. In this case, as indicated with the broken line in
The coding processing performed by the coding unit 196 is not limited. For example, a sound signal code may be obtained by performing coding on the downmix signals xM(1), xM(2) . . . , xM(T) of input T samples by a monaural coding scheme such as 3GPP EVS standard. Moreover, for example, in addition to obtaining a monaural code by coding a downmix signal, a stereo code may be obtained by coding N input sound signals of the first channel input sound signal to the Nth channel input sound signal by a stereo coding scheme supporting a stereo decoding scheme of MPEG-4 AAC standard, and a combination of the monaural code and the stereo code may be obtained and output as a sound signal code. Furthermore, for example, in addition to obtaining a monaural code by coding a downmix signal, a stereo code may be obtained by performing coding on the weighted difference and the difference from the downmix signal for each channel for N input sound signals of the first channel input sound signal to the Nth channel input sound signal, and a combination of the monaural code and the stereo code may be obtained and output as a sound signal code.
It is possible to provide the sound signal downmix apparatus of the second embodiment and the third embodiment as a sound signal downmix unit in a signal processing apparatus for processing a sound signal, and this configuration is described as a fifth embodiment below.
Sound Signal Processing Apparatus 306
As illustrated in
Sound Signal Downmix Unit 407
The sound signal downmix unit 407 obtains a downmix signal from the N input sound signals of the first channel input sound signal to the Nth channel input sound signal input to the sound signal processing apparatus 306, and outputs the downmix signal (step S407). As with the sound signal downmix apparatus 407 of the second embodiment or the third embodiment, the sound signal downmix unit 407 includes the inter-channel relationship information obtaining unit 187 and the downmix unit 116. The inter-channel relationship information obtaining unit 187 performs the above-described step S187, and the downmix unit 116 performs the above-described step S116. That is, the sound signal processing apparatus 306 includes the sound signal downmix apparatus 407 of the second embodiment or the third embodiment as the sound signal downmix unit 407, and performs the processing of the sound signal downmix apparatus 407 of the second embodiment or the third embodiment as step S407.
Signal Processing Unit 316
At least the downmix signal output by the sound signal downmix unit 407 is input to the signal processing unit 316. The signal processing unit 316 performs at least signal processing on the input downmix signal, and obtains and outputs a signal processing result (step S316). The signal processing unit 316 may also perform a signal processing on the N input sound signals of the first channel input sound signal to the Nth channel input sound signal and obtain a signal processing result. In this case, as indicated with the broken line in
Program and Recording Medium
The processing of each unit of each sound signal downmix apparatus, sound signal coding apparatus and sound signal processing apparatus may be implemented using a computer, and in this case, the processing detail of the function that should be provided in each apparatus is described in a program. When this program is read in a storage unit 1020 of a computer 1000 illustrated in
A program in which processing content thereof has been described can be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a non-temporary recording medium, specifically, a magnetic recording device, an optical disk, or the like.
Further, distribution of this program is performed, for example, by selling, transferring, or renting a portable recording medium such as a DVD or CD-ROM on which the program has been recorded. Further, the program may be distributed by being stored in a storage device of a server computer and transferred from the server computer to another computer via a network.
For example, a computer executing such a program first temporarily stores the program recorded on the portable recording medium or the program transmitted from the server computer in an auxiliary recording unit 1050 that is its own non-temporary storage device. Then, when executing the processing, the computer reads the program stored in the auxiliary recording unit 1050 that is its own storage device to the storage unit 1020 and executes the processing in accordance with the read program. Further, as another execution mode of this program, the computer may directly read the program from the portable recording medium to the storage unit 1020 and execute processing in accordance with the program, or, further, may sequentially execute the processing in accordance with the received program each time the program is transferred from the server computer to the computer. Further, a configuration in which the above-described processing is executed by a so-called application service provider (ASP) type service for implementing a processing function according to only an execution instruction and result acquisition without transferring the program from the server computer to the computer may be adopted. It is assumed that the program in the present embodiment includes information provided for processing of an electronic calculator and being pursuant to the program (such as data that is not a direct command to the computer, but has properties defining processing of the computer).
Further, in this embodiment, although the present device is configured by a predetermined program being executed on the computer, at least a part of processing content of thereof may be achieved by hardware.
It is needless to say that the present disclosure can appropriately be modified without departing from the gist of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
PCT/JP2020/010080 | Mar 2020 | WO | international |
PCT/JP2020/010081 | Mar 2020 | WO | international |
PCT/JP2020/041216 | Nov 2020 | WO | international |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/004641 | 2/8/2021 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2021/181976 | 9/16/2021 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20050180579 | Baumgarte | Aug 2005 | A1 |
20080010072 | Yoshida et al. | Jan 2008 | A1 |
20110211702 | Mundt | Sep 2011 | A1 |
20120093321 | Shim | Apr 2012 | A1 |
20130195276 | Ojala | Aug 2013 | A1 |
20140112482 | Virette | Apr 2014 | A1 |
Number | Date | Country |
---|---|---|
2006070751 | Jul 2006 | WO |
Number | Date | Country | |
---|---|---|---|
20230107976 A1 | Apr 2023 | US |