One or more example embodiments relate to a multichannel signal processing method and a multichannel signal processing apparatus for performing the method, and more particularly, to a method and apparatus that may compress a multichannel signal with degrading a sound quality regardless of an increase in the number of channels included in the multichannel signal.
MPEG Surround (MPS) is a codec for coding a multichannel signal, such as a 5.1-channel signal, a 7.1-channel signal, etc. MPS may compress a multichannel signal at a relatively high compression ratio and may transmit the compressed multichannel signal.
MPS has some constraints, such as backward compatibility, during an encoding/decoding process. That is, a bitstream of a multichannel signal generated through MPS requires the backward compatibility that the bitstream is to be reproduced in a mono format or a stereo format through an existing codec.
Accordingly, although a multichannel signal having the number of channels greater than the number of channels defined in MPS is input, a signal that is output and transmitted from MPS is to be represented in the same mono format or stereo format as MPS. A decoder may decode a multichannel signal from a bitstream based on additional information received from an encoder. The decoder may restore the multichannel signal using additional information for up-mixing.
Currently, with enhancement of a communication environment, a transmission bandwidth has increased and a bandwidth to be allocated to a signal has also increased. Thus, technology is developing to maintain a sound quality of an original multichannel signal rather than excessively compressing the multichannel signal to correspond to a bandwidth. However, compression is still required for transmission in order to process the multichannel signal having a large number of channels.
Accordingly, there is a need for a method that may reduce a data amount and perform transmission by compressing a multichannel signal at a threshold level or more while maintaining quality of the multichannel signal in the case of processing an input signal having the number of channels greater than the number of channels defined in an MPS standard.
An aspect of an example embodiment provides a method and apparatus that may process a multichannel signal through an N-N/2-N configuration.
A multichannel signal processing method according to an example embodiment includes identifying an N/2-channel downmix signal derived from an N-channel input signal; and generating an N-channel output signal from the identified N/2-channel downmix signal using a plurality of one-to-two (OTT) boxes. If a low frequency effect (LFE) channel is absent in the output signal, the number of OTT boxes is equal to N/2 where N/2 denotes the number of channels of the downmix signal.
Each of the plurality of OTT boxes may generate a 2-channel output signal using a 1-channel downmix signal and a decorrelated signal generated from a corresponding decorrelator.
If N exceeds M where N denotes the number of channels of the output signal and M denotes the preset number of channels, the decorrelator may include a first decorrelator corresponding to a channel of M or less and a second decorrelator corresponding to a channel greater than M, and the second decorrelator may reuse a filter set of the first decorrelator.
An OTT box from which an LFE channel is output, among the plurality of OTT boxes, may generate a 2-channel downmix signal without using the decorrelated signal.
If a transmitted residual signal is present, each of the plurality of OTT boxes may generate a 2-channel output signal using the residual signal and the 1-channel downmix signal instead of using the decorrelated signal.
The generating of the N-channel output signal may include generating the N-channel output signal using a pre-decorrelator matrix M1 and a mix matrix M2.
Each of the plurality of OTT boxes may generate the N-channel output signal using a channel level difference (CLD).
N denoting the number of channels of the output signal may be an even number among numbers from 10 to 32.
A multichannel signal processing method according to another example embodiment includes decoding an N/2-channel downmix signal encoded based on a first coding scheme; and generating an N-channel output signal from the N/2-channel downmix signal based on a second coding scheme. If an LFE channel is absent in the output signal, the number of OTT boxes is equal to N/2 where N/2 denotes the number of channels of the downmix signal.
A multichannel signal processing apparatus according to an example embodiment includes a processor to implement a multichannel signal processing method. The processor is configured to identify an N/2-channel downmix signal derived from an N-channel input signal, and generate an N-channel output signal from the identified N/2-channel downmix signal using a plurality of OTT boxes. If an LFE channel is absent in the output signal, the number of OTT boxes is equal to N/2 where N/2 denotes the number of channels of the downmix signal.
Each of the plurality of OTT boxes may generate a 2-channel output signal using a 1-channel downmix signal and a decorrelated signal generated from a corresponding decorrelator.
If N exceeds M where N denotes the number of channels of the output signal and M denotes the preset number of channels, the decorrelator may include a first decorrelator corresponding to a channel of M or less and a second decorrelator corresponding to a channel greater than M, and the second decorrelator may reuse a filter set of the first decorrelator.
An OTT box from which an LFE channel is output, among the plurality of OTT boxes, may generate a 2-channel downmix signal without using the decorrelated signal.
If a transmitted residual signal is present, each of the plurality of OTT boxes may generate a 2-channel output signal using the residual signal and the 1-channel downmix signal instead of using the decorrelated signal.
The processor may generate the N-channel output signal using a pre-decorrelator matrix) M1 and a mix matrix M2.
Each of the plurality of OTT boxes may generate the N-channel output signal using a CLD.
N denoting the number of channels of the output signal may be an even number among numbers from 10 to 32.
A multichannel signal processing apparatus according to another example embodiment includes a processor to implement a multichannel signal processing method. The processor is configured to decode an N/2-channel downmix signal encoded based on a first coding scheme, and generate an N-channel output signal from the N/2-channel downmix signal based on a second coding scheme. If an LFE channel is absent in the output signal, the second coding scheme uses the number of OTT boxes equal to N/2 where N/2 denotes the number of channels of the downmix signal.
According to example embodiments, it is possible to efficiently process a multichannel signal having the number of channels greater than the number of channels defined in MPEG Surround (MPS) by processing the multichannel signal based on an N-N/2-N configuration.
Hereinafter, example embodiments will be described with reference to the accompanying drawings. A process of generating an N/2-channel downmix signal from an N-channel input signal through an MPEG Surround (MPS) encoder and generating an N-channel output signal using the N/2-channel downmix signal through an MPS decoder according to example embodiments will be described. Here, N/2 denotes the number of channels greater than the number of channels defined in the existing MPS standard. For example, the MPS decoder according to example embodiments may satisfy an expanded MPS standard for an MPEG-H 3D audio standard.
Hereinafter, example embodiments will be described with reference to the accompanying drawings.
Herein, an encoding apparatus and a decoding apparatus correspond to a multichannel signal processing apparatus.
An encoding apparatus 100 according to an example embodiment may generate an N/2-channel downmix signal by downmixing an N-channel input signal. A decoding apparatus 101 may generate an N-channel output signal using the N/2-channel downmix signal. Here, N may be 10 or more.
Referring to
The sampling rate converter 202 may convert a sampling rate of the N/2-channel downmix signal. The sampling rate converter 202 may perform down-sampling at a bitrate allocated to the USAC encoder, i.e., the second encoder 203. If a sufficiently high bitrate is allocated to the USAC encoder, i.e., the second encoder 203, the sampling rate converter 202 may be bypassed.
The second encoder 203 may perform encoding with respect to a core band of the N/2-channel downmix signal of which the sampling rate is converted. In this manner, the N/2-channel downmix signal encoded through the second encoder 203 may be generated. The encoded N/2-channel downmix signal may be a signal of M channels where M is less than or equal to N/2. Here, when a frequency band is expanded through Spectral Band Replication (SBR) applied at the USAC encoder, the core band indicates a low frequency band of which a frequency band is not expanded.
According to the existing MPS standard, the number of channels of a downmix signal (also referred as the number of downmix signal channels) output through the MPS encoder corresponding to the first encoder 201 is limited to 1 channel, 2 channels, and 5.1 channels. However, the first encoder 201 according to an example embodiment may exceed the number of channels of downmix signal channels defined in the MPS standard. That is, the first encoder 201 may generate the N/2-channel downmix signal by downmixing the N-channel input signal. In the N/2-channel downmix signal, N/2 channels may be 1, 2, 5.1, or 5.1 or more.
If the first encoder 401 follows the existing MPS standard, only 1 channel, 2 channels, or 5.1 channels may be allowed for a downmix signal generated at the first encoder 401. According to an example embodiment, the first encoder 401 may generate an N/2-channel downmix signal from an N-channel input signal based on MPS. Here, N/2 channels may be 1 channel, 2 channels, or 5.1 channels, or 5.1 or more channels. If the number of N channels is greater than the number of channels defined in MPS, the first encoder 401 may need to consider additional syntax to control MPS. For example, the first encoder 401 may define additional syntax to control MPS based on a coding mode using an arbitrary tree.
The sampling rate converter 502 may convert a sampling rate of the N/2-channel downmix signal. Here, the sampling rate converter 502 may convert a sampling rate of an audio signal converted at an encoding apparatus to an original sampling rate. That is, if a sampling rate conversion is performed in
The second decoder 503 may generate an N-channel output signal by upmixing the N/2-channel downmix signal output from the sampling rate converter 502.
The number of channels for a downmix signal input to a conventional MPS decoder is limited to 1 channel, 2 channels, and 5.1 channels. However, the number of channels for a downmix signal input to the second decoder 503 according to an example embodiment may be expanded up to N/2 channels in addition to 1 channel, 2 channels, and 5.1 channels. The second decoder 503 may generate the N-channel output signal by upmixing the N/2-channel downmix signal. Here, the N/2-channel downmix signal input to the second decoder 503 indicates 5.1 channels or more and thus, N may be 10.2 channels or more.
In
As described above with reference to
Accordingly, to generate the N-channel output signal by upmixing the N/2-channel downmix signal, the second decoder 701 may include N/2 OTT boxes 702.
If the second decoder 701 follows the existing MPS standard, the number of channels of a downmix signal to be input to and processed at the second decoder 701 may be 1 channel, 2 channels, or 5.1 channels. According to an example embodiment, the second decoder 701 may generate the N-channel output signal from the N/2-channel downmix signal based on MPS. Here, N may be 10.2 or more.
Here, the second decoder 701 may need to consider additional syntax to control MPS. For example, the second decoder 701 may define additional syntax to control MPS based on a coding mode using an arbitrary tree.
An MPS decoder described in
The decoder operates in a hybrid subband. The decoder may generate output signals from input signals by performing spatial synthesis based on spatial parameters transferred from the encoder. The decoder may inversely convert output signals from the hybrid subband to the time domain using the hybrid QMF synthesis bank.
A process of processing a multichannel signal through a matrix mixed with spatial synthesis performed at the decoder is described with reference to
The N-N/2-N configuration provides a process of converting an N-channel input signal to an N/2-channel downmix signal and generating an N-channel output signal from the N/2-channel downmix signal. A decoder according to an example embodiment may generate the N-channel output signal by upmixing the N/2-channel downmix signal. Basically, the number of N channels is not limited in the N-N/2-N configuration. That is, the N-N/2-N configuration may support a channel configuration of a multichannel signal not supported in MPS, as well as a channel configuration supported in MPS.
In
In
In
The input vector X to be multiplied by a vector M1n,k corresponding to a matrix M1 denotes a vector that includes the N/2-channel downmix signals. When a low frequency effect (LFE) channel is absent in an N-channel output signal, a maximum of N/2 decorrelators may be used to generate a decorrelated signal. However, if the number of output signal channels, N, exceeds 20, filters of a decorrelator may be reused.
To guarantee orthogonality between output signals of decorrelators, if N=20, there is a need to limit the number of available decorrelators to a specific number, for example, 10. Thus, indices of some decorrelators may be repeated. According to an example embodiment, in the N-N/2-N configuration, N that is the number of output signal channels is to be less than two times of the specific number. For example, N<20. If an LFE channel is included in an output signal, the number of output signal channels may need to be configured using the number of channels less than two times of the specific number based on the number of LFE channels. For example, N<24.
Output results of decorrelators may be replaced with residual signals for a specific frequency domain based on a bitstream. If an LFE channel is one of outputs of OTT boxes, a decorrelator is not used for an upmix-based OTT box.
In
Hereinafter, a vector and a matrix used in the N-N/2-N configuration are defined. In the N-2/N-N configuration, an input signal input to a decorrelator is defined as a vector vn,k.
The vector vn,k may be determined to be different depending on whether a temporal shaping tool is used or not used.
(1) In an example in which the temporal shaping tool is not used:
If the temporal shaping tool is not used, the vector vn,k is derived based on the vector Xn,l and M1n,k corresponding to the matrix M1 according to Equation 1. Here, M1n,k denotes a matrix including an N-th row and a first column.
In Equation 1, among elements of the vector Vn,k, VM
A vector Wn,k includes direct signals, d1 to dM that are decorrelated signals output from decorrelators, and res1 to resM that are residual signals output from the decorrelators. The vector wn,k may be determined according to Equation 2.
In Equation 2,
and kset denotes a set of all k satisfying K(k)<mresProc (X). Dx(vxn,k) denotes a decorrelated signal output from a decorrelator in response to a signal vxn,k being input to a decorrelator Dx. In particular, DX(vxn,k) denotes a signal that is output from a decorrelator if an OTT box is OTTx and a residual signal is Vres
A subband of an output signal may be defined to be dependent on all of time slots n and all of hybrid subbands k. An output signal yn,k may be determined based on the vector w and the matrix M2 according to Equation 3.
In Equation 3, M2n,k denotes the matrix M2 including a row NumOutCh and a column NumInCh-NumLfe. Here, M2n,k may be defined with respect to 0≤l<L and 0≤k<K, as expressed by Equation 4.
In Equation 4,
and W2l,k may be smoothed as expressed by Equation 5.
In Equation 5, κ(k) denotes a function of which a first row is a hybrid band k and of which a second row is a processing band, and W2−1,k corresponds to a last parameter set of a previous frame.
Meanwhile, yn,k may denote hybrid subband signals synthesizable to the time domain through a hybrid synthesis filter bank. Here, the hybrid synthesis filter bank is combined with a QMF synthesis bank through Nyquist synthesis banks, and yn,k may be converted from the hybrid subband domain to the time domain through the hybrid synthesis filter bank.
(2) In an example in which the temporal shaping tool is used:
If the temporal shaping tool is used, the vector Vn,k may be the same as described above, however, the vector Wn,k may be classified into two types of vectors as expressed by Equation 6 and Equation 7.
Here, Wdirectn,k denotes a direct signal that is directly input to the matrix M2 without passing through decorrelators and residual signals output from the decorrelators, and Wdiffusen,k denotes a decorrelated signal output from a decorrelator. Further,
and kset denotes a set of all k satisfying K(k)<mresProc (X). Also, Dx(Vxn,k) denotes a decorrelated signal output from the decorrelator Dx in response to the input signal vxn,k being input to the decorrelator Dx.
Signals finally output by wdirectn,k and wdiffusen,k defined in Equation 6 and Equation 7 may be classified into ydirectn,k and ydiffusen,k. ydirectn,k includes a direct signal and ydiffusen,k includes a diffuse signal. That is, ydirectn,k is a result that is derived from the direct signal directly input to the matrix M2 without passing through a decorrelator and ydiffusen,k is a result that is derived from the diffuse signal output from the decorrelator and input to the matrix M2.
In addition, ydirectn,k and ydiffusen,k may be derived based on a case in which a Subband
Domain Temporal Processing (STP) is applied to the N-N/2-N configuration and a case in which Guided Envelope Shaping (GES) is applied to the N-N/2-N configuration. In this instance, ydirectn,k and ydiffusen,k are identified using bsTempShapeConfig that is a data stream element.
<Case in Which STP is Applied>
To synthesize decorrelation levels between output signal channels, a diffuse signal is generated through a decorrelator for spatial synthesis. Here, the generated diffuse signal may be mixed with a direct signal. In general, a temporal envelope of the diffuse signal does not match an envelope of the direct signal.
In this instance, STP is applied to shape an envelope of a diffuse signal portion of each output signal to be matched to a temporal shape of a downmix signal transmitted from an encoder. Such processing may be performed by calculating an envelope ratio between the direct signal and the diffuse signal or by estimating an envelope such as shaping of an upper spectrum portion of the diffuse signal.
That is, temporal energy envelopes with respect to a portion corresponding to the direct signal and a portion corresponding to the diffuse signal may be estimated from the output signal generated through upmixing. A shaping factor may be calculated based on a ratio between the temporal energy envelopes with respect to the portion corresponding to the direct signal and the portion corresponding to the diffuse signal.
STP may be signaled to bsTempShapeConfig=1. If bsTempShapeEnableChannel(ch)=1, the diffuse signal portion of the output signal generated through upmixing may be processed through the STP.
Meanwhile, to reduce the necessity of a delay alignment of an original downmix signal transmitted with respect to spatial upmixing for generating an output signal, downmixing of spatial upmixing may be calculated as an approximation of the transmitted original downmix signal.
With respect to the N-N/2-N configuration, a direct downmix signal for NumgInCh-NumLfe may be defined by Equation 8.
In Equation 8, chd includes a pair-wise output signal corresponding to a channel d of an output signal with respect to the N-N/2-N configuration, and chd may be defined with respect to the N-N/2-N configuration, as expressed by Table 1.
Downmix broadband envelopes and an envelope with respect to a diffuse signal portion of each upmix channel may be estimated based on the normalized direct energy according to Equation 9.
Edirectn,sb=|{circumflex over (Z)}directn,sb·BPsb·GFsb|2 [Equation 9]
In Equation 9, BPsb denotes a band pass factor and GFsb denotes a spectral flattering factor.
In the N-N/2-N configuration, since the direct signal for NumInCh-NumLfe is present, energy Edirect_norm,d of the direct signal that satisfies 0≤d<(NumInCh−NumLfe) may be obtained using the same method as that used in a 5-1-5 configuration defined in MPS. A scale factor associated with final envelope processing may be defined by Equation 10.
In Equation 10, the scale factor may be defined if 0≤d<(NumInCh-31 NumLfe) is satisfied with respect to the N-N/2-N configuration. By applying the scale factor to the diffuse signal portion of the output signal, the temporal envelope of the output signal may be substantially mapped to the temporal envelope of the downmix signal. Accordingly, the diffuse signal portion processed using the scale factor in each of channels of the N-channel output signal may be mixed with the direct signal portion. Through this process, whether the diffuse signal portion is processed using the scale factor may be signaled for each of output signal channels. If bsTempShapeEnableChannel(ch)=1, it indicates that the diffuse signal portion is processed using the scale factor.
<Case in Which GES is Applied>
In the case of performing temporal shaping on the diffuse signal portion of the output signal, a characteristic distortion is likely to occur. Accordingly, GES may enhance temporal/spatial quality by outperforming the distortion issue. The decoder may individually process the direct signal portion and the diffuse signal portion of the output signal. In this instance, if GES is applied, only the direct signal portion of the upmixed output signal may be altered.
GES may restore a broadband envelope of a synthesized output signal. GES includes a modified upmixing process after flattening and reshaping an envelope with respect to a direct signal portion for each of output signal channels.
Additional information of a parametric broadband envelope included in a bitstream may be used for reshaping. The additional information includes an envelope ratio between an envelope of an original input signal and an envelope of a downmix signal. The decoder may apply the envelope ratio to a direct signal portion of each of time slots included in a frame for each of output signal channels. Due to GES, a diffuse signal portion for each output signal channel is not altered.
If bsTempShapeConfig=2 a GES process may be performed. If GES is available, each of a diffuse signal and a direct signal of an output signal may be synthesized using a post mixing matrix M2 modified in a hybrid subband domain according to Equation 11.
ydirectn,k=M2n,k wdirectn,k ydiffusen,k=M2n,k for 0≤k<K and 0≤n<numSlots [Equation 11]
In Equation 11, a direct signal portion for an output signal y provides a direct signal and a residual signal, and a diffuse signal portion for the output signal y provides a diffuse signal. Overall, only the direct signal may be processed using GES.
A GES processing result may be determined according to Equation 12.
ygesn,k=ydirectn,k+ydiffusen,k [Equation 12]
GES may extract an envelope with respect to a downmix signal for performing spatial synthesis aside from an LFE channel depending on a tree structure and a specific channel of an output signal upmixed from the downmix signal by the decoder.
In the N-N/2-N configuration, an output signal choutput may be defined as expressed by Table 2.
In the N-N/2-N configuration, an input signal chinput may be defined as expressed by Table 3.
Also, in the N-N/2-N configuration, a downmix signal Dch(choutput) may be defined as expressed by Table 4.
Hereinafter, the matrix M1 (M1n,k) and the matrix M2 (M2n,k) defined with respect to all of time slots n and all of hybrid subbands k will be described. The matrices are the interpolated version of R1l,mG1l,mHl,m and R2l, m defined with respect to a given parameter time slot l and a given processing band n based on channel level difference (CLD), ICC, and CPC parameters valid for a parameter time slot and a processing band.
<Definition of Matrix M1 (pPre-matrix)>
A process of inputting a downmix signal to decorrelators used at the decoder in the N-N/2-N configuration of
A size of the matrix M1 depends on the number of channels of a downmix signal input to the matrix M1 and the number of decorrelators used at the decoder. Here, elements of the matrix M1 may be derived from CLD and/or CPC parameters. The matrix M1 may be defined by Equation 13.
In Equation 13,
Meanwhile, W1n,k may be smoothed according to Equation 14.
In Equation 14, in each of K (k) and Kkonj(k,x), a first row is a hybrid subband k, a second row is a processing band, and a third row is a complex conjugation x* of x with respect to a specific hybrid subband k. Further, W1−1,k denotes a last parameter set of a previous frame.
Matrices R1l,m, G1l,m, and Hl,m for the matrix M1 may be defined as follows:
(1) Matrix R1:
Matrix may control the number of signals that are input to decorrelators, and may be expressed as a function of CLD and CPS since a decorrelated signal is not added.
The matrix R1l,m may be differently defined based on a channel configuration. In the N-N/2-N configuration, all of input signal channels may be input in pairs to an OTT box to prevent OTT boxes from being cascaded. In the N-N/2-N configuration, the number of OTT boxes is N/2.
In this case, the matrix R1l,m depends on the number of OTT boxes equal to a column size of the vector Xn,k that includes an input signal. However, LFE upmix based on an OTT box does not require a decorrelator and thus, is not considered in the N-N/2-N configuration. All of elements of the matrix R1l,m may be either 1 or 0.
In the N-N/2-N configuration, the matrix R1l,m may be defined by Equation 15.
In the N-N/2-N configuration, all of the OTT boxes represent parallel processing stages instead of cascade. Accordingly, in the N-N/2-N configuration, none of the OTT boxes are connected to other OTT boxes. The matrix R1l,m may be configured using unit matrix INumInCh and unit matrix INumInCh-NumLfe. Here, unit matrix IN may be a unit matrix with the size of N*N.
(2) Matrix GI:
To handle a downmix signal or a downmix signal supplied from an outside prior to MPS decoding, a data stream controlled based on correction factors may be applicable. A correction factor may be applicable to the downmix signal or the downmix signal supplied from the outside, based on matrix G1l,m.
The matrix G1l,m may guarantee that a level of a downmix signal for a specific time/frequency tile represented by a parameter is equal to a level of a downmix signal obtained when an encoder estimates a spatial parameter.
It can be classified into three cases; (i) a case in which external downmix compensation is absent (bsArbitraryDownmix=0) (ii) a case in which parameterized external downmix compensation is present (bsArbitraryDownmix=1), and (iii) residual coding based on external downmix compensation is performed (bsArbitraryDownmix=2). If bsArbitraryDownmix=1, the decoder does not support the residual coding based on the external downmix compensation.
If the external downmix compensation is not applied in the N-N/2-N configuration (bsArbitraryDownmix=0), the matrix G1l,m in the N-N/2-N configuration may be defined by Equation 16.
G1l,m=[INumInCh|ONumInCh] [Equation 16]
In Equation 16, INumIncl denotes a unit matrix that indicates a size of NumInCh* NumInCh and ONumInCh denotes a zero matrix that indicates a size of NumInCh*NumInCh.
On the contrary, if the external downmix compensation is applied in the N-N/2-N configuration (bsArbitraryDownmix=1), the matrix G1l,m in the N-N/2-N configuration may be defined by Equation 17:
In Equation 17, gXl,m=G(X,l,m,), 0≤X<NumInCh, 0≤m<Mproc, 0≤l<L.
Meanwhile, if residual coding based on the external downmix compensation is applied in the N-N/2-N configuration (bsArbitraryDownmix=2), the matrix G1l,m may be defined by Equation 18:
In Equation 18, gXl,m=G(X,l,m), 0≤X<NumInCh, 0≤m<Mproc, 0≤l<L, and α may be updated.
(3) Matrix H1:
In the N-N/2-N configuration, the number of downmix signal channels may be 5 or more. Accordingly, inverse matrix H may be a unit matrix having a size corresponding to the number of columns of vector Xn,k of an input signal with respect to all of parameter sets and processing bands.
<Definition of Matrix M2 (post-matrix)>
In the N-N/2-N configuration, M2n,k that is the matrix M2 defines a combination between a direct signal and a decorrelated signal in order to generate a multi-channel output signal. M2n,k may be defined by Equation 19:
In Equation 19,
Meanwhile, W2l,k may be smoothed according to Equation 20.
In Equation 20, in each of K(k) and Kkonj(k,x), a first row is a hybrid subband k, a second row is a processing band, and a third row is a complex conjugation x* of x with respect to a specific hybrid subband k. Further, W2−1,k denotes a last parameter set of a previous frame.
An element of the matrix R2n,k for the matrix M2 may be calculated from an equivalent model of an OTT box. The OTT box includes a decorrelator and a mixing processor. A mono input signal input to the OTT box may be transferred to each of the decorrelator and the mixing processor. The mixing processor may generate a stereo output signal based on the mono input signal, a decorrelated signal output through the decorrelator, and CLD and ICC parameters. Here, CLD controls localization in a stereo field and ICC controls a stereo wideness of an output signal.
A result output from an arbitrary OTT box may be defined by Equation 21.
The OTT box may be labeled with OTTx where 0≤X<numOttBoxes, and each of H11OTT
Here, a post gain matrix may be defined by Equation 22.
In Equation 22,
Meanwhile,
where λ0=−11/72 for 0≤m<Mproc,0≤l<L.
Further,
Here, in the N-N/2-N configuration, R2l,m may be defined by Equation 23.
In Equation 23, CLD and ICC may be defined by Equation 24.
CLDxl,m=DCLD(X,l,m)
ICCxl,m=DICC(X,l,m) [Equation 24]
In Equation 24, 0≤X<NumInCh,0≤m<Mproc,0≤l<L.
<Definition of Decorrelator>
In the N-N/2-N configuration, decorrelators may be executed by reverberation filters in a QMF subband domain. The reverberation filters may represent different filter characteristics based on a current corresponding hybrid subband among all of hybrid subbands.
A reverberation filter refers to an imaging infrared (IIR) lattice filter. IIR lattice filters have different filter coefficients with respect to different decorrelators in order to generate mutually decorrelated orthogonal signals.
A decorrelation process performed by a decorrelator may proceed through a plurality of processes. Initially, vn,k that is an output of the matrix M1 is input to an all-pass decorrelation filter set. Filtered signals may be energy-shaped. Here, energy shaping indicates shaping a spectral or temporal envelope so that decorrelated signals may be matched to be further closer to input signals.
The input signal vXn,k input to an arbitrary decorrelator is a portion of the vector vn,k. To guarantee orthogonality between decorrelated signals derived through a plurality of decorrelators, the plurality of decorrelators has different filter coefficients.
Due to constant frequency-dependent delay, a decorrelator filter includes a plurality of all-pass IIR areas. A frequency axis may be divided into different areas to correspond to QMF divisional frequencies. For each area, a length of delay and lengths of filter coefficient vectors are same. A filter coefficient of a decorrelator having fractional delay due to additional phase rotation depends on a hybrid subband index.
As described above, filters of a decorrelator have different filter coefficients to guarantee orthogonality between decorrelated signals that are output from the decorrelators. In the N-N/2-N configuration, N/2 decorrelators are required. Here, in the N-N/2-N configuration, the number of decorrelators may be limited to 10. In the N-N/2-N configuration in which an LFE mode is absent, if N/2, i.e., the number of OTT boxes exceeds 10, the number of decorrelators corresponding to the number of OTT boxes exceeding 10 may be reused according to a 10-basis modulo operation.
Table 5 shows an index of a decorrelator in the decoder of the N-N/2-N configuration. Referring to Table 5, indices of N/2 decorrelators are repeated based on a unit of “10”. That is, a zero-th decorrelator and a tenth decorrelator have the same index of D1OTT ( ). In detail, if N, i.e., the number of output signal channels exceeds M corresponding to a preset number of channels, the decorrelator may include a first decorrelator corresponding to a channel of M or less and a second decorrelator corresponding to a channel greater than M. The second decorrelator may reuse a filter set of the first decorrelator.
The N-N/2-N configuration may be configured based on syntax as expressed by Table 6.
Here, bsTreeConfig may be expressed by Table 7. Table 7 shows a configuration of a decoding apparatus in the N-N/2-N configuration if bsTreeConfig=7. The number (numOttBoxes) of OTT boxes is equal to the number of downmix signal channels (NumInCh). The number of OTT boxes is zero.
Here, if bsTreeConfig=0,1,2,3,4,5,6, Table 40 of ISO/IEC 20003-1:2007 corresponding to MPS standard is defined by Table 8.
In the N-N/2-N configuration, the number of downmix signal channels, i.e., bsNumInCh, may be expressed by Table 9.
Here, NumInCh denotes the number of channels of a downmix signal input to the decoding apparatus in the N-N/2-N configuration, and NumOutCh denotes the number of output signal channels by upmixing the downmix signal. In the N-N/2-N configuration, NLFE, i.e., the number of LFE channels among output signals may be expressed by Table 10. NumLfe denotes the number of LFE channels (NLFE) in the N-N/2-N configuration.
In the N-N/2-N configuration, channel ordering of output signals may be performed based on the number of output signal channels and the number of LFE channels as expressed by Table 11.
In Table 6, bsHasSpeakerConfig denotes a flag indicating whether a layout of an output signal to be played is different from a layout corresponding to channel ordering in Table 11. If bsHasSpeakerConfig==1, audioChannelLayout that is a layout of a loudspeaker for actual play may be used for rendering.
In addition, audioChannelLayout denotes the layout of the loudspeaker for actual play.
If the output signal includes an LFE channel, a channel order of the LFE channel may be determined to satisfy (i) a condition that the LFE channel is processed together with another channel using an OTT box instead of the LFE channel and (ii) a condition that the LFE channel is located at a last position in a channel list. For example, the LFE channel is located at a last position among L, Lv, R, Rv, Ls, Lss, Rs, Rss, C, LFE, Cvr, and LFE2 that are included in the channel list.
The N-N/2-Nstructure of
Referring to
A tree structure on the left of
If the LFE channel is not included in the N-channel output signal, the N/2 OTT boxes may generate the N-channel output signal using a residual signal (res) and a downmix signal (M). However, if the LFE channel is not included in the N-channel output signal, an OTT box from which the LFE channel is output among the N/2 OTT boxes may use only a downmix signal aside from a residual signal.
In addition, if the LFE channel is included in the N-channel output signal, an OTT box from which the LFE channel is not output among the N/2 OTT boxes may upmix a downmix signal using CLD and ICC and an OTT box from which the LFE channel is output may upmix a downmix signal using only CLD.
If the LFE channel is included in the N-channel output signal, an OTT box from which the LFE channel is not output among the N/2 OTT boxes generates a decorrelated signal through a decorrelator and an OTT box from which the LFE channel is output does not perform a decorrelation process and thus, does not generate a decorrelated signal.
According to an example embodiment, an N/2-channel downmix signal may be generated from an N-channel input signal through MPS encoding. An N-channel output signal may be generated from the N/2-channel downmix signal through MPS decoding.
Although 1 channel, 2 channels, and 5.1 channels may be output as a downmix signal channel through an encoder in the existing MPS standard, the present disclosure is not limited thereto. The definition of additional syntax is required to support the number of downmix signal channels not defined in the existing MPS standard.
In the MPS standard, an input/output relationship may be defined through BsTreeConfig as shown in Table 8. A decoding process of an input signal and an output signal is defined based on BsTreeConfig.
BsTreeConfig 0 defines a process of generating a 1-channel downmix signal from a 6-channel (5.1-channel) input signal, and generating a 6-channel (5.1-channel) output signal from the 1-channel downmix signal. To this end, the decoder requires 5 OTT boxes and CLD may be applicable to each of the OTT boxes.
Here, defaultCLD [0˜5] may be defined as CLD that is input to an OTT box based on a position of the OTT box. CLD corresponding to the OTT box is enabled. That is, once the CLD is enabled, the CLD may be input to the OTT box. ottModeLfe also indicates whether an LFE channel is output from the OTT box.
According to Table 8 defined in the current MPS standard, defaultCLD [0˜5] corresponding to 6 OTT boxes are defined. The current MPS standard does not cover a case of generating 5 or more channels of a downmix signal where the number of channels of an input signal exceeds 10.
According to an example embodiment, it is possible to process an input signal having the number of channels different from the number of channels defined in the existing MPS standard by applying a reserved bit to the MPS standard. For example, if the number of input signal channels, i.e., N=24 and the number of downmix signal channels=12, definition may be made as shown in Table 12.
The decoder of
In
A vector v 1003 of
Equation 25 corresponds to Equation 1. If a residual signal (res) is absent in Equation 25, xM0 to xM11 may be mapped to vM0 to vM11. The same number of decorrelated signals as the number of downmix signals may be derived.
A vector w 1004 may be determined according to Equation 26.
Equation 26 corresponds to Equation 2. The decorrelator 1007 operates if the residual signal is absent. That is, if the residual signal is absent, the decorrelated signal may be generated. D( ) is used when the decorrelator generates the decorrelated signal. In Equation 26, if the residual signal is present, δi=0, and otherwise, δi=1. That is, if δi=1, the decorrelated signal may be generated according to Equation 15.
In
A process of deriving the matrix M11002 and the matrix M21005 may refer to description of
(the number of “1” is 12 and is equal to the number of downmix signal channels)
In Equation 29, HLL, HLR, HRL, and HRR may be derived from CLD and ICC corresponding to each OTT box.
Herein, proposed is a parallel OTT-based MPS decoder that may generate an N-channel output signal from an N/2-channel downmix signal based on newly defined BsTreeConfig information.
Referring to
A multichannel audio signal processing method according to an example embodiment may include identifying a residual signal and an N/2-channel downmix signal generated from an N-channel input signal; applying the N/2-channel downmix signal and the residual signal to a first matrix; outputting a first signal input to N/2 decorrelators corresponding to N/2 OTT boxes through the first matrix and a second signal transferred to a second matrix instead of being input to the N/2 decorrelators; outputting a decorrelated signal from the first signal through the N/2 decorrelators; applying the decorrelated signal and the second signal to the second matrix; and generating an N-channel output signal through the second matrix.
If an LFE channel is not included in the N-channel output signal, the N/2 decorrelators may correspond to the N/2 OTT boxes, respectively.
If the number of decorrelators exceeds a reference value of a modulo operation, an index of a decorrelator may be repeatedly reused based on the reference value.
If the LFE channel is included in the N-channel output signal, the number of decorrelators corresponding to a remaining number excluding the number of LFE channels from N/2 may be used. The LFE channel may not use the decorrelator of the OTT box.
If a temporal shaping tool is not used, a single vector that includes the second signal, the decorrelated signal derived from the decorrelator, and the residual signal derived from the decorrelator may be input to the second matrix.
Conversely, if the temporal shaping tool is used, a vector corresponding to a direct signal including the second signal and the residual signal derived from the decorrelator and a vector corresponding to a diffuse signal including the decorrelated signal derived from the decorrelator may be input to the second matrix.
The generating of the N-channel output signal may include shaping a temporal envelope of an output signal by applying a scale factor according to the diffuse signal and the direct signal to a diffuse signal portion of the output signal if an STP is used.
The generating of the N-channel output signal may include flattening and reshaping an envelope with respect to a direct signal portion for each channel of the N-channel output signal if GES is used.
A size of the first matrix may be determined based on the number of decorrelators and the number of downmix signal channels used to apply the first matrix, and an element of the first matrix may be determined based on a CLD parameter or a CPC parameter.
A multichannel audio signal processing method according to an example embodiment may include identifying an N/2-channel downmix signal and an N/2-channel residual signal; generating an N-channel output signal by inputting the N/2-channel downmix signal and the N/2-channel residual signal to each of the N/2 OTT boxes. Here, the N/2 OTT boxes are disposed in parallel without mutual connection. Among the N/2 OTT boxes, an OTT box from which an LFE channel is output (1) receives only a downmix signal aside from a residual signal, (2) uses a CLD parameter between the CLD parameter and an ICC parameter, and (3) does not output a decorrelated signal through a decorrelator.
A multichannel signal processing apparatus according to an example embodiment includes a processor to implement a multichannel signal processing method, and the multichannel signal processing method may include identifying a residual signal and an N/2-channel downmix signal generated from an N-channel input signal; applying the N/2-channel downmix signal and the residual signal to a first matrix; outputting a first signal input to N/2 decorrelators corresponding to N/2 OTT boxes through the first matrix and a second signal transferred to a second matrix instead of being input to the N/2 decorrelators; outputting a decorrelated signal from the first signal through the N/2 decorrelators; applying the decorrelated signal and the second signal to a second matrix; and generating an N-channel output signal through the second matrix.
If an LFE channel is not included in the N-channel output signal, the N/2 decorrelators may correspond to the N/2 OTT boxes, respectively.
If the number of decorrelators exceeds a reference value of a modulo operation, an index of a decorrelator may be repeatedly reused based on the reference value. If the LFE channel is included in the N-channel output signal, the number of decorrelators corresponding to a remaining number excluding the number of LFE channels from N/2 may be used. The LFE channel may not use the decorrelator of the OTT box.
If a temporal shaping tool is not used, a single vector that includes the second signal, the decorrelated signal derived from the decorrelator, and the residual signal derived from the decorrelator may be input to the second matrix.
Conversely, if the temporal shaping tool is used, a vector corresponding to a direct signal including the second signal and the residual signal derived from the decorrelator and a vector corresponding to a diffuse signal including the decorrelated signal derived from the decorrelator may be input to the second matrix.
The generating of the N-channel output signal may include shaping a temporal envelope of an output signal by applying a scale factor according to the diffuse signal and the direct signal to a diffuse signal portion of the output signal if an STP is used.
The generating of the N-channel output signal may include flattening and reshaping an envelope with respect to a direct signal portion for each channel of the N-channel output signal if GES is used.
A size of the first matrix may be determined based on the number of decorrelators and the number of downmix signal channels used to apply the first matrix, and an element of the first matrix may be determined based on a CLD parameter or a CPC parameter.
A multichannel signal processing apparatus according to another example embodiment includes a processor to perform a multichannel signal processing method, and the multichannel signal processing method may include identifying an N/2-channel downmix signal and an N/2-channel residual signal; generating an N-channel output signal by inputting the N/2-channel downmix signal and the N/2-channel residual signal to each of the N/2 OTT boxes.
Here, the N/2 OTT boxes are disposed in parallel without mutual connection. Among the N/2 OTT boxes, an OTT box that outputs an LFE channel (1) receives only a downmix signal aside from a residual signal, (2) uses a CLD parameter between the CLD parameter and an ICC parameter, and (3) does not output a decorrelated signal through a decorrelator.
The embodiments described herein may be implemented using hardware components, software components, and/or combination of hardware components and software components. For example, the processing device(s) described herein may include a processor, a controller and an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), a programmable logic unit (PLU), a microprocessor or any other device capable of responding to and executing instructions in a defined manner. The processing device may run an operating system (OS) and one or more software applications that run on the OS. The processing device also may access, store, manipulate, process, and create data in response to execution of the software. For purpose of simplicity, the description of a processing device is used as singular; however, one skilled in the art will appreciated that a processing device may include multiple processing elements and multiple types of processing elements. For example, a processing device may include multiple processors or a processor and a controller. In addition, different processing configurations are possible, such a parallel processors.
The software may include a computer program, a piece of code, an instruction, or some combination thereof, to independently or collectively instruct and/or configure the processing device to operate as desired, thereby transforming the processing device into a special purpose processor. Software and/or data may be embodied permanently or temporarily in any type of machine, component, physical or virtual equipment, computer storage medium or device, or in a propagated signal wave capable of providing instructions or data to or being interpreted by the processing device. The software also may be distributed over network coupled computer systems so that the software is stored and executed in a distributed fashion. The software and data may be stored by one or more non-transitory computer readable recording mediums.
The methods according to the above-described embodiments may be recorded in non-transitory computer-readable media including program instructions to implement various operations of the above-described embodiments. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the media may be those specially designed and constructed for the purposes of embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of non-transitory computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM discs, DVDs, and/or Blue-ray discs; magneto-optical media such as optical discs; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory (e.g., USB flash drives, memory cards, memory sticks, etc.), and the like. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The above-described devices may be configured to act as one or more software modules in order to perform the operations of the above-described embodiments, or vice versa.
Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While this disclosure includes specific example embodiments, it will be apparent to one of ordinary skill in the art that various changes and modifications in form and details may be made in these example embodiments without departing from the spirit and scope of the claims and their equivalents. For example, suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents. Therefore, the scope of the disclosure is defined not by the detailed description, but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
10-2015-0024464 | Feb 2015 | KR | national |
10-2016-0018462 | Feb 2016 | KR | national |
This present application is a continuation of U.S. patent application Ser. No. 15/551,734 filed Aug. 17, 2017, which is a U.S. national stage application of International Application No. PCT/KR2016/001613 filed on Feb. 17, 2016, which claims the benefit under 35 U.S.C. § 119(a) of Korean Patent Application Nos. 10-2015-0024464 filed on Feb. 17, 2015, and 10-2016-0018462 filed on Feb. 17, 2016, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated herein by reference for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
20070233293 | Villemoes et al. | Oct 2007 | A1 |
20100228554 | Beack et al. | Sep 2010 | A1 |
20120201388 | Seo et al. | Aug 2012 | A1 |
20120321090 | Kim et al. | Dec 2012 | A1 |
20130173274 | Kuntz et al. | Jul 2013 | A1 |
20160071522 | Beack et al. | Mar 2016 | A1 |
Number | Date | Country |
---|---|---|
2 880 053 | Jan 2015 | EP |
3 023 984 | May 2016 | EP |
10-2007-0094422 | Sep 2007 | KR |
10-2015-0009474 | Jan 2015 | KR |
Entry |
---|
Dolby Laboratories, Inc., “Dolby Metadata Guide,” Issue 3, 2005 (retrieved from http://www.dolby.com/us/en/technologies/doby-metadata.html) (28 pages in English). |
Herre, J. et al., “MPEG Surround—The ISO/MPEG Standard for Efficient and Compatible Multichannel Audio Coding,” Journal of the Audio Engineering Society, vol. 56.11, 2008 (pp. 932-955). |
International Search Report issued in counterpart International Application No. PCT/KR2016/001613 dated Jul. 11, 2016 (3 pages in English; 4 pages in Korean). |
Number | Date | Country | |
---|---|---|---|
20190200150 A1 | Jun 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15551734 | US | |
Child | 16290469 | US |