Embodiments of the present invention relate to a multichannel audio signal processing apparatus included in a three-dimensional (3D) audio decoder and a multichannel audio signal processing method.
With the enhancement in the quality of multimedia contents, a high quality multichannel audio signal, such as a 7.1 channel audio signal, a 10.2 channel audio signal, a 13.2 channel audio signal, and a 22.2 channel audio signal, having a relatively large number of channels compared to an existing 5.1 channel audio signal, has been used. However, in many cases, the high quality multichannel audio signal may be listened to with a 2-channel stereo loudspeaker or a headphone through a personal terminal such as a smartphone or a personal computer (PC).
Accordingly, binaural rendering technology for down-mixing a multichannel audio signal to a stereo audio signal has been developed to make it possible to listen to the high quality multichannel audio signal with a 2-channel stereo loudspeaker or a headphone.
The existing binaural rendering may generate a binaural stereo audio signal by filtering each channel of a 5.1 channel audio signal or a 7.1 channel audio signal through a binaural filter such as a head related transfer function (HRTF) or a binaural room impulse response (BRIR). In the existing method, an amount of filtering calculation may increase according to an increase in the number of channels of an input multichannel audio signal.
Accordingly, in a case in which an amount of calculation increases according to an increase in the number of channels of a multichannel audio signal, such as a 10.2 channel audio signal and a 22.2 channel audio signal, it may be difficult to perform a real-time calculation for playback using a 2-channel stereo loudspeaker or a headphone. In particular, a mobile terminal having a relatively low calculation capability may not readily perform a binaural filtering calculation in real time according to an increase in the number of channels of a multichannel audio signal.
Accordingly, there is a need for a method that may decrease an amount of calculation required for binaural filtering to make it possible to perform a real-time calculation when rendering a high quality multichannel audio signal having a relatively large number of channels to a binaural signal.
An aspect of the present invention provides an apparatus and method that may down-mix an input multichannel audio signal and then perform binaural rendering, thereby decreasing an amount of calculation required for binaural rendering although the number of channels of the multichannel audio signal increases.
According to an aspect of the present invention, there is provided a multichannel audio signal processing method including: generating an N-channel audio signal of N channels by down-mixing an M-channel audio signal of M channels; and generating a stereo audio signal by performing binaural rendering of the N-channel audio signal.
The generating of the stereo audio signal may include: generating channel-by-channel stereo audio signals using filters corresponding to playback locations of channel-by-channel audio signals of the N channels; and generating the stereo audio signal by mixing the channel-by-channel stereo audio signals.
The generating of the stereo audio signal may include generating the stereo audio signal using a plurality of binaural renderers respectively corresponding to the channels of the N-channel audio signal.
According to another aspect of the present invention, there is provided a multichannel audio signal processing method including: sub-sampling the number of channels of the multichannel audio signal based on a virtual loudspeaker layout; and generating a stereo audio signal by performing binaural rendering of the sub-sampled multichannel audio signal.
The generating of the stereo audio signal may include performing binaural rendering of the sub-sampled multichannel audio signal in a frequency domain.
The generating of the stereo audio signal may include generating the stereo audio signal using a plurality of binaural renderers respectively corresponding to the channels of the N-channel audio signal.
According to still another aspect of the present invention, there is provided a multichannel audio signal processing method including: sub-sampling the number of channels of the multichannel audio signal based on a three-dimensional (3D) loudspeaker layout; and generating a stereo audio signal by performing binaural rendering of the sub-sampled multichannel audio signal.
The generating of the stereo audio signal may include performing binaural rendering of the sub-sampled multichannel audio signal in a frequency domain.
The generating of the stereo audio signal may include generating the stereo audio signal using a plurality of binaural renderers respectively corresponding to the channels of the N-channel audio signal.
According to still another aspect of the present invention, there is provided a multichannel audio signal processing apparatus including: a channel down-mixing unit configured to generate an N-channel audio signal of N channels by down-mixing an M-channel audio signal of M channels; and a binaural rendering unit configured to generate a stereo audio signal by performing binaural rendering of the N-channel audio signal.
The binaural rendering unit may generate channel-by-channel stereo audio signals using filters corresponding to playback locations of channel-by-channel audio signals of the N channels, and may generate the stereo audio signal by mixing the channel-by-channel stereo audio signals.
The binaural rendering unit may generate the stereo audio signal using a plurality of binaural renderers respectively corresponding to the channels of the N-channel audio signal.
According to still another aspect of the present invention, there is provided a multichannel audio signal processing apparatus including: a channel down-mixing unit configured to sub-sample the number of channels of a multichannel audio signal based on a virtual loudspeaker layout; and a binaural rendering unit configured to generate a stereo audio signal by performing binaural rendering of the sub-sampled multichannel audio signal.
The binaural rendering unit may perform binaural rendering of the sub-sampled multichannel audio signal in a frequency domain.
The binaural rendering unit may generate the stereo audio signal using a plurality of binaural renderers respectively corresponding to the channels of the N-channel audio signal.
According to still another aspect of the present invention, there is provided a multichannel audio signal processing apparatus including: a channel down-mixing unit configured to sub-sample the number of channels of the multichannel audio signal based on a 3D loudspeaker layout; and a binaural rendering unit configured to generate a stereo audio signal by performing binaural rendering of the sub-sampled multichannel audio signal.
The binaural rendering unit may perform binaural rendering of the sub-sampled multichannel audio signal in a frequency domain.
The binaural rendering unit may generate the stereo audio signal using a plurality of binaural renderers respectively corresponding to the channels of the N-channel audio signal.
According to embodiments of the present invention, it is possible to down-mix an input multichannel audio signal and then perform binaural rendering, thereby decreasing an amount of calculation required for binaural rendering although the number of channels of the multichannel audio signal increases.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures. A multichannel audio signal processing method according to an embodiment of the present invention may be performed by a multichannel audio signal processing apparatus according to an embodiment of the present invention.
Referring to
The channel down-mixing unit 110 may generate an N-channel audio signal of N channels by down-mixing an M-channel audio signal of M channels. Here, the M channels denote the number of channels greater than the N channels (N<M).
For example, when an M-channel audio signal includes three-dimensional (3D) spatial information, the channel down-mixing unit 110 may down-nix the M-channel audio signal to minimize loss of the 3D spatial information included in the M-channel audio signal. Here, the 3D spatial information may include a height channel.
For example, in the case of down-mixing the M-channel audio signal having a 3D channel layout to an N-channel audio signal having a two-dimensional (2D) channel layout, it may be difficult to reproduce 3D spatial information of the M-channel audio signal using the N-channel audio signal.
Accordingly, when the M-channel audio signal includes the 3D spatial information, the channel down-mixing unit 110 may down-mix the M-channel audio signal so that even the N-channel audio signal generated through down-mixing may include the 3D spatial information. In detail, when the M-channel audio signal includes the 3D spatial information, the channel down-mixing unit 110 may down-mix the M-channel audio signal based on a channel layout including the 3D spatial information.
For example, when an input multichannel audio signal has a 22.2 channel layout among 3D channel layouts, the channel down-mixing unit 110 may generate a 10.2 channel or 8.1 channel audio signal that provides a sound field similar to a 22.2 channel audio signal through down-mixing and also has the minimum number of channels.
The binaural rendering unit 120 may generate a stereo audio signal by performing binaural rendering of the N-channel audio signal generated by the channel down-mixing unit 110. For example, the binaural rendering unit 120 may generate channel-by-channel stereo audio signals using a plurality of binaural rendering filters corresponding to playback locations of channel-by-channel audio signals of the N channels of the N-channel audio signal, and may generate a single stereo audio signal by mixing the channel-by-channel stereo audio signals.
The channel down-mixing unit 110 may receive an M-channel audio signal 210 of M channels corresponding to a multichannel audio signal. The channel down-mixing unit 110 may output an N-channel audio signal 220 of N channels by down-mixing the M-channel audio signal 210. Here, the number of channels of the N-channel audio signal 220 may be less than the number of channels of the M-channel audio signal 210.
When the M-channel audio signal 210 includes 3D spatial information, the channel down-mixing unit 110 may down-mix the M-channel audio signal 210 to the N-channel audio signal 220 having a 3D layout to minimize loss of the 3D spatial information included in the M-channel audio signal.
The binaural rendering unit 120 may output a stereo audio signal 230 including a left channel 221 and a right channel 222 by performing binaural rendering of the N-channel audio signal 220.
Accordingly, the multichannel audio signal processing apparatus 100 may down-mix the input M-channel audio signal 210 in advance prior to performing binaural rendering of the N-channel audio signal 220, without directly performing binaural rendering of the M-channel audio signal 210. Through this operation, the number of channels to be processed in binaural rendering decreases and thus, an amount of filtering calculation required for binaural rendering may decrease in practice.
The N-channel audio signal 220 down-mixed from the M-channel audio signal 210 may indicate N 1-channel mono audio signals. A binaural rendering unit 310 may perform binaural rendering of the N-channel audio signal 220 using N binaural rendering filters 410 corresponding to N mono audio signals, respectively, base on 1:1.
Here, the binaural rendering filter 410 may generate a left channel audio signal and a right channel audio signal by performing binaural rendering of an input mono audio signal. Accordingly, when binaural rendering is performed by the binaural rendering unit 310, N left channel audio signals and N right channel audio signals may be generated.
The binaural rendering unit 310 may output the stereo audio signal 230 including a single left channel audio signal and a single right channel audio signal by mixing the N left channel audio signals and the N right channel audio signals. In detail, the binaural rendering unit 310 may output the stereo audio signal 230 by mixing channel-by-channel stereo audio signals generated by the plurality of binaural rendering filters 410.
The channel down-mixing unit 110 may receive and then down-mix a 22.2 channel audio signal 510. The channel down-mixing unit 110 may output a 10.2 channel or 8.1 channel audio signal 520 from the 22.2 channel audio signal 510. Since the 22.2 channel audio signal 510 includes 3D spatial information, the channel down-mixing unit 110 may output the 10.2 channel or 8.1 channel audio signal 520 that maintains a sound field similar to the 22.2 channel audio signal 510 and has the minimum number of channels.
The binaural rendering unit 120 may output a stereo audio signal 530 including a left channel audio signal and a right channel audio signal by performing binaural rendering on each of a plurality of mono audio signals constituting the down-mixed 10.2 channel or 8.1 channel audio signal 520.
The multichannel audio signal processing apparatus 100 may down-mix the input 22.2 channel audio signal 510 to the 10.2 channel or 8.1 channel audio signal 520 having the number of channels less than the 22.2 channel audio signal 510 and may input the N-channel audio signal 220 to the binaural rendering unit 120, thereby decreasing an amount of calculation required for binaural rendering compared to the existing method and performing binaural rendering of a multichannel audio signal having a relatively large number of channels.
5.1 channel, 8.1 channel, 10.1 channel, and 22.2 channel audio signals may have input formats and output formats of
Referring to
Here, audio signals played back using the loudspeakers positioned on the upper layer, the top layer, and the lower layer may further include 3D spatial information compared to an audio signal played back using a loudspeaker positioned on a middle layer. For example, the 5.1 channel audio signal played back using only the loudspeaker positioned on the middle layer may not include 3D spatial information. The 22.2 channel, 8.1 channel, and 10.1 channel audio signals using the loudspeakers positioned on the upper layer, the top layer, and the lower layer may include 3D spatial information.
In this case, when an input multichannel audio signal is the 22.2 channel audio signal, the 22.2 channel audio signal may need to be down-mixed to the 10.1 channel or 8.1 channel audio signal including the 3D spatial information in order to maintain a sound field corresponding to a 3D effect of the 22.2 channel audio signal.
Referring to
The plurality of channel/prerendered objects, the plurality of objects, and the MA signals may be input through a dynamic range control (DRC 1) and may be input to a format conversion unit, an object renderer, and a HOA renderer, respectively.
Outputs results of the format conversion unit, the object renderer, the HOA render, and a SAOC 3D decoder may be input to a mixer. An audio signal corresponding to a plurality of channels may be output from the mixer.
The audio signal corresponding to the plurality of channels, output from the mixer, may pass through a DRC 2 and then may be input to a DRC 3 or frequency domain (FD)-bin based on a playback terminal. Here, FD-Bin indicates a binaural renderer of a frequency domain.
Most renderers described in
The format conversion unit of
Here, when the format conversion unit performs a binaural rendering function, the format conversion unit may down-mix an audio signal corresponding to a plurality of channels and then perform binaural rendering on the down-mixed result, thereby decreasing the complexity of binaural rendering. That is, the format conversion unit may sub-sample the number of channels of a multichannel audio signal in a virtual layout, instead of using the entire set of a binaural room impulse response (BRIR) such as a given 22.2 channel, thereby decreasing the complexity of binaural rendering.
According to embodiments of the present invention, it is possible to decrease an amount of calculation required for binaural rendering by initially down-mixing an M-channel audio signal corresponding to a multichannel audio signal to an N-channel audio signal having the number of channels less than the M-channel audio signal, and by performing binaural rendering of the N-channel audio signal. In addition, it is possible to effectively perform binaural rendering of the multichannel audio signal having a relatively large number of channels.
The above-described embodiments of the present invention may be recorded in non-transitory computer-readable media including program instructions to implement various operations embodied by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. Examples of non-transitory computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules in order to perform the operations of the above-described embodiments of the present invention, or vice versa.
Although a few embodiments of the present invention have been shown and described, the present invention is not limited to the described embodiments. Instead, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2013-0043383 | Apr 2013 | KR | national |
10-2014-0046741 | Apr 2014 | KR | national |
Number | Name | Date | Kind |
---|---|---|---|
5371799 | Lowe et al. | Dec 1994 | A |
5436975 | Lowe et al. | Jul 1995 | A |
5596644 | Abel et al. | Jan 1997 | A |
5742689 | Tucker et al. | Apr 1998 | A |
5987142 | Courneau et al. | Nov 1999 | A |
6180866 | Kitamura | Jan 2001 | B1 |
6188769 | Jot et al. | Feb 2001 | B1 |
6639989 | Zacharov et al. | Oct 2003 | B1 |
6970569 | Yamada | Nov 2005 | B1 |
7099482 | Jot et al. | Aug 2006 | B1 |
7215782 | Chen | May 2007 | B2 |
7903824 | Faller et al. | Mar 2011 | B2 |
7936887 | Smyth | May 2011 | B2 |
8081762 | Ojala et al. | Dec 2011 | B2 |
8270616 | Slamka et al. | Sep 2012 | B2 |
9215544 | Faure et al. | Dec 2015 | B2 |
9319819 | Lee et al. | Apr 2016 | B2 |
9842597 | Lee et al. | Dec 2017 | B2 |
9986365 | Lee et al. | May 2018 | B2 |
20020122559 | Fay et al. | Sep 2002 | A1 |
20030236814 | Miyasaka et al. | Dec 2003 | A1 |
20050053249 | Wu et al. | Mar 2005 | A1 |
20050063551 | Cheng et al. | Mar 2005 | A1 |
20050276430 | He et al. | Dec 2005 | A1 |
20060086237 | Burwen | Apr 2006 | A1 |
20070133831 | Kim et al. | Jun 2007 | A1 |
20070140498 | Moon et al. | Jun 2007 | A1 |
20070160219 | Jakka et al. | Jul 2007 | A1 |
20070172086 | Dickins et al. | Jul 2007 | A1 |
20070244706 | Tsushima | Oct 2007 | A1 |
20070297616 | Plogsties et al. | Dec 2007 | A1 |
20080008327 | Ojala et al. | Jan 2008 | A1 |
20080008342 | Sauk | Jan 2008 | A1 |
20080031462 | Walsh et al. | Feb 2008 | A1 |
20080049943 | Faller et al. | Feb 2008 | A1 |
20080175396 | Ko et al. | Jul 2008 | A1 |
20080192941 | Oh et al. | Aug 2008 | A1 |
20080205658 | Breebaart | Aug 2008 | A1 |
20080273708 | Sandgren et al. | Nov 2008 | A1 |
20080306720 | Nicol et al. | Dec 2008 | A1 |
20090012796 | Jung et al. | Jan 2009 | A1 |
20090043591 | Breebaart et al. | Feb 2009 | A1 |
20090103738 | Faure et al. | Apr 2009 | A1 |
20090129601 | Ojala et al. | May 2009 | A1 |
20090144063 | Beack et al. | Jun 2009 | A1 |
20090281804 | Watanabe et al. | Nov 2009 | A1 |
20100017002 | Oh et al. | Jan 2010 | A1 |
20100094631 | Engdegard et al. | Apr 2010 | A1 |
20100119075 | Xiang et al. | May 2010 | A1 |
20100223061 | Ojanpera | Sep 2010 | A1 |
20100246832 | Villemoes et al. | Sep 2010 | A1 |
20110135098 | Kuhr et al. | Jun 2011 | A1 |
20110158416 | Yuzuriha | Jun 2011 | A1 |
20110170721 | Dickins et al. | Jul 2011 | A1 |
20110211702 | Mundt et al. | Sep 2011 | A1 |
20110261966 | Engdegard | Oct 2011 | A1 |
20110264456 | Koppens et al. | Oct 2011 | A1 |
20110317522 | Florencio et al. | Dec 2011 | A1 |
20120082319 | Jot et al. | Apr 2012 | A1 |
20120093323 | Lee et al. | Apr 2012 | A1 |
20120140938 | Yoo | Jun 2012 | A1 |
20120213375 | Mahabub et al. | Aug 2012 | A1 |
20120243713 | Hess | Sep 2012 | A1 |
20120263311 | Neugebauer et al. | Oct 2012 | A1 |
20120328107 | Nyström et al. | Dec 2012 | A1 |
20130202125 | Sena et al. | Aug 2013 | A1 |
20130216059 | Yoo | Aug 2013 | A1 |
20130268280 | Galdo et al. | Oct 2013 | A1 |
20130272527 | Oomen et al. | Oct 2013 | A1 |
20140037094 | Ma et al. | Feb 2014 | A1 |
20140072126 | Uhle et al. | Mar 2014 | A1 |
20140153727 | Walsh et al. | Jun 2014 | A1 |
20140169568 | Li et al. | Jun 2014 | A1 |
20140270216 | Tsilfidis et al. | Sep 2014 | A1 |
20140348354 | Christoph et al. | Nov 2014 | A1 |
20140350944 | Jot et al. | Nov 2014 | A1 |
20140355794 | Morrell et al. | Dec 2014 | A1 |
20140355796 | Xiang et al. | Dec 2014 | A1 |
20150030160 | Lee et al. | Jan 2015 | A1 |
20150125010 | Yang et al. | May 2015 | A1 |
20150199973 | Borsum et al. | Jul 2015 | A1 |
20150213807 | Breebaart et al. | Jul 2015 | A1 |
20150256956 | Jensen et al. | Sep 2015 | A1 |
20150350801 | Koppens et al. | Dec 2015 | A1 |
20150358754 | Koppens et al. | Dec 2015 | A1 |
20160088407 | Elmedyb et al. | Mar 2016 | A1 |
20160142854 | Fueg et al. | May 2016 | A1 |
20160232902 | Lee et al. | Aug 2016 | A1 |
20160275956 | Lee et al. | Sep 2016 | A1 |
20180091927 | Lee et al. | Mar 2018 | A1 |
20180102131 | Lee et al. | Apr 2018 | A1 |
Number | Date | Country |
---|---|---|
1630434 | Jun 2005 | CN |
101366081 | Feb 2009 | CN |
101366321 | Feb 2009 | CN |
101809654 | Aug 2010 | CN |
2012227647 | Nov 2012 | JP |
100754220 | Sep 2007 | KR |
1020080078907 | Aug 2008 | KR |
1020100063113 | Jun 2010 | KR |
1020100106193 | Oct 2010 | KR |
1020110039545 | Apr 2011 | KR |
1020120038891 | Apr 2012 | KR |
101175592 | Aug 2012 | KR |
1020130004373 | Jan 2013 | KR |
9914983 | Mar 1999 | WO |
9949574 | Sep 1999 | WO |
Number | Date | Country | |
---|---|---|---|
20190007778 A1 | Jan 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14767538 | US | |
Child | 16126466 | US |