The present invention relates to coding a main and a side signal being the result of the first step of performing parametric coding of multichannel signals.
Stereophonic audio signals comprise a left (L) and a right (R) signal component which may originate from a stereo signal source, for example from separated microphones. The coding of audio signals aims at reducing the bit rate of a stereophonic signal, e.g. in order to allow an efficient transmission of sound signals via a communications network, such as the Internet, via a modem and via analogue telephone lines, mobile communication channels or via other wireless networks, etc., and in order to store a stereophonic sound signal on a chip card or another storage medium with limited storage capacity.
EP 1,107,232 discloses a method of performing parametric coding to generate a representation of a stereo audio signal, which is composed of a left channel signal and a right channel signal. To utilize transmission bandwidth efficiently, such a representation contains information concerning only one of the L and R signals, and parametric information based on which the other signal can be recovered. Because of the design of the parametric coding, the representation advantageously captures localization cues of the stereo audio signal, including intensity and phase characteristics of L and R. As a result, the stereo audio signal recovered from the transmitted representation affords a high stereo quality.
Even though parametric stereo encoding does improve the bit-rate utilisation, it is of interest to improve this utilisation by further reducing a required bit-rate for a given sound quality.
It is an object of the present invention to provide a solution to the above-mentioned problem.
The object of the present invention is solved by a method of encoding a main and a side signal, where at least said main and side signal represent a multichannel audio signal, where the main and the side signal have the properties that the relation between the power spectral energies of said main and side signal is intact per psycho-acoustical band and where said side signal is psycho acoustically uncorrelated with the main signal. The method of encoding the main and the side signal comprises the steps of:
Thereby the bit rate can be decreased when transmitting data and further, less storage space is needed when storing encoded data.
In an embodiment the predetermined transformation comprises the step of:
This is an efficient way of representing the essential information from the side signal.
In a specific embodiment the step of generating the transformation parameters comprises the steps of:
Based on these transformation parameters the side signal can be reproduced very accurately.
In another embodiment the step of generating the transformation parameters comprises the steps of:
Then only one set of prediction coefficients is necessary which further decreases the necessary bit rate when transmitting the encoded signal.
In an embodiment the step of generating the transformation parameters comprises the steps of:
This is a very simple and thereby resource efficient method of generating transformation parameters.
In a specific embodiment transforming the side signal into a set of transformation parameters is performed on overlapping segments of at least the side signal and by determining transformation parameters corresponding to each segment. By segmenting before encoding the parameters only have to describe a few data, and based on the few parameters a more precise regeneration of the segment can be performed. Further, signal variations can easier be followed, just as encoding can be performed on segments of streaming data.
The invention further relates to a method for decoding which corresponds to the methods of encoding as described above. Accordingly, the same advantages apply.
The invention relates to a method of decoding main and side signal information, where at least said main and side signal represent a multichannel audio signal. The main and the side signal have the properties that the relation between the power spectral energies of said main and side signal is intact per psycho-acoustical band and where said side signal is psycho acoustically uncorrelated with the main signal, the method comprises the steps of:
In an embodiment the step of generating the third signal comprises the steps of:
In a specific embodiment the step of generating the third signal comprises the steps of:
In a specific embodiment the step of generating the temporal signal comprises the steps of:
In another embodiment the step of generating the temporal signal comprises the steps of:
In another embodiment, when the transformation parameters have been generated corresponding to specific segments, the step of generating the third signal, having the same properties as the side signal, is performed by initially interpolating transformation parameters between the specific segments.
The present invention can be implemented in different ways e.g. through the methods described above. The following will describe arrangements for encoding and decoding multichannel signals, respectively a data signal and further product means, each yielding one or more of the benefits and advantages described in connection with the first-mentioned method, and each having one or more preferred embodiments corresponding to the preferred embodiments described in connection with the first-mentioned method and disclosed in the dependant claims.
It is noted that the features of the methods described above and in the following may be implemented in software and carried out in a data processing system or through other processing means caused by the execution of computer-executable instructions. The instructions may be program code means loaded in a memory, such as a RAM, from a storage medium or from another computer via a computer network. Alternatively, the described features may be implemented by hardwired circuitry instead of software or in combination with software.
The invention further relates to an arrangement for encoding a main and a side signal, where at least said main and side signal represent a multichannel audio signal, where the main and side signal have the properties that the relation between the power spectral energies of said main and side signal is intact per psycho-acoustical band and where said side signal is psycho acoustically uncorrelated with the main signal, the arrangement comprising:
The invention further relates to an arrangement for decoding main and side signal information, where at least said main and side signal represents a multichannel audio signal, the main and side signal have the properties that the relation between the power spectral energies of said main and side signal is intact per psycho-acoustical band and where said side signal is psycho acoustically uncorrelated with the main signal, the method comprises the steps of:
The above arrangements may be part of any electronic equipment including computers, such as stationary and portable PCs, stationary and portable radio communications equipment and other handheld or portable devices, such as mobile telephones, pagers, audio players, multimedia players, communicators, i.e. electronic organisers, smart phones, personal digital assistants (PDAs), handheld computers or the like.
The term processing means comprises general- or special-purpose programmable microprocessors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Programmable Logic Arrays (PLA), Field Programmable Gate Arrays (FPGA), special purpose electronic circuits, etc., or a combination thereof. The above first and second processing means may be separate processing means or they may be comprised in one processing means.
The term receiving means includes circuitry and/or devices suitable for enabling the communication of data, e.g. via a wired or a wireless data link. Examples of such receiving means include a network interface, a network card, a radio receiver, a receiver for other suitable electromagnetic signals, such as infrared light, e.g. via an IrDa port, radio-based communications, e.g. via Bluetooth transceivers or the like. Further examples of such receiving means include a cable modem, a telephone modem, an Integrated Services Digital Network (ISDN) adapter, a Digital Subscriber Line (DSL) adapter, a satellite transceiver, an Ethernet adapter or the like.
The term receiving means further comprises other input circuits/devices for receiving data signals, e.g. data signals stored on a computer-readable medium. Examples of such receiving means include a floppy-disk drive, a CD-Rom drive, a DVD drive, or any other suitable disc drive, a memory card adapter, a smart card adapter, etc.
In the following, preferred embodiments of the invention will be described referring to the figures, where
The coding device 101 comprises an encoder 102 for encoding a stereophonic signal according to the invention, where the stereophonic signal includes an L signal component and an R signal component. The encoder receives the L and R signal components and generates a coded signal T. The stereophonic signal L and R may originate from a set of microphones, e.g. via further electronic equipment such as a mixing equipment, etc. The signals may further be received as an output from another stereo player, over-the-air as a radio signal, or by any other suitable means. Preferred embodiments of such an encoder, according to the invention, will be described below. According to one embodiment, the encoder 102 is connected to a transmitter 103 for transmitting the coded signal T via a communications channel 109 to the decoding device 105. The transmitter 103 may comprise circuitry suitable for enabling the communication of data, e.g. via a wired or a wireless data link 109. Examples of such a transmitter include a network interface, a network card, a radio transmitter, a transmitter for other suitable electromagnetic signals, such as an LED for transmitting infrared light, e.g. via an IrDa port, radio-based communications, e.g. via a Bluetooth transceiver or the like. Further examples of suitable transmitters include a cable modem, a telephone modem, an Integrated Services Digital Network (ISDN) adapter, a Digital Subscriber Line (DSL) adapter, a satellite transceiver, an Ethernet adapter or the like. Correspondingly, the communications channel 109 may be any suitable wired or wireless data link, for example of a packet-based communications network, such as the Internet or another TCP/IP network, a short-range communications link, such as an infrared link, a Bluetooth connection or another radio-based link. Further examples of the communications channel include computer networks and wireless telecommunications networks, such as a Cellular Digital Packet Data (CDPD) network, a Global System for Mobile (GSM) network, a Code Division Multiple Access (CDMA) network, a Time Division Multiple Access Network (TDMA), a General Packet Radio service (GPRS) network, a Third Generation network, such as a UMTS network, or the like. Alternatively, or additionally, the coding device may comprise one or more other interfaces 104 for communicating the coded stereo signal T to the decoding device 105.
Examples of such interfaces include a disc drive for storing data on a computer-readable medium 110, e.g. a floppy-disk drive, a read/write CD-ROM drive, a DVD-drive, etc. Other examples include a memory card slot, a magnetic card reader/writer, an interface for accessing a smart card, etc. Correspondingly, the decoding device 105 comprises a corresponding receiver 108 for receiving the signal transmitted by the transmitter and/or another interface 106 for receiving the coded stereo signal communicated via the interface 104 and the computer-readable medium 110. The decoding device further comprises a decoder 107 which receives the received signal T and decodes it into corresponding stereo components L′ and R′. Preferred embodiments of such a decoder, according to the invention, will be described below. The decoded signals L′ and R′ may subsequently be fed into a stereo player for reproduction via a set of speakers, head-phones or the like.
The main signal used in the decoder could either be the original m signal or a main signal which has been encoded/decoded by e.g. quantisation.
The main and the side signal that are generated by the first step of parametric stereo encoding, as described above, are characterised by the fact that the waveform of the main signal has to be kept intact, but the side signal is rather arbitrary in waveform and adheres to two conditions only. Firstly, the relation between the power spectral energies of the main and the side signal has to be kept intact per psycho acoustical band. Secondly, the side signal has to be uncorrelated with the main signal in psycho acoustical sense. The method of encoding the main and the side signal, according to the present invention, is twofold. Firstly, a filter is estimated which is able to re-instate the desired spectral amplitude relation and a temporal profile. Secondly, in specific embodiments, as described below, a filter is derived which guarantees the desired uncorrelatedness.
In
In the following, specific embodiments of the above described encoding of the m and the s signal and decoding to obtain m and the s′ are presented.
The segmentation of both the m and the s signal is performed in the segmentation unit 601. Then in 603 linear prediction is performed on each segment of the m signal resulting in a set of prediction coefficients a. In 605 linear prediction is performed on each segment of the s signal resulting in a set of prediction coefficients as. Further, in 607, the energy e of each segment of the signal s is estimated. The prediction coefficients a, as and the estimated energy e is multiplexed in 609 to the set of transformation parameters pF. The m signal and the set of transformation parameters pF now represent the m and the s signal and can be used for regenerating a signal corresponding to the s signal in a decoder.
To make the synthesis filters simpler (i.e. of lower order) it may be convenient to encapsulate the decorrelation filter in the prediction coefficients. The filter described by the prediction coefficients performs a form of psycho-acoustic decorrelation which, consequently, does not need to be done by the decorrelation filter anymore. However, this encapsulation has to be done in the encoder and the total filter (spectral shaping and decorrelation) has to be transmitted. This will typically lead to an increased bit rate.
For audio and speech coding purposes, it is advantageous to use linear prediction filters with a behaviour that is in some way reminiscent of auditory filters. Examples of such filters are Kautz filters, Laguerre filters and Gamma-tone filters and are e.g. described in WO2002089116.
It is understood that a skilled person may adapt the above embodiments, e.g. by adding or removing features or by combining features of the above embodiments. It is further noted that the invention is not limited to stereophonic signals, but may also be applied to other multi-channel input signals having two or more input channels. Examples of such multi-channel signals include signals received from a Digital Versatile Disc (DVD) or a Super Audio Compact Disc, etc. In this more general case, a principal component signal y and one or more residual signals r may still be generated according to the invention. The number of residual signals transmitted depends on the number of channels and the desired bit rate, as higher order residuals may be omitted without significantly degrading the signal quality.
In general, it is an advantage of the invention that bit-rate allocation may be adaptively varied, thereby allowing graceful degradation. For example, if the communication channel momentarily only allows a reduced bit rate to be transmitted, e.g. due to increased network traffic, noise, or the like, the bit rate of the transmitted signal may be reduced without significantly degrading the perceptible quality of the signal. For example, in the case of a stationary sound source as discussed above, the bit rate may be reduced by a factor of approximately two without significantly degrading the signal quality which corresponds to transmitting a single channel instead of two.
It is noted that the above arrangements may be implemented as general- or special-purpose programmable microprocessors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Programmable Logic Arrays (PLA), Field Programmable Gate Arrays (FPGA), special purpose electronic circuits, etc., or a combination thereof.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps than those listed in a claim. The invention can be implemented by means of hardware comprising several distinct elements and by means of a suitably programmed computer. In a device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Number | Date | Country | Kind |
---|---|---|---|
03100752.9 | Mar 2003 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB04/50288 | 3/18/2004 | WO | 9/20/2005 |