This application claims priority under 35 USC §119 to Finnish Patent Application No. 20031069 filed on Jul. 14, 2003.
The invention concerns generally the technology of digital encoding and decoding of sound. Especially the invention concerns the problem of enabling natural reconstruction of sounds after transmission through a channel in which band split coding methods are utilised for encoding the sound for transmission in digital form.
Linear Predictive Coding (LPC) is a digital sound encoding principle according to which the encoder repeatedly constructs, for each short sequence of input samples, a linear all-pole filter that with a certain excitation signal enables producing a replica of the corresponding input sample sequence. The encoder transmits information representing the filter parameters and the exitation signal to the decoder. Known variations of LPC include but are not limited to transformation coding or code excitation according to what is the selected approach to generating the excitation signal, as well as various selections with respect to whether filter parameters are transmitted directly or in some transformed form. Such variations have no effect to the applicability of the general principle of the present invention.
The selection of input signal bandwidth has great influence to the naturalness of the eventually reproduced sound. A narrow bandwidth of the input signal is advantageous in terms of saving required transmission capacity. Accepting a wider band of input frequencies to encoding would enable reproducing the sound in a more natural way at the receiving end, but simultaneously increases the demand for transmission bandwidth.
In a very basic arrangement the low and high band encoders 103 and 104 operate independently, and selection is applied according to whether the outputs of both of them or only the low band encoder 103 are transmitted. More advanced arrangements utilise some information from the low band encoding and decoding in performing the high band encoding and decoding respectively, which is illustrated as vertical arrows between the appropriate functional blocks in
The drawback of the arrangement of
An objective of the present invention is to present a method and an apparatus for digitally encoding and decoding sound in a band split arrangement, so that the synthesized sound after decoding would be as natural as possible regardless of the type of the input signal. A further objective of the invention is to implement a principle of said kind without causing extensive need for additional transmission resources. A yet further objective of the invention is to enable implementation of the above-explained principles with reasonable requirements to system complexity.
The objectives of the invention are achieved by having at least one alternative source for the high band excitation signal, and by selecting the appropriate excitation signal source for the high band on the basis of analysed characteristics of the audio signal to be encoded.
The features of encoding and decoding methods according to the invention are characterised by the features recited in the characterising parts of the independent patent claims directed to encoding and decoding methods respectively.
The invention also applies to transmitting and receiving devices. The characterised features of the transmitting and receiving devices are recited in the characterising parts of the independent patent claims directed to transmitting and receiving devices respectively.
The suboptimal performance of the known prior art band split encoding and decoding arrangement stems from the fact that using an excitation signal associated with a strongly voiced first band input signal tries to introduce periodicity onto the second band even when none should be present. According to the invention it is possible to avoid such unintentional distortion of the second band frequency spectrum by using an alternative excitation signal for the upper band, when a comparison of the degree of voicedness shows a mismatch between the bands.
There are a number of ways for examining, whether an input signal on a certain frequency band has voiced or unvoiced characteristics. For example the long-term correlation gain calculated for long-term prediction is a good indicator of periodicity and thus voicedness of an input signal. Other possible indicators include but are not limited to various statistical values derived from the Fourier transform of a signal sequence. An encoder according to the invention analyses separately the first (lower) band input signal and the second (higher) band input signal. It produces values indicative of the voiced/unvoiced character of the signals on the different bands. If these values show that the first (lower) band signal is voiced but the second (higher) band signal is not, excitation taken from the first band is not copied into the encoding of the second band, but an alternative (preferably random) excitation is used instead.
Using an alternative (typically random) excitation signal for the second band introduces potentially a problem of excitation gain mismatch. In prior art solutions the excitation gain is determined to set the copied first band excitation energy to the same level with the second band LPC residual. It is natural that there is some dependence between the second band LPC residual and the first band excitatsion that basically represents the low band LPC residual. If the excitation for the second band is independent from the first band, any such dependence in excitation energy is lost. Therefore the difference in energy between the independent second band excitation signal and the second band LPC residual may become extremely large compared to that between an excitation signal derived from the first band and the LPC residual of the second band. The quantisation of the excitation gain becomes more difficult when the dynamics thereof is increased.
A solution to the excitation gain mismatch problem is to normalise the second (independent) excitation signal energy to that of the first band excitation signal, even if the former and not the latter is used as the actual second band excitation signal due to detected difference in voiced/unvoiced characteristics of the bands. Two advantages are gained therethrough. Firstly, the dynamics of the excitation signal gain on the second band are the same and the above-explained extremely large differences are avoided. Secondly the arrangement enhances robustness against errors in the transmission channel. The selection of the second band excitation signal must be transmitted to the receiver, which involves a risk of a transmission error that causes the receiver to misinterpret the transmitted selection signal. Due to the excitation signal energy normalisation, such an error will not cause severe distortion in the second band, because the energy level of the wrongly selected excitation signal is the same as that of the correct one.
The novel features which are considered as characteristic of the invention are set forth in particular in the appended claims. The invention itself, however, both as to its construction and its method of operation, together with additional objects and advantages thereof, will be best understood from the following description of specific embodiments when read in connection with the accompanying drawings.
The exemplary embodiments of the invention presented in this patent application are not to be interpreted to pose limitations to the applicability of the appended claims. The verb “to comprise” is used in this patent application as an open limitation that does not exclude the existence of also unrecited features. The features recited in depending claims are mutually freely combinable unless otherwise explicitly stated.
The deductions from the signal analysis functionalities 302 and 303 are taken to an excitation selection switch 304. It is arranged to select one of a resampled low band excitation coming from a resampling block 305 or a random excitation, such as white noise excitation, coming from a random excitation source 306. The excitation selection switch 304 delivers the selected excitation to an LPC synthesis functionality 307, which also receives the LPC parameters from the LPC analysis block 301. A synthesized high band audio signal goes from the LPC synthesis functionality 307 to a gain control block 308, which also receives the original high band audio signal. The gain control block 308 is arranged to determine a gain control signal that is needed to align the synthesized signal energy with that of the original high band audio signal.
Information that will be sent to a receiving device comprises (inverse) LPC parameters from the LPC synthesis functionality 307, a high band synthesis gain control signal from the gain control block 308 as well as an excitation selection signal from the excitation selection switch 304. The last-mentioned signal indicates, which of the available excitation sources was used.
The deductions produced in the signal analysis functionalities 302 and 303 should enable the excitation selection switch 304 to select the resampled low band excitation signal whenever there is enough correlation between the low band and the high band to justify such selection. On the other hand the excitation selection switch 304 should select the random excitation signal in all cases where such correlation does not exist. A general rule for making the deductions and the selection based thereupon is the following: “If the low band signal is voiced and the high band signal is unvoiced, select the random excitation signal. In all other cases select the resampled low band excitation signal.”
In the functional block diagram of
The basic arrangement described above with reference to
We may consider a situation in which the high band is voiced but the low band is not. Such a situation is exceptional and will be rarely encountered in practice. However, it must be noted that in such cases the arrangement described above with reference to
When we compare the use of the resampled low band excitation signal to the use of some other excitation signal generated “locally” for the needs of the high band encoder, we note that the former comes with a variable signal power that basically represents the low band LPC residual. Locally generated excitation signals have no similar correlation with any part of the original audio signal, but come at more or less constant signal power level. This creates a problem, because a momentary difference in energy between a locally generated excitation signal and the high band LPC residual may become extremely large. When the required dynamic range of gain control increases, the quantization of the excitation gain becomes more difficult.
The LPC encoding process handles the input signal in discrete, consecutive sample trains. Similarly the excitation signals come in short pieces so that the finite number of samples that constitute one piece of an excitation signal may be expressed as a vector. We may denote a low band excitation vector as lb_exc and a corresponding random excitation vector as rand_exc. If we further assume the existence of scalar real variables exc_energy, rand_energy and scale_factor that describe the squared energy of the low band excitation signal, the squared energy of the random excitation signal and the scaling factor respectively, we may give the following pseudocode representation of the excitation gain scaling process:
Here xTx means an inner product (dot product) of vector x, and SQRT(x) means the square root of x. The operator * on the last line of the pseudocode listing is a plain multiplication operator that is used e.g. in a product of a scalar and a vector. Comments not affecting the flow of execution are displayed between /*- and */-signs.
The arrangement of
It should be noted that it is not absolutely necessary to perform excitation gain scaling, if the large variations in energy differences described above can be accepted or compensated for otherwise. However, the principle shown in
The use of excitation gain scaling also enhances robustness against errors, or at least helps to minimise the effects of errors. As was explained previously in the description of blocks 304 and 502, the transmitter needs to signal to the receiver at least the information about whether the resampled low band excitation signal or the locally generated random excitation signal was used in the high band encoder. Signalling is typically accomplished by inserting a certain bit value into a signalling field. A transmission error may cause the receiver to interprete the transmitted signal value incorrectly, so that the receiver selects the wrong excitation signal for high band decoding. If, however, the transmitter applied excitation gain scaling to ensure that the energy of the excitation signal was the same in any case, inadvertently selecting an incorrect excitation signal at the receiver does not cause as bad an annoying audible effect as would be possible without excitation gain scaling at the transmitting end.
In accordance with the presented embodiment of the invention the source encoding means 802 comprise band splitting means 811, low band encoding means 812, low band excitation extracting means 813, voicedness analysing means 814, additional excitation generating means 815, excitation gain scaling means 816, excitation selecting means 817, high band encoding means 818 and bit stream multiplexing means 819. Of these the band splitting means 811 are arranged at least to separate the audio signal of one (low) band from the audio signal of another (high) band and to deliver the separated signals to the appropriate band-specific encoders. Some route must also exist from the band splitting means 811 to voicedness analysing means 814, so that the last-mentioned may examine, whether the separated bands comprise signals of voiced character. This route has been drawn as a direct connection in
The low band encoding means 812, sometimes also referred to as the core encoder means, are arranged to receive the separated low band audio signal, to encode it using LPC encoding and to deliver the low band excitation signal (through certain conceptually defined low band excitation extracting means 813, which also include resampling if any is required) to the excitation selecting means 817. If excitation gain scaling is applied, the low band excitation signal is also arranged to be conveyed to the excitation gain scaling means 816, which are arranged to receive a locally generated excitation signal from the additional excitation generating means 815 and to scale its signal energy appropriately. In embodiments of the invention where information about the potential voicedness of the high band signal is used to introduce periodicity into the locally generated excitation signal, there must be a connection from the voicedness analysing means 814 to the additional excitation generating means 815 for conveying the required information.
The excitation selecting means 817 are arranged to receive the low band excitation signal, the voicedness information and the locally generated excitation signal from blocks 813, 814 and 816 (or 815) respectively, to select the excitation according to the received voicedness information and preprogrammed selection rules, and to deliver the selected excitation signal to the high band encoding means 818 as well as the appropriate excitation signal selection information to the bit stream multiplexing means 819. The high band encoding means 818 are arranged to perform high band LPC encoding with the help of the excitation signal received from the excitation selecting means 817. The bit stream multiplexing means 819 are arranged to receive the encoding results of the low band encoding means 812 and the high band encoding means 818 and the excitation signal selection information from the excitation selecting means 817. The bit stream multiplexing means 819 are additionally arranged to multiplex said information into an appropriate bit stream that represents complete source encoded information, which bit stream can be delivered to the channel encoding means 803.
In accordance with the presented embodiment of the invention the source decoding means 903 comprise bit stream demultiplexing means 911, low band decoding means 912, low band excitation signal extracting means 913, excitation selection checking means 914, additional excitation signal generating means 915, excitation selecting means 916, high band decoding means 917 and band reconstructing means 918. Of these the bit stream demultiplexing means 911 are arranged to demultiplex the received bit stream and to direct the appropriate portions thereof to the low band decoding means 912, the excitation selection checking means 914 and the high band decoding means 917. The low band decoding means 912 are arranged to perform standard LPC decoding for the low band audio signal and to deliver decoding results to the band reconstructing means 918. The low band decoding means 912 also deliver the low band excitation signal (through certain conceptually defined low band excitation extracting means 913, which also include resampling if any is required) to the excitation selecting means 916.
The excitation selection checking means 914 are arranged to examine an appropriate part of the received bit stream to find an indication about whether the high band encoder in the transmitting device used the low band excitation signal or a locally generated excitation signal in encoding the high band. The excitation selection checking means 914 are arranged to deliver this indication as an instruction to the excitation selecting means 916. In embodiments of the invention where the locally generated excitation signal may comprise periodicity, the excitation selection checking means 914 also recover the appropriate periodicity information from the received bit stream and deliver it to the additional excitation signal generating means 915. The excitation selecting means 916 are arranged to receive the low band excitation signal, the locally generated excitation signal and the excitation selection information from blocks 913, 915 and 914 respectively, to select the appropriate excitation according to the received selection information, and to deliver the selected excitation signal to the high band decoding means 917.
It should be noted that the receiver need not be affected at all by the detail, whether excitation gain scaling is applied in the transmitter or not. The receiver just accepts the excitation selection information and the high band gain information from the transmitter, regardless of the way in which they were produced. Naturally the application of excitation gain scaling in the transmitter and the resulting enhanced accuracy in quantization of the excitation gain enables the receiver to reproduce the high band audio signal more accurately, but the receiver does not need to know, whether the advantageous circumstances were due to deliberately taken action in the transmitter or just good luck.
The high band decoding means 917 are arranged to perform LPC decoding within the high band by starting from the encoded high band information received from the bit stream demultiplexing means 911 and with the help of the excitation signal received from the excitation selecting means 916. The band reconstructing means 918 are arranged to collect the decoded audio information from the low band decoding means 912 and the high band decoding means 917 and to combine them into a single wideband audio signal that can be delivered to the sound reproducing means 904.
The invention has been presented above in the exclusive context of LPC. However, it is possible to generalise the same principle so that we just assume the following:
Number | Date | Country | Kind |
---|---|---|---|
20031069 | Jul 2003 | FI | national |
Number | Name | Date | Kind |
---|---|---|---|
6182031 | Kidder et al. | Jan 2001 | B1 |
6680972 | Liljeryd et al. | Jan 2004 | B1 |
20020007280 | McCree | Jan 2002 | A1 |
20030093264 | Miyasaka et al. | May 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20050065783 A1 | Mar 2005 | US |