Post filter for audio signals

Description

TECHNICAL FIELD

The present invention generally relates to digital audio coding and more precisely to coding techniques for audio signals containing components of different characters.

BACKGROUND

A widespread class of coding method for audio signals containing speech or singing includes code excited linear prediction (CELP) applied in time alternation with different coding methods, including frequency-domain coding methods especially adapted for music or methods of a general nature, to account for variations in character between successive time periods of the audio signal. For example, a simplified Moving Pictures Experts Group (MPEG) Unified Speech and Audio Coding (USAC; see standard ISO/IEC 23003-3) decoder is operable in at least three decoding modes, Advanced Audio Coding (AAC; see standard ISO/1EC 13818-7), algebraic CELP (ACELP) and transform-coded excitation (TCX), as shown in the upper portion of accompanying FIG. 2.

The various embodiments of CELP are adapted to the properties of the human organs of speech and, possibly, to the human auditory sense. As used in this application, CELP will refer to all possible embodiments and variants, including but not limited to ACELP, wide- and narrow-band CELP, SB-CELP (sub-band CELP), low- and high-rate CELP, RCELP (relaxed CELP), LD-CELP (low-delay CELP), CS-CELP (conjugate-structure CELP), CS-ACELP (conjugate-structure ACELP), PSI-CELP (pitch-synchronous innovation CELP) and VSELP (vector sum excited linear prediction). The principles of CELP are discussed by R. Schroeder and S. Atal in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937-940, 1985, and some of its applications are described in references 25-29 cited in Chen and Gersho, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, 1995. As further detailed in the former paper, a CELP decoder (or, analogously, a CELP speech synthesizer) may include a pitch predictor, which restores the periodic component of an encoded speech signal, and a pulse codebook, from which an innovation sequence is added. The pitch predictor may in turn include a long-delay predictor for restoring the pitch and a short-delay predictor for restoring formants by spectral envelope shaping. In this context, the pitch is generally understood as the fundamental frequency of the tonal sound component produced by the vocal chords and further coloured by resonating portions of the vocal tract. This frequency together with its harmonics will dominate speech or singing. Generally speaking, CELP methods are best suited for processing solo or one-part singing, for which the pitch frequency is well-defined and relatively easy to determine.

To improve the perceived quality of CELP-coded speech, it is common practice to combine it with post filtering (or pitch enhancement by another term). U.S. Pat. No. 4,969,192 and section II of the paper by Chen and Gersho disclose desirable properties of such post filters, namely their ability to suppress noise components located between the harmonics of the detected voice pitch (long-term portion; see section IV). It is believed that an important portion of this noise stems from the spectral envelope shaping. The long-term portion of a simple post filter may be designed to have the following transfer function:

$H_{E} (z) = 1 + α (\frac{z^{T} + z^{- T}}{2} - 1),$

where T is an estimated pitch period in terms of number of samples and a is a gain of the post filter, as shown in FIGS. 1 and 2. In a manner similar to a comb filter, such a filter attenuates frequencies 1/(2T), 3/(2T), 5/(2T), . . . , which are located midway between harmonics of the pitch frequency, and adjacent frequencies. The attenuation depends on the value of the gain α. Slightly more sophisticated post filters apply this attenuation only to low frequencies—hence the commonly used term bass post filter—where the noise is most perceptible. This can be expressed by cascading the transfer function H_Edescribed above and a low-pass filter H_LP. Thus, the post-processed decoded S_Eprovided by the post filter will be given, in the transform domain, by

$S_{E} (z) = S (z) - α S (z) P_{LT} (z) H_{LP} (z), where$

$P_{LT} (z) = 1 - \frac{z^{T} + z^{- T}}{2}$

and S is the decoded signal which is supplied as input to the post filter. FIG. 3 shows an embodiment of a post filter with these characteristics, which is further discussed in section 6.1.3 of the Technical Specification ETSI TS 126 290, version 6.3.0, release 6. As this figure suggests, the pitch information is encoded as a parameter in the bit stream signal and is retrieved by a pitch tracking module communicatively connected to the long-term prediction filter carrying out the operations expressed by P_LT.

The long-term portion described in the previous paragraph may be used alone. Alternatively, it is arranged in series with a noise-shaping filter that preserves components in frequency intervals corresponding to the formants and attenuates noise in other spectral regions (short-term portion; see section III), that is, in the ‘spectral valleys’ of the formant envelope. As another possible variation, this filter aggregate is further supplemented by a gradual high-pass-type filter to reduce a perceived deterioration due to spectral tilt of the short-term portion.

Audio signals containing a mixture of components of different origins—e.g., tonal, non-tonal, vocal, instrumental, non-musical—are not always reproduced by available digital coding technologies in a satisfactory manner. It has more precisely been noted that available technologies are deficient in handling such non-homogeneous audio material, generally favouring one of the components to the detriment of the other. In particular, music containing singing accompanied by one or more instruments or choir parts which has been encoded by methods of the nature described above, will often be decoded with perceptible artefacts spoiling part of the listening experience.

SUMMARY OF THE INVENTION

In order to mitigate at least some of the drawbacks outlined in the previous section, it is an object of the present invention to provide methods and devices adapted for audio encoding and decoding of signals containing a mixture of components of different origins. As particular objects, the invention seeks to provide such methods and devices that are suitable from the point of view of coding efficiency or (perceived) reproduction fidelity or both.

The invention achieves at least one of these objects by providing an encoder system, a decoder system, an encoding method, a decoding method and computer program products for carrying out each of the methods, as defined in the independent claims. The dependent claims define embodiments of the invention.

The inventors have realized that some artefacts perceived in decoded audio signals of non-homogeneous origin derive from an inappropriate switching between several coding modes of which at least one includes post filtering at the decoder and at least one does not. More precisely, available post filters remove not only interharmonic noise (and, where applicable, noise in spectral valleys) but also signal components representing instrumental or vocal accompaniment and other material of a ‘desirable’ nature. The fact that the just noticeable difference in spectral valleys may be as large as 10 dB (as noted by Ghitza and Goldstein, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-4, pp. 697-708, 1986) may have been taken as a justification by many designers to filter these frequency bands severely. The quality degradation by the interharmonic (and spectral-valley) attenuation itself may however be less important than that of the switching occasions. When the post filter is switched on, the background of a singing voice sounds suddenly muffled, and when the filter is deactivated, the background instantly becomes more sonorous. If the switching takes place frequently, due to the nature of the audio signal or to the configuration of the coding device, there will be a switching artefact. As one example, a USAC decoder may be operable either in an ACELP mode combined with post filtering or in a TCX mode without post filtering. The ACELP mode is used in episodes where a dominant vocal component is present. Thus, the switching into the ACELP mode may be triggered by the onset of singing, such as at the beginning of a new musical phrase, at the beginning of a new verse, or simply after an episode where the accompaniment is deemed to drown the singing voice in the sense that the vocal component is no longer prominent. Experiments have confirmed that an alternative solution, or rather circumvention of the problem, by which TCX coding is used throughout (and the ACELP mode is disabled) does not remedy the problem, as reverb-like artefacts appear.

Accordingly, in a first and a second aspect, the invention provides an audio encoding method (and an audio encoding system with the corresponding features) characterized by a decision being made as to whether the device which will decode the bit stream, which is output by the encoding method, should apply post filtering including attenuation of interharmonic noise. The outcome of the decision is encoded in the bit stream and is accessible to the decoding device.

By the invention, the decision whether to use the post filter is taken separately from the decision as to the most suitable coding mode. This makes it possible to maintain one post filtering status throughout a period of such length that the switching will not annoy the listener. Thus, the encoding method may prescribe that the post filter will be kept inactive even though it switches into a coding mode where the filter is conventionally active.

It is noted that the decision whether to apply post filtering is normally taken frame-wise. Thus, firstly, post filtering is not applied for less than one frame at a time. Secondly, the decision whether to disable post filtering is only valid for the duration of a current frame and may be either maintained or reassessed for the subsequent frame. In a coding format enabling a main frame format and a reduced format, which is a fraction of the normal format, e.g., ⅛ of its length, it may not be necessary to take post-filtering decisions for individual reduced frames.

Instead, a number of reduced frames summing up to a normal frame may be considered, and the parameters relevant for the filtering decision may be obtained by computing the mean or median of the reduced frames comprised therein.

In a third and a fourth aspect of the invention, there is provided an audio decoding method (and an audio decoding system with corresponding features) with a decoding step followed by a post-filtering step, which includes interharmonic noise attenuation, and being characterized in a step of disabling the post filter in accordance with post filtering information encoded in the bit stream signal.

A decoding method with these characteristics is well suited for coding of mixed-origin audio signals by virtue of its capability to deactivate the post filter in dependence of the post filtering information only, hence independently of factors such as the current coding mode. When applied to coding techniques wherein post filter activity is conventionally associated with particular coding modes, the post-filtering disabling capability enables a new operative mode, namely the unfiltered application of a conventionally filtered decoding mode.

In a further aspect, the invention also provides a computer program product for performing one of the above methods. Further still, the invention provides a post filter for attenuating interharmonic noise which is operable in either an active mode or a pass-through mode, as indicated by a post-filtering signal supplied to the post filter. The post filter may include a decision section for autonomously controlling the post filtering activity.

As the skilled person will appreciate, an encoder adapted to cooperate with a decoder is equipped with functionally equivalent modules, so as to enable faithful reproduction of the encoded signal. Such equivalent modules may be identical or similar modules or modules having identical or similar transfer characteristics. In particular, the modules in the encoder and decoder, respectively, may be similar or dissimilar processing units executing respective computer programs that perform equivalent sets of mathematical operations.

In one embodiment, encoding the present method includes decision making as to whether a post filter which further includes attenuation of spectral valleys (with respect to the formant envelope, see above). This corresponds to the short-term portion of the post filter. It is then advantageous to adapt the criterion on which the decision is based to the nature of the post filter.

One embodiment is directed to an encoder particularly adapted for speech coding. As some of the problems motivating the invention have been observed when a mixture of vocal and other components is coded, the combination of speech coding and the independent decision-making regarding post filtering afforded by the invention is particularly advantageous. In particular, such a decoder may include a code-excited linear prediction encoding module.

In one embodiment, the encoder bases its decision on a detected simultaneous presence of a signal component with dominant fundamental frequency (pitch) and another signal component located below the fundamental frequency. The detection may also be aimed at finding the co-occurrence of a component with dominant fundamental frequency and another component with energy between the harmonics of this fundamental frequency. This is a situation wherein artefacts of the type under consideration are frequently encountered. Thus, if such simultaneous presence is established, the encoder will decide that post filtering is not suitable, which will be indicated accordingly by post filtering information contained in the bit stream.

One embodiment uses as its detection criterion the total signal power content in the audio time signal below a pitch frequency, possibly a pitch frequency estimated by a long-term prediction in the encoder. If this is greater than a predetermined threshold, it is considered that there are other relevant components than the pitch component (including harmonics), which will cause the post filter to be disabled.

In an encoder comprising a CELP module, use can be made of the fact that such a module estimates the pitch frequency of the audio time signal. Then, a further detection criterion is to check for energy content between or below the harmonics of this frequency, as described in more detail above.

As a further development of the preceding embodiment including a CELP module, the decision may include a comparison between an estimated power of the audio signal when CELP-coded (i.e., encoded and decoded) and an estimated power of the audio signal when CELP-coded and post-filtered. If the power difference is larger than a threshold, which may indicate that a relevant, non-noise component of the signal will be lost, and the encoder will decide to disable the post filter.

In an advantageous embodiment, the encoder comprises a CELP module and a TCX module. As is known in the art, TCX coding is advantageous in respect of certain kinds of signals, notably non-vocal signals. It is not common practice to apply post-filtering to a TCX-coded signal. Thus, the encoder may select either TCX coding, CELP coding with post filtering or CELP coding without post filtering, thereby covering a considerable range of signal types.

As one further development of the preceding embodiment, the decision between the three coding modes is taken on the basis of a rate—distortion criterion, that is, applying an optimization procedure known per se in the art.

In another further development of the preceding embodiment, the encoder further comprises an Advanced Audio Coding (AAC) coder, which is also known to be particularly suitable for certain types of signals. Preferably, the decision whether to apply AAC (frequency-domain) coding is made separately from the decision as to which of the other (linear-prediction) modes to use. Thus, the encoder can be apprehended as being operable in two super-modes, AAC or TCX/CELP, in the latter of which the encoder will select between TCX, post-filtered CELP or non-filtered CELP. This embodiment enables processing of an even wider range of audio signal types.

In one embodiment, the encoder can decide that a post filtering at decoding is to be applied gradually, that is, with gradually increasing gain. Likewise, it may decide that post filtering is to be removed gradually. Such gradual application and removal makes switching between regimes with and without post filtering less perceptible. As one example, a singing episode, for which post-filtered CELP coding is found to be suitable, may be preceded by an instrumental episode, wherein TCX coding is optimal; a decoder according to the invention may then apply post filtering gradually at or near the beginning of the singing episode, so that the benefits of post filtering are preserved even though annoying switching artefacts are avoided.

In one embodiment, the decision as to whether post filtering is to be applied is based on an approximate difference signal, which approximates that signal component which is to be removed from a future decoded signal by the post filter. As one option, the approximate difference signal is computed as the difference between the audio time signal and the audio time signal when subjected to (simulated) post filtering. As another option, an encoding section extracts an intermediate decoded signal, whereby the approximate difference signal can be computed as the difference between the audio time signal and the intermediate decoded signal when subjected to post filtering. The intermediate decoded signal may be stored in a long-term prediction buffer of the encoder. It may further represent the excitation of the signal, implying that further synthesis filtering (vocal tract, resonances) would need to be applied to obtain the final decoded signal. The point in using an intermediate decoded signal is that it captures some of the particularities, notably weaknesses, of the coding method, thereby allowing a more realistic estimation of the effect of the post filter. As a third option, a decoding section extracts an intermediate decoded signal, whereby the approximate difference signal can be computed as the difference between the intermediate decoded signal and the intermediate decoded signal when subjected to post filtering. This procedure probably gives a less reliable estimation than the two first options, but can on the other hand be carried out by the decoder in a standalone fashion.

The approximate difference signal thus obtained is then assessed with respect to one of the following criteria, which when settled in the affirmative will lead to a decision to disable the post filter:

- a) whether the power of the approximate difference signal exceeds a predetermined threshold, indicating that a significant part of the signal would be removed by the post filter;
- b) whether the character of the approximate difference signal is rather tonal than noise-like;
- c) whether a difference between magnitude frequency spectra of the approximate difference signal and of the audio time signal is unevenly distributed with respect to frequency, suggesting that it is not noise but rather a signal that would make sense to a human listener;
- d) whether a magnitude frequency spectrum of the approximate difference signal is localized to frequency intervals within a predetermined relevance envelope, based on what can usually be expected from a signal of the type to be processed; and
- e) whether a magnitude frequency spectrum of the approximate difference signal is localized to frequency intervals within a relevance envelope obtained by thresholding a magnitude frequency spectrum of the audio time signal by a magnitude of the largest signal component therein downscaled by a predetermined scale factor.
  
  When evaluating criterion e), it is advantageous to apply peak tracking in the magnitude spectrum, that is, to distinguish portions having peak-like shapes normally associated with tonal components rather than noise. Components identified by peak tracking, which may take place by some algorithm known per se in the art, may be further sorted by applying a threshold to the peak height, whereby the remaining components are tonal material of a certain magnitude. Such components usually represent relevant signal content rather than noise, which motivates a decision to disable the post filter.

In one embodiment of the invention as a decoder, the decision to disable the post filter is executed by a switch controllable by the control section and capable of bypassing the post filter in the circuit. In another embodiment, the post filter has variable gain controllable by the control section, or a gain controller therein, wherein the decision to disable is carried out by setting the post filter gain (see previous section) to zero or by setting its absolute value below a predetermined threshold.

In one embodiment, decoding according to the present invention includes extracting post filtering information from the bit stream signal which is being decoded. More precisely, the post filtering information may be encoded in a data field comprising at least one bit in a format suitable for transmission. Advantageously, the data field is an existing field defined by an applicable standard but not in use, so that the post filtering information does not increase the payload to be transmitted.

In other embodiments, an audio decoder for decoding an audio bitstream is disclosed. The decoder includes a first decoding module adapted to operate in a first coding mode and a second decoding module adapted to operate in a second coding mode, the second coding mode being different from the first coding mode. The decoder further includes a pitch filter in either the first coding mode or the second coding mode, the pitch filter adapted to filter a preliminary audio signal generated by the first decoding module or the second decoding module to obtain a filtered signal. The pitch filter is selectively enabled or disabled based on a value of a first parameter encoded in the audio bitstream, the first parameter being distinct from a second parameter encoded in the audio bitstream, the second parameter specifying a current coding mode of the audio decoder.

In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.

It is noted that the methods and apparatus disclosed in this section may be applied, after appropriate modifications within the skilled person's abilities including routine experimentation, to coding of signals having several components, possibly corresponding to different channels, such as stereo channels. Throughout the present application, pitch enhancement and post filtering are used as synonyms. It is further noted that AAC is discussed as a representative example of frequency-domain coding methods. Indeed, applying the invention to a decoder or encoder operable in a frequency-domain coding mode other than AAC will only require small modifications, if any, within the skilled person's abilities. Similarly, TCX is mentioned as an example of weighted linear prediction transform coding and of transform coding in general.

Features from two or more embodiments described hereinabove can be combined, unless they are clearly complementary, in further embodiments. The fact that two features are recited in different claims does not preclude that they can be combined to advantage. Likewise, further embodiments can also be provided by the omission of certain features that are not necessary or not essential for the desired purpose.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will now be described with reference to the accompanying drawings, on which:

FIG. 1 is a block diagram showing a conventional decoder with post filter;

FIG. 2 is a schematic block diagram of a conventional decoder operable in AAC, ACELP and TCX mode and including a post filter permanently connected downstream of the ACELP module;

FIG. 3 is a block diagram illustrating the structure of a post filter;

FIGS. 4 and 5 are block diagrams of two decoders according to the invention;

FIGS. 6 and 7 are block diagrams illustrating differences between a conventional decoder (FIG. 6) and a decoder (FIG. 7) according to the invention;

FIG. 8 is a block diagram of an encoder according to the invention;

FIGS. 9 and 10 are block diagrams illustrating differences between a conventional decoder (FIG. 9) and a decoder (FIG. 10) according to the invention; and

FIG. 11 is a block diagram of an autonomous post filter which can be selectively activated and deactivated.

DETAILED DESCRIPTION OF EMBODIMENTS

FIG. 4 is a schematic drawing of a decoder system 400 according to an embodiment of the invention, having as its input a bit stream signal and as its output an audio signal. As in the conventional decoders shown in FIG. 1, a post filter 440 is arranged downstream of a decoding module 410 but can be switched into or out of the decoding path by operating a switch 442. The post filter is enabled in the switch position shown in the figure. It would be disabled if the switch was set in the opposite position, whereby the signal from the decoding module 410 would instead be conducted over the bypass line 444. As an inventive contribution, the switch 442 is controllable by post filtering information contained in the bit stream signal, so that post filtering may be applied and removed irrespectively of the current status of the decoding module 410. Because a post filter 440 operates at some delay—for example, the post filter shown in FIG. 3 will introduce a delay amounting to at least the pitch period T—a compensation delay module 443 is arranged on the bypass line 444 to maintain the modules in a synchronized condition at switching. The delay module 443 delays the signal by the same period as the post filter 440 would, but does not otherwise process the signal. To minimize the change-over time, the compensation delay module 443 receives the same signal as the post filter 440 at all times. In an alternative embodiment where the post filter 440 is replaced by a zero-delay post filter (e.g., a causal filter, such as a filter with two taps, independent of future signal values), the compensation delay module 443 can be omitted.

FIG. 5 illustrates a further development according to the teachings of the invention of the triple-mode decoder system 500 of FIG. 2. An ACELP decoding module 511 is arranged in parallel with a TCX decoding module 512 and an AAC decoding module 513. In series with the ACELP decoding module 511 is arranged a post filter 540 for attenuating noise, particularly noise located between harmonics of a pitch frequency directly or indirectly derivable from the bit stream signal for which the decoder system 500 is adapted. The bit stream signal also encodes post filtering information governing the positions of an upper switch 541 operable to switch the post filter 540 out of the processing path and replace it with a compensation delay 543 like in FIG. 4. A lower switch 542 is used for switching between different decoding modes. With this structure, the position of the upper switch 541 is immaterial when one of the TCX or AAC modules 512, 513 is used; hence, the post filtering information does not necessary indicate this position except in the ACELP mode. Whatever decoding mode is currently used, the signal is supplied from the downstream connection point of the lower switch 542 to a spectral band replication (SBR) module 550, which outputs an audio signal. The skilled person will realize that the drawing is of a conceptual nature, as is clear notably from the switches which are shown schematically as separate physical entities with movable contacting means. In a possible realistic implementation of the decoder system, the switches as well as the other modules will be embodied by computer-readable instructions.

FIGS. 6 and 7 are also block diagrams of two triple-mode decoder systems operable in an ACELP, TCX or frequency-domain decoding mode. With reference to the latter figure, which shows an embodiment of the invention, a bit stream signal is supplied to an input point 701, which is in turn permanently connected via respective branches to the three decoding modules 711, 712, 713. The input point 701 also has a connecting branch 702 (not present in the conventional decoding system of FIG. 6) to a pitch enhancement module 740, which acts as a post filter of the general type described above. As is common practice in the art, a first transition windowing module 703 is arranged downstream of the ACELP and TCX modules 711, 712, to carry out transitions between the decoding modules. A second transition module 704 is arranged downstream of the frequency-domain decoding module 713 and the first transition windowing module 703, to carry out transition between the two super-modes. Further a SBR module 750 is provided immediately upstream of the output point 705. Clearly, the bit stream signal is supplied directly (or after demultiplexing, as appropriate) to all three decoding modules 711, 712, 713 and to the pitch enhancement module 740. Information contained in the bit stream controls what decoding module is to be active. By the invention however, the pitch enhancement module 740 performs an analogous self actuation, which responsive to post filtering information in the bit stream may act as a post filter or simply as a pass-through. This may for instance be realized through the provision of a control section (not shown) in the pitch enhancement module 740, by means of which the post filtering action can be turned on or off. The pitch enhancement module 740 is always in its pass-through mode when the decoder system operates in the frequency-domain or TCX decoding mode, wherein strictly speaking no post filtering information is necessary. It is understood that modules not forming part of the inventive contribution and whose presence is obvious to the skilled person, e.g., a demultiplexer, have been omitted from FIG. 7 and other similar drawings to increase clarity.

As a variation, the decoder system of FIG. 7 may be equipped with a control module (not shown) for deciding whether post filtering is to be applied using an analysis-by-synthesis approach. Such control module is communicatively connected to the pitch enhancement module 740 and to the ACELP module 711, from which it extracts an intermediate decoded signal s_{i_DEC}(n) representing an intermediate stage in the decoding process, preferably one corresponding to the excitation of the signal. The detection module has the necessary information to simulate the action of the pitch enhancement module 740, as defined by the transfer functions P_LT(z) and H_LP(z) (cf. Background section and FIG. 3), or equivalently their filter impulse responses p_LT(z) and h_LP(n). As follows by the discussion in the Background section, the component to be subtracted at post filtering can be estimated by an approximate difference signal s_AD(n) which is proportional to [(s_{i_DEC}*p_LT)*h_LP](n), where * denotes discrete convolution. This is an approximation of the true difference between the original audio signal and the post-filtered decoded signal, namely

s_ORIG(n)−s_E(n)=s_ORIG(n)−(s_DEC(n)−α[s_DEC*p_LT*h_LP](n)),

where α is the post filter gain. By studying the total energy, low-band energy, tonality, actual magnitude spectrum or past magnitude spectra of this signal, as disclosed in the Summary section and the claims, the control section may find a basis for the decision whether to activate or deactivate the pitch enhancement module 740.

FIG. 8 shows an encoder system 800 according to an embodiment of the invention. The encoder system 800 is adapted to process digital audio signals, which are generally obtained by capturing a sound wave by a microphone and transducing the wave into an analog electric signal. The electric signal is then sampled into a digital signal susceptible to be provided, in a suitable format, to the encoder system 800. The system generally consists of an encoding module 810, a decision module 820 and a multiplexer 830. By virtue of switches 814, 815 (symbolically represented), the encoding module 810 is operable in either a CELP, a TCX or an AAC mode, by selectively activating modules 811, 812, 813. The decision module 820 applies one or more predefined criteria to decide whether to disable post filtering during decoding of a bit stream signal produced by the encoder system 800 to encode an audio signal. For this purpose, the decision module 820 may examine the audio signal directly or may receive data from the encoding module 810 via a connection line 816. A signal indicative of the decision taken by the decision module 820 is provided, together with the encoded audio signal from the encoding module 810, to a multiplexer 830, which concatenates the signals into a bit stream constituting the output of the encoder system 800.

Preferably, the decision module 820 bases its decision on an approximate difference signal computed from an intermediate decoded signal s_{i_DEC}, which can be subtracted from the encoding module 810. The intermediate decoded signal represents an intermediate stage in the decoding process, as discussed in preceding paragraphs, but may be extracted from a corresponding stage of the encoding process. However, in the encoder system 800 the original audio signal S_ORIGis available so that, advantageously, the approximate difference signal is formed as:

s_ORIG(n)−(s_{i_DEC}(n)−α[(s_{i_DEC}*p_LT)*h_LP](n))

The approximation resides in the fact that the intermediate decoded signal is used in lieu of the final decoded signal. This enables an appraisal of the nature of the component that a post filter would remove at decoding, and by applying one of the criteria discussed in the Summary section, the decision module 820 will be able to take a decision whether to disable post filtering.

As a variation to this, the decision module 820 may use the original signal in place of an intermediate decoded signal, so that the approximate difference signal will be [(s_{i_DEC}*p_LT)*h_LP](n). This is likely to be a less faithful approximation but on the other hand makes the presence of a connection line 816 between the decision module 820 and the encoding module 810 optional.

In such other variations of this embodiment where the decision module 820 studies the audio signal directly, one or more of the following criteria may be applied:

- Does the audio signal contain both a component with dominant fundamental frequency and a component located below the fundamental frequency? (The fundamental frequency may be supplied as a by-product of the encoding module 810.)
- Does the audio signal contain both a component with dominant fundamental frequency and a component located between the harmonics of the fundamental frequency?
- Does the audio signal contain significant signal energy below the fundamental frequency?
- Is post-filtered decoding (likely to be) preferable to unfiltered decoding with respect to rate-distortion optimality?

In all the described variations of the encoder structure shown in FIG. 8—that is, irrespectively of the basis of the detection criterion—the decision section 820 may be enabled to decide on a gradual onset or gradual removal of post filtering, so as to achieve smooth transitions. The gradual onset and removal may be controlled by adjusting the post filter gain.

FIG. 9 shows a conventional decoder operable in a frequency-decoding mode and a CELP decoding mode depending on the bit stream signal supplied to the decoder. Post filtering is applied whenever the CELP decoding mode is selected. An improvement of this decoder is illustrated in FIG. 10, which shows a decoder 1000 according to an embodiment of the invention. This decoder is operable not only in a frequency-domain-based decoding mode, wherein the frequency-domain decoding module 1013 is active, and a filtered CELP decoding mode, wherein the CELP decoding module 1011 and the post filter 1040 are active, but also in an unfiltered CELP mode, in which the CELP module 1011 supplies its signal to a compensation delay module 1043 via a bypass line 1044. A switch 1042 controls what decoding mode is currently used responsive to post filtering information contained in the bit stream signal provided to the decoder 1000. In this decoder and that of FIG. 9, the last processing step is effected by an SBR module 1050, from which the final audio signal is output.

FIG. 11 shows a post filter 1100 suitable to be arranged downstream of a decoder 1199. The filter 1100 includes a post filtering module 1140, which is enabled or disabled by a control module (not shown), notably a binary or non-binary gain controller, in response to a post filtering signal received from a decision module 1120 within the post filter 1100. The decision module performs one or more tests on the signal obtained from the decoder to arrive at a decision whether the post filtering module 1140 is to be active or inactive. The decision may be taken along the lines of the functionality of the decision module 820 in FIG. 8, which uses the original signal and/or an intermediate decoded signal to predict the action of the post filter. The decision of the decision module 1120 may also be based on similar information as the decision modules uses in those embodiments where an intermediate decoded signal is formed. As one example, the decision module 1120 may estimate a pitch frequency (unless this is readily extractable from the bit stream signal) and compute the energy content in the signal below the pitch frequency and between its harmonics. If this energy content is significant, it probably represents a relevant signal component rather than noise, which motivates a decision to disable the post filtering module 1140.

A 6-person listening test has been carried out, during which music samples encoded and decoded according to the invention were compared with reference samples containing the same music coded while applying post filtering in the conventional fashion but maintaining all other parameters unchanged. The results confirm a perceived quality improvement.

Further embodiments of the present invention will become apparent to a person skilled in the art after reading the description above. Even though the present description and drawings disclose embodiments and examples, the invention is not restricted to these specific examples. Numerous modifications and variations can be made without departing from the scope of the present invention, which is defined by the accompanying claims.

The systems and methods disclosed hereinabove may be implemented as software, firmware, hardware or a combination thereof. Certain components or all components may be implemented as software executed by a digital signal processor or microprocessor, or be implemented as hardware or as an application-specific integrated circuit. Such software may be distributed on computer readable media, which may comprise computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to a person skilled in the art, computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Further, it is well known to the skilled person that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Claims

1. A method of decoding a bit stream signal as an audio time signal, the method including the steps of: decoding a bit stream signal as a preliminary audio time signal according to a coding mode selected from a plurality of coding modes, wherein the plurality of coding modes includes at least a first coding mode which includes a post-filtering step and at least a second coding mode which does not include the post-filtering step,wherein the post-filtering step applies a pitch-enhancement filter to the preliminary audio time signal, thereby obtaining an audio time signal,wherein the post-filtering step is selectively omitted responsive to post-filtering information encoded in the bit stream signal, the post-filtering information being indicative of an encoder-side decision of whether or not to omit the post-filtering step, whereby the post-filtering step is selectively omitted in the first coding mode,and wherein the post-filtering step further includes attenuating noise located in spectral valleys.
2. The method of claim 1, wherein the pitch-enhancement filter is a bass post filter.
3. The method of claim 1, wherein the decoding step includes applying code-excited linear prediction, CELP, decoding.
4. The method of claim 1, wherein the bit stream signal is segmented into time frames and the post-filtering step is omitted for an entire time frame or a sequence of entire time frames.
5. A non-transitory computer readable storage medium containing a program of instructions which, when executed by one or more processors, cause the one or more processors to perform the method of claim 1.
6. A decoding system configured to perform the method of claim 1.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 17/532,775 filed on Nov. 22, 2021, which is a continuation of U.S. patent application Ser. No. 17/073,228 filed on Oct. 16, 2020, (now U.S. Pat. No. 11,183,200, issued Nov. 23, 2021), which is a continuation of U.S. patent application Ser. No. 16/351,133 filed on Mar. 12, 2019, (now U.S. Pat. No. 10,811,024, issued Oct. 20, 2020), which is a continuation of U.S. patent application Ser. No. 15/792,589, filed Oct. 24, 2017 (now U.S. Pat. No. 10,236,010, issued Mar. 19, 2019), which is a divisional of U.S. patent application Ser. No. 15/086,409, filed Mar. 31, 2016 (now U.S. Pat. No. 9,858,940, issued Jan. 2, 2018), which is a continuation of U.S. patent application No. 14/936,408, filed Nov. 9, 2015 (now U.S. Pat. No. 9,343,077, issued May 17, 2016), which is a continuation of U.S. patent application No. 13/703,875, filed Dec. 12, 2012 (now U.S. Pat. No. 9,224,403, issued Dec. 29, 2015), which is the 371 National Stage of International Application No. PCT/EP2011/060555 having an international filing date of Jun. 23, 2011. PCT/EP2011/060555 claims priority to U.S. Provisional Patent Application No. 61/361,237, filed Jul. 2, 2010, the contents of each of which are hereby incorporated by reference in their entirety.

US Referenced Citations (94)

Number	Name	Date	Kind
4052568	Jankowski	Oct 1977	A
4696040	Doddington	Sep 1987	A
4896361	Gerson	Jan 1990	A
4969192	Chen	Nov 1990	A
5664055	Kroon	Sep 1997	A
5802109	Sano	Sep 1998	A
5864798	Miseki	Jan 1999	A
6072844	Inoue	Jun 2000	A
6073092	Kwon	Jun 2000	A
6098036	Zinser, Jr.	Aug 2000	A
6114859	Koda	Sep 2000	A
6138093	Ekudden	Oct 2000	A
6240386	Thyssen	May 2001	B1
6334105	Ehara	Dec 2001	B1
6363340	Sluijter	Mar 2002	B1
6385195	Sicher	May 2002	B2
6385578	Lee	May 2002	B1
6584441	Ojala	Jun 2003	B1
6658383	Koishida et al.	Dec 2003	B2
6714908	Naka	Mar 2004	B1
6735567	Gao et al.	May 2004	B2
6785645	Khalil	Aug 2004	B2
6810381	Sasaki	Oct 2004	B1
6862567	Gao	Mar 2005	B1
7020605	Gao	Mar 2006	B2
7110942	Thyssen	Sep 2006	B2
7222070	Stachurski	May 2007	B1
7426466	Ananthapadmanabhan et al.	Sep 2008	B2
7478040	Thyssen	Jan 2009	B2
7529660	Nagai	Sep 2009	B2
7873511	Herre et al.	Jan 2011	B2
7933769	Bessette	Apr 2011	B2
7945055	Taleb	May 2011	B2
7979271	Bessette	Jul 2011	B2
8095362	Gao	Jan 2012	B2
RE43191	Arslan et al.	Feb 2012	E
8332213	Gournay	Dec 2012	B2
8554548	Ehara	Oct 2013	B2
8682652	Herre	Mar 2014	B2
8738385	Chen	May 2014	B2
9031834	Coorman	May 2015	B2
9070364	Oh	Jun 2015	B2
20020016161	Dellien	Feb 2002	A1
20030004711	Koishida	Jan 2003	A1
20030088405	Chen	May 2003	A1
20030088406	Chen	May 2003	A1
20040068399	Ding	Apr 2004	A1
20050004793	Ojala	Jan 2005	A1
20050091046	Thyssen	Apr 2005	A1
20050165603	Bessette	Jul 2005	A1
20050246164	Ojala	Nov 2005	A1
20050267742	Makinen	Dec 2005	A1
20060047522	Ojanpera	Mar 2006	A1
20060167683	Hoerich	Jul 2006	A1
20060182091	Park	Aug 2006	A1
20060190247	Lindblom	Aug 2006	A1
20060271354	Sun	Nov 2006	A1
20060293902	Kim	Dec 2006	A1
20070147518	Bessette	Jun 2007	A1
20070174051	Oh	Jul 2007	A1
20070225971	Bessette	Sep 2007	A1
20070282603	Bessette	Dec 2007	A1
20080004869	Herre	Jan 2008	A1
20080279388	Oh	Nov 2008	A1
20090012797	Boehm	Jan 2009	A1
20090016426	Sato	Jan 2009	A1
20090022261	Suhling	Jan 2009	A1
20090043574	Gao	Feb 2009	A1
20090044231	Oh	Feb 2009	A1
20090046815	Oh	Feb 2009	A1
20090055196	Oh	Feb 2009	A1
20090067642	Buck	Mar 2009	A1
20090110201	Kim	Apr 2009	A1
20090210234	Sung	Aug 2009	A1
20090210237	Shen	Aug 2009	A1
20090216541	Oh et al.	Aug 2009	A1
20090240504	Pang et al.	Sep 2009	A1
20090265168	Kang et al.	Oct 2009	A1
20090299757	Guo	Dec 2009	A1
20090313009	Kovesi	Dec 2009	A1
20090319264	Yoshida	Dec 2009	A1
20100017200	Oshikiri	Jan 2010	A1
20100098199	Oshikiri	Apr 2010	A1
20100217607	Neuendorf	Aug 2010	A1
20100241427	Kovesi	Sep 2010	A1
20100262420	Herre	Oct 2010	A1
20100268542	Kim	Oct 2010	A1
20110076968	Seshadri	Mar 2011	A1
20110173011	Geiger	Jul 2011	A1
20110282656	Grancharov et al.	Nov 2011	A1
20110320196	Choo et al.	Dec 2011	A1
20120101824	Chen	Apr 2012	A1
20160021383	Wittmann	Jan 2016	A1
20200042415	Maddukuri	Feb 2020	A1

Foreign Referenced Citations (64)

Number	Date	Country
2094780	Oct 1994	CA
1104010	Jun 1995	CN
1567205	Jan 2005	CN
1873778	Dec 2006	CN
101145343	Mar 2008	CN
101256771	Sep 2008	CN
101617362	Dec 2009	CN
1262956	Apr 2005	EP
1747556	Jan 2007	EP
1990799	Nov 2008	EP
2096629	Sep 2009	EP
2128858	Dec 2009	EP
H09-46268	Feb 1997	JP
H09-50298	Feb 1997	JP
H09-81192	Mar 1997	JP
H09-261184	Oct 1997	JP
H09-326772	Dec 1997	JP
H10-143195	May 1998	JP
11-003099	Jan 1999	JP
H11-045100	Feb 1999	JP
2000-206999	Jul 2000	JP
2001-147700	May 2001	JP
2001-249700	Sep 2001	JP
2001-513916	Sep 2001	JP
2002-149200	May 2002	JP
2002-517022	Jun 2002	JP
2003-186487	Jul 2003	JP
2007-525707	Sep 2007	JP
2010-520503	Jun 2010	JP
2010-520505	Jun 2010	JP
2012-505423	Mar 2012	JP
2013-533983	Aug 2013	JP
2262748	Oct 2005	RU
2319223	Mar 2008	RU
2339088	Nov 2008	RU
2376656	Dec 2009	RU
2390856	May 2010	RU
2008146294	May 2010	RU
9528699	Oct 1995	WO
1995028699	Oct 1995	WO
9731367	Aug 1997	WO
9938155	Jul 1999	WO
1999038155	Jul 1999	WO
2005081230	Sep 2005	WO
2005081231	Sep 2005	WO
2005104095	Nov 2005	WO
2005111567	Nov 2005	WO
2005112004	Nov 2005	WO
2007055507	May 2007	WO
2007086646	Aug 2007	WO
2007142434	Dec 2007	WO
2008071353	Jun 2008	WO
2008072701	Jun 2008	WO
2008072913	Jun 2008	WO
2008082133	Jul 2008	WO
2008086920	Jul 2008	WO
2008104663	Sep 2008	WO
2008151755	Dec 2008	WO
2009022193	Feb 2009	WO
2009100768	Aug 2009	WO
2009114656	Sep 2009	WO
2010003532	Jan 2010	WO
2010040522	Apr 2010	WO
2011048117	Apr 2011	WO

Non-Patent Literature Citations (24)

Entry
ETSI, Digital cellular telecommunications system (Phase 2+), 3GPP TS 2 6.290 version 6.3.0 Release 6, , 200506, p. 57-59, Jan. 2019.
Anonymous: “Study on ISO/IEC 23003-3:201X/CD of Unified Speech and Audio Coding” MPEG Meeting Motion Picture Expert Group or ISO/IEC JTC1/SC29/WG11 Nov. 16, 2010.
Bessette, B. et al. “A Wideband Speech and Audio Codec at 16/24/32 kbit/s Using Hybrid ACELP/TCX Techniques” 1999 IEEE Workshop on Speech Coding Proceedings, pp. 7-9.
Bessette, B. et al. “Universal Speech/Audio Coding Using Hybrid ACELP/TCX Techniques” ICASSP 2005 International Conference on IEEE, Mar. 18-23, 2005, vol. 3.
Chen, J.H. et al. “Adaptive Postfiltering for Quality Enhancement of Coded Speech” IEEE Transactions on Speech and Audio Processing, vol. 3, No. 1, Jan. 1995.
Ghitza, O. et al. “Scalar LPC Quantization Based on Format JND's” IEEE Transactions on Acoustics, Speeech and Signal Processing, vol. 34, Issue 4, pp. 697-708, published in Aug. 1986.
Grancharov, V et al. “Noise-Dependent Postfiltering” IEEE International Conference on Acoustics, Speech, and Signal Processing, May 17-21, 2004, pages I-457-460, vol. 1.
Labonte, Francis, “Etude, Optimisation et Implementation d'un Quantificateur Vectoriel Agebrique Encastre Dans Un Codeur Audio Hybride ACELP/TCX” 2003, Corporate Source Institution.
Lecomte, J. et al. “An Improved Low Complexity AMR-WB+Encoder Using Neural Networks for Mode Selection” AES Convention Oct. 2007.
Neuendorf, Max, “WD7 of USAC” MPEG Meeting Apr. 19-23, 2010.
Ojanpera, J. et al.“Long term predictor for transform domain perceptual audio coding,” in Proceedings of the 107th AES Convention, New York, NY, USA, Sep. 1999.
Ramprashad, Sean A. “The Multimode Transform Predictive Coding Paradigm” IEEE Transactions on Speech and Audio Processing, vol. 11, No. 2, Mar. 2003.
Resch, B. et al.“Finalization of CE on an Improved Basspost Filter Operation for the ACELP of USAC” ISO/IEC JTC1/SC29/WG11, MPEG 2010, Oct. 2010, Guangzhou, China.
Resch, B. et al. “CE Proposal on Improved Bass-Post Filter Operation for the ACELP of USAC” MPEG Meeting Jul. 26-30, 2010, Geneva.
Schroeder, R. et al. “Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates” ICASSP 1985, Apr. 1985, vol. 10, pp. 937-940.
Song, J. et al.“Harmonic enhancement in low bitrate audio coding using an efficient long-term predictor,” EURASIP Journal on Advances in Signal Processing, Received Feb. 8, 2010; Published Aug. 15, 2010.
ETSI TS 126 071 v5.0.0 (Jun. 2002), Universal Mobile Telecommunications System (UMTS); AMR Speech Codec; General Description.
ETSI TS 126 090 v5.0.0 (Jun. 2002) Universal Mobile Telecommunications System (UMTS); AMR Speech Codec; Transcoding Functions (3GPPTS 26.090 version 5.0.0 Release 5).
ETSI TS 126 290 v10.0.0 (Apr. 2011) Digital Cellular Telecommunications System (Phase 2+); Universal Mobile Telecommunications System (UMTS); LTE; Audio Codec Processing Functions; Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec; Transcoding Functions.
ETSI TS 132 612 “Digital Cellular Telecommunications System (Phase 2+)” 3GPP TS 26.290 version 6.3.0, pp. 57-59 (Jun. 2005).
ITU-T G.722.2 Wideband Coding of Speech at Around 16 kbit/s Using Adaptive Multi-Rate Wideband (AMR-WB) Series G: Transmission Systems and Media, Digital Systems and Networks, Digital Terminal Equipments—Coding of Analogue Signals by Methods other than PCM (Jul. 2003).
Song, J. et al. “Enhanced Long-Term Predictor for Unified Speech and Audio Coding” IEEE International Conference on Acoustics, Speech and Signal processing, Prague, 2011, pp. 505-508.
3GPP TS 26.290 v8.0.0 (Dec. 2008), 3rd Generation Partnership Project; Technical Specification Group Service and System Aspects; Audio Codec Processing Functions; Extended Adaptive Multi-Rate—Wideband (AMR-WB+) code; Transcoding functions (Release 8), 84 pages.
ISO/IEC 14496-3:2009(E) “Information Technology—Coding of Audio-Visual Objects” Part 3: Audio, Fourth Edition, Sep. 1, 2009, 1178 pages.

Related Publications (1)

	Number	Date	Country
	20230282222 A1	Sep 2023	US

Provisional Applications (1)

	Number	Date	Country
	61361237	Jul 2010	US

Divisions (1)

	Number	Date	Country
Parent	15086409	Mar 2016	US
Child	15792589		US

Continuations (6)

	Number	Date	Country
Parent	17532775	Nov 2021	US
Child	18185691		US
Parent	17073228	Oct 2020	US
Child	17532775		US
Parent	16351133	Mar 2019	US
Child	17073228		US
Parent	15792589	Oct 2017	US
Child	16351133		US
Parent	14936408	Nov 2015	US
Child	15086409		US
Parent	13703875		US
Child	14936408		US

Post filter for audio signals

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

CPC

Field of Search

CPC

International Classifications

Disclaimer

Term Extension

Abstract