The present disclosure relates to the field of coding/decoding digital signals.
The disclosure advantageously applies to coding/decoding sounds which may contain speech and music, either mixed together or alternating.
In order to efficiently code the speech sounds at low rate, CELP type techniques (“Code Excited Linear Prediction”) are recommended. In order to efficiently code the music sounds, transform coding techniques are recommended instead.
CELP type coders are predictive coders. Their objective is to model speech production from various elements: a short-term linear prediction to model the vocal tract, a long-term prediction to model the vocal cord vibration during a voiced period, and an excitation derived from a fixed dictionary (white noise, algebraic expectation) to represent the “innovation” which could not be modeled.
The transform coders such as MPEG AAC, AAC-LD, AAC-ELD, or ITU-T G.722.1 Annex C, for example, use critical sampling transforms in order to pack the signal in the transform domain. “Critical sampling transform” refers to a transform for which the number of coefficients in the transformed domain is equal to the number of time samples in each analyzed frame.
A solution for efficiently coding a signal with mixed speech/music content consists in selecting over time the best technique between at least two coding modes, one of CELP type, the other of transform type.
It is for example the case for 3GPP AMR-WB+ and MPEG USAC codecs (for “Unified Speech Audio Coding”). The applications aimed by AMR-WB+ and USAC are not conversational, but corresponds to storing and disseminating services, without strong constraints on the algorithmic delay.
The initial version of the USAC codec, called RMO (Reference Model 0), is described in the paper by M. Neuendorf et al., A Novel Scheme for Low Bitrate Unified Speech and Audio Coding—MPEG RM0, 7-10 May 2009, 126th AES Convention. This RM0 codec alternates between several coding modes:
In the USAC codec, the transitions between LPD and FD modes are crucial for ensuring a sufficient quality without switching flaws, knowing that each mode (ACELP, TCX, FD) has a specific “signature” (in terms of artefacts) and that the FD and LPD modes are different in nature—the FD mode is based on transform coding in the signal domain, whereas the LPD modes use predictive linear coding in the perceptual weighted domain with filter memories to correctly manage. Intermodal switching management in the USAC RM0 codec is detailed in the paper by J. Lecomte et al., “Efficient cross-fade windows for transitions between LPC-based and non-LPC based audio coding”, 7-10 May 2009, 126th AES Convention. As explained in this paper, the main difficulty is the transitions from LPD to FD modes and vice versa. Only the case of CELP to FD transitions is considered here.
In order to fully understand how it works, the principle of MDCT transform coding is recalled through a typical example of development.
At the coder the MDCT transformation is typically divided into three steps, the signal being split into frames of M samples prior to MDCT coding:
The MDCT window is divided into four adjacent portions with equal lengths M/2, here called “quarters”.
The signal is multiplied by the analysis window, and then aliasing is performed: the first quarter (windowed) is aliased (i.e. time reversed and overlapped) on the second quarter and the fourth quarter is aliased on the third.
More precisely, time-domain aliasing a quarter on another is performed in the following way: the first sample of the first quarter is added to (or subtracted from) the last sample of the second quarter, the second sample of the first quarter is added to (or subtracted from) the penultimate sample of the second quarter, and so on, until the last sample of the first quarter which is added to (or subtracted from) the first sample of the second quarter.
Therefore, from four quarters, 2 aliased quarters are obtained where each sample is the result of a linear combination of 2 signal samples to code. This linear combination induces time-domain aliasing.
The 2 aliased quarters are then jointly coded after the DCT transformation (of type IV). For the following frame there is a shift by half of a window (i.e. 50% of overlap), the third and fourth quarters of the preceding frame then become the first and the second quarter of the current frame. After aliasing, a second linear combination of the same sample pairs is sent like in the preceding frame, but with different weights.
At the decoder, after the DCT transformation, the decoded version of these aliased signals is therefore obtained. Two consecutive frames contain the result for two different aliasing events of the same quarters, i.e. for each sample pair there is the result for two linear combinations with different but known weights: an equation system may therefore be solved to obtain the decoded version of the input signal, time-domain aliasing may thus be eliminated using two consecutive decoded frames.
Solving the mentioned equation systems is generally implicitly performed by opening, multiplication by a synthesis window which is judiciously chosen, and then adding and overlapping the common parts (without discontinuity due to quantization errors) between 2 consecutive decoded frames, indeed this operation behaves like a an overlap-add. When the window for the first quarter or the fourth quarter is at zero for each sample, it is referred to an MDCT transformation without time-domain aliasing in this part of the window. In this case the smooth transition is not ensured by the MDCT transformation, it must be made by other means such as for example an external overlap-add.
It should be noted that there are variants of implementations for the MDCT transformation, in particular for defining the DCT transform, on the way of time-domain aliasing the block to transform (for example, signs applied to the quarters aliased left and right may be reversed, or the second and the third quarter may be aliased on the first and fourth quarters respectively), etc. These variants do not change the MDCT analysis-synthesis principle with sample block reduction by windowing, time-domain aliasing, then transforming, and finally windowing, aliasing, and adding-overlapping.
In the case of the USAC RMO coder described in the paper by Lecomte et al., the transition between a frame coded by ACELP coding and a frame coded by FD coding, is performed in the following way:
A transition window for the FD mode is used with an overlap to the left of 128 samples.
Time-domain aliasing on this overlap area is cancelled by introducing an artificial time-domain aliasing to the right of the reconstructed ACELP frame. The MDCT windows used for the transition has a size of 2304 samples and the DCT transformation operates on 1152 samples whereas normally the FD mode frames are coded with a window with a size of 2048 samples and a DCT transformation of 1024 samples. Thus the MDCT transformation of the normal FD mode is not directly usable for the transition window, the coder must also integrate a modified version of this transformation which complicates the transition implementation for the FD mode.
This coding technique from the state-of-the-art has an algorithmic delay in the order of 100 to 200 ms. This delay is incompatible with conversational applications for which the coding delay is generally in the order of 20 to 25 ms for speech coders for the mobile applications (e.g. GSM EFR, 3GPP AMR and AMR-WB) and in the order of 40 ms for conversational transform coders for videoconferencing (for example UIT-T G.722.1 Annex C and G.719). Moreover, occasionally increasing the DCT transformation size (2304 vs 2048) causes a peak in complexity at the moment of transition.
To overcome these disadvantages, the international patent application WO2012/085451, wherein the content is incorporated by reference to the present application, proposes a new method of coding a transition frame. The transition frame is defined as the transform coded current frame following a preceding frame coded by predictive coding. According to the above-mentioned new method, part of the transition frame, for example a sub-frame of 5 ms, in the case of CELP coding at 12.8 GHz, and two extra CELP frames of 4 ms each, in the case of a CELP coding at 16 kHz, are coded by predictive coding restricted with respect to predictive coding the preceding frame.
The restricted predictive coding consists in using the stable parameters of the preceding frame coded by predictive coding, such as for example the linear prediction filter coefficients and in only coding some minimum parameters for the extra sub-frame in the transition frame.
As the preceding frame has not been coded with transform coding, cancelling time-domain aliasing in the first part of the frame is impossible. The above-mentioned patent application WO2012/085451 further proposes to modify the first half of the MDCT window so as to not have time-domain aliasing in the first quarter which is usually aliased. It is also proposed to integrate part of the overlap-add between the decoded CELP frame and the decoded MDCT frame by modifying the coefficients of the analysis/synthesis window. In reference to
At the coder, the transition window is null until the aliasing point. Thus the coefficients of the left part of the aliased window are identical to those of the non-aliased window. The part between the aliasing point and the end of this transition (TR) CELP sub-frame corresponds to a sinusoidal half-window. At the decoder, after opening, the same window is applied to the signal. On the segment between the aliasing point and the start of the MDCT frame, the window coefficients correspond to a sine window. To ensure the addition-overlap between the decoded CELP sub-frame and the signal originating from the MDCT, only applying a cost type window to the part of the CELP sub-frame in overlap and to sum the latter with the MDCT frame is required. The method is of perfect reconstruction.
However, the application WO2012/085451 provides allocating a bit budget Btrans for coding the CELP sub-frame corresponding to the required budget for CELP coding a classic frame, brought down to a single sub-frame. The remaining budget for transform coding the transition frame is then insufficient and might lead to a quality decrease at low rate.
The present disclosure improves the situation.
For this purpose, a first aspect of the disclosure relates to a method of determining a distribution of bits for coding a transition frame. This method is implemented in a coder/decoder for coding/decoding a digital signal. The transition frame is preceded by a predictive coded preceding frame and coding this transition frame comprises transform coding and predictive coding a single sub-frame of the transition frame. The method further comprises the following steps:
The predictive coding bit rate is thus curbed by a maximum value. The number of bits allocated for predictive coding depends on this bit rate. Since the weaker the bit rate, the weaker the number of bits allocated for coding is, a minimum remaining budget for transform coding the transition frame is guaranteed.
Moreover, the number of bits allocated for predictive coding the sub-frame is optimized with respect to the transform coding bit rate. Indeed, if the bit rate for transform coding the transition frame is lower than the first predetermined value, the bit rate for predictive coding and the bit rate for transfer coding are identical. Signal coherence thus generated is therefore improved which simplifies the subsequent steps of coding (channel coding) and processing the received frames at the decoder.
In another embodiment, the coder/decoder comprises a first core working, for predictive coding/decoding a signal frame, at a first frequency, and a second core working, for predictive coding/decoding a signal frame, at a second frequency. The first predetermined bit rate value depends on the core selected from the first and second cores for coding/decoding the predictive coded preceding frame.
The working frequency of the coder/decoder core has an influence on the number of bits required for correctly representing the input digital signal. For example, for some working frequencies, additional bits must be provided for coding frequency bands which are non-directly processed by the core.
In an embodiment, when the first core has been selected for coding/decoding the predictive coded preceding core, the assigned bit rate also equal to the maximum between the bit rate for the transform coded transition frame and the second predetermined bit rate value, the second value being lower than the first value. Thus, a minimum bit rate is guaranteed in order to prevent rates differences being too large between the different coded frames.
In another embodiment, the digital signal is decomposed into at least one frequency low band and one frequency high band. In this situation, the first calculated number of bits is assigned for predictive coding the transition frame for the frequency low band. A third predetermined number of bits is thus allocated for coding the transition sub-frame for the frequency high band. Moreover, the second number of bits allocated for transfer coding the transition frame is then further determined from the third predetermined number of bits. Thus, it is possible to efficiently code the whole frequency spectrum of the input signal without sacrificing the quality of signal restored upon decoding.
In an embodiment, the number of bits available for coding the transition frame is fixed. This reduces the complexity of the coding steps.
In another embodiment, the second number of bits is equal to the fixed number of bits for coding the transition frame minus the first number of bits minus the third number of bits. The final determination of the distribution of bits in the transition frame is thus limited to subtracting the entire values which simplifies coding.
Alternatively, the second number of bits is equal to the fixed number of bits for coding the transition frame minus the first number of bits minus the third number of bits minus a first bit minus a second bit. The first bit indicates whether a low-pass filtering is performed during the determination of predictive coding parameters for the transition sub-frame, the parameters being relative to the tonal lead time. The second bit indicates the frequency used by the coder/decoder core for predictive coding/decoding the transition sub-frame. Such indication allows more flexible coding.
A second aspect of the disclosure relates to a method of coding a digital signal in a coder able to code signal frames according to predictive coding or according to transform coding, comprising the following steps:
Determining the distribution of bits comprised in the transition frame is thus determined prior to coding. As described below, determining the distribution of bits is reproducible at the decoder which prevents an explicit transmission of information about this distribution.
Moreover, this coding guarantees a balanced distribution between predictive coding and transform coding within this transition frame.
In an embodiment, predictive coding comprises generating determined predictive coding parameters for the bit rate assigned during the distribution of bits in the transition frame. The use of such predictive parameters allows optimising the ratio between the bit rate assigned for predictive coding and the rate remaining assigned for transform coding, and therefore optimizing the quality of the reconstructed signal. Indeed, at constant quality, the number of bits attributed for this predictive parameter or another may vary in non-linear proportions with respect to the bit rate assigned for predictive coding.
In another embodiment, predictive coding comprises generating predictive coding parameters restricted with respect to predictive coding the preceding frame by reusing at least one predictive coding parameter of the preceding frame. Thus, upon decoding, additional information is extracted from the preceding frame to complete decoding the transition sub-frame to decode. This reduces the number of bits that must be reserved for predictive coding the transition sub-frame.
The combination of reusing parameters from a preceding frame and assigning the bit rate for transform coding the transition frame allows ensuring a coherent transition at low-cost.
A third aspect of the disclosure relates to a method of decoding a digital signal coded by predictive coding and transform coding, comprising the steps of:
As mentioned above, the method of determining the distribution of bits in the transition frame is directly reproducible at the decoder. Indeed, the distribution of bits is determined only from the bit rate from the transform coded part of the transition. Therefore, no extra bit is necessary for implementing the step of determining the distribution of bits and bandwidth savings are therefore made.
A fourth aspect of the disclosure further aims for a computer program comprising instructions for implementing the method according to the aspects of the disclosure described above, when these instructions are executed by processor.
A fifth aspect of the disclosure relates to a device for determining a distribution of bits for coding a transition frame, this device being implemented in a coder/decoder for coding/decoding a digital signal, the transition frame being preceded by a predictive coded preceding frame, coding the transition frame comprising transform coding and predictive coding a single sub-frame of the transition frame, the number of bits for coding the transition frame being fixed, the device comprising a processor arranged for performing the following operations:
A sixth aspect of the disclosure further aims for a coder able to code frames for a digital signal according to predictive coding or according to transform coding, comprising :
A seventh suspect of the disclosure further aims for a decoder for a digital signal coded by predictive coding and transform coding, comprising:
Other features and advantages of the disclosure will appear upon examining the detailed description below, and the accompanying drawings on which:
The coder 100 comprises a receiving unit 101 for receiving, at step 201, an input signal samples at a given frequency fs (for example 8, 16, 32, or 48 kHz) and decomposed into sub-frames, for example of 20 ms.
Upon receiving a current frame, a pre-processing unit 102 is able to select, at step 202, the coding mode which is most adequate for coding the current frame, between at least one LPD mode and one FD mode. In the following description, it is considered, for illustrative purposes, that MDCT coding is used for the FD mode and that CELP coding is used for the LPD mode. There is no restriction on the coding techniques employed for the LPD and FD modes respectively. Thus, modes in addition to the CELP and MDCT modes may be used for example, CELP coding may be replaced with another type of predictive coding, the MDCT transform may be replaced with another type of transform.
It is assumed here that the frame type is explicitly transmitted via the block 206, with for example fixed length coding indicating the mode chosen from a predefined list. In variants of the disclosure, this coding for the mode chosen in each frame may be of variable length. It is also provided that the CELP coding type (12.8 or 16 kHz) may be explicitly transmitted through a bit so as to facilitate decoding the transition frame.
Step 203 verifies that CELP decoding has been selected at step 202. In cases where the LPD mode is selected, the signal frame is transmitted to a CELP coder 103 for coding a CELP frame at step 204. The CELP coder may use two “cores” working at two respective internal sampling frequencies, for example fixed at 12.8 kHz and 16 kHz, which require the use of sampling of the entry signal (at frequency fs) at an internal frequency of 12.8 or 16 kHz. Such re-sampling may be implemented in a re-sampling unit in the pre-processing block 102 or in the CELP coder 103. The frame is then predictive coded by the CELP coder 103 by deducting the CELP parameters generally depending on a signal categorisation. The CELP parameters typically include LPC coefficients, a fixed and adaptive gain vector, an adaptive dictionary vector, a fixed dictionary vector. This list may also be modified based on a signal category in the frame, such as in UIT-T G.718 coding. The parameters thus calculated may then be quantified, multiplexed, and transmitted at step 206 to the decoder by a transmitting unit 108. The CELP coding parameters, such as the LPC coefficients, the fixed and adaptive gain vector, the adaptive dictionary vector, the fixed dictionary vector, and the CELP decoder states may further be memorised, at step 205, in a memory 107 in cases where the frame following the current frame is an MDCT transition frame.
As explained below, a band extension may also be performed with coding associated with the high band when the current frame is of CELP type.
In cases where MDCT coding has been selected by the unit 102 at step 203, it is verified at step 207 that the frame preceding the current frame has been MDCT transform coded. In cases where the frame preceding the current frame has been MDCT transform coded, the current frame is transmitted to the MDCT coder 105 directly, for MDCT transform coding the current frame at step 208. The MDCT coder may code a frame covering 28.75 ms of non-re-sampled signal, including 20 ms of frame and 8.75 ms of lookahead for example. There is no restriction on the MDCT window size. Furthermore, a delay corresponding to the CELP coder delay due to the re-sampling of the input signal, is applied to the frame coded by the MDCT coder, in such a way that the MDCT and CELP frames are synchronized. Such delay at the coder may be of 0.9375 ms according to the re-sampling type before CELP coding. The MDCT transform coded frame is transmitted to the decoder at step 206.
In cases where MDCT coding is selected by the unit 102, and in cases where the frame preceding the current frame has been predictive coded, the current frame a transition frame and is transmitted to a transition unit 104. As described in the following, the MDCT transition frame comprises an extra CELP sub-frame.
The transition unit 104 is able to implement the following steps:
In an embodiment, at least one of these steps is performed by the transition frame coding unit 106 described below.
The transition MDCT frame is coded by the MDCT coder 105, at step 212, as described in the following, and based on a budget of bits allocated at step 209. The extra CELP sub-frame is also coded by a CELP coder 103, at step 213, as described in the following in reference to
A frame to code 301 is received at the coder 100 and is coded by the CELP coder 103. A current frame 302 is then received by input of the coder 100 to be MDCT transform coded. It is thus a transition frame. The following frame 303 received by input of the coder is also MDCT transform coded. According to the disclosure, the following frame 303 may be coded by CELP coding and there is no restriction on coding used for the following frame 303.
An asymmetric MDCT window 304 may be used for coding the current frame. This window 304 shows a rising edge 307 of 14.375 ms, a level with a gain at 1 of 11.25 ms, a falling edge 309 of 8.75 ms corresponding to the lookahead, and a null part 310 of 5.265 ms. Adding the null part 310 allows reducing the lookahead and thus the corresponding delay. In an embodiment, the form of this MDCT analysis window for MDCT coding is modified for example for further reducing the lookahead or for using a symmetric window with examples given in the patent application WO2012/085451.
The dashed lines 312 represent the medium of the MDCT window 304. On both sides of the line 312, the 10 ms quarters of the MDCT window 212 are aliased as described in the introductory part. The continuous line 311 indicates the aliasing area between the first and second quarters of the MDCT window 304. The MDCT window of the following frame 303 is referenced 306 and shows an overlap-add area with the MDCT window 304 corresponding to the falling edge 309 of the MDCT window 304.
An MDCT window 305 theoretically represents the window which will be applied to the preceding window if it has been MDCT transform coded. However, the preceding frame 301 being coded by the CELP coder 103, it is necessary, in order to allow opening of the first part of the MDCT transform coded frame at the decoder, that the window is null in the first quarter (since the second part of the preceding MDCT frame is not available).
For this purpose, the MDCT window 304 is modified by an MDCT window 313 having a first quarter at zero, allowing time-domain aliasing in the first part of the MDCT frame at the decoder.
At the decoder, the analysis windows 304, 305, 306, and 313 correspond to synthesis windows 324, 325, 326, and 327 respectively. This synthesis window is therefore time-reversed with respect to the corresponding analysis window. In variants of the disclosure, the analysis and synthesis windows may be identical, of sinusoidal type or other.
A first frame 320 of new samples coded by CELP coding is received at the decoder. It corresponds to the coded version of this CELP frame 301. It is here recalled that the decoded frame is shifted by 8.75 ms with respect to the frame 320.
The coded version of the transition frame 302, is then received (references 321 and 222 forming a complete frame). Between the end of the CELP frame 320 and the start of the rising edge of the synthesis window 327 (corresponding to the aliasing line), a gap is created. In the particular example represented here, a quarter of the MDCT window being 10 ms and the null part of the synthesis window MDCT 324 covering this CELP frame 220 being 5.625 ms (corresponding to the part 310 of the MDCT analysis window 204), the gap is 4.275 ms. Furthermore, to ensure that a satisfying overlap-add length with the start of the non-null part of the MDCT window 327, the delay between this CELP frame 320 and the start of the MDCT window 327 is prolonged to the required length. In the following example, for illustrative purposes, a satisfying overlap-add length of 1.875 ms is considered, the above-mentioned delay (corresponding to a missing signal length) thus being brought up to 6.25 ms, as represented by reference 321 on
It should be noted that the signal frames represented on
As previously mentioned the application WO2012/085451 proposes to code an extra CELP sub-frame of 5 ms at the start of the MDCT transition frame, in cases of 12.8 kHz CELP coding, and two extra CELP frames of 4 ms each at the start of the MDCT transition frame, in cases of 16 kHz CELP coding.
In cases of 12.8 kHz, the 6.25 ms delay is not filled and the overlap-add is affected: there are only 0.625 ms of overlap-add at the decoder, which is insufficient.
In the case of 16 kHz, two extra CELP sub-frames are coded at the start of the transition frame, which only leaves a very little budget for coding the transition MDCT frame and may lead to a significant quality decrease at low rate.
In order to overcome these disadvantages, the present disclosure may provide coding a single extra CELP sub-frame at 12.8 or 16 kHz by the CELP coder 103. The extra samples are generated at the decoder, as detailed in the following, in order to generate the missing signal on the above-mentioned 6.25 ms length.
In order to code the transition CELP sub-frame, the unit 106 may reuse at least one CELP parameter of the preceding CELP frame. For example, the unit 106 may reuse the linear prediction coefficient A(z) of the preceding CELP sub-frame as well as the energy from the preceding frame innovation (stored in the memory 107 such as previously described) in order to code only the adaptive dictionary vector, the adaptive gain, the fixed gain, and the fixed dictionary vector of the transition CELP sub-frame. Thus, the extra CELP sub-frame may be coded with the same core (12.8 kHz or 16 kHz) as the preceding CELP frame.
A transition frame coding unit 106 ensures coding the transition frame according to the disclosure. The disclosure may further provide the insertion by the unit 106 in the bit flow of an extra bit indicating that the coded frame 322 is a transition frame, however in general cases this transition frame indication may also be transmitted in the global indication of the current frame coding mode, without taking extra bits.
The disclosure may further provide that this unit 116 codes the signal high band at steps 204 and 214 (method of so-called “band extension”), when the latter is required, with a fixed budget since the sampling frequency of the synthesis signal at the decoder is not necessarily identical to the CELP core frequency.
For this purpose, the coding unit of the transition frame 106 may implement the following steps:
The above-mentioned step 209 is illustrated in more details in reference to
At step 400, the total rate (in bit/s), noted core_brate, which may be allocated to coding the current framed is fixed as being equal to the output rate of the MDCT coder. The duration of the frame being considered in this example as 20 ms, the number of frames per second is 50 and the total budgets in bits is equal to core_brate/50. The total budget may be fixed, in the case of a fixed rate coder, or variable, in the case of a variable rate coder when adapting to the coding rate is implemented. In the following, an num_bits variable is used, initialised at the value of core_brate/50.
At step 401, the transition unit 104 determines the CELP core, from at least two CELP cores, which has been used for coding this preceding CELP frame. In the following example, two CELP cores are considered, working at frequencies of 12.8 kHz and 16 kHz respectively. Alternatively, a single CELP core is implemented upon coding and/or upon decoding.
In the case where the CELP core used for the preceding CELP frame has a 12.8 kHz frequency, the method comprises a step 402 of assigning a bit rate, labelled cbrate, for CELP coding the transition sub-frame, the bit rate being equal to the minimum between the bit rate for MDCT coding of the transition frame and a first predetermined bit rate value. The first predetermined value may be fixed at 24.4 kbit/s for example, which allows to ensure a satisfying bit budget for transfer coding.
Thus, cbrate=min(core_bitrate, 24400). This limitation is equivalent to curbing the operation of the restricted CELP coding limited to an extra sub-frame with the coded CELP parameters as if they were coded by CELP coding with at most 24.40 kbit/s.
At an optional step 403, the assigned bit rate is compared to a 11.60 kbit/s CELP bit rate. If the assigned bit rate is higher, a bit may be reserved for coding a bit indication for low pass filtering the adaptive dictionary (such as for example for AMR-WB coding at rates higher or equal to 12.65 kbit/s). The num bits variable is updated:
num_bits:=num bits−1
At step 404, a first number of bits, labelled budgl, is allocated for predictive coding the additional CELP sub-frame. The first number of bits budg1 represents the number of bits representing the CELP parameters used for coding the CELP sub-frame. As previously detailed, coding the CELP sub-frame may be restricted in that a restricted number of CELP parameters is used, some parameters used for coding the preceding CELP frame being reused advantageously.
For example, only the excitation may be modelled for coding the extra CELP sub-frame, and bits are thus reserved only for the fixed dictionary vector, for the adaptive dictionary vector, and for the gain vector. The number of bits attributed to each of these parameters is deduced from the bit rate assigned for coding this extra CELP sub-frame at step 402. For example, Table 1/G722.2—Distribution of bits for the AMR-WB coding algorithm for a 20 ms frame, originating from the July 2003 version of the G.722.2 of the ITU-T, gives examples of bit allocations by a CELP parameter depending on the assigned bit rate.
In the previous example, where coding the sub-frame is restricted, budgl corresponds to the sum of bits attributed to the adaptive dictionary, to the fixed dictionary, and to the gain vector respectively. For example, for an assigned bit rate of 19.85 kbit/s, by referring to the above-mentioned Tablel/G722, 9 bits are allocated to the fixed dictionary (tonal lead time), and 7 bits are allocated to the gain vector (directory gain). In this case, budg1 is equal to 88 bits.
The num_bits variable may thus be updated:
num_bits:=num_bits−budg1
The disclosure may also provide taking into account the frame categories in the allocation of bits to the CELP parameters. For example, the G.718 norm of the ITU-T, in its June 2008 version, sections 6.8 and 8.1, gives the budgets to allocate to each CELP parameter depending on categories, or modes such as non-voiced mode (UC), the voiced mode (VC), the transition mode (TC), and the generic mode (GC), and depending on the allocated bit rate (layerl or layer2, corresponding to rates of 8 kbit/s and 8+4 kbit/s, respectively). The coder G.718 is a hierarchic coder, but it is possible to combine the CELP coding principles using a G 718 categorisation with the multi-rate allocation of AMR-WB.
If it has been determined at step 401 that the CELP core used for the preceding CELP frame has a 16 kHz frequency, the method comprises a step 405 of assigning a bit rate, labelled cbrate, for CELP coding the transition sub-frame, the bit rate being equal to the minimum between the bit rate for MGCT coding the transition frame and a first predetermined value of the bit rate. In the case of the 16 kHz core, the first predetermined value may be fixed at 22.6 kbit/s for example, which allows to ensure a satisfying bit budget for transform coding. Thus, the first predetermined value depends on the CELP core used for code the preceding CELP frame. Furthermore, for coding the 16 kHz core, threshold values may be applied when assigning a bit rate to CELP coding. Thus, the assigned bit rate is further equal to the maximum between the bit rate for the transform coded transition frame and at least one predetermined second bit rate value, the second value being lower than the first value. The second predetermined value of the trade may for example be 14.8 kbit/s. Thus, if the bit rate for the transform coded transition frame is lower than 14.8 kbit/s, the bit rate assigned for CELP coding the transition sub-frame may be 14.8 kbit/s.
In a complementary embodiment, if the bit rate for the transform coded transition frame is lower than 8 kbit/s, the assigned rate may be 8 kbit/s.
Thus, according to this complementary embodiment, the following algorithm is obtained:
At an optional step 407, the assigned bit rate is compared to a CELP bit rate of 11.60 kbit/s. If the assigned bit rate is higher, a bit may be reserved for coding a low-pass filtering bit indication of the adaptive dictionary. The num bits variable is updated:
num_bits:=num_bits−1
At step 408, in the same manner as that of step 404, a first number of bits budgl is allocated for predictive coding the extra CELP sub-frame, and budg1 depends on the bit rate assigned for CELP coding the transition sub-frame.
At step 410 which is common to coding at various core frequencies, a second number is allocated for transform coding the transition frame, labelled budg2, is calculated from the first number of bits budg1 the total number of bits of the transition frame. Regarding the calculations above, budg2 is equal to the num bits variable. Generally, the mode of the transition current frame is here assumed to be imputed to the MDCT coding budget, this information is thus not explicitly taken into account.
The preceding steps may have been implemented for coding a frequency low band of the transition sub-frame, in cases where the audio signal is decomposed into at least one frequency low band and one frequency high band. At an optional step 409 preceding step 410, also common to coding at different core frequencies, the method may comprise allocating a third predetermined number of bits, labelled budg3 for coding the frequency high band of the transition sub-frame. In this case, the second number of bits budg2 is calculated both from the first number of bits budg1 and the third number of bits budg3.
As previously explained, coding the frequency high band (or extending the band) of the transition sub-frame may be based on a correlation between the preceding frame of the audio signal and the transition sub-frame. For example, coding the frequency high band may be decomposed into two steps.
In the first step, the preceding frame and the current frame of the audio signal are filtered by a high pass filter to only keep the higher part of the spectrum. The high part of the spectrum may correspond to frequencies higher than that of the CELP core used. For example, if the CELP core used is the 12.8 kHz CELP core, the high band corresponds to the audio signal for which the frequencies lower than 12.8 kHz have been filtered. Such filtering may be implemented by means of an FIR filter.
In a second step, searching a correlation between the filtered parts of the preceding frame and the current frame is implemented. Such correlation search allows estimating a delay parameter and then a gain. The gain corresponds to the amplitude ratio between the filtered part of the current frame and the signal predicted by applying the delay.
For example, 6 bits may be allocated for the gain and 6 bits for the delay. The third number of bits budg3 is then equal to 12.
The num bits variable may then be updated:
num_bits:=num_bits−budg3.
The second number of bits budg2 is then equal to the updated num_bits variable.
The decoder 500 comprises a receiving unit 501 for receiving, at step 601, the coded digital signal (or bit flow) originating from the coder 500 of
At step 603, it is verified that the current frame is a transition frame.
If the current frame is not a transition frame, it is verified at step 604 that the current frame is a CELP frame. If that is the case, the frame is transmitted to a CELP decoder 504 able to decode a CELP frame at step 605, with the core frequency indicated by the categorising units 502. After decoding a CELP frame, the CELP decoder 504 may store, at step 606, in a memory 506, parameters such as the linear prediction filter coefficients A(z) and internal states such as predictive energy in cases where the following frame is a transition frame.
As an output from the CELP decoder 504, the signal may be re-sampled, at step 607, with the output frequency of the decoder 500 by a re-sampling unit 505. In an embodiment of the disclosure, the re-sampling unit comprises an FIR filter and re-sampling introduces a delay of (for example) 1.25 ms. In an embodiment, post-processing may be applied to CELP decoding before or after re-sampling.
As mentioned above, in an embodiment, extending the band may also be performed, by a managing unit of band extension 5051 at steps 6071 and 6151, with decoding associated with the high band when the current frame is of CELP type. The high band is then combined to CELP coding with potentially an additional delay applied to the CELP synthesis at low band.
The signal decoded by the CELP decoder and re-sampled, potentially post-processed before or after re-sampling, is transmitted to an output interface 510 of the decoder at step 608.
The decoder 500 further comprises an MDCT decoder 507. In cases where has been determined at step 604 that the current frame is an MDCT frame, the MDCT decoder 507 is able to decode the MDCT frame in a classic manner at step 609. Furthermore, a delay corresponding to the delay necessary for the re-sampling application of the signal originating from the CELP decoder 504 is applied at the decoder output by a delay unit 508, so as to synchronise the MDCT synthesis with the CELP synthesis, at step 610. The signal decoded by MDCT and delayed is transmitted to the output interface 510 of the decoder at step 608.
In cases where the current frame is determined as being a transition frame after step 603, a device for determining the distribution of bits 503 is able to determine, at step 611, the first number of bits budg1 allocated for CELP coding the transition frame and the second number of bits budg3, allocated for transform coding the transition frame. The device 503 may correspond to the device 700 described in details in reference to
The MDCT decoder 507 uses the third number of bits budg3 calculated by the determining unit 503 for adjusting the rate necessary for decoding the transition frame. The MDCT decoder 507 further zeroes the memory of the MDCT transformation and decodes the transition frame at step 612. The signal originating from the MDCT decoder is then delayed by the delay unit 508 at step 613.
In parallel, the CELP decoder 504 decodes the transition CELP sub-frame based on the first number of bits budgl, at step 614. For this purpose, the CELP decoder 504 decodes the CELP parameters that may depend on the current frame category, and comprising for example pitch values from the adaptive dictionary, the fixed and gains dictionary of the CELP sub-frame, and uses the linear prediction filter coefficients. Furthermore, the CELP decoder 504 updates the CELP decoding states. These states may typically comprise the predictive energy of the innovation originating from the preceding CELP frame for generating the signal sub-frame over 4 ms or 5 ms according to whether the 12.8 kHz or 16 kHz CELP core is being used (in the case of restricted coding the transition CELP sub-frame).
As previously mentioned, the application WO2012/085451 provides extra coding a sub-frame of 5 ms for the 12.8 kHz CELP core and two extra sub-frames of 4 ms for the 16 kHz CELP core.
As explained in reference to
In the 16 kHz case, to extra CELP sub-frames are coded at the start of the transition frame, which only leaves very little budget for coding the transition MDCT frame and may lead to a quality decrease with respect to MDCT coding at “full rate” in the current frame.
Thus, the solution of the international application WO2012/085451 is not satisfying.
An independent aspect of the disclosure provides, from a single extra transition CELP sub-frame, partially generating a second sub-frame by reusing the coding parameters used for coding the transition CELP sub-frame. The delay is thus filled, by ensuring sufficient overlap-add, and without affecting the MDCT coding rate of the transition frame.
For this purpose, the disclosure also aims for a method of decoding a coded digital signal P, in a decoder 500 able to decode the signal frames according to predictive decoding or according to transform decoding, comprising the following steps:
The disclosure further aims for the decoder 500 for implementing the method of decoding P, as well as a computer program comprising the instructions for performing the method of decoding P, when these instructions are executed by a processor.
The CELP parameters reused for generating the second sub-frame may be the gain vector, the adaptive dictionary vector, and the fixed dictionary vector.
According to an embodiment of the method of decoding P, a minimum overlap value may be predefined for transform decoding and the number of generated samples from the second sub-frame is determined based on the minimum overlap value. This last sub-frame may be generated without extra information by prolonging the CELP synthesis by repeating the pitch prediction with the same pitch delay and the same adaptive dictionary gain as in the first sub-frame, and by performing a synthesis LPC filtering with the same LPC coefficients and a de-emphasis or de-accentuation.
The second CELP sub-frame may then be truncated so as to only preserve 1.25 ms of signal in the case of the 12.8 kHz CELP core, and 2.25 ms of signal in the case of the 16 kHz CELP core. The first CELP sub-frame is thus completed so as to have 6.25 ms of extra signal allowing filling the gap and ensuring a satisfying overlap-add (minimum overlap value, for example of 1.875 ms) with the MDCT transition frame. In an embodiment, the extra CELP sub-frame has a length extended to 6.25 ms for the 12.8 and 16 kHz CELP cores, which implies modifying “normal” CELP coding for having such length of extended sub-frame, in particular for the fixed dictionary.
In addition to the previous embodiment of the method of decoding P, the method P may further comprise a step 615 of re-sampling performed by a finite impulse response filter. As previously explained, the FIR filter may be integrated into the re-sampling unit 505. Re-sampling uses the FIR filter memory from the preceding CELP frame and processing induces an extra delay of 1.25 ms in this example.
The method P may further involve a step of adding an additional signal obtained from samples stored in the finite impulse response filter memory, to fill the delay introduced by the re-sampling step. Thus, 1.25 ms of signal, in addition to the 6.25 ms of additional signal previously generated, are generated by the decoder 500, these samples advantageously allowing filling the delay introduced by the re-sampling of 6.25 ms of additional signal.
For this purpose, the FIR filter memory of the re-sampling unit 505 may be saved for each frame after CELP decoding. The number of samples in this memory corresponds to 1.25 ms at the CELP core frequency considered (12.8 or 16 kHz).
According to a complementary embodiment of the method P, re-sampling the stored samples is performed by an interpolation method introducing a second delay shorter than the first delay from the finite impulse response filter, which may be considered as null. Thus, the 1.25 ms of signal generated from the FIR filter memory, are re-sampled according to a method implying a minimum delay. For example, re-sampling the 1.25 ms of signals generated by the FIR filter memory may be performed by cubic interpolation, which implies a delay from two samples only, a minimum delay compared to the delay from the FIR filter. Thus, the two extra signal samples are required for re-sampling the above-mentioned 1.25 ms of signal: these two extra samples may be obtained by repeating the last value of re-sampling memory of the FIR filter.
The decoder may further decode the high-frequency part from the 6.25 ms of CELP signal obtained from the first and second transition frames. For this purpose, the CELP decoder 504 may use the adaptive gain and the fixed dictionary vector from the last sub-frame of the preceding CELP frame.
The decoder 500 further comprises an overlap-add unit 509 able to ensure the overlap-add, at a step 616, between the decoded and re-sampled CELP transition sub-frames, the samples re-sampled by cubic interpolation, and the decoded signal of the transition frame originating from the MDCT decoder 507.
For this purpose, the unit 509 applies the synthesis modified window 327 of
The transition frame thus obtained is transmitted to the output interface 510 of the decoder at step 608.
The device comprises a random access memory 704 and a processor 703 for storing instructions allowing implementing the method of determining the distribution of bits for a transition frame described above. The device also involves a mass memory 705 for storing data intended for being preserved after applying the method. The device 700 further involves an input interface 701 and an output interface 706 intended for receiving the digital signal frames and for emitting the detail for the budget allocated to these different frames, respectively.
The device 700 may further involve a digital signal processor (DSP) 702. This DSP 702 receives digital signal frames for forming, demodulating, and amplifying these frames, in a known manner known per se.
The present disclosure does not limit itself to the embodiments described above for example purposes; it extends to other variants.
Thus, an embodiment has been described wherein the compression or decompression devices are entities as a whole. Of course, the devices may be embedded in all types of more significant devices such as for example a digital camera, a photo camera, a mobile phone, a computer, a cinema projector, etc.
Moreover, an embodiment proposing a particular design of the compression, decompression, and comparison devices has been described. These designs are only given for illustrative purposes. Thus, an arrangement of the components and a different distribution of the tasks assigned for each of the components may also be considered. For example, the tasks performed by the digital signal processor (DSP) may also be performed by a classic processor.
Number | Date | Country | Kind |
---|---|---|---|
14 57353 | Jul 2014 | FR | national |
This application is a continuation of U.S. patent application Ser. No. 15/329,671 filed Jan. 27, 2017, which is the U.S. national phase of the International Patent Application No. PCT/FR2015/052073 filed Jul. 27, 2015, which claims the benefit of French Application No. 14 57353 filed Jul. 29, 2014, the contents being incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 15329671 | Jan 2017 | US |
Child | 16775569 | US |