The present invention is related to audio signal processing and, particularly, to audio encoder/decoders applying a signal-dependent number and precision control.
Modern transform based audio coders apply a series of psychoacoustically motivated processings to a spectral representation of an audio segment (a frame) to obtain a residual spectrum. This residual spectrum is quantized and the coefficients are encoded using entropy coding.
In this process, the quantization step-size, which is usually controlled through a global gain, has a direct impact on the bit-consumption of the entropy coder and needs to be selected in such a way that the bit-budget, which is usually limited and often fix, is met. Since the bit consumption of an entropy coder, and in particular an arithmetic coder, is not known exactly prior to encoding, calculating the optimal global gain can only be done in a closed-loop iteration of quantization and encoding. This is, however, not feasible under certain complexity constraints as arithmetic encoding comes with a significant computational complexity.
State of the art coders as can be found in the 3GPP EVS codec therefore usually feature a bit-consumption estimator for deriving a first global gain estimate, which usually operates on the power spectrum of the residual signal. Depending on complexity constraint this may be followed by a rate-loop to refine the first estimate. Using such an estimate alone or in conjunction with a very limited correction capacity reduces complexity but also reduces accuracy leading either to significant under or overestimations of the bit-consumption.
Overestimation of the bit-consumption leads to excess bits after the first encoding stage. State of the art encoders use these to refine the quantization of the encoded coefficients in a second coding stage referred to as residual coding. Residual coding is fundamentally different from the first encoding stage as it works on bit-granularity and thus does not incorporate any entropy coding. Furthermore, residual coding is usually only applied at frequencies with quantized values unequal to zero, leaving dead-zones that are not further improved.
On the other hand, an underestimation of the bit-consumption inevitably leads to partial loss of spectral coefficients, usually the highest frequencies. In state of the art encoders this effect is mitigated by applying noise substitution at the decoder, which is based on the assumption that high frequency content is usually noisy.
In this setup it is evident, that it is desirable to encode as much of the signal as possible in the first encoding step, which uses entropy coding and is therefore more efficient than the residual coding step. Therefore, one would like to select the global gain with a bit estimate as close to the available bit-budget as possible. While the power spectrum based estimator works well for most audio content, it can cause problems for highly tonal signals, where the first stage estimation is mainly based on irrelevant side-lobes of the frequency decomposition of the filter-bank while important components are lost due to underestimation of the bit-consumption.
According to an embodiment, an audio encoder for encoding audio input data may have: a preprocessor for preprocessing the audio input data to obtain audio data to be coded; a coder processor for coding the audio data to be coded; and a controller for controlling the coder processor so that, depending on a first signal characteristic of a first frame of the audio data to be coded, a number of audio data items of the audio data to be coded by the coder processor for the first frame is reduced compared to a second signal characteristic of a second frame, and a first number of information units used for coding the reduced number of audio data items for the first frame is stronger enhanced compared to a second number of information units for the second frame.
According to another embodiment, a method of encoding audio input data may have the steps of: preprocessing the audio input data to obtain audio data to be coded; coding the audio data to be coded; and controlling the coding so that, depending on a first signal characteristic of a first frame of the audio data to be coded, a number of audio data items of the audio data to be coded for the first frame is reduced compared to a second signal characteristic of a second frame, and a first number of information units used for coding the reduced number of audio data items for the first frame is stronger enhanced compared to a second number of information units for the second frame.
Still another embodiment may have a non-transitory digital storage medium having stored thereon a computer program for performing a method of encoding audio input data having the steps of: preprocessing the audio input data to obtain audio data to be coded; coding the audio data to be coded; and controlling the coding so that, depending on a first signal characteristic of a first frame of the audio data to be coded, a number of audio data items of the audio data to be coded for the first frame is reduced compared to a second signal characteristic of a second frame, and a first number of information units used for coding the reduced number of audio data items for the first frame is stronger enhanced compared to a second number of information units for the second frame, when said computer program is run by a computer.
The present invention is based on the finding that, in order to enhance the efficiency particularly with respect to the bitrate on the one hand and the audio quality on the other hand, a signal-dependent change with respect to the typical situation that is given by psychoacoustic considerations is entailed. Typical psychoacoustic models or psychoacoustic considerations result in a good audio quality at a low bitrate for all signal classes in average, i.e., for all audio signal frames irrespective of their signal characteristic, when an average result is contemplated. However, it has been found that for certain signal classes or for signals having certain signal characteristics such as quite tonal signals, the straightforward psychoacoustic model or the straight forward psychoacoustic control of the encoder only results in sub-optimum outcomes with respect to audio quality (when the bitrate is kept constant), or with respect to bitrate (when the audio quality is kept constant).
Therefore, in order to address this shortcoming of typical psychoacoustic considerations, the present invention provides, in the context of an audio encoder with a preprocessor for preprocessing the audio input data to obtain audio data to be encoded, and a coder processor for coding the audio data to be coded, a controller for controlling the coder processor in such a way that, depending on a certain signal characteristic of a frame, a number of audio data items of the audio data to be coded by the coder processor is reduced compared to typical straightforward results obtained by state of the art psychoacoustic considerations. Furthermore, this reduction of the number of audio data items is done in a signal-dependent way so that, for a frame with a certain first signal characteristic, the number is stronger reduced than for another frame with another signal characteristic that differs from the signal characteristic from the first frame. This reduction in the number of audio data items can be considered to be a reduction in the absolute number or a reduction in the relative number, although this is not decisive. It is, however, a feature that the information units that are “saved” by the intentional reduction of the number of audio data items are not simply lost, but are used for more precisely coding the remaining number of data items, i.e., the data items that have not been eliminated by the intentional reduction of the number of audio data items.
In accordance with the invention, the controller for controlling the coder processor operates in such a way that, depending on the first signal characteristic of a first frame of the audio data to be coded, a number of audio data items of the audio data to be coded by the coder processor for the first frame is reduced compared to a second signal characteristic of a second frame, and, at the same time, a first number of information units used for coding the reduced number of audio data items for the first frame is stronger enhanced compared to a second number of information units for the second frame.
In an embodiment, the reduction is done in such a way that, for more tonal signal frames, a stronger reduction is performed and, at the same time, the number of bits for the individual lines is stronger enhanced compared to a frame that is less tonal, i.e., that is more noisy. Here, the number is not reduced to such a high degree and, correspondingly, the number of information units used for encoding the less tonal audio data items is not increased so much.
The present invention provides a framework where, in a signal dependent way, typically provided psychoacoustic considerations are more or less violated. On the other hand, however, this violation is not treated as in normal encoders, where a violation of psychoacoustic considerations is, for example, done in an emergency situation such as a situation where, in order to maintain a bitrate used, higher frequency portions are set to zero. Instead, in accordance with the present invention, such a violation of normal psychoacoustic considerations is done irrespective of any emergency situation and the “saved” information units are applied to further refine the “surviving” audio data items.
In embodiments, a two-stage coder processor is used that has, as an initial coding stage, for example, an entropy encoder such as an arithmetic encoder, or a variable length encoder such as a Huffman coder. The second coding stage serves as a refinement stage and this second encoder is typically implemented in embodiments as a residual coder or a bit coder operating on a bit-granularity which can, for example, be implemented by adding a certain defined offset in case of a first value of an information unit or subtracting an offset in case of an opposite value of the information unit. In an embodiment, this refinement coder may be implemented as a residual coder adding an offset in case of a first bit value and subtracting an offset in case of a second bit value. In an embodiment, the reduction of the number of audio data items results in a situation that the distribution of the available bits in a typical fixed frame rate scenario is changed in such a way that the initial coding stage receives a lower bit-budget than the refinement coding stage. Up to now, the paradigm was that the initial coding stage was to receive a bit-budget that is as high as possible irrespective of the signal characteristic since it was believed that the initial coding stage such as an arithmetic coding stage has the highest efficiency and, therefore, codes much better than a residual coding stage from an entropy point of view. In accordance with the present invention, however, this paradigm is removed, since it has been found that for certain signals such as, for example, signals with a higher tonality, the efficiency of the entropy coder such as an arithmetic coder is not as high as an efficiency as obtained by a subsequently connected residual coder such as a bit coder. However, while it is true that the entropy coding stage is highly efficient for audio signals in average, the present invention now addresses this issue by not looking on the average but by reducing the bit-budget for the initial coding stage in a signal-dependent way and, advantageously, for tonal signal portions.
In an embodiment, the bit-budget shift from the initial coding stage to the refinement coding stage based on the signal characteristic of the input data is done in such a way that at least two refinement information units are available for at least one, and advantageously 50% and even more advantageously all audio data items that have survived the reduction of the number of data items. Furthermore, it has been found that a particularly efficient procedure for calculating these refinement information units on the encoder-side and applying these refinement information units on the decoder-side is an iterative procedure where, in a certain order such as from a low frequency to a high frequency, the remaining bits from the bit-budget for the refinement coding stage are consumed one after the other. Depending on the number of surviving audio data items and depending on the number of information units for the refinement coding stage, the number of iterations can be significantly greater than two and, it has been found that for strongly tonal signal frames, the number of iterations can be four, five or even higher.
In an embodiment, the determination of a control value by the controller is done in an indirect way, i.e., without an explicit determination of the signal characteristic. To this end, the control value is calculated based on manipulated input data, where this manipulated input data are, for example, the input data to be quantized or amplitude-related data derived from the data to be quantized. Although the control value for the coder processor is determined based on manipulated data, the actual quantization/encoding is performed without this manipulation. In such a way, the signal-dependent procedure is obtained by determining a manipulation value for the manipulation in a signal-dependent way where this manipulation more or less influences the obtained reduction of the number of audio data items, without explicit knowledge of the specific signal characteristic.
In another implementation, the direct mode can be applied, in which a certain signal characteristic is directly estimated and dependent on the result of this signal analysis, a certain reduction of the number of data items is performed in order to obtain a higher precision for the surviving data items.
In a further implementation, a separated procedure can be applied for the purpose of reduction of audio data items. In the separated procedure, a certain number of data items is obtained by means of a quantization controlled by a typically psychoacoustically driven quantizer control and based on the input audio signal, the already quantized audio data items are reduced with respect to their number and, advantageously, this reduction is done by eliminating the smallest audio data items with respect to their amplitude, their energy, or their power. The control for the reduction can, once again, be obtained by a direct/explicit signal characteristic determination or by an indirect or non-explicit signal control.
In a further embodiment, the integrated procedure is applied, in which the variable quantizer is controlled to perform a single quantization but based on manipulated data where, at the same time, the non-manipulated data is quantized. A quantizer control value such as a global gain is calculated using signal-dependent manipulated data while the data without this manipulation is quantized and the result of the quantization is coded using all available information units so that, in the case of a two-stage coding, a typically high amount of information units for the refinement coding stage remains.
Embodiments provide a solution to the problem of quality loss for highly tonal content which is based on a modification of the power spectrum that is used for estimating the bit-consumption of the entropy coder. This modification exists of a signal-adaptive noise-floor adder that keep the estimate for common audio content with a flat residual spectrum practically unchanged while it increases the bit-budget estimate for highly tonal content. The effect of this modification is twofold. Firstly, it causes filter-bank noise and irrelevant side-lobes of harmonic components, which are overlayed by the noise floor, to be quantized to zero. Second, it shifts bits from the first encoding stage to the residual coding stage. While such a shift is not desirable for most signals, it is fully efficient for highly tonal signals since the bits are used to increase the quantization accuracy of harmonic components. This means they are used to code bits with low significance which usually follow a uniform distribution and therefore are fully efficiently encoded with a binary representation. Furthermore, the procedure is computationally inexpensive making it a very effective tool for solving the aforementioned problem.
Embodiments of the present invention are subsequently disclosed with respect to the accompanying drawings, in which:
The controller 20 of
Furthermore, the refinement coding stage 152 is configured to use a first frame remaining number of information units for a refinement coding for the reduced number of audio data items for the first frame, and the first frame initial number of information units added to the first frame remaining number of information units result in a predetermined number of information units for the first frame. Particularly, the refinement coding stage 152 outputs the first frame remaining number of bits and the second frame remaining number of bits and there do exist at least two refinement bits for at least one or advantageously at least 50% or even more advantageously all non-zero audio data items, i.e., the audio data items that survive the reduction of audio data items and that are initially coded by the initial coding stage 151.
Advantageously, the predetermined number of information units for the first frame is equal to the predetermined number of information units for the second frame or quite close to the predetermined number of information units for the second frame so that a constant or substantially constant bitrate operation for the audio encoder is obtained.
As illustrated in
In an embodiment, the refinement coding stage is configured to iteratively assign the first frame remaining number of information units to the reduced number of audio data items of the first frame in at least two sequentially performed iterations. Particularly, the values of the assigned information units for the at least two sequentially performed iterations are calculated and the calculated values of the information unit for the at least two sequentially performed iterations are introduced into the encoded output frame in a predetermined order. Particularly, the refinement coding stage is configured to sequentially assign an information unit for each audio data item of the reduced number of audio data items for the first frame in an order from a low frequency information for the audio data item to a high frequency information for the audio data item in the first iteration. Particularly, the audio data items may be individual spectral values obtained by a time/spectral conversion. Alternatively, the audio data items can be tuples of two or more spectral lines typically being adjacent to each other in the spectrum. The, the calculation of the bit values takes place from a certain starting value with a low frequency information to a certain end value with the highest frequency information and, in a further iteration, the same procedure is performed, i.e., once again the processing from low spectral information values/tuples to high spectrum information values/tuples. Particularly, the refinement coding stage 152 is configured to check, whether a number of already assigned information units is lower than a predetermined number of information units for the first frame less than the first frame initial number of information units and the refinement coding stage is also configured to stop the second iteration in case of a negative check result, or in case of a positive check result, to perform a number of further iterations, until a negative check result is obtained, where the number of further iterations is 1, 2 . . . . Advantageously, the maximum number of iterations is bounded by a two-digit number such as a value between 10 and 30 and advantageously 20 iterations. In an alternative embodiment, a check for a maximal number of iterations can be omitted, if the non-zero spectral lines were counted first and the number of residual bits were adjusted accordingly for each iteration or for the whole procedure. Hence, when there are for example 20 surviving spectral tuples and 50 residual bits, one can, without any check during the procedure in the encoder or the decoder determine that the number of iterations is three and in the third iteration, a refinement bit is to be calculated or is available in the bitstream for the first ten spectral lines/tuples. Thus, this alternative does not require a check during the iteration processing, since the information on the number of non-zero or surviving audio items is known subsequent to the processing of the initial stage in the encoder or the decoder.
In step 300, surviving audio data items are determined. This determination can be automatically performed by operating on the audio data items that have already been processed by the initial coding stage 151 of
In step 306, the offset is reduced with a predetermined rule. This predetermine rule may, for example, be that the offset is halved, i.e., that the new offset is half the original offset. However, other offset reduction rules can be applied as well that are different from the 0.5 weighting.
In step 308, the bit values for each item in the predefined sequence are again calculated, but now in the second iteration. As an input into the second iteration, the refined items after the first iteration illustrated at 307 are input. Thus, for the calculation in step 314, the refinement represented by the first iteration refinement information units is already applied and under the prerequisite that refinement bits are still available as indicated in step 314, the second iteration refinement information units are calculated and output at 318.
In step 310, the offset is again reduced with a predetermined rule to be ready for the third iteration and the third iteration once again relies on the refined items after the second iteration illustrated at 309 and again under the prerequisite that the refinement bits are still available as indicated at 314, the third iteration refinement information units are calculated and output at 320.
Furthermore, the numbers of initial number of bits on the one hand and the remaining number of bits on the other hand is only exemplary. Typically, the initial number of bits that typically encode the most significant bit portion of the audio data item such as spectral values or tuples of spectral values is greater than the iteration refinement bits that represent the least significant portion of the “surviving” audio data items. Furthermore, the initial number of bits 400 are typically determined by means of an entropy coder or arithmetic encoder, but the iteration refinement bits are determined using a residual or bit encoder operating on an information unit granularity. Although the refinement coding stage does not perform any entropy coding or so, the encoding of the least significant bit portion of the audio data items nevertheless is more efficiently done by the refinement coding stage, since one can assume that the least significant bit portion of the audio data items such as spectral values are equally distributed and, therefore, any entropy coding with a variable length code or an arithmetic code together with a certain context does not introduce any additional advantage, but to the contrary even introduces additional overhead.
In other words, for the least significant bit portion of the audio data items, the usage of an arithmetic coder would be less efficient than the usage of a bit encoder, since the bit encoder does not require any bitrate for a certain context. The intentional reduction of audio data items as induced by the controller not only enhances the precision of the dominant spectral lines or line tuples, but additionally, provides a highly efficient encoding operation for the purpose of refining the MSB portions of these audio data items represented by the arithmetic or variable length code.
In view of that several and for example the following advantages are obtained by means of the implementation of the coder processor 15 of
An efficient two-stage coding scheme is proposed, comprising a first entropy coding stage and a second residual coding stage based on single-bit (non-entropy) encoding.
The scheme employs a low complexity global gain estimator which incorporates an energy based bit-consumption estimator for the first coding stage featuring a signal-adaptive noise floor adder.
The noise floor adder effectively transfers bits from the first encoding stage to the second encoding stage for highly tonal signals while leaving the estimate for other signal types unchanged. This shift of bits from an entropy coding stage to a non-entropy coding stage is fully efficient for highly tonal signals.
With respect to a decoder-side processing, reference is made to
In an implementation for an audio decoder that corresponds to the audio encoder of
As illustrated in
Subsequently, an implementation of the refinement decoding stage under the control of the controller 60 is illustrated with respect to
This procedure is continued until all iteration refinement bits included in the bitstream for a frame are processed. This is checked by the controller 60 via control line 814 that controls a remaining availability of refinement bits advantageously for each iteration but at least for the second and the third iterations processed in blocks 808, 812. In each iteration, the controller 60 controls the refinement decoding stage to check, whether a number of already read information units is lower than the number of information units in the frame remaining information units for the frame to stop the second iteration in case of a negative check result, or in case of a positive check result, to perform a number of further iterations until a negative check result is obtained. The number of further iterations is at least one. Due to the application of similar procedures on the encoder-side discussed in the context of
In the implementation, the refinement decoding stage 52 is configured to add an offset to the initially decoded data item, when a read information data unit of the frame remaining number of information units has a first value and to subtract an offset from the initially decoded item, when the read information data unit of the frame remaining number of information units has a second value. This offset is, for the first iteration, the start offset 805 of
In an implementation of the present invention using an indirect mode illustrated in
When, however, the manipulation is not performed to the audio data of the first frame or the second frame, but is applied to amplitude-related values derived from the audio data of the first frame or the second frame, the control preprocessor 22 is there and the bypass line is not existing. The actual manipulation is performed by the combiner 24 that combines the manipulation value output from block 23 to the amplitude-related values derived from the audio data of a certain frame. At the output of the combiner 24, there do exist manipulated (advantageously energy) data, and based on these manipulated data, a global gain calculator 25 calculates a global gain or at least a control value for the global gain indicated at 404. The global gain calculator 25 has to apply restrictions with respect to an allowed bit-budget for the spectrum so that a certain data rate or a certain number of information units allowed for a frame is obtained.
In the direct mode illustrated at
Although the indirect mode of
As illustrated in
The present invention does not result in a coarser quantization that is typically obtained by applying a greater global gain. Instead, this calculation of the global gain based on a signal-dependent manipulated data only results in a bit-budget shift from the initial coding stage that receives a smaller bit-budget to the refinement decoding stage that receives a higher bit-budget, but this bit-budget shift is done in a signal-dependent way and is greater for a higher tonality signal portion.
Advantageously, the control preprocessor 22 of
Alternatively, as indicated by the bypass line, values obtained by the same magnitude of the manipulation value calculated by block 23, but advantageously with randomized signs, and/or values obtained by a subtraction of slightly different terms from the same magnitude (but advantageously with randomized signs) or complex manipulation value or, more generally, values obtained as samples from a certain normalized probability distribution scaled using the calculated complex or real magnitude of the manipulation value are added to all audio values of a plurality of audio values included in the frame. The procedure performed by the control preprocessor 22 such as calculating a power spectrum and downsampling can be included within the global gain calculator 25. Hence, advantageously, a noise floor is added either to the spectral audio values directly or alternatively to the amplitude-related values derived from the audio data per frame, i.e., the output of the control preprocessor 22. Advantageously, the controller preprocessor calculates a downsampled power spectrum which corresponds to the usage of an exponentiation with an exponent value being equal to 2. Alternatively, however, a different exponent value greater than 1 can be used. Exemplarily, an exponent value being equal to 3 would represent a loudness rather than a power. But, other exponent values such as smaller or greater exponent values can be used as well.
In the implementation illustrated in
Advantageously, the signal-independent contribution is determined by means of a bitrate for an actual encoder session, a frame duration or a sampling frequency for an actual encoder session. Furthermore, the calculator 28 for calculating one or more moments per frame is configured to calculate a signal-dependent weighting value derived from at least of a first sum of magnitudes of the audio data or downsampled audio data within the frame, the second sum of magnitudes of the audio data or the downsampled audio data within the frame multiplied by an index associated with each magnitude and the quotient of the second sum and the first sum.
In an implementation performed by the global gain calculator 25 of
Subsequently, embodiments are illustrated.
Detailed Description of the Encoder (e.g.
Notation
We denote by fs the underlying sampling frequency in Hz, by Nms the underlying frame duration in milliseconds and by br the underlying bitrate in bits per second.
Derivation of Residual Spectrum (e.g. Preprocessor 10)
The embodiment operates on a real residual spectrum Xf(k), k=0 . . . N−1, that is typically derived by a time to frequency transform like an MDCT followed by psychoacoustically motivated modifications like temporal noise shaping (TNS) to remove temporal structure and spectral noise shaping (SNS) to remove spectral structure. For audio content with slowly varying spectral envelope the envelope of the residual spectrum Xf(k) is therefore flat.
Global Gain Estimation (e.g.
Quantization of the spectrum is controlled by a global gain gglob via
The initial global gain estimate (item 22 of
PXlp(k)=Xf(4k)2+Xf(4k+1)2+Xf(4k+2)2+Xf(4k+3)2
and a signal adaptive noise floor N(Xf) which is given by
The parameter regBits depends on bitrate, frame duration and sampling frequency and is computed as
with C(Nms, fs) as specified in the table below.
The parameter lowBits depends on the center of mass of the absolute values of the residual spectrum and is computed as
are moments of the absolute spectrum.
The global gain is estimated in the form
from the values
E(k)=10 log10(PXlp(k)+N(Xf)+2−31), (e.g. output of combiner 24 of
It should be noted that adding the noise-floor term N(Xf) to PXlp(k) gives the expected result of adding a corresponding noise-floor to the residual spectrum Xf(k), e. g. randomly adding or subtracting the term 0.5 √N(Xf) to each spectral line, before calculating the power spectrum.
Pure power spectrum based estimates can already be found e.g. in the 3GPP EVS codec (3GPP TS 26.445, section 5.3.3.2.8.1). In embodiments, the addition of the noise floor N(Xf) is done. The noise floor is signal adaptive in two ways.
First, it scales with the maximal amplitude of Xf. Therefore, the impact on the energy of a flat spectrum, where all amplitudes are close to the maximal amplitude, is very small. But for highly tonal signals, where the spectrum and in extension also the residual spectrum features a number of strong peaks, the overall energy is increased significantly which increases the bit-estimate in the global gain computation as outlined below.
Second, the noise floor is lowered through the parameter lowBits if the spectrum exhibits a low center of mass. In this case a low frequency content is dominant whence the loss of high frequency components is likely not as critical as for high pitched tonal content.
The actual estimate of the global gain is performed (e.g. block 25 of
Residual Coding (e.g.
Residual coding uses the excess bits that are available after arithmetic encoding of the quantized spectrum Xq(k). Let B denote the number of excess bits and let K denote the number of encoded non-zero coefficients Xq(k). Furthermore, let ki, i=1 . . . K, denote the enumeration of these non-zero coefficients from lowest to highest frequency. The residual bits bi(j) (taking values 0 and 1) for coefficient ki are calculated as to minimize the error
This can be done in an iterative fashion testing whether
If (1) is true then the nth residual bit bi(n) for coefficient ki is set to 0 and otherwise it is set to 1. The calculation of residual bits is carried out by calculating a first residual bit for every ki and then a second bit and so on until all residual bits are spent or a maximal number nmax of iterations is carried out. This leaves
residual bits for coefficient Xq(ki). This residual coding scheme improves the residual coding scheme that is applied in the 3GPP EVS codec which spends at most one bit per non-zero coefficient.
The calculation of residual bits with nmax=20 is illustrated by the following pseudo-code, where gg denotes the global gain:
Description of the Decoder (e.g.
At the decoder, the entropy encoded spectrum is obtained by entropy decoding. The residual bits are used to refine this spectrum as demonstrated by the following pseudo code (see also e.g.
The decoded residual spectrum is given by
(k)=gglob(k).
Alternatively to the procedure in
Contrary thereto,
The present invention can be implemented at least in four different modes. The determination of the control value can be done in the direct mode with an explicit signal characteristic determination or in an indirect mode without an explicit signal characteristic determination but with the addition of a signal-dependent noise floor to the audio data or to derived audio data as an example for a manipulation. At the same time, the reduction of audio data items is done in an integrated manner or in a separated manner. An indirect determination and an integrated reduction or an indirect generation of the control value and a separated reduction can be performed as well. Additionally, a direct determination together with an integrated reduction and a direct determination of the control value together with a separated reduction can be performed as well. For the purpose of low efficiency, an indirect determination of the control value together with an integrated reduction of audio data items is of advantage.
It is to be mentioned here that all alternatives or aspects as discussed before and all aspects as defined by independent claims in the following claims can be used individually, i.e., without any other alternative or object than the contemplated alternative, object or independent claim. However, in other embodiments, two or more of the alternatives or the aspects or the independent claims can be combined with each other and, in other embodiments, all aspects, or alternatives and all independent claims can be combined to each other.
An inventively encoded audio signal can be stored on a digital storage medium or a non-transitory storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet. Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.
The above described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
This application is a continuation of copending International Application No. PCT/EP2020/066088, filed Jun. 10, 2020, which is incorporated herein by reference in its entirety, and additionally claims priority from International Application No. PCT/EP2019/065897, filed Jun. 17, 2019, which is also incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
20050228651 | Wang et al. | Oct 2005 | A1 |
20070124141 | You | May 2007 | A1 |
20090099851 | Pilati | Apr 2009 | A1 |
20090326931 | Ragot et al. | Dec 2009 | A1 |
20100286991 | Hedelin | Nov 2010 | A1 |
20110173008 | Lecomte et al. | Jul 2011 | A1 |
20120185256 | Virette et al. | Jul 2012 | A1 |
20120253797 | Geiger | Oct 2012 | A1 |
20120290306 | Smyth | Nov 2012 | A1 |
20120323582 | Peng | Dec 2012 | A1 |
20130179175 | Biswas et al. | Jul 2013 | A1 |
20140142957 | Sung et al. | May 2014 | A1 |
20140200901 | Kawashima et al. | Jul 2014 | A1 |
20140303965 | Lee | Oct 2014 | A1 |
20150142452 | Sung et al. | May 2015 | A1 |
20150147354 | Epp | May 2015 | A1 |
20150179190 | Nagle | Jun 2015 | A1 |
20150255076 | Fejzo | Sep 2015 | A1 |
20150287417 | Disch | Oct 2015 | A1 |
20150332702 | Disch et al. | Nov 2015 | A1 |
20170142412 | Fuchs | May 2017 | A1 |
20170169833 | Lecomte et al. | Jun 2017 | A1 |
20170223356 | Sung et al. | Aug 2017 | A1 |
20170365263 | Disch et al. | Dec 2017 | A1 |
20180322863 | Bocklet et al. | Nov 2018 | A1 |
20190156843 | Multrus et al. | May 2019 | A1 |
20190158833 | Sung et al. | May 2019 | A1 |
20200273471 | Ravelli et al. | Aug 2020 | A1 |
20200273472 | Ravelli et al. | Aug 2020 | A1 |
20200286494 | Ravelli et al. | Sep 2020 | A1 |
Number | Date | Country |
---|---|---|
H07273659 | Oct 1995 | JP |
2005004119 | Jan 2005 | JP |
2019514065 | May 2019 | JP |
2021502606 | Jan 2021 | JP |
1020160145559 | Dec 2016 | KR |
10-2017-0037970 | Apr 2017 | KR |
2583717 | May 2016 | RU |
2679571 | Feb 2019 | RU |
201419265 | May 2014 | TW |
201724085 | Jul 2017 | TW |
I602172 | Oct 2017 | TW |
2010040503 | Apr 2010 | WO |
2017066312 | Apr 2017 | WO |
2017178329 | Oct 2017 | WO |
WO-2019091576 | May 2019 | WO |
Entry |
---|
Universal Mobile Telecommunications System (UMTS); LTE;“Codec for Enhanced Voice Services (EVS)”, Detailed algorithmic description, May 16, 2019, (3GPP TS 26.445 version 12.13.0 Release 12), pp. 1-662 (Year: 2019). |
Herre et al, “The Integrated Filterbank Based Scalable MPEG-4 Audio Coder”, 1998, In 105th AES Convention, pp. 1-20 (Year: 1998). |
Sugiura et al, “Golomb-rice coding optimized via LPC for frequency domain audio coder”, 2014, In2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP) Dec. 3, 2014 (pp. 1024-1028). IEEE. (Year: 2014). |
C Naveen Andrew, “Office Action for IN Application No. 202227002162”, dated Nov. 8, 2022, Intellectual Property India, India. |
ETSI TS, “Universal Mobile Telecommunications System (UMTS); LTE; Codec for Enhanced Voice Services (EVS); Detailed algorithmic description (3GPP TS 26.445 version 13.4.1 Release 13)”, ETSI TS 126 445 V13.4.1 (Apr. 2017), Apr. 6, 2017 (Apr. 6, 2017), pp. 2017-2024, XP055553737. |
Martin Dietz et al., “Overview of the EVS codec architecture”, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (I CASSP), Apr. 1, 2015 (Apr. 1, 2015), pp. 5698-5702, XP055290998. |
Srikanth Nagisetty et al., “Low bit rate high-quality MDCT audio coding of the 3GPP EVS standard”, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Apr. 1, 2015 (Apr. 1, 2015), pages, XP055297162. |
P. A. Volkov, “Office Action for RU Application No. 2022100599”, dated Apr. 26, 2022, Rospatent, Russia. |
Number | Date | Country | |
---|---|---|---|
20220101866 A1 | Mar 2022 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2020/066088 | Jun 2020 | WO |
Child | 17546540 | US | |
Parent | PCT/EP2019/065897 | Jun 2019 | WO |
Child | PCT/EP2020/066088 | US |