This invention relates to a frequency-differential encoding of sinusoidal model parameters.
In recent years, model based approaches for low bit-rate audio compression have gained increased interest. Typically, these parametric schemes decompose the audio waveform into various co-existing signal parts, e.g., a sinusoidal part, a noise-like part, and/or a transient part. Subsequently, model parameters describing each signal part are quantized, encoded, and transmitted to a decoder, where the quantized signal parts are synthesised and summed to form a reconstructed signal. Often, the sinusoidal part of the audio signal is represented using a sinusoidal model specified by amplitude, frequency, and possibly phase parameters. For most audio signals, the sinusoidal signal part is perceptually more important than the noise and transient parts, and consequently, a relatively large amount of the total bit budget is assigned for representing the sinusoidal model parameters. For example, in a known scalable audio coder described by T. S. Verma and T. H. Y. Meng in “A 6 kbps to 85 kbps scalable audio coder” Proc. IEEE Inst. Conf. Acoust., Speech Signal Processing, Pages 877-880, 2000, more than 70% of the available bits are used for representing sinusoidal parameters.
Usually, in order to reduce the bit rate needed for the sinusoidal model, inter-frame correlation between sinusoidal parameters is exploited using time-differential (TD) encoding schemes. Sinusoidal components in a current signal frame are associated with quantized components in the previous frame (thus forming ‘tonal tracks’ in the time-frequency plane), and the parameter differences are quantized and encoded. Components in the current frame that cannot be linked to past components are considered as start-ups of new tracks and are usually encoded directly, with no differential encoding. While efficient for reducing the bit rate in stationary signal regions, TD encoding is less efficient in regions with abrupt signal changes, since relatively few components can be associated with tonal tracks, and, consequently, a large number of components are encoded directly. Furthermore, to be able to reconstruct a signal from the differential parameters at the decoder, TD encoding is critically dependent on the assumption that the parameters of the previous frame have arrived unharmed. With some transmission channels, e.g. lossy packet networks like the Internet, this assumption may not be valid. Thus, in some cases an alternative to TD encoding is desirable.
One such alternative is frequency-differential (FD) encoding, where intra-frame correlation between sinusoidal components is exploited. In FD encoding, differences between parameters belonging to the same signal frame are quantized and encoded, thus eliminating the dependence on parameters from previous frames. FD encoding is well-known in sinusoidal based speech coding, and has recently been used for audio coding as well. Typically, sinusoidal components within a frame are quantized and encoded in increasing frequency order; first, the component with lowest frequency is encoded directly, and then higher frequency components are quantized and encoded one at a time relative to their nearest lower-frequency neighbor. While this approach is simple, it may not be optimal. For example, in some frames it may be more efficient to relax the nearest-neighbor constraint.
In arriving at the present invention, the inventors have sought to derive a more general method for FD encoding of sinusoidal model parameters. For given parameter quantizers and code-word lengths (in bits) corresponding to each quantization level, the proposed method finds the optimal combination of frequency differential and direct encoding of the sinusoidal components in a frame. The method is more general than existing schemes in the sense that it allows for parameter differences involving any component pair, that is to say, not necessarily frequency domain neighbors. Furthermore, unlike the simple scheme described above, several (in the extreme case, all) components may be encoded directly, if this turns out to be most efficient.
From a method of coding an audio signal, the method being characterised by a step of encoding parameters of a given sinusoidal component in encoded frames either differentially relative to other components in the same frame or directly, i.e. without differential encoding.
From various further aspects, the invention provides methods and apparatus set forth in the independent claims below. Further preferred features of embodiments of the invention are set forth in the dependent claims below.
Embodiments of the invention will now be described in detail, by way of example, and with reference to the accompanying drawings, in which:
a to 6c show examples of topologically identical and distinct solution trees;
Embodiments of the invention can be constituted in a system for transmitting audio signals over an unreliable communication link, such as the Internet. Such a system, shown diagrammatically in
Within the encoding device 22, the signal is encoded in accordance with a coding method comprising a step of encoding parameters of a given sinusoidal component either differentially relative to other components in the same frame or directly, i.e. without differential encoding. The method must determine whether or not to use differential coding at any stage in the encoding process.
In order to formulate the problem that must be solved by the method to arrive at this determination, consider the situation where a number of sinusoidal components s1, . . . , sK have been estimated in a signal frame. Each component sk is described by an amplitude ak and a frequency value ωk. For the purposes of the present description it is not necessary to consider phase values since these may be derived from the frequency parameters or quantized directly. Nonetheless, it will be seen that the invention may in fact be extended to phase values and/or other values such as damping coefficients.
Consider the following possibilities for quantization of the parameters of a given component:
The set of all possible combinations of direct and differential quantization is represented using a directed graph (digraph) D as illustrated in
The vertices s1, . . . , sK represent the sinusoidal components to be quantized. Edges between these vertices represent the possibilities for differential encoding, e.g., the edge between s1 and s4 represents quantization of the parameters of s4 relative to s1 (that is, â4=â1+Δâ14 for amplitude parameters). The vertex s0 is a dummy vertex introduced to represent the possibility of direct quantization. For example, the edge between s0 and s2 represents direct quantization of the parameters of s2. Each edge is assigned a weight wij, which corresponds to a cost in terms of rate and distortion of choosing the particular quantization represented by the edge. The basic task is to find a rate-distortion optimal combination of direct and differential encoding. This corresponds to finding the subset of K edges in D with minimum total cost, such that each vertex s1, . . . , sK has exactly one in-edge assigned.
The calculation of edge weights will now be described. In principle, each edge weight is of the form:
wij=rij+λdij Equation 1
where rij and dij are the rate (i.e. the numbers of bits) and the distortion, respectively, associated with this particular quantization, and λ is a Lagrange multiplier. Generally, since higher-indexed components sj are quantized relative to (already quantized) lower-indexed components as shown in
In
With this assumption, the quantizer levels that can be reached through direct and differential quantization are identical, and a given component will be quantized in the same way, independent of whether direct or differential quantization is used. This in turn means that the total distortion is constant for any combination of direct and differential encoding, and we can set λ=0 in equation 1. Furthermore, now all weight values of D can be calculated in advance as wij=rij, where
and the integer r(·) denotes the number of bits needed to represent the quantized parameter (·). In this example, the values of r(·) are found as entries in pre-calculated Huffman code-word tables.
In order to clearly understand the example, it is necessary to formulate the problem that is being addressed. Assuming that the signal frame in question contains K sinusoidal components to be encoded, we formulate the optimal FD encoding problem as follows:
Problem 1: For a given digraph D with edge weights wij, find the set of K edges with minimum total weight such that:
Constraint a) is essential since it ensures that each of the K sinusoidal components is quantized and encoded exactly once. Constraint b) enforces a particular simple structure on the K edge solution tree. This is of importance for reducing the amount of side information needed to tell the decoder how to combine the transmitted (delta-) amplitudes and frequencies.
In solving the above problem, two algorithms (referred to as Algorithm 1 and Algorithm 2) are provided. Algorithm 1 is mathematically optimal, while Algorithm 2 provides an approximate solution at a lower computational cost.
Algorithm 1: In order to solve Problem 1, we reformulate it as a so-called assignment problem, which is a well-known problem in graph-theory. Using the digraph D (
A number of edges connect the vertices of X and Y. Edges connected to vertices in X correspond to out-edges in the digraph D, while edges connected to vertices s1, . . . , sKεY correspond to in-edges in D. For example, the edge from s2εX to s4εY in G corresponds to the edge s2s4 in the digraph D. Thus, the solid line edges in graph G represent the ‘differential encoding’ edges in digraph D. Furthermore, the dashed-line edges from the vertices {s0}εX to s1, . . . , sKεY all correspond to direct encoding of components s1, . . . , sK. The weights of the edges connecting vertices in X with vertices s1, . . . , sKεY are identical to the weights of the corresponding edges in digraph D. Finally, the K−1 dummy vertices {†}εY are used to represent the fact that some vertices in the solution trees may be ‘leaves’, i.e., do not have any out-edges. For example, in
It can be shown that each set of K edges in D that satisfies constraints a) and b) of Problem 1, can be represented as an assignment in G of the vertices in X to the vertices in Y, i.e., a subset of 2K−1 edges in G such that each vertex is assigned exactly one edge.
Problem 2: Find in graph G the set of 2K−1 edges with minimum total weight such that each vertex is assigned exactly one edge.
Several algorithms exist for solving Problem 2, such as the so-called Hungarian Method, as discussed in H. W. Kuhn, “The Hungarian Method for the Assignment Problem”, Naval Research Logistics Quarterly, 2:83-97, 1955 which solves the problem in O((2K−1)3) arithmetic operations. An alternative implementation is an algorithm described in R. Jonker and A. Volgenant, “A Shortest Augmenting Path Algorithm for Dense and Sparse Linear Assignment Problems”, Computing, vol.38, pp.325-340, 1987. The complexity is similar to the Hungarian Method, but the Jonker and Volgenants algorithm is faster in practice. Further, their algorithm can solve sparse problems faster, which is of importance for the multi-frame linking algorithm of this embodiment.
In summary, Algorithm 1 consists of the following steps. First, the digraph D (and as a result the graph G) is constructed. Then, the assignment in G with minimal weight (Problem 2) is determined. Finally, from the assignment in G, the optimal combination of direct and differential coding is easily derived.
Algorithm 2 is an iterative, greedy algorithm that treats the vertices s1, . . . , sK of the graph D one at a time for increasing indices. At iteration k, one of the in-edges of vertex sk is selected from a candidate edge set. The candidate set consists of the in-edges of sk originating from vertices with no previously selected out-edge, and the direct encoding edge s0sk. From this set, the edge with minimal weight is selected. With this procedure, a set of K edges is obtained that satisfies constraints a) and b) of Problem 1. Generally, this greedy approach is not optimal, i.e., there may exist another set of K edges with a lower total weight satisfying constraints a) and b). Algorithm 2 has a computational complexity of O(K2).
In addition to the sinusoidal (delta-) parameters encoded as described above, an encoded signal embodying the invention must include side information that describes how to combine the parameters at the decoder. One possibility is to assign to each possible solution tree one symbol in the side information alphabet. However, the number of different solution trees is large; for example with K=25 sinusoidal components in a frame, it can be shown that the number of different solution trees is approximately 1018, corresponding to 62 bits for indexing the solution tree in the side information alphabet. Clearly, this number is excessive for most applications. Fortunately, the side information alphabet only needs to represent topologically distinct solution trees, provided that a particular ordering is applied to the (delta-) parameter sequence. To clarify the notion of topologically distinct trees and parameter ordering, consider the examples of solution trees in
Consequently, preferred embodiments of the invention provide a side information alphabet whose symbols correspond to topologically distinct solution trees. An upper bound for the side information is given by the number of such trees. There follows expressions for the number of topological distinct trees.
As illustrated in the examples of
where
The performance of the proposed algorithms can be demonstrated in a simulation study with audio signals. Four different audio signals sampled at a rate of 44.1 kHz and with a duration of approximately 20 seconds each were divided into frames of a fixed length of 1024 samples using a Hanning window with a 50% overlap between consecutive frames.
Each signal frame was represented using a sinusoidal model with a fixed number of K=25 constant-amplitude, constant-frequency sinusoidal components, whose parameters were extracted using a matching pursuit algorithm. Amplitude and frequency parameters were quantized uniformly in the log-domain using relative quantizer level spacings of 20% and 0.5%, respectively. Similar relative quantization levels were used for direct and differential quantization, as shown in
Experiments were conducted where Algorithms 1 and 2 were used to determine how to combine direct and FD encoding for each frame. In addition, simulations were run where amplitude and frequency parameters were quantized using the ‘standard’ FD encoding configuration illustrated in
For each of these encoding procedures, the bit rate Rpars, needed for encoding of (delta-) amplitudes and frequencies was estimated (using first-order entropies). Furthermore, since Algorithms 1 and 2 require that information about the solution tree structure be sent to the decoder, the bit rate RS.I: needed for representing this side information was estimated as well. Table 1 below shows the estimated bit rates for the various coding strategies and test signals. In this context, comparison of bit rates is reasonable because similar quantizers are used for all experiments, and, consequently, the test signals are encoded at the same distortion level.
The columns in Table 1 below show bit rates [kbps] for various coding schemes and test signals. The table columns are RPars: bit rate for representing (delta-) amplitudes and frequencies, RS.I: rate needed for side information (tree structures), and RTotal: total rate. Gain is the relative improvement with various FD encoding schemes over direct encoding (non-differential).
Table 1 shows that using Algorithm 1 for determining the combination of direct and FD encoding gives a bit-rate reduction in the range of 18.8-27.0% relative to direct encoding. Algorithm 2 performs nearly as well with bit-rate reductions in the range of 18.5-26.7%. The slightly lower side information resulting from Algorithm 2 is due to the fact that Algorithm 2 tends to produce solution trees with fewer but longer ‘branches’, thereby reducing the number of different solution trees observed. Finally, the ‘standard’ method of FD encoding reduces the bit rate with 12.7-24.0%.
Therefore, encoding methods are provided that use two algorithms for determining the bit-rate optimal combination of direct and FD encoding of sinusoidal components in a given frame. In simulation experiments with audio signals, the presented algorithms showed bit-rate reductions of up to 27% relative to direct encoding. Furthermore, the proposed methods reduced the bit rate with up to 7% compared to a typically used FD encoding scheme. While consideration of the invention has been focused on FD encoding as a stand-alone technique, in further embodiments the scheme is generalizes to describe FD encoding in combination with TD encoding. With such joint TD/FD encoding schemes, it is possible to provide embodiments that combine the strengths of the two encoding techniques.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps than those listed in a claim. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Number | Date | Country | Kind |
---|---|---|---|
01203934 | Oct 2001 | EP | regional |
02077844 | Jul 2002 | EP | regional |
Number | Date | Country | |
---|---|---|---|
20040204936 A1 | Oct 2004 | US |