Apparatus and method for low complexity combinatorial coding of signals

Information

  • Patent Grant
  • 8495115
  • Patent Number
    8,495,115
  • Date Filed
    Friday, August 22, 2008
    16 years ago
  • Date Issued
    Tuesday, July 23, 2013
    11 years ago
Abstract
The invention utilizes low complexity estimates of complex functions to perform combinatorial coding of signal vectors. The invention disregards the accuracy of such functions as long as certain sufficient properties are maintained. The invention in turn may reduce computational complexity of certain coding and decoding operations by two orders of magnitude or more for a given signal vector input.
Description
FIELD OF THE INVENTION

The present invention relates generally to coding vectors and in particular, to low-complexity combinational Factorial Pulse Coding of vectors.


BACKGROUND OF THE INVENTION

Methods for coding vector or matrix quantities for speech, audio, image, video, and other signals are well known. One such method described in U.S. Pat. No. 6,236,960 by Peng, et. al, (which is incorporated by reference herein) is known as Factorial Pulse Coding (or FPC). FPC can code a vector xi using a total of M bits, given that:










m
=




i
=
0


n
-
1






x
i





,




(
1
)








and all values of vector xi are integral valued such that −m≦xi≦m, where m is the total number of unit amplitude pulses, and n is the vector length. The total M bits are used to code N combinations in a maximally efficient manner, such that the following expression, which describes the theoretical minimum number of combinations, holds true:









N
=





d
=
1


min


(

m
,
n

)






F


(

n
,
d

)




D


(

m
,
d

)




2
d






2
M

.






(
2
)








For this equation, F(n,d) are the number of combinations of d non-zero vector elements over n positions given by:











F


(

n
,
d

)


=


n
!



d
!




(

n
-
d

)

!




,




(
3
)








D(m,d) are the number of combinations of d non-zero vector elements given m total unit pulses given by:

D(m,d)=F(m−1,d−1),  (4)

and 2d represents the combinations required to describe the polarity (sign) of the d non-zero vector elements. The term min(m, n) allows for the case where the number of unit magnitude pulses m exceeds the vector length n. A method and apparatus for coding and decoding vectors of this form have been fully described in the prior art. Furthermore, a practical implementation of this coding method has been described in 3GPP2 standard C.S0014-B, where the vector length n=54 and the number of unit magnitude pulses m=7 produce an M=35 bit codeword.


While these values of n and m do not cause any unreasonable complexity burden, larger values can quickly cause problems, especially in mobile handheld devices which need to keep memory and computational complexity as low as possible. For example, use of this coding method for some applications (such as audio coding) may require n=144 and m=28, or higher. Under these circumstances, the cost associated with producing the combinatorial expression F(n,d) using prior art methods may be too high for practical implementation.


In looking at this cost in greater detail, we can rewrite Eq. 3 as:










F


(

n
,
d

)


=






i
=

n
-
d
+
1


n



(
i
)






j
=
1

d



(
j
)



.





(
5
)








Direct implementation is problematic because F(144, 28) would require 197 bits of precision in the numerator and 98 bits of precision in the nominator to produce a 99 bit quotient. Since most digital signal processors (DSPs) used in today's handheld devices typically support only 16 bit×16 bit multiply operations, special multi-precision multiply/divide routines would need to be employed. Such routines require a series of nested multiply/accumulate operations that typically require on the order of k multiple/accumulate (MAC) operations, where k is the number of 16 bit segments in the operand. For a 197 bit operand, k=┌197/16┐=13. So, execution of a single 197×16 bit multiply would require a minimum of 13 MAC operations plus shifting and store operations. The denominator term is calculated in a similar manner to produce a 98 bit result. In addition, a 197/98 bit division is required, which is an extremely complex operation, thus computation of the entire factorial relation in Eq. 5 would require considerable resources.


In an effort to reduce complexity, Eq. 5 can be rewritten in to distribute the divide operations to produce the following:










F


(

n
,
d

)


=

round


[



(

n
d

)

·

(


n
-
1


d
-
1


)

·

(


n
-
2


d
-
2


)















(


n
-
d
+
2

2

)

·

(


n
-
d
+
1

1

)



]






(
6
)








In this expression, the dynamic range of the divide operations is reduced, but unfortunately, increased resolution of the quotient is needed to accurately represent division by 3, 7, 9, etc. In order to accommodate this structure, a rounding operation is also needed to guarantee an integer result. Given the large number of high precision divide operations, this implementation does not adequately address the complexity problem for large m and n, and further has the potential to produce an incorrect result due to accumulated errors in precision.


In yet another implementation, Eq. 5 can be rearranged in the following manner:










F


(

n
,
d

)


=


n
·

(

n
-
1

)

·

(

1
2

)

·

(

n
-
2

)

·









(

1
3

)















(

n
-
d
+
2

)

·

(

1

d
-
1


)

·

(

n
-
d
+
1

)

·


(

1
d

)

.







(
7
)








If this expression is evaluated from left to right, the result will always produce an integer value. While this method controls the precision and dynamic range issue to some degree, large values of m and n still require extensive use of multi-precision multiply and divide operations.


Finally, in order to minimize computational complexity, it may be possible to pre-compute and store all factorial combinations in a lookup table. Thus, all values of F(n,m) may be simply stored in an n×m matrix and appropriately retrieved from memory using very few processor cycles. The problem with this approach, however, is that as n and m become large, so does the associated memory requirement. Citing the previous example, F(144, 28) would require 144×28×┌99 bits/8 bits/byte┐=52,416 bytes of storage, which is unreasonable for most mobile handheld devices. Therefore, a need exists for a method and apparatus for low-complexity combinational Factorial Pulse Coding of vectors.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram of an encoder.



FIG. 2 is a block diagram of a decoder.



FIG. 3 is a flow chart showing operation of a Combinatorial Function Generator of FIG. 1 and FIG. 2.



FIG. 4 is a flow chart showing operation of the encoder of FIG. 1.



FIG. 5 is a flow chart showing operation of the decoder of FIG. 2.





DETAILED DESCRIPTION OF THE DRAWINGS

In order to address the above-mentioned need, a method and apparatus for low-complexity combinatorial coding of vectors is provided herein. During operation an encoder and decoder will use relatively low resolution approximations of factorial combinations F′(n, d), which provide only enough precision to allow a valid codeword to be generated. Particularly, both an encoder and a decoder will utilize a combinatorial function generator to derive F′(n, d) such that F′(n,d)≧F(n,d), and F′(n,d)≧F′(n−1,d)+F′(n−1,d−1). F′(n, d) will be provided to either coding or decoding circuitry to produce a combinatorial codeword or vector xi, respectively.


Because F′(n, d) will have a lower precision than F(n, d), it is generally much easier to compute on a fixed point digital signal processor (DSP), general purpose microprocessor, or implement in hardware, such as on a programmable logic device or application specific integrated circuit (ASIC). In essence, complicated multi-precision multiplications and divisions are replaced with relatively low precision additions and subtractions, coupled with a small amount of table memory. Generally, the lower complexity operations do not affect the overall coding rate of the input vectors, but it may be possible to lower the complexity even further than is described herein at the expense of a slightly higher bit-rate.


The present invention encompasses a method for combinatorial coding and decoding. The method comprising the steps of receiving a value n based on the number of positions in a vector receiving a value d based on the number of occupied positions within the vector creating F′(n, d) based on n and d, wherein F′(n, d) is an estimate of F(n, d) such that F′(n, d)>F(n, d) and F′(n, d)>F′(n−1, d)+F′(n−1, d−1), and wherein








F


(

n
,
d

)


=


n
!



d
!




(

n
-
d

)

!




,





and using F′(n, d) to code or decode the vector.


The present invention additionally encompasses an apparatus comprising a combinatorial function generator outputting F′(n, r) having the properties F′(n,r)≧F(n,r) and F′(n,r)≧F′(n−1,r)+F′(n−1,r−1), which are sufficient to uniquely encode/decode vector Xcc. The function F′(n,r) is given as:









F




(

n
,
r

)


=

R


(





i
=

n
-
r
+
1


n




P




(
i
)



-


Q




(
r
)



)



,





where P′(i) and Q′(r) are 32 bit lookup tables given as:









P




(
i
)


=


2

-
21








2
21




log
2



(
i
)



+
1





,

i


[

1
,
2
,





,
144

]







and







Q




(
r
)


=

{





0
,




r
=
1










j
=
2

r




2

-
14








2
14




log
2



(
j
)



-
1





,




r


[

2
,





,
28

]





,







and where R′(k) is an approximation of the function R′(k)≈2k, given as:

R′(k)=└2ki−19└219Kf┘┘,  (4.13.5-1)

where k=ki+kf is broken down into integer and fractional components of k, and Kf=2kf is a low resolution Taylor series expansion of the fractional component of k. The apparatus additionally comprises a coder or decoder receiving F′(n, r) and a vector and outputting a codeword or a vector based on F′(n, r).


The present invention additionally encompasses an apparatus comprising a combinatorial function generator receiving a value n based on the number of positions in a vector, receiving a value d based on the number of occupied positions within the vector and creating F′(n, d) based on n and d, wherein F′(n, d) is an estimate of F(n, d) such that F′(n, d)>F(n, d) and F′(n, d)>F′(n−1, d)+F′(n−1, d−1), and where







F


(

n
,
d

)


=



n
!



d
!




(

n
-
d

)

!



.






The apparatus additionally comprises an encoder using F′(n, d) to code the vector, and outputting a codeword.


The present invention additionally encompasses an apparatus comprising a combinatorial function generator receiving a value n based on the number of positions in a vector, receiving a value d based on the number of occupied positions within the vector and creating F′(n, d) based on n and d, wherein F′(n, d) is an estimate of F(n, d) such that F′(n, d)>F(n, d) and F′(n, d)>F′(n−1, d)+F′(n−1, d−1), and where







F


(

n
,
d

)


=



n
!



d
!




(

n
-
d

)

!



.






The apparatus additionally comprises a decoder using F′(n, d) to decode a codeword, and outputting the vector.


Turning now to the drawings, wherein like numerals designate like components, FIG. 1 is a block diagram of encoder 100. Encoder 100 comprises vector generator 102, combinational coding circuitry (coder) 106, combination function generator 108, and other coding circuitry 104. During operation, an input signal to be coded is received by vector generator 102. As is known in the art, the input signal may comprise such signals as speech, audio, image, video, and other signals.


Vector generator 102 receives the input signal and creates vector xi. Vector generator 102 may comprise any number of encoding paradigms including, but not limited to, Code-Excited Linear Prediction (CELP) speech coding as described by Peng, et. al, transform domain coding for audio, images and video including Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), and Modified Discrete Cosine Transform (MDCT) based methods, wavelet based transform coding, direct time domain pulse code modulation (PCM), differential PCM, adaptive differential PCM (ADPCM), or any one of a family of sub-band coding techniques that are well known in the art. Virtually any signal vector of the form given above may be advantageously processed in accordance with the present invention.


Combinatorial coding circuitry 106 receives vector xi and uses Factorial Pulse Coding to produce a codeword C. As discussed above Factorial Pulse Coding can code a vector xi using a total of M bits, given that







m
=




i
=
0


n
-
1






x
i





,





and all values of vector xi are integral valued such that −m≦xi≦m, where m is the total number of unit amplitude pulses, and n is the vector length. As discussed above, larger values of m and n can quickly cause problems, especially in mobile handheld devices which need to keep memory and computational complexity as low as possible.


In order to address this issue, combinatorial function generator 108 utilizes a low complexity technique for producing F′(n,d). Combinatorial coding circuitry 106 then utilizes F′(n,d) to produce codeword C. Circuitry 108 utilizes relatively low resolution approximations (bits of precision) of factorial combinations F′(n,d), which provide only enough precision to allow a valid codeword to be generated. That is, as long as certain properties are maintained, a suitable approximation of the function F(n, d) is sufficient to guarantee that the resulting codeword is uniquely decodable.


In order to describe the generation of F′(n,d), let us proceed by first deriving a function F′(n,d) that is a suitable approximation of F(n, d). The first step is to take the logarithm of an arbitrary base a of Eq. 5, and taking the inverse log base a of the rearranged terms:











F


(

n
,
d

)


=


exp
a



(





i
=

n
-
d
+
1


n




log
a



(
i
)



-




j
=
1

d




log
a



(
j
)




)



,




(
8
)








where the function expa(k)=ak. Next, define functions P(i), Q(d), and R(k), and substitute into Eq. 8 such that:












F


(

n
,
d

)


=

R


(





i
=

n
-
d
+
1


n



P


(
i
)



-

Q


(
d
)



)



,




where









P


(
i
)


=


log
a



(
i
)



,


Q


(
d
)


=




j
=
1

d




log
a



(
j
)




,
and












R


(
k
)


=



exp
a



(
k
)


=


a
k

.







(
9
)







However, in accordance with the preferred embodiment of the present invention, it is not necessary for F(n, d) and F′(n, d) to be equivalent in order for the resulting codeword to be uniquely decodable. There are only two conditions that are sufficient for this to hold true:

F′(n,d)≧F(n,d),  (10)
and
F′(n,d)≧F′(n−1,d)+F′(n−1,d−1).  (11)


For the first condition, the restriction simply says that if F′(n,d)<F(n,d), then there will be overlapping code-spaces, and subsequently, there will be more than one input capable of generating a particular codeword; thus, the codeword is not uniquely decodable. The second condition states that the “error” for a given n, d shall be greater than or equal to the sum of the error terms associated with the previous element of the recursive relationship described by Peng, et. al in U.S. Pat. No. 6,236,960. It can be shown that F(n,d)=F(n−1,d)+F(n−1,d−1), which is only true if the combinatorial expression is exactly equal to F(n,d)=Cdn=n!/d!(n−d)!. However, while the inequality in Eq. 11 is sufficient, it may not necessarily be true for all values of n and d. For such values, F(n,d) may satisfy another inequality derived from Eq. 31 of Peng, et al. and is given by:










F


(

n
,
d

)


>




i
=
1

d




F


(


n
-

(

d
-
i
+
1

)


,
i

)


.






(
12
)








In this case, Eq. 11 has to be satisfied with strict inequality for certain (m,k), (m≦n), (k≦d), that is:

F(m,k)>F(m−1,k)+F(m−1,k−1),m≦n,k≦d.  (13)


Referring back to Eq. 9, we now wish to generate F′(n, d) by creating the functions P′(i), Q′(d), and R′(k), with low complexity approximations of the original functions such that:












F




(

n
,
d

)


=


R




(





i
=

n
-
d
+
1


n




P




(
i
)



-


Q




(
d
)



)



,




(
14
)








and where the conditions given in Eqs. 10 and 11 are satisfied. Considering P(i), we may wish to approximate the function such that P′(i)≧loga(i), iε[1, 2, . . . , n]. If we choose a=2 and then restrict P′(i) to 32 bits of precision, the resulting operations are easy to implement on a handheld mobile device since most DSPs support single cycle 32 bit additions. Therefore, we define:

P′(i)=2−l(i)└2l(i) log2(i)+1┘,iε[1, 2, . . . , n],  (15)

where l(i) is a shift factor that may vary as a function of i. In the preferred embodiment, l(i)=l=21, but many other sets of values are possible. For this example, the 2l factor is equivalent to a shift of l bits to the left, whereby the floor function └x+1┘ removes the fractional bits while rounding up to the next highest integer, and finally the 2−l factor shifts the results back to the right by l bits. Using this methodology, the function P′(i)≧log2(i) for all i≧1, and also provides sufficient dynamic range and precision using only 32 bits because 9 bits of positive integer resolution in the log2 domain can represent a 512 bit number. To avoid the complexity of computing these values in real-time, they can be pre-computed and stored in a table using only 144×4 bytes of memory for the F(144, 28) example. Using a similar methodology for approximating Q(d), we get:











Q




(
d
)


=

{





0
,




d
=
1










j
=
2

d




2

-

l


(
j
)










2

l


(
j
)






log
2



(
j
)



-
1





,




d


[

2
,





,
m

]





,






(
16
)








where the floor function └x−1┘ is used because of the subtraction of the quantity from the total. This guarantees that








Q




(
d
)






j
=
1

d






log2(j) so that the contribution of Q′(d) will guarantee F′(n,d)≧F(n,d). While l(j) can assume many values depending on the configuration of m and n, the preferred embodiment uses a value of l(j)=l=14 for the variable shift factor. Like P′(i), Q′(d) can be pre-computed and stored in a table using only 28×4 bytes of memory for the F(144, 28) example. For defining R′(k), we need to first define k as:









k
=





i
=

n
-
d
+
1


n




P




(
i
)



-



Q




(
d
)


.






(
17
)








With P′(i) and Q′(d) defined above, k is preferably a 32 bit number with an 8 bit unsigned integer component ki and a 24 bit fractional component kf. Using this, we may derive R′(k)≧exp2(k)=2k by letting k=ki+kf and then taking the inverse logarithm base 2 to yield 2k=2ki2kf. We may then use a Taylor series expansion to estimate the fractional component to the desired precision, represented by Kf=2kf, rounding up the result using the ceiling function, and then appropriately shifting the result to form a multi-precision result (with only l significant bits), such that:

R′(k)=2ki−l┌2lKf┐,  (18)

where 2ki is the integer shift factor applied to the Taylor series expansion result. Here, l is a shift factor used in a similar manner to Eqs. 15 and 16 to guarantee R′(k)≧2k. However, since R′(k) cannot be practically pre-computed for efficient real-time operation, great care must be taken in specifying the exact operations necessary in both the encoder and decoder to ensure that the reconstructed signal vector matches the input signal vector exactly. Note that R′(k) may be obtained from left shifting ┌2lKf┐, which can be accurately represented by l bits.


In the above discussion, functions P′(i), Q′(d), and R′(k) have been chosen such that each individual function estimate guarantees that the resulting F′(n,d)≧F(n,d). However, it is only necessary for the aggregate effect to satisfy this condition. For example, P′(i) and Q′(d) may be as described above, but R′(k) may be a more conventional R′(k)≈2k function which may truncate or round the least significant bits such that R′(k) may be less than 2k for some values of k. This is acceptable as long as this effect is small relative to the effects of P′(i) and Q′(d), so the properties in Eqs. 10 and 11 still hold true.


Also, any functions P′(i), Q′(d), and R′(k) may be used without loss of generality as long as the properties on Eqs. 10 and 11 are satisfied. Care must be taken however, that an increase in bit rate may occur if too little precision is used. It should also be noted that there is an inherent tradeoff in bit rate and complexity, and for large values of m, n, an increase of 1 or 2 bits may be a reasonable tradeoff for a significant reduction in complexity.



FIG. 2 is a block diagram of decoder 200. As shown, decoder 200 comprises combinatorial decoding circuitry (decoder) 206, signal reconstruction circuitry 210, other decoding circuitry 204, and combinatorial function generator 108. During operation a combinatorial codeword is received by combinatorial decoding circuitry 206. Combinatorial decoding circuitry 206 provides n and d to combinatorial function generator, and receives F′(n,d) in response. Decoding circuitry 206 then creates vector xi based on F′(n,d). Vector xi is passed to signal reconstruction circuitry 210 where the output signal (e.g., speech, audio, image, video, or other signals) is created based on xi and other parameters from other decoding circuitry 204. More specifically, the other parameters may include any number of signal reconstruction parameters associated with the signal coding paradigm being used in a particular embodiment. These may include, but are not limited to, signal scaling and energy parameters, and spectral shaping and/or synthesis filter parameters. Normally these parameters are used to scale the energy of and/or spectrally shape the reconstructed signal vector xi in such a manner as to reproduce the final output signal.



FIG. 3 is a flow chart showing operation of a combinatorial function generator of FIG. 1 and FIG. 2. More particularly, the logic flow of FIG. 4 shows those steps necessary for combinatorial function generator 108 to produce F′(n,d). The logic flow begins at step 302 where the inputs n and d are received. At step 303 accumulator A is set to 0. At step 304 the counter i is set equal to n−d+1. At step 306 logarithm approximation P′(i) is added to the accumulator A. At step 310 counter i is incremented by 1. Steps 306 and 310 are repeated in a loop until the counter i is greater than n. Step 312 tests i>n and terminates the loop when i becomes greater than n. At this stage the accumulator contains the logarithm approximate of the numerator of the combinatorial function F(n, d). A logarithm approximation of the denominator of the combinatorial function Q′(d) is subtracted from the accumulator at step 316 to obtain a logarithm approximation of the combinatorial function. At step 318 an exponential approximation R′(A) of the accumulator is taken to generate the approximation B of the combinatorial function. At step 314, B is outputted as F′(n, d). FIG. 4 is a flow chart showing operation of the encoder of FIG. 1. The logic flow begins at step 401 where an input signal is received by vector generator 102. As discussed above, the input signal may comprise speech, audio, image, video, or other signals. At step 403 vector xi is produced and input into combinatorial coding circuitry 106 where m and d are determined and passed to combinatorial function generator 108. As discussed above, m is the total number of unit amplitude pulses (or sum of the absolute values of the integral valued components of xi) and d is the number non-zero vector elements of xi. At step 405 F′(n, d) is created by combinatorial function generator 108 and passed to combinatorial coding circuitry 106, where vector xi is coded to create combinatorial codeword C (step 407). As discussed above, F′(n, d) is created by replacing the functions P(i), Q(d), and R(k) in F(n, d), with low complexity approximations of the original functions such that the conditions given in Equations 10 and 11 are satisfied.



FIG. 5 is a flow chart showing operation of the decoder of FIG. 2. The logic flow begins at step 501 where a combinatorial codeword is received by combinatorial decoder 206. At step 503 n and d are passed from combinatorial decoder 206 to combinatorial function generator 108 and F′(n, d) is returned to decoder 206 (step 505). The codeword is decoded by decoder 206 based on F′(n, d) (step 507) to produce vector xi and xi is passed to signal reconstruction circuitry 210 where an output signal is created (step 509).


Table 1 shows the complexity reduction associated with the present invention as compared to the prior art. For different values of m and n, the associated number of bits M and average number of function calls per frame to F(n, m) are given. For these examples, the frame length interval is 20 ms, which corresponds to a rate of 50 frames per second. The unit of measure for the complexity comparison is weighted millions of operations per second, or WMOPS. A computer simulation was used to produce an estimate of the complexity as it would be executed on a limited precision fixed point DSP. For these examples, multi-precision libraries were used when appropriate, and each primitive instruction was assigned an appropriate weighting. For example, multiplies and additions, were given a weight of one operation, while primitive divide and transcendental (e.g., 2x) operations were given a weight of 25 operations. From the table, it is easy to see that using F′(n, d) provides significant complexity reduction over the prior art, and that the proportional reduction in complexity increases and n and m increase. This complexity reduction is shown to be as high as two orders of magnitude for the F(144, 60) case, but would continue to grow as n and m increase further. This is primarily due to the growth in precision of the operands that is required to carry out exact combinatorial expressions in the prior art. These operations prove to result in an excessive complexity burden and virtually eliminate factorial pulse coding as a method for coding vectors having the potential for large m and n. The invention solves these problems by requiring only single cycle low precision operations coupled with a small amount of memory storage to produce estimates of the complex combinatorial expressions required for this type of coding.









TABLE 1







Complexity Comparison of F(n, m) vs. F′(n, m)










Prior Art F(n, m)
Invention F′(n, m)

















Avg Calls per
Peak
Avg
Peak
Avg


n
m
Bits
frame F(n, m)
WMOPS
WMOPS
WMOPS
WMOPS

















54
7
35
44
0.44
0.32
0.09
0.07


144
28
131
191
24.50
16.45
0.51
0.37


144
44
180
279
76.45
46.65
0.96
0.64


144
60
220
347
150.00
83.25
1.50
0.90









The following text and equations implement the above technique for coding and decoding into the Third Generation Partnership Project 2 (3GPP2) C.P0014-C specification for Enhanced Variable Rate Codec, Speech Service Options 3, 68, and 70 for Wideband Spread Spectrum Digital Systems.


4.13.5 MDCT Residual Line Spectrum Quantization


The MDCT coefficients, referred to as the residual line spectrum, are quantized in a similar manner to the FCB factorial codebook of 4.11.8.3. Basically, factorial coding of N=nFPCm possible combinations can be achieved given that the length n vector v has the properties






m
=




i
=
0


n
-
1






v
i









and all elements νi are integral valued. That is, the sum of the absolute value of the integer elements of v is equal to m. For this case, we wish to code an energy scaled version of Xk such that:










m
=




k
=
0

143





round


{


γ
m



X
k


}






,




(

4.13

.5


-


1

)








where γm is a global scale factor, and the range 0 to 143 corresponds to the frequency range 0 to 3600 Hz. For this case, m can be either 28 for NB or 23 for WB inputs. The value of γm used to achieve the above objective is determined iteratively (for non-zero ∥Xk2) according to the following pseudo-code:














/* Initialization */


emin = −100, emax = 20


e = max{emin, −10 log10 (∥Xk2)/1.2}


s = +1, Δe = 8


/* main loop */


do {










γ
m

=

10

e
/
20









m


=




k
=
0

143





round






{


γ
m



X
k


}






















if  (m′ == m)
then break



else if (m′ > m and s == +1)
then s = −1, Δe = Δe/2



else if (m′ < m and s == −1)
then s = +1, Δe = Δe/2



end



e = e + s · Δe







} while e ≦ emax and Δe ≧ Δmin










The quantized residual line spectrum Xcc is then calculated as:











X
cc



(
k
)


=

{






round


{


γ
m



X
k


}


;




0

k
<
144






0
;




144

k
<
160




,






(

4.13

.5


-


2

)








If, on the rare occasion, the values of m and m′ are different, the line spectrum shall be modified by adding or subtracting unit values to the quantized line spectrum Xcc. This guarantees that the resulting line spectrum can be reliably coded using the factorial coding method. The output index representing the line spectrum Xcc is designated RLSIDX. This index comprises 131 bits for the 144FPC28 case and 114 bits for the 144FPC23 case.


In order to address complexity issues associated with encoding and decoding vector Xcc, a low resolution combinatorial approximation function F′(n, r) shall be used in place of the standard combinatorial relation F(n,r)=nCr=n!/r!(n−r)!. In particular, both the encoder and decoder utilize a combinatorial function generator F′(n, r) having the properties F′(n, r)≧F(n, r) and F′(n, r)≧F′(n−1,r)+F′(n−1,r−1), which are sufficient to uniquely encode/decode vector Xcc. The function F′(n, r) is given as:












F




(

n
,
r

)


=


R




(





i
=

n
-
r
+
1


n




P




(
i
)



-


Q




(
r
)



)



,




(

4.13

.5


-


3

)








where P′(i) and Q′(r) are 32 bit lookup tables given as:












P




(
i
)


=


2

-
21








2
21




log
2



(
i
)



+
1





,

i


[

1
,
2
,





,
144

]


,




and




(

4.13

.5


-


4

)








Q




(
r
)


=

{





0
,




r
=
1










j
=
2

r




2

-
14








2
14




log
2



(
j
)



-
1





,




r


[

2
,





,
28

]





;






(

4.13

.5


-


5

)








and where R′(k) is a multi-precision integer approximation of the function R′(k)≈2k, given as:

R′(k)=└2ki−19└219Kf┘┘,  (4.13.5-6)

where k=ki+kf is broken down into integer and fractional components of k, and Kf=2kf is a Taylor series expansion of the fractional component of k. These operations significantly reduce the complexity necessary in calculating the combinatorial expressions by replacing multi-precision multiply and divide operations with 32 bit additions and a low complexity Taylor series approximation of 2k followed by a multi-precision shift operation. All other components of the encoding/decoding operations are similar to that in 4.11.8.3.


While the invention has been particularly shown and described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention. It is intended that such changes come within the scope of the following claims.

Claims
  • 1. A method for combinatorial coding and decoding, the method comprising the steps of: receiving a vector comprising speech, audio, image, or video;receiving a value n based on a number of positions in the vector;receiving a value d based on a number of occupied positions within the vector;generating a logarithmic approximation of a combinatorial function based on n and d;generating a value F′(n,d) based on the logarithmic approximation of the combinatorial function of n and d, such that
  • 2. The method of claim 1 wherein Q′(d) is an approximation of the function
  • 3. The method of claim 1 wherein the step of generating the value based on the logarithmic approximation of the combinatorial function of n and d further comprises the step of generating a value Q′(d), is generated by a summation wherein each quantity in the summation is less than logarithm of numbers from 1 to d.
  • 4. The method of claim 1 wherein the step of producing the value R′(k) based on Kf further comprises generating: R′(k)=└2ki−l└2lKf┘┘.
  • 5. The method of claim 3 wherein the summation is generated as
  • 6. The method of claim 1 wherein the approximation of the function ak is obtained using a Taylor series expansion method.
  • 7. An apparatus comprising: vector generator circuitry receiving a vector comprising speech, audio, image, or video;combinatorial function generator circuitry receiving a value n based on a number of positions in the vector, receiving a value d based on a number of occupied positions within the vector, generating a logarithmic approximation of a combinatorial function based on n and d, and generating a value F′(n,d) based on the logarithmic approximation of the combinatorial function of n and d, such that
  • 8. The apparatus of claim 7 wherein Q′(d) is an approximation of the function
  • 9. The apparatus of claim 7 wherein the combinatorial function generator circuitry generates the value based on Q′(d), wherein
  • 10. The apparatus of claim 7 wherein F′(n,d)=R′(k), where R′(k) is an approximation of the function ak, and where a is a logarithm base.
US Referenced Citations (72)
Number Name Date Kind
4560977 Murakami et al. Dec 1985 A
4670851 Murakami et al. Jun 1987 A
4727354 Lindsay Feb 1988 A
4853778 Tanaka Aug 1989 A
5006929 Barbero et al. Apr 1991 A
5067152 Kisor et al. Nov 1991 A
5268855 Mason et al. Dec 1993 A
5327521 Savic et al. Jul 1994 A
5394473 Davidson Feb 1995 A
5956674 Smyth et al. Sep 1999 A
5974435 Abbott Oct 1999 A
6108626 Cellario et al. Aug 2000 A
6236960 Peng et al. May 2001 B1
6253185 Arean et al. Jun 2001 B1
6263312 Kolesnik et al. Jul 2001 B1
6304196 Copeland et al. Oct 2001 B1
6453287 Unno et al. Sep 2002 B1
6493664 Uday Bhaskar et al. Dec 2002 B1
6504877 Lee Jan 2003 B1
6593872 Makino et al. Jul 2003 B2
6658383 Koishida et al. Dec 2003 B2
6662154 Mittal et al. Dec 2003 B2
6691092 Udaya Bhaskar et al. Feb 2004 B1
6704705 Kabal et al. Mar 2004 B1
6775654 Yokoyama et al. Aug 2004 B1
6813602 Thyssen Nov 2004 B2
6940431 Hayami Sep 2005 B2
6975253 Dominic Dec 2005 B1
7031493 Fletcher et al. Apr 2006 B2
7130796 Tasaki Oct 2006 B2
7161507 Tomic Jan 2007 B2
7180796 Tanzawa et al. Feb 2007 B2
7212973 Toyama et May 2007 B2
7230550 Mittal et al. Jun 2007 B1
7231091 Keith Jun 2007 B2
7414549 Yang et al. Aug 2008 B1
7761290 Koishida et al. Jul 2010 B2
7840411 Hotho et al. Nov 2010 B2
7885819 Koishida et al. Feb 2011 B2
7889103 Mittal et al. Feb 2011 B2
20020052734 Unno et al. May 2002 A1
20030004713 Makino et al. Jan 2003 A1
20030009325 Kirchherr et al. Jan 2003 A1
20030220783 Streich et al. Nov 2003 A1
20040252768 Suzuki et al. Dec 2004 A1
20050261893 Toyama et al. Nov 2005 A1
20060047522 Ojanpera Mar 2006 A1
20060173675 Ojanpera Aug 2006 A1
20060190246 Park Aug 2006 A1
20060222374 Gnauck et al. Oct 2006 A1
20060241940 Ramprashad Oct 2006 A1
20060265087 Philippe et al. Nov 2006 A1
20070171944 Schuijers et al. Jul 2007 A1
20070239294 Brueckner et al. Oct 2007 A1
20070271102 Morii Nov 2007 A1
20080120096 Oh et al. May 2008 A1
20090030677 Yoshida Jan 2009 A1
20090076829 Ragot et al. Mar 2009 A1
20090100121 Mittal et al. Apr 2009 A1
20090112607 Ashley et al. Apr 2009 A1
20090234642 Mittal et al. Sep 2009 A1
20090259477 Ashley et al. Oct 2009 A1
20090276212 Khalil et al. Nov 2009 A1
20090306992 Ragot et al. Dec 2009 A1
20090326931 Ragot et al. Dec 2009 A1
20100088090 Ramabadran Apr 2010 A1
20100169087 Ashley et al. Jul 2010 A1
20100169099 Ashley et al. Jul 2010 A1
20100169100 Ashley et al. Jul 2010 A1
20100169101 Ashley et al. Jul 2010 A1
20110161087 Ashley et al. Jun 2011 A1
20120226506 Ashley et al. Sep 2012 A1
Foreign Referenced Citations (15)
Number Date Country
1533789 May 2005 EP
0932141 Aug 2005 EP
1619664 Jan 2006 EP
1483759 Sep 2006 EP
1818911 Aug 2007 EP
1912206 Apr 2008 EP
1845519 Sep 2009 EP
1959431 Jun 2010 EP
2137179 Sep 1999 RU
9715983 May 1997 WO
03073741 Sep 2003 WO
2007012794 Feb 2007 WO
2007063910 Jun 2007 WO
2008063035 May 2008 WO
2010003663 Jan 2010 WO
Non-Patent Literature Citations (62)
Entry
Danielle et al, Pyramid Vector Coding for High Quality Audio Compression, Copyright 1997 IEEE, pp. 343-346.
Ratko V. Tomic; Fast, Optimal Entropy Coder; 1stWorks Corporation Technical Report TR04-0815; Aug. 15, 2004; pp. 1-52.
Office Action for U.S. Appl. No. 12/345,141, mailed Sep. 19, 2011.
Office Action for U.S. Appl. No. 12/345,165, mailed Sep. 1, 2011.
Office Action for U.S. Appl. No. 12/047,632, mailed Oct. 18, 2011.
Office Action for U.S. Appl. No. 12/099,842, mailed Oct. 12, 2011.
Patent Cooperation Treaty, “PCT Search Report and Written Opinion of the International Searching Authority” for International Application No. PCT/US2011/0266400 Aug. 5, 2011, 11 pages.
Neuendorf, et al., “Unified Speech Audio Coding Scheme for High Quality oat Low Bitrates” ieee International Conference on Accoustics, Speech and Signal Processing, 2009, Apr. 19, 2009, 4 pages.
Office Action for U.S. Appl. No. 12/187,423, mailed Sep. 30, 2011.
3GPP TS 26.290 v7.0.0 (Mar. 2007) 3rd Generation Partnership Project; Technical Speciification Group Service and System Aspects; Audio codec processing functions; Extended Adaptive Multi0Rate—Wideband (AMR-WB+) codec; Transcoding functions (Release 7).
Chen et al.; “Adaptive Postifiltering for Quality Enhancement of Coded Speech” IEEE Transactions on Speech and Audio Processing, vol. 3, No. 1, Jan. 1995, pp. 59-71.
Chan et al.; “Frequency domain postfiltering for multiband excited linear predictive coding of speech” Electronics Letters, Jun. 6, 1996, vol. 32 No. 12; pp. 1061-1063.
Andersen et al.; “Reverse Water-Filling in Predictive Encoding of Speech” IEEE 1999 pp. 105-107.
International Telecommunication Union, “G.729.1, Series G: Transmission Systems and Media, Digital Systems and Networks, Digital Terminal Equipments—Coding of analogue signals by methods other than PCM, G.729 based Embedded Variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperatble with G.729” ITU=T Recommondation G.729.1, May 2006, Cover page, pp. 11-18. Full document available at: http://www.itu.int/rec/T-REC-G.729.1-200605-I/en.
Makinen et al., “AMR-Wb+: a new audio coding standard for 3rd generation mobile audio service”, In 2005 Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. ii/1109-ii/1112, Mar. 18, 2005.
Faller et al., “Technical advances in digital audio radio broadcasting”, Proceedings of the IEEE, vol. 90, No. 8, pp. 1303-1333, Aug. 1, 2002.
Salami et al., “Extended AMR-WB for High-Quality Audio on Mobile Devices”, IEEE Communications Magazine, pp. 90-97, May 1, 2006.
Patent Cooperation Treaty, “PCT Search Report and Written Opinion of the International Searching Authority” for International Application No. PCT/US2009/066627 Mar. 5, 2010, 13 pages.
Ramprashad: “A Two Stage Hybrid Embedded Speech/Audio Coding Structure” Proceedings of International Conference on Acoustics, Speech, and Signal Processing, ICASSP May 1998, vol. 1, pp. 337-340.
Kovesi et al.; “A Scalable Speech and Audio Coding Scheme with Continuous Bitrate Flexibility” Proceeding of International Conference on Acoustics, Speech, and Signal Processing, 2004, Piscataway, JY vol. 1, May 17, 2004 pp. 273-276.
Kim et al.; “A New Bandwidth Scalable Wideband Speech/Aduio Coder” Proceedings of Proceedings of International Conference on Acoustics, Speech, and Signal Processing, ICASSP; Orland, FL; ; vol. 1, May 13, 2002 pp. 657-660.
Ramprashad: “High Quality Embedded Wideband Speech Coding Using an Inherently Layered Coding Paradigm” Proceedings of International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, vol. 2, Jun. 5-9, 2000 pp. 1145-1148.
Patent Cooperation Treaty, “PCT Search Report and Written Opinion of the International Searching Authority” for International Application No. PCT/US2008/077693 Dec. 15, 2008, 12 pages.
Ramprashad: “Embedded Coding Using a Mixed Speech and Audio Coding Paradigm” International Journal of Speech Technology Kluwer Academic Publishers Netherlands, Vo. 2, No. 4, May 1999, pp. 359-372.
Patent Cooperation Treaty, “PCT Search Report and Written Opinion of the International Searching Authority” for International Application No. PCT/US2009/066507 Mar. 16, 2010, 14 pages.
Patent Cooperation Treaty, “PCT Search Report and Written Opinion of the International Searching Authority” for International Application No. PCT/US2009/036479 Jul. 28, 2009, 15 pages.
Markas et al. “Multispectral Image Compression Algorithms”; Data Compression Conference, 1993; Snowbird, UT USA Mar. 30-Apr. 2, 1993; pp. 391-400.
Mittal et al., Low complexity factorial pulse coding of MDCT coefficients using approximation of combinatorial functions, Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, Apr. 1, 2007, pp. I-289-I-292.
“Enhanced Variable Rate Codec, Speech Service Options 3, 68, and 70 for Wideband Spread Spectrum Digital Systems”, 3GPP2 TSG-C Working Group 2, XX, XX, No. C. S0014-C, Jan. 1, 2007, pp. 1-5.
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due” for U.S. Appl. No. 12/047,586 dated Nov. 20, 2009, 20 pages.
Patent Cooperation Treaty, “PCT Search Report and Written Opinion of the International Searching Authority” for International Application No. PCT/US2009/036481 Jul. 20, 2009, 15 pages.
Boris YA Ryabko et al.: “Fast and Efficient Construction of an Unbiased Random Sequence”, IEEE Transactions on Information Theory, IEEE, US, vol. 46, No. 3, May 1, 2000, ISSN: 0018-9448, pp. 1090-1093.
Ratko V. Tomic: “Quantized Indexing: Background Information”, May 16, 2006, URL: http://web.archive.org/web/20060516161324/www.1stworks.com/ref/TR/tr05-0625a.pdf, pp. 1-39.
Ido Tal et al.: “On Row-by-Row Coding for 2-D Constraints”, Information Theory, 2006 IEEE International Symposium on, IEEE, PI, Jul. 1, 2006, pp. 1204-1208.
United States Patent and Trademark Office, “Non-Final Rejection” for U.S. Appl. No. 12/047,632 dated Mar. 2, 2011, 20 pages.
Tancerel, L. et al., “Combined Speech and Audio Coding by Discrimination,” In Proceedings of IEEE Workshop on Speech Coding, pp. 154-156, (2000).
Patent Cooperation Treaty, “PCT Search Report and Written Opinion of the International Searching Authority” for International Application No. PCT/US2009/039984 Aug. 13, 2009, 14 pages.
United States Patent and Trademark Office, “Non-Final Rejection” for U.S. Appl. No. 12/099,842 dated Apr. 15, 2011, 21 pages.
Ramo et al. “Quality Evaluation of the G.EV-VBR Speech Codec” Apr. 4, 2008, pp. 4745-4748.
Jelinek et al. “ITU-T G.EV-VBR Baseline Codec” Apr. 4, 2008, pp. 4749-4752.
Jelinek et al. “Classification-Based Techniques for Improving the Robustness of CELP Coders” 2007, pp. 1480-1484.
Fuchs et al. “A Speech Coder Post-Processor Controlled by Side-Information” 2005, pp. IV-433-IV-436.
J. Fessler, “Chapter 2; Discrete-time signals and systems” May 27, 2004, pp. 2.1-2.21.
Udar Mittal et al., “Decoder for Audio Signal Including Generic Audio and Speech Frames”, U.S. Appl. No. 12/844,199, filed Jul. 27, 2010.
Virette et al “Adaptive Time-Frequency Resolution in Modulated Transform at Reduced Delay” ICASSP 2008; pp. 3781-3784.
Edler “Coding of Audio Signals with Overlapping Block Transform and Adaptive Window Functions”; Journal of Vibration and Low Voltage fnr; vol. 43, 1989, Section 3.1.
Princen et al., “Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation” IEEE 1987; pp. 2161-2164.
Udar Mittal et al., “Decoder for Audio Signal Including Generic Audio and Speech Frames”, U.S. Appl. No. 12/844,206, filed Sep. 9, 2010.
Patent Cooperation Treaty, “PCT Search Report and Written Opinion of the International Searching Authority” for International Application No. PCT/US2011/026660 Jun. 15, 2011, 10 pages.
Hung et al., Error-Resilient Pyramid Vector Quantization for Image Compression, IEEE Transactions on Image Processing, 1994 pp. 583-587.
Hung et al., Error-resilient pyramid vector quantization for image compression, IEEE Transactions on Image Processing, vol. 7, No. 10, Oct. 1, 1998, pp. 1373-1386.
Cadel, et al. “Pyramid Vector Coding for High Quality Audio Compression”, IEEE 1997, pp. 343-346, Cefriel, Milano, Italy and Alcatel Telecom, Vimercate Italy.
Mittal et al., Coding unconstrained FCB excitation using combinatorial and Huffman codes, Speech Coding 2002 IEEE Workshop Proceedings, Oct. 1, 2002, pp. 129-131.
Ashley et al., Wideband coding of speech using a scalable pulse codebook, Speech Coding 2000 IEEE Workshop Proceedings, Sep. 1, 2000, pp. 148-150, Motorola Labs, Schaumburg, Illinois, USA.
Mexican Patent Office, 2nd Office Action, Mexican Patent Application MX/a/2010/004479 dated Jan. 31, 2002, 5 pages.
Bessette: “Universal Speech/Audio Coding using Hybrid ACELP/TCX Techniques”, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference, Mar. 18-23, 2005, ISSN: III-301-III-304, Print ISBN: 0-78.
United States Patent and Trademark Office, “Non-Final Office Action” for U.S. Appl. No. 12/844,199 dated Aug. 31, 2012, 13 pages.
Chinese Patent Office (SIPO), 1st Office Action for Chinese Patent Application No. 200980153318.0 dated Sep. 12, 2012, 6 pages.
European Patent Office, Supplementary Search Report for EPC Patent Application No. 07813290.9 dated Jan. 4, 2013, 8 pages.
Cover, T.M., “Enumerative Source Encoding” IEEE Transactions on Information Theory, IEEE Press, USA vol. IT-19, No. 1; Jan. 1, 1973, pp. 73-77.
Mackay, D., “Information Theory, Inference, and Learning Algorithms” In: “Information Theory, Inference, and Learning Algorithms”, Jan. 1, 2004; pp. 1-10.
Korean Intellectual Property Office, Notice of Preliminary Rejection for Korean Patent Application No. 10-2010-0725140 dated Jan. 4, 2013.
Related Publications (1)
Number Date Country
20090024398 A1 Jan 2009 US
Divisions (1)
Number Date Country
Parent 11531122 Sep 2006 US
Child 12196414 US