This invention relates to coding of data.
A binary constant weight code is a code where each member of the code (i.e., each codeword) has the same number of 1's. Constant weight codes have numerous applications.
A conventional general purpose technique for encoding data into constant weight codes is based on a recursive expression for determining the lexicographic index of an element of a codebook. The operation of encoding is equivalent to determining the codeword, given its index, and the operation of decoding is equivalent to determining the index, given the codeword. If b=(b1, b2, K, bn) is used to denote the codeword, biε{0, 1}, the lexicographic index v(b) is
where wm is the number of ones in the m-bit prefix of b. See T. M. Cover, “Enumerative source encoding,” IEEE Trans. Information Theory, vol. 19, no. 1, pp. 73-77, January 1973; and J. P. M. Schalkwijk, “An algorithm for source coding,” IEEE Trans. Information Theory, vol. IT-18, pp. 395-399, May 1972. The resulting code is fully efficient, but the complexity of the technique limits its direct application to small block lengths. This is mainly due to the fact that the binomial coefficients in (1) become extremely large, requiring extended precision arithmetic to prevent overflow errors.
Arithmetic coding is an efficient variable length coding technique for finite alphabet sources. Given a source alphabet and a simple probability model for sequences, with p(x) and F(x) denoting the probability distribution and cumulative distribution function of sequence x, respectively, an arithmetic encoder represents x by a number in the interval [F(x)−p(x),F(x)]. The implementation of such an arithmetic coder can also run into problems with very long registers, but elegant finite-length implementations are known and are widely used. See I. H. Witten et al., “Arithmetic coding for data compression,” Communications of the ACM, vol. 30, pp. 520-540, June 1987. For constant weight codes, the idea is to reverse the roles of encoder and decoder, i.e., to use an arithmetic decoder as an encoder and an arithmetic encoder as a constant weight decoder. An efficient algorithm for implementing such codes using the arithmetic coding approach is given in T. V. Ramabadran, “A coding scheme for m-out-of-n codes,” IEEE Trans. Communications, vol. 38, no. 8, pp. 1156-113, August 1990. The probability model used by the coder is adaptive, in the sense that the probability that the incoming bit is a 1 depends on the number of 1's that have already occurred. This approach successfully overcomes the finite-register-length constraints associated with computing the binomial coefficients and the resulting efficiency is often very high, the loss of information bits being one bit or less, in most cases. The encoding complexity of the method is O(n).
A different method for encoding and decoding balanced constant weight codes was developed by Knuth, as described in D. E. Knuth, “Efficient balanced codes,” IEEE Trans. Information Theory, vol. 32, no. 1, pp. 51-53, January 1986, and is referred to as the complementation method. The method relies on the key observation that if the bits of a length-k binary sequence are complemented sequentially, starting from the left, there must be a point at which the weight is equal to └k/2┘. Given the transformed sequence, it is possible to recover the original sequence by specifying how many bits were complemented (or the weight of the original sequence). This information is provided using check bits of constant weight, and the resulting code consists of the transformed original sequence followed by the constant weight check bits.
In a series of papers, Bose and colleagues extended Knuth's method in various ways, and determined the limits of this approach. See, for example, J.-H. Youn and B. Bose, “Efficient encoding and decoding schemes for balanced codes,” IEEE Trans. Computers, vol. 52, no. 9, pp. 1229-1232, September 2003, and the references therein. Knuth's method is simple and efficient, and even though the overall complexity is O(n), for n=100 it can be eight times as fast as the method based on arithmetic codes. However, this method only works for balanced codes, which restricts its applicability.
In light of the available prior art, what is still needed is an effective and fast method for encoding and decoding constant weight codes that is not restricted in its applicability.
An advance in the art is realized with a method that employs a piecewise linear algorithm, P, to map m-dimensional symbols into code tuples, followed by the construction of codes of weight m from the code tuples. To reverse the operation, constant weight codes are converted to code tuples, and a reverse piecewise linear algorithm P′ is used to map the code tuples into symbols, from which data is recovered. The m-dimensional symbols are obtained from mapping of input data into the symbols, which are contained within an m-dimensional parallelopiped, with each coordinate having a different span but in which the symbols along each of the coordinate axes are equally spaced apart. The code tuples, which are obtained by employing process P, are contained within an m-dimensional simplex.
A binary constant weight five-bit code of weight 2 is a code whose members (code words) have 5 bits each, and precisely 2 of the bits are 1's. This is illustrated in the first (left most) column below:
This code can be described by two-number tuples as shown in the second column in the above table, where each number describes the ordinal position of the “1” in the code. Thus, the (3,4) tuple (third row of the table), for example, states that there is a 1 in the third and fourth bit (counting from the left) of the associated code word. Henceforth herein, tuples that describe a code word in a constant weight code are referred to as code tuples.
It may be noted that the codewords in the above table are effectively ordered in descending order, from one row to the next row, and that the numbers in the tuple are ordered in an ascending order (viewed from the left). If the first number is designated by y1 and the second number is designated by y2, then one can say that
0<y1<y2≦n, (2)
where n is the number of bits in a code word, or the code length. It may also be noted that the code tuples reduce dimensionality; in this case, from 5 to 2, and that the two-dimensional tuples, when normalized to 1 (i.e., all numbers are divided by 5) and depicted in a two-dimensional graph, occupy a triangle, as shown in
What can be further realized is that a constant weight code of weight 3 may be described by tuples having 3 numbers each which, when depicted in three dimensional space, are enclosed in a three dimensional polyhedron, a tetrahedron, with each of the four faces being right triangles.
Extending the above concepts to w dimensions, for weight w codes, one can realize that the code tuples, having w numbers each, are circumscribed by a w-dimensional simplex having an edge path consisting of w successive orthogonal vectors. A simplex with these properties is called an orthoscheme (H. S. M. Coxeter, Regular Polytopes, 3rd ed., Macmillan, 1968.)
The process of coding data bits can be viewed as a process of mapping those data bits into points of the simplex that are tuples representing the codewords; and once the tuples are identified, mapping the tuples to the binary codewords. Two difficulties arise in mapping data bits into points of the simplex. First, the number of code tuples in a code is not a power of two. For example, for the weight 2, n=5, constant weight code (
We realized, however, that if a bijective function, or mapping, P (and its inverse, P′) can be found1, between the w-dimensional orthoscheme and a w-dimensional parallelopiped (a w-dimensional “brick”), and then the process of coding and decoding data becomes straightforward, and efficient. 1 P′(P(a))=a where a is in A and P′(P(b))=b where b is in B, and A and B are disjoint sets.
To illustrate, as demonstrated above, the code tuples of a constant weight code of length 10 and weight 2 belong to a right triangle 10 of
Given an incoming stream of symbols defined by number pairs a1 and a2 (with dynamic ranges 0 to 4 and 0 to 8 respectively), mapping number pairs to points in the space defined by regions 12 and 13′ (herein, symbols) is quite simple. What is left, then, is to map the symbols in the space defined by region 13′ of
0<x1≦1, and ½<x2≦1 (3)
by construction, it follows that:
if x1≧x2
then set x′1=1−x1 and x′2=1−x2+1/n (4)
else set x′1=x1 and x′2=x2
Similarly for codes of weight 3, an inductive process exists for converting the orthoscheme to a “brick.” This is illustrated in
Algorithmically, the same result is achieved by first handling the mapping of a point's x1 and x2 coordinates, and then handling the mapping of the x3 coordinate.
It is hard to visualize dissections in dimensions greater than 3, and even harder to visualize the necessary mapping of points from the brick (into which the symbols of a block of data are mapped) to the simplex (to create code tuples), and vice versa, so an alternative approach for visualization is needed.
Returning to two dimensions, as can be seen from
Once the relationship of x1 and x2 is properly set; that is, insuring that x1 is smaller than x2, one proceeds to the third dimension, to handle x3.
Our aim is to convert the orthoscheme of
x1<x2<x3
x1<x3<x2 or (5)
x3<x1<x2
The set of operations that are depicted in
We discovered a piecewise algorithm that is not only simple and reversible, but also contains information within the symbols that informs the user as to how to map forward, and also within the code tuples to inform the user as to how to reverse the map. The mappings shown in
Before delving into the algorithm's equations, let us observe that the added dimension, x3, is last in the order of
To reiterate, we discovered a piecewise algorithm that is simple and reversible, and also inherently relies on the data to determine how the forward and reverse mappings are to be carried out. Moreover, the algorithm applies to dimensions higher than 3, meaning that may be used for constant weight code of any desired weight. The following describes the algorithm in mathematical terms which, as indicated above, is iterative in the sense that it starts with handling 2 coordinates, then handles the third coordinate, then the fourth coordinate, etc.
Expressed formally, the problem is to find a bijection between set Aw and Bw, assuming that the required bijection between Aw-1 and Bw-1 is already known. The induction is advanced by finding a bijection between
and Bw (where the × designates the Cartesian product of two sets). The wth step in the forward mapping,
is described by the following.
The input to the forward mapping is the vector (x1, x2, x3, K, xw) where (x1, x2, x3, K, xw-1)εBw-1 and
The mapping produces the vector (x′1, x′2, x′3, K, x′w) where (x′1, x′2, x′3, K, x′w-1)εBw-1,
Forward mapping: fw
The above piecewise equation identifies the shift and switch operations required to obtain x′k for different ranges of the variable k. We follow the convention that if the starting index of a range of k-values is smaller than the ending index, the range is empty, and the corresponding transformation is not carried out. Also if an index for x is not in the range 1, . . . , w, it is regarded as a void index, and thus voids the operation. Note that i0=w implies j0=0, in which case Step 2 is the identity.
The next algorithm describes the wth step in the inverse mapping gw to recover symbols from code tuples:
where the input to the mapping is the vector (x′1, x′2, x′3, K, x′w)εBw. The output is the vector
where x=(x1, x2, x3, K, xw).
Inverse Mapping: gw
and let i0=j0+m0.
3) then x is obtained from x′ by:
To apply the above algorithm to the problem of encoding and decoding constant weight codes, positive integers must be used, and this results in a certain rate loss. The algorithms remain largely unchanged. In a manner analogous to the real-valued case, we find a bijection between AwN⊂Nw and BwN⊂Nw for given w and n (n>2w), where
and BwN={(y1, y2, . . . , yw)εNw: 1≦y1≦y2< . . . <yw≦n}. Note that usually |AwN|≦|BwN|, which means rate loss is generated.
The following algorithm provides the forward mapping, i.e.,
Given w and n=pw+q, where p≧0 and 0≦q≦w−1, we divide the range 1, 2, . . . , n into w partitions, where the first n−w−l partitions each have p elements, and the next q partitions each have p+1 elements, and the last partition hasp elements, which makes up the total n elements.
where Ti=(w−i0+i−1)p+max(q−i0+i, 0).
The following algorithm provides the inverse mapping
Again, assume n=pw+q.
where Si=q+(i−1)p+min(i−q−1,0).
The overall complexity of the transform algorithm is O(w2), because at each induction step, the complexity is linear in the weight at that step. Recall that the complexities of the arithmetic coding method and Knuth's complementation method are both O(n). Thus when the weight w is larger than √{square root over (n)}, the geometric approach is less competitive. When the weight is low, the proposed geometric technique is more efficient, because Knuth's complementation method is not applicable, while the dissection operations of the proposed algorithm makes it faster than the arithmetic coding method. Furthermore, due to the structure of the algorithm, it is possible to parallelize part of the computation within each induction step to further reduce the computation time.
So far little has been said about mapping a binary sequence to an integer sequence y1, y2, . . . , yw such that y1ε|Li, Ui|, where Li and Ui are the lower and upper bound of the valid range as specified by the algorithm. A straightforward method is to treat the binary sequence as an integer number and then use “quotient and remainder” method to find such a mapping. However, this requires a division operation, and when the binary sequence is long, the computation is not very efficient. A simplification is to partition the binary sequence into short sequences, and map each short binary sequence to a pair of integers, as in the case of a weight two constant weight codes. Through proper pairing of the ranges, the loss in the rate can be minimized.
The overall rate loss consists of two parts, where the first part is from the rounding in using natural numbers, and the second is from the loss in the above simplified translation step. However, when the weight is on the order of √{square root over (n)}, and n is in the range of 100-1000, the rate loss is usually 1-3 bits per block. For example, when n=529, w=23, then the rate loss is 2 bits/block compared to the best possible code which would encode k0=132 information bits.
Number | Name | Date | Kind |
---|---|---|---|
4569050 | Ohme | Feb 1986 | A |
4613860 | Currie et al. | Sep 1986 | A |
6462898 | Blaum et al. | Oct 2002 | B2 |
6661355 | Cornelius et al. | Dec 2003 | B2 |
7075461 | Kim | Jul 2006 | B2 |
7164373 | Shim et al. | Jan 2007 | B2 |