Data compression is a technique that enables data to be coded in order to minimize the number of bits required to represent the original data.
Adaptive lossless data compression (ALDC) is a technique, as the name suggests, that enables the compression to be performed in a dynamic manner without any data being lost, thus enabling the original data to be regenerated to exactly its original state during a decompression operation.
An ALDC system typically uses a content addressable memory (CAM), which comprises a history buffer that stores a dictionary of data sequences. Incoming file strings to be compressed are adaptively matched against the data stored in the history buffer, such that the original data is represented by a succession of matches with the dictionary.
Existing CAM based ALDC compression engines require match signals (or flags) to be created at each history buffer location. The match signals are often combined logically, for example OR'd together, with the result being used to drive out a control signal to all locations of the history buffer.
This means that in one clock cycle there can be a large fan-in of signals to an OR gate, and a large fan-out of signals back to all history buffer locations in a cascade arrangement. For compression engines with relatively small history buffers this does not present any issues. However, for large compression engines the large fan-in and fan-out can be disadvantageous.
For example, if the size of a history buffer is increased, for example from 1024 to 16384 bytes (i.e. a 16 times increase), current technology does not allow for fan-in and fan-out to be achieved in one cycle for such a history buffer.
For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example only, to the following drawings in which:
a further illustrates the copy pointer signals that may be used with the implementation of
b shows an example of the method steps that may be performed by the control logic of
c is a state machine further describing an implementation;
a shows an example of combinatorial logic that may be used with the implementation of
b shows a further example of combinatorial logic that may be used with the implementation of
The implementations described in the examples below provide a method and apparatus for use in adaptive lossless data compression (ALDC), for example ALDC used with a content addressable memory (CAM) having a history buffer. Although the various implementations are described in relation to a CAM having a history buffer, it is noted that the implementations may be used with any buffer memory having a set of storage locations that are capable of receiving data that is to be matched, and that are capable of generating corresponding match flags.
A content addressable memory comprises a history buffer that stores a plurality of bytes, with incoming sequences being compared with those bytes which are stored in the history buffer.
Consider a history buffer of a content addressable memory that comprises the sequence shown in Table 1 below stored in locations 1 to 12:
As a first example, consider that an input sequence comprising the sequence A B C D is compared with the contents of the history buffer of Table 1 (i.e. an input sequence comprising A (first), then B, then C, then D). Such a sequence would correctly match at locations 0, then 1, then 2 and finally 3. This matching sequence would result in a “copy pointer” starting at address 0 lasting 4 bytes.
As a second example, consider that a new input sequence B C D E is compared with the contents of the history buffer of Table 1. Such a sequence would match as follows:
This results in a copy pointer starting at location 1 and lasting 4 bytes.
As a third example, consider that a new sequence A B C D K L M is compared with the contents of the history buffer of Table 1. Such a sequence would match as follows:
It will be noted that B also matches at location 7. However, the matching of B with location 7 is ignored because a match sequence has already been started at location 0. This is because no match can commence if an existing matching sequence continues. In other words, a new sequence is not allowed to start when there is a currently active matching sequence.
Thus, when considering the sequence A B C D K L M with the contents of the history buffer shown in Table 1, it can be seen that this received sequence results in a copy pointer starting at address 0 lasting 4 bytes. However, it can be seen that, had the matching sequence been started when B matched at location 7, then this would have resulted in a copy pointer starting at address 7 lasting 6 bytes (i.e. because the sequence B C D K L M matches with the contents of locations 7 to 12).
As such, the conventional hardware is not able determine that the sequence ABCD followed by KLM is a worse set of codewords than A followed by BCDKLM.
The apparatus 101 comprises a plurality of AND gates 1090 to 109n, each AND gate 1090 to 109n coupled to receive the output of a corresponding comparison unit 1050 to 105n on a first input. Each AND gate 1090 to 109n is also coupled to receive the output of an OR gate 1110 to 111n on its second input. Each OR gate receives a control signal “A” as a first input (“A” being termed an “ALLOW” signal in the art), and a match signal “m” of a preceding memory location (i.e. m(x−1)) on a second input. The first OR gate 1110 will have its second input coupled to the match signal m(n) of the last memory location 105n in the history buffer 103, thus forming a “circular” arrangement. For example, for a 1024 element history buffer, match signal m(1023) will be coupled to the second input of OR gate 1110, the match signal m(n) thereby being the effective “preceding” match signal m(x−1). The apparatus 101 also comprises a plurality of delay units 1130 to 113n (for example D-type Flip Flops). Each delay unit 1130 to 113n is coupled to receive the output of a corresponding AND gate 1090 to 109n, and output a match signal m(0) to m(n).
A NOR gate 115 receives the plurality of match signals m(0) to m(n) and generates the control signal A (i.e. which is coupled to the first input of each OR gate 1110 to 111n).
The operation of the circuit shown in
Referring to
Because the match signals m(0) to m(n) are no longer all at zero, the output of the NOR gate 115 (i.e. the control signal A) goes to logic 0. This prevents any new matches from starting.
It can be seen that the match signal bus (the m-bus) has identified a 3-byte sequence, starting at location 0 and ending at location 2.
Referring to
Although input A matches at address 0, because the control signal A is at 0 and there is no incoming match from below, the match signal m(0) does not become set.
It can therefore be seen that the arrangement of
A history buffer 403 is shown as having a plurality of memory locations 4050 to 405n. In the example, the history buffer 403 is illustrated as having the values A, B, C, D and E in memory locations 4050 to 4054. A plurality of comparison units 4070 to 407n each determine whether or not a value of an input sequence received on an input line d2m matches a value of a corresponding memory location 4050 to 405n. The apparatus 401 comprises a plurality of AND gates 4090 to 409n, each AND gate 4090 to 409n coupled to receive the output of a corresponding comparison unit 4050 to 405n on a first input. Each AND gate 4090 to 409n is also coupled to receive a sequence signal s(x−1) of a preceding memory location on its second input. The first AND gate 4090 will have its second input coupled to the sequence signal s(n) of the last memory location 405n in the history buffer 403, thus forming a “circular” arrangement. For example, for a 1024 element history buffer, sequence signal s(1023) will be coupled to the second input of AND gate 4090, the sequence signal s(n) thereby being the effective “preceding” sequence signal s(x−1).
The apparatus 401 further comprises a plurality of multiplexer units 4110 to 411n. Each multiplexer unit 4110 to 411n is coupled to receive the output of a corresponding comparison unit 4050 to 405n on a first input, and the output of a corresponding AND gate 4090 to 409n on a second unit, the outputs of the AND gates 4090 to 409n being the match signals m(0) to m(n). Each multiplexer unit 4110 to 411n is controlled by a control signal A, and in the example passes the output of the corresponding AND gate 4090 to 409n, i.e. the match signal m(0) to m(n) when the control signal A is at 0, and passes the output of the corresponding comparison unit 4050 to 405n when the control signal A is at 1.
The apparatus 401 also comprises a plurality of delay units 4130 to 413n (for example D-type Flip Flops). Each delay unit 4130 to 413n is coupled to receive the output of a corresponding multiplexing unit 4110 to 411n, and provide a corresponding sequence signal s(0) to s(n).
A NOR gate 415 receives the plurality of match signals m(0) to m(n) and generates the control signal A (i.e. which controls the multiplexer units 4110 to 411n).
If there are no matches at all, then the match signal bus (i.e. the m-bus) will be at 0 (i.e. all match signals m(0) to m(n) are at 0), which means that the NOR gate 415 sets the control signal A to 1. This condition allows any match to become registered as a sequence. However, once a match sequence starts, then the control signal A becomes 0, and as a consequence no new match sequences can start. Once the end of a match sequence is reached, then the control signal A will become 1 again.
Referring to
It is noted that the sequence bus (i.e. comprising sequence signals s(0) to s(n)) denotes match sequences:
It is noted that the arrangement shown in
Such an arrangement has the disadvantage of having a large fan-in and a large fan-out that must be fanned-in and fanned-out in a cascaded arrangement within one clock cycle.
As mentioned in the background section, while a fan-in and fan-out of this type might be acceptable in a history buffer comprising 1,024 bytes, the fan-in and fan-out becomes more of an issue for a larger history buffer, for example a history buffer comprising 16,384 bytes.
During operation each incoming byte is written into the history buffer 601 in sequential address locations, i.e. sequentially from memory location 6050 to memory location 605n (for example 0 . . . 16383, 0 . . . 16383, etc). Each contiguous segment 6030 to 603P is configured to look for matches independently and in parallel with other contiguous segments 6030 to 603P. As such, the matching operation involving a fan-in and fan-out is carried out within each of the smaller contiguous segment 6030 to 603P, rather than across the entire history buffer 601. The number of memory locations handled by each contiguous segment 6030 to 603P can therefore be chosen to provide a desired fan-in or fan-out. For example, the configuration of the contiguous segments 6030 to 603P can be chosen such that a maximum fan-in or fan-out is not exceeded.
When matches are found, each contiguous segment 6030 to 603P generates respective copy pointer signals 6070 to 607P. Control logic 609 is coupled to receive the copy pointer signals 6070 to 607P from the plurality of contiguous segments 6030 to 603P. The control logic 609 is configured to select and prioritise the copy pointer signals 6070 to 607P, as will be described in greater detail below. The control logic 609 is also adapted to truncate overlapping match lengths from parallel contiguous segments 6030 to 603P as required.
Although match sequences are truncated at segment boundaries, the inventors have found that in practice a large percentage of match sequences (typically 97%) have lengths below 16 bytes. The distribution of match sequences is massively skewed towards short match lengths with a “long tail”. As a consequence, the probability of a match sequence hitting a segment boundary is quite low, and thus only has a negligible impact on the compression ratio. For example, for a 16384 byte history buffer, when using the standard Calgary Corpus data set the compression ratio reduces from 2.549 to 2.546, which is a negligible effect.
Referring to
According to another implementation, the control logic may be configured to detect the first match, or multiple matches if more than one match sequence commences at the same time, and then continue selecting whichever one of these matches lasts the longest. When the match ends (or the longest match ends in the case of multiple matches), the control logic 609 may be configured to check whether there is another match (or matches) in progress, or commencing, on another of the continuous segments 6030 to 603p, i.e. by monitoring the sequence in progress signals 7010 to 701p. The control logic can be configured to continue selecting such a match or matches. In the event that multiple matches are already in sequence and/or commencing, the control logic 609 can be configured to select the longest match until that match ends. This process can be repeated until such time as there are no longer any matches in progress, or commencing, on another contiguous segment when the current match being monitored ends. For example, in the example of
Once the control logic 609 has determined that the longest match has ended, i.e. sequence in progress signal 7010 completed at time T3), the control logic 609 checks whether any other matches are currently in progress on the other contiguous segments, and in the example determines that sequence in progress signals 7011, 7012 and 7013 meet this criteria. The control logic monitors each of these matches and continues with the longest one of these match signals, i.e. T3 to T5 of sequence in progress signal 7011 in
This sequence of operations can continue at time T5, whereby the control logic 609 determines whether any other matches are currently in progress in any one or more of the plurality of other contiguous segments 6030 to 603p, until such time as no other matches in progress are found. In the example of
Since the sequence in progress signal 7010 was already in progress (i.e. from time T1 to T3) when the sequence in progress signal 7011 matches from time T2 to T5, the copy pointer for signal in progress signal 7011 is truncated to T3-T5.
As such, the resultant copy pointers for the example in
b shows the method steps performed by the control logic 609 in further detail.
In step 801 the control logic 609 is configured to detect a first match or matches to occur in the plurality of contiguous segments, and store a copy pointer (START) for each contiguous segment having a match. In step 802 the control logic 609 is configured to monitor whether the match has ended. In the case where multiple matches were detected in step 801, this involves monitoring when the longest of such matches has ended. When the match or longest match has ended, in step 803 the control logic is configured to use a copy pointer (END) for that particular contiguous segment. The control logic 609 then checks in step 804 whether any of the other of the plurality of contiguous segments have a match in progress. If not, the copy pointer (START) and copy pointer (END) already obtained are used as the copy pointer signals.
However, if it is determined in step 804 that one or more of the other contiguous segments does have a match sequence in progress (or commencing), then that point in time is used as a copy pointer (START) signal for each of such matches, step 806. As such, any contiguous segments already having a match in progress will have its original copy pointer (START) signal truncated to the new copy pointer (START) signal determined in step 806. In the example of
The procedure above is then repeated in steps 802, 803, 804 and 806 until such time as there are no other matches in progress or commencing when a particular match comes to an end in step 804, in which case processing moves to step 805 where the already gathered copy pointer (START) and copy pointer (END) signals for each relevant contiguous segment are used as the copy pointer signals.
c shows a state diagram corresponding to the operations described above. The reading of the state diagram will be clear to a person skilled in the art, using the syntax and actions explained below:
a shows an example of the combinatorial logic that may be used to generate a match flag MFn for one memory location in the history buffer. It is noted that the implementations described herein are not limited to the combinatorial logic described in
The combinatorial logic shown in
The multiplexer 903n is coupled to receive the following input signals:
The multiplexer 903 is configured to pass one of the input signals according to the status of control signals “sel1” and “sel0”. For example, the multiplexer 903n may be configured to pass the input signals according to the truth table shown below:
The output of the multiplexer is latched using a latch 905n, the output of which provides the match flag MFn.
b shows an example of the combinatorial logic that may be used to generate the control signals sell and sel0 for a particular contiguous segment. Such circuitry exists for each segment, and it is therefore noted that the 1024 inputs to the OR gate 909 are merely an example corresponding to how a history buffer having 16383 bytes, for example, may be split into 16 segments. It is noted that the implementations described herein are not limited to the combinatorial logic described in
Referring to
The output of the AND gate 917 generates the control signal sel1, and the output of AND gate 919 generates the control signal sel0.
The implementations described above have the advantage of enabling higher transfer rates with improved compression ratios to be achieved using cost effective ASIC hardware.
The implementations can also be applied retrospectively to existing products for cost saving or improved speed performance.
Although the implementations have been described using certain logic gates to provide a desired function, it will be appreciated that other logic gates may be used which are configured differently, while providing the same logic function, without departing from the scope of the appended claims.
As such, it should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word “comprising” does not exclude the presence of elements or steps other than those listed in a claim, “a” or “an” does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.