Claims
- 1. A data compression technique comprising:
from a first position in an input stream of data characters, searching in a preceding portion of said input stream of data characters from said first position for a sequence of a plurality of data characters that matches a sequence of a plurality of data characters at said first position; and at said first position, replacing the sequence of a plurality of data characters for which a matching sequence of a plurality of data characters was found in said preceding portion of said input stream with a reference to said matching sequence of a plurality of data characters in said preceding portion of said input stream; wherein said reference comprises an offset from said first position to a position in said preceding portion of said input stream at which said matching sequence of a plurality of data characters is located and a size of said matching sequence.
- 2. The method of claim 1 wherein said preceding portion is defined by a fixed number of characters relative to said first position.
- 3. The method of claim 1 further comprising the step of:
encoding those characters in said input stream that are not replaced by substitution codes.
- 4. The method of claim 1 further comprising the step of:
Huffman encoding those characters in said input stream that are not replaced by substitution codes.
- 5. The method of claim 1 further comprising the steps of:
advancing through said input stream to identify each sequence of a plurality of data characters that matches a sequence of a plurality of data characters in a preceding portion of said input stream; and at each location at which a sequence of a plurality of data characters is identified that matches sequence of a plurality of data characters in a preceding portion of said input stream, replacing said sequence of a plurality of data characters for which a matching sequence of a plurality of data characters was found with a reference to said matching sequence of a plurality of data characters in said preceding portion of said input stream, wherein each reference includes an offset from the location of the reference into the preceding input data stream and an indication of a number of characters replaced.
- 6. The method of claim 5 further comprising the step of:
Huffman encoding each offset.
- 7. The method of claim 5 further comprising the step of:
Huffman encoding each indication of a number of characters replaced.
- 8. A method of compressing a geographic database comprising:
advancing through a portion of the geographic database; identifying matching substrings of data in said portion; and when a substring of data is encountered that matches a previous substring in said portion, replacing the substring with a substitution code, wherein said substitution code comprises a backwards offset from said position at which said substitution code replaces said substring to the previous substring..
- 9. The method of claim 8 wherein said substitution code further comprises a data component that indicates the length of the matching substring.
- 10. The method of claim 8 further comprising the step of:
forming a compressed version of said geographic database that includes substitution codes replacing substrings of data for which matching substrings were identified.
- 11. The method of claim 10 further comprising:
prior to forming a compressed version of the geographic database, inserting literal length codes in at least said portion of said geographic database, wherein each of said literal length codes indicates a number of immediately following consecutive characters that are not substitution codes.
- 12. The method of claim 11 further comprising the step of:
replacing literal length codes with Huffman encoded representations thereof.
- 13. The method of claim 8 wherein said geographic database is comprised of data records that represent physical features in a geographic region and wherein the method further comprises the step of:
separating said geographic database into a plurality of parcels wherein each parcel II includes a plurality of data records, wherein the pluralities of data records in the plurality of parcels together comprise the geographic database, and wherein each of said plurality of parcels comprises a separate portion of the geographic database which is examined to find matching substrings.
- 14. The method of claim 13 wherein a previous matching substring is constrained to occur within the same parcel as the substitution code that refers thereto.
- 15. The method of claim 8 further comprising the step of:
replacing literal characters in said portion with Huffman codes.
- 16. The method of claim 8 further comprising the step of:
replacing backwards offsets with Huffman codes.
- 17. The method of claim 8 further comprising the step of:
replacing data components that indicate lengths of matching substrings with Huffman codes.
- 18. The method of claim 8 further comprising the step of:
encoding characters in said portion of the geographic database with compressed representations thereof; and storing an index in another portion of the database apart from the portion in which substrings were replaced by substitution codes, wherein said index associates each of said encoded characters with said compressed representations.
- 19. A method of forming a geographic database comprising:
separating a first plurality of data records into a plurality of groupings of data records, wherein each grouping includes a separate plurality of data records that are accessed together as a group when using the geographic database; with respect to each of said groupings, identifying matching substrings of data within said grouping; and when a substring of data is encountered at a position in a grouping that matches a previous substring in said grouping, replacing the substring with a substitution code.
- 20. The method of claim 19 wherein each substitution code comprises a backwards offset from the position of said substitution code to said previous matching substring.
- 21. The method of claim 19 further comprising:
prior to separating at least the first plurality of data records into a plurality of groupings, forming separate types of data records, wherein each type includes a separate plurality of data records; then, with respect to each type, separating the plurality of data records within the type into a plurality of groupings, each of which includes a separate plurality of data records of the given type which are accessed together as a group; and then further within each of said groupings, identifying matching substrings of data and replacing the substring with a substitution code.
- 22. The method of claim 19 further comprising the step of:
determining character occurrence frequencies within at least part of said geographic database; forming an index that associates characters with coded representations thereof based upon said occurrence frequencies; and replacing those characters within said at least part of said geographic database with said coded representations.
- 23. The method of claim 22 further comprising the step of:
storing said index in said geographic database.
- 24. The method of claim 22 further comprising the step of:
storing said index in a global portion of said geographic database.
- 25. A compression format for storing a collection of data on a medium, wherein said data are required to be decompressed to an uncompressed form in order to use the data for performing functions, the compression format comprising:
an arrangement of said collection of data wherein said collection is separated into a plurality parcels each of which includes a plurality of data items which form at least part of said collection, wherein said plurality of data items in each parcel are accessed together as a group in a given sequence; and a plurality of substitution codes included among said arrangement of a plurality of data items, each of said plurality of substitution codes including an offset from a position in said arrangement of a plurality of data items at which said substitution code is located into a position sequentially backwards therefrom.
- 26. The compression format of claim 25 wherein each of said plurality of substitution codes also include a substring length
- 27. The compression format of claim 25 wherein the offset of each of said plurality of substitution codes is constrained to reference a position sequentially backwards within the same parcel as the substitution code including said offset.
- 28. A method for decompressing a compressed data stream comprising:
starting at a first end of the compressed data stream, advancing through a portion of the data stream until encountering a substitution code that indicates a substitution substring length and an offset backwards into said data stream toward said first end; and forming an uncompressed output from said compressed data stream, wherein said uncompressed output comprises the portion of the data stream up to said substitution code and a substitution substring appended thereto, wherein said substitution substring corresponds to that part of said portion of said substitution substring length located at said offset from said substitution code within said portion.
- 29. The method of claim 28 further comprising:
after encountering said substitution code, continuing to advance through the data stream to a second end thereof, wherein said second end is opposite from said first end; and during said step of continuing to advance, as each substitution code is encountered, wherein each substitution code indicates a substitution substring length and an offset backwards into said data stream toward said first end, continuing to form the uncompressed output from said compressed data stream, wherein said uncompressed output comprises the portion of the data stream up to each substitution code and a substitution substring appended thereto, wherein each said substitution substring corresponds to that part of said portion of said substitution substring length located at said offset from said substitution code within said portion.
REFERENCE TO RELATED APPLICATION
[0001] The present application is a divisional of Ser. No. 09/153,996, filed Sep. 17, 1998, now U.S. Pat. No. 6,393,149 and Ser. No. 10/104,947 filed Mar. 22, 2002, the entire disclosures of which are incorporated herein by reference.
Divisions (2)
|
Number |
Date |
Country |
Parent |
09153996 |
Sep 1998 |
US |
Child |
10464717 |
Jun 2003 |
US |
Parent |
10104947 |
Mar 2002 |
US |
Child |
10464717 |
Jun 2003 |
US |