This invention addresses the need to transport high bit-rate text to multiple users over wired and wireless means. Specifically, this disclosure describes a dynamic pattern elimination compression method to eliminate redundant patterns, the content of which is not known a priori.
Any text-based protocol would have predefined keywords with special purposes that are agreed between parties to communicate with each other. A trivial way used to reduce the size of messages is to use shorter forms to replace those long, predefined keywords. However, there may still be text patterns that are repeated or redundant in a message.
The existing technologies of text-based compression can be categorized into two different groups. One is dictionary-based and another one is to use a standard compression algorithm such as Huffman codes. Dictionary-based techniques usually use static dictionaries that are created before transmission of a message and/or dynamic dictionaries that are included in the message. Those techniques include U.S. Ser. No. 6,976,081, U.S. Ser. No. 5,999,949, U.S. Ser. No. 7,412,541, U.S. Ser. No. 6,807,173, U.S. Ser. No. 6,883,035, and U.S. Ser. No. 6,976,081. Replacing the longer words with a shorter form is a simple example of using a static dictionary at both the compressor and the decompressor. This disclosure proposes a method, Dynamic Pattern Elimination, to eliminate redundant patterns the content of which is not known a priori. The proposed method identifies the redundant patterns on the fly and does not require any dictionary.
This invention addresses the need to transport high bit-rate text to multiple users over wired and wireless means. Specifically, this disclosure describes a dynamic pattern elimination compression method to eliminate redundant patterns, the content of which is not known a priori.
For a fuller understanding of the nature and objects of the invention, reference should be made to the following detailed description taken in connection with the accompanying drawings.
For a fuller understanding of the nature and objects of the invention, reference should be made to the accompanying drawings, in which:
This disclosure describes a method to achieve a higher compression ratio than by just replacing known longer patterns with shorter forms. The preferred embodiment is specifically designed for a wireless environment as a wireless link is prone to errors. With a smaller message size, one has a higher probability of successful transmission as well as reduced latency over the wireless link.
The basic idea is to identify duplicate patterns that cannot be known before hand. However, those patterns and the location may be predicated. Therefore, one uses a regular expression to identify the candidate patterns at the first stage, and remove duplicate patterns in the next stage. In this disclosure SIP signaling protocol is used as the preferred embodiment to illustrate the compression method.
In order to remove duplicate dynamic patterns, one first needs to identify them. This is done by inserting a marker before a candidate pattern so that it can be analyzed later. Note that the representation of markers is chosen such that they would not appear in normal SIP messages. Examples and the notations shown in this document are for preferred embodiment purposes only and other notations can be easily substituted by those skilled in the art. After analyzing characteristics of SIP messages, the inventors of this application found the IP address and User name patterns have a higher probability of being repeated at several points within a message. For example, below are regular expressions to identify and insert markers for IP address and user name:
IP address—s/([: ;\″@])([0-9\.]+)([: ;\″>]|\r)Λ1̂\2˜\3/g
User name—s/([:\″])([a-zA-Z0-9\.]+)([\″@])Λ1̂\2˜\3/g
Note that additional identifications of dynamic patterns could be added later as discussed below.
After identifying the candidate dynamic patterns, one checks to see if there are any duplicate occurrences within the entire message using the following steps.
At the decompressor, one only needs to find the markers and restore each pattern corresponding to a marker. A special marker, ̂ in the example above, is used to indicate the beginning of a pattern and the corresponding variable. By doing so, the decompressor is able to reconstruct the mapping between variables and patterns. If the decompressor finds the variable in the message, it could replace it with the pattern it found. As the purpose of a marker is to identify possible duplicate patterns, we could add identification of dynamic patterns later without breaking compatibility because the additional markers are inserted by the compressor, and the decompressor could still decompress the message with additional markers.
This application disclosed a general approach to eliminate duplicate patterns in text-based protocol. The regular expression is used to identify candidate patterns to be removed. Then one examines the message for special markers and variables to compress and decompress the message. The advantages of this method include:
Since certain changes may be made in the above described dynamic compression method for text based signaling protocols without departing from the scope of the invention herein involved. It is intended that all matter contained in the description thereof, or shown in the accompanying figures, shall be interpreted as illustrative and not in a limiting sense.
The present application claims the benefit of previously filed co-pending Provisional Patent Application, Ser. No. 61,269,951.
Number | Date | Country | |
---|---|---|---|
61269951 | Jul 2009 | US |