The present invention relates to message compression; more specifically, message compression and sharing according to a derived compression dictionary.
With the increase in Internet use over recent years, Internet and network bandwidth capabilities are being increasingly taxed. Within this increased usage, network users are requesting and receiving more and more. Further, businesses are conducting more and more business over the Internet and other large scale networks. This growth in use has come at the cost of communication and processing efficiency which has caused increased network latency.
One method of solving the resultant increase in network latency has been to compress messages between nodes on networks. For example, gzip (Lempel-Ziv) compression, described in IETF RFC 1952, is one such method. HTTP 1.1 defined in IETF RFC 2616 defines content encodings based on gzip compression. Many web browsers are capable of accepting compressed HTTP content, some web servers are capable of delivering either statically or dynamically compressed HTTP content.
However, compressed messages, such as those compressed using gzip compression, include a compression dictionary within the message stream that is necessary for message decompression. This included compression dictionary adds to the size of the message and reduces the effectiveness in reducing network traffic, the speed of message transport across networks, and increases the use of processor bandwidth.
The following drawings are various representations of embodiments of the invention. Other embodiments are within the scope of the claims herein.
In the following description and the drawings illustrate specific embodiments of the invention sufficiently to enable those skilled in the art to practice it. Other embodiments may incorporate structural, logical, electrical, process, and other changes. Examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. The scope of the invention encompasses the full ambit of the claims and all available equivalents. The following description is, therefore, not to be taken in a limited sense, and the scope of the present invention is defined by the appended claims.
The leading digit(s) of reference numbers appearing in the Figures generally corresponds to the Figure number in which that component is first introduced, such that the same reference number is used throughout to refer to an identical component which appears in multiple Figures. Signals and connections may be referred to by the same reference number or label, and the actual meaning will be clear from its use in the context of the description.
The functions described herein are implemented in software in one embodiment, where the software comprises computer executable instructions stored on computer readable media such as memory or other type of storage devices. The term “computer readable media” is also used to represent carrier waves on which the software is transmitted. Further, such functions correspond to modules, which are software, hardware, firmware of any combination thereof. Multiple functions are performed in one or more modules as desired, and the embodiments described are merely examples.
One or more of the servers 102 hold a compression dictionary 104, which is available for download to the other servers 102 and workstation clients 106 connected 124 to the network 122. Compression dictionary 104 is used by operable software 105 on both servers 102 and client workstations 106 for compressing and decompressing a message for communication over the network 122. Operable software 105 has instructions for encoding and decoding messages according to compression dictionary 104, wherein the compression dictionary 104 maps character segments to character codes. In some embodiments of system 100, the character segments are Extensible Markup Language (XML) tags. In one such embodiment, the character codes are single characters that are mapped to commonly occurring XML tags.
In some embodiments, system 100 further comprises software 101 for the generation of a compression dictionary on one or more of the servers 102 and/or clients 106. In some embodiments, the software 101 for generating a compression dictionary 104 comprises instructions for identifying and extracting character segments of one or more files, wherein the character segments appear one or more times in the one or more files and instructions for creating a compression dictionary 104 based on extracted character segments from the one or more files, wherein the compression dictionary 104 maps the extracted character segments to a character code. In one such embodiment of system 100, the character segments are Extensible Markup Language (XML) tags. In one such embodiment, the character codes are single characters that are mapped to commonly occurring XML tags.
In some embodiments, system 100 may be implemented with servers 102 utilizing one of many available operating systems. Servers 102 may also include, for example, machine variants such as personal computers, handheld personal digital assistants, RISC processor computers, MIP single and multiprocessor class computers, and other personal, workgroup, and enterprise class servers. Further, servers 102 may also be implemented with relational database management systems and application servers. Other servers 102 may be file servers.
Client workstations 102 within embodiments of system 100, may include personal computers, computer terminals, handheld devices, mobile phones, household appliances, and wristwatches. Client workstations 102 include software thereon for performing operations in accordance with received messages. For example, a client workstation 102 may include a web browser for displaying web pages.
The network 122 within an embodiment of a system 100 may include a Local Area Network (LAN), Wide Area Network (WAN), or other similar network 145 within network 122. Network 122 may itself be a LAN, WAN, the Internet, or other large scale regional, national, or global network or a combination of several types of networks. Some embodiments of system 100 include a LAN, WAN, or other similar network 145 that utilizes one or more compression dictionaries 150 on servers 152 and clients 155 behind a firewall 160 within the LAN, WAN, or other similar network 145.
In the present embodiment of method 300, the method further comprises searching 310 for XML tags in the file based on character sequences. When an XML tag is found, method 300 determines 312 if the XML tag has been previously identified. If not, method 300 writes 314 the XML tag to the compression dictionary 104 with a character code. Method 300 proceeds once the XML tag is added to the compression dictionary 104 or if the identified XML tag already existed in the dictionary 104, the method determines 316 if the end of the file has been reached. If not, the method again searches 310 for XML tags and proceeds until the end of the file is reached.
Once the end of the file is reached, the file is removed 318 from memory by performing an operating such as closing the file and determining 320 if there are files remaining to be processed. If there are files remaining to be processed, the next file is read 306 into memory and method 300 proceeds until all files have been processed. Once all of the files have been processed, the compression dictionary 104 is written 322 to a non-volatile memory or storage location so as to preserve the compression dictionary 104.
In further accord with an embodiment of method 300, preparing 308 the files for processing may include in some embodiments, removing white space from a file, removing certain characters, or changing other attributes or properties of the file or its contents. Preparation 308 of the file is performed in some embodiments as a normalizing technique to make a file ready for further processing according to the specific requirements of a specific embodiment of method 300.
Character sequences include sequences such as “<****>” wherein the asterisks indicate any character between the characters “<” and “>” as frequently used in XML. Other character sequences may be relevant in other embodiments of method 300, and other embodiments of method 300 for generating a compression dictionary for a language or purpose other than XML are readily apparent to one skilled in art.
Another embodiment of a method 400 for the creation of a compression dictionary is shown in
In one embodiment where a request is decompressed directly to DOM, the compression dictionary maps compression entries to DOM elements. Thus, when compressing from DOM, rather than converting DOM elements to XML compression entries, as in some other embodiments, the DOM elements are converted to DOM compression elements that are later decompressed directly to the original DOM elements.
In some embodiments of a method 550, one or more compression dictionaries 104 are generated 560 from a Web Services Description Language (WSDL) definition for an entire system interface. In such an embodiment, the one or more compression dictionaries 104 are available on one or more network resources to allow for compression of messages sent between nodes on a network using one or more synchronized compression dictionaries.
In some embodiments, processing an XML request comprises querying a database, such as a relational database management system (RDBMS), a file server, or a flat file, resident or stored on the same server 102 or another server connected to the network 122.
A block diagram of a computer system (i.e. server 102 or client 106) that executes programming for performing the above method is shown in
Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 1102 of the computer 1110. A hard drive, CD-ROM, and RAM are some examples of articles including a computer-readable medium. For example, a computer program 1125 capable of providing a generic technique to perform access control check for data access and/or for doing an operation on one of the servers in a COM based system according to the teachings of the present invention may be included on a CD-ROM and loaded from the CD-ROM to a hard drive. The computer-readable instructions allow computer system 1100 to provide generic access controls in a computer network system having multiple users and servers, wherein communication between the computers comprises utilizing XML, Simple Object Access Protocol (SOAP), and Web Services Description Language (WSDL).
To achieve the quicker compression results using the compression systems and methods described above, a compression dictionary may be cached at both the client and server ends of a web service conversation. This is possible using web services defined in advance using WSDL (web services definition language). The WSDL definition of a web service can be used to determine commonly invariant XML tags used in SOAP messages passed back and forth between web service clients and servers. Using this information a compression dictionary can be produced, distributed and cached for future re-use by both clients and servers.
Web services commonly publish their WSDL definitions alongside their service endpoints (e.g. http://some.service.com/printme?WSDL). They could also publish compression dictionaries derived from this WSDL.
Thus, a client might send an HTTP get request to http://service.com/printme?WSDICT in order to retrieve a compression dictionary for the printme web service. If the client already had a copy of the compression dictionary (perhaps generated when the client was built or obtained from a different server then it could verify that it was using the right version by adding a hash value to the WSDICT request, thus http://service.com/printme?WSDICT=<dictionary-hash>. The server could then dictionary (i.e. the server also new dictionary or it could dictionary.