Documents are commonly exchanged or transmitted between devices over the Internet or other networks. Some languages, due to their data-intensive nature, may require significant resources to store and communicate documents. For example, XML (eXtensible Markup Language) may be used to exchange documents, e.g., between a browser and a server. When the browser returns to edit a document, this information may be retrieved and reloaded, edited, and then may be saved again for future processing. Thus, the processing and communication of documents may in some cases consume a significant amount of computer or memory resources to store the documents and may require significant network resources to communicate this information. It may be desirable in some cases to reduce the amount of resources required to store and communicate these documents.
Various embodiments are disclosed relating to compression of a document.
According to an example embodiment, a document may be compressed using a number of different techniques which may be used separately or in combination. For example, according to an example embodiment, a document may be compressed by replacing one or more language constructs in the document with a language-based replacement code. In addition, the document may be compressed by replacing one or more text strings in the document with a schema-based replacement code. In an example embodiment, a schema based replacement table may be generated based on a schema for use with the document according to a set of rules. This may allow both, for example, a transmitter and a receiver to independently generate the schema-based replacement table based on the schema.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Referring to the Figures in which like numerals indicate like elements,
According to an example embodiment, computing device 102 may be a user device (such as a PC or handheld device) and may include an application program, such as a browser 104. Computing device 108 may be, for example, a server. Computing devices 102 and 108 may communicate with each other via network 106, and may exchange information using one or more protocols, for example. The information exchanged by computing devices 102 and 108 may be provided in any format. In an example embodiment, information exchanged between computing devices 102 and 108 may include structured documents, such as XML (Extensible Markup Language) documents, although other structured documents may be used. XML is merely provided as an example, and the various embodiments are not limited thereto.
According to an example embodiment, a document may be based upon a language, such as XML or other language. According to an embodiment, the document may be compressed based on the structure of that language, such as, for example, replacing one or more language constructs in the document with a language-based replacement code. This may allow one or more language constructs in the document to be replaced based on either a required structure (e.g., well formed nature or required syntax of an XML document, for example) or a common usage pattern for the language.
For example, in XML, each start tag must typically have an end tag that uses the same name. Also, XML typically requires start tags and end tags to nest with other sets of start and end tags. Thus, according to an example embodiment, once the start tag is known, the corresponding end tag may be replaced with a language-based replacement code indicating “end tag” or “end element.” The required nesting arrangement according to XML language may therefore identify which end tag should be there, based on earlier start tags in the document. The replacement of an end tag (or end element) is an example of replacing a language construct based on a required structure or required syntax of a language, for example.
Another example of a required language construct may be a start element with a same prefix as the last element. For example a start element (or start tag) may be <prefix:element name>. An example may be <my:CustomerName>, where my is the prefix and CustomerName is the element name. If the prefix “my” was used as the prefix in the last element (e.g., for last start element), then the replacement code for “Element-with-same-Prefix-as-last-Element” may be used to replace the element <my: >. Also, the element name (e.g., CustomerName) may, for example, remain as text, or may itself be replaced with a replacement code described in more detail below (e.g., schema-based replacement code).
According to another example embodiment, language constructs may also be replaced based on a common usage for the language. For example, namespace declarations, although not required by XML, are commonly used. For example, a namespace declaration may be xmlns:namespace-prefix=“namespace-name.” The namespace declaration (xmlns: =“ ”) may be replaced with a language-based replacement code. An example of a namespace declaration may be xmlns:my=http://mysite/order. The specific attributes within the namespace declaration, including the namespace prefix (e.g., my) and namespace name (e.g., http://mysite/order) may remain as text, but may themselves be replaced with separate replacement codes described in greater detail below.
Table 1 above is an example language-based replacement table that lists several example language constructs in the left-hand column and the associated (language-based) replacement codes in the right-hand column. For example, Element-With-Prefix is assigned the replacement code of “0”, an Element-With-Same-Prefix-As-Last-Element is assigned the replacement code of “1”, an End-Element (or end tag) is assigned the replacement code of “2”, a namespace declaration is assigned the replacement code of “3”, etc. Only a few language constructs are shown in Table 1, and many other language constructs may be used for a language-based replacement table to compress a document. Of course, if other languages are used (e.g., other than XML), then the language constructs may be different based on the particular syntax, format or rules, common patterns of usage, etc., for that language. XML is an example language and other languages may be used.
According to an example embodiment, one or more language constructs (such as end tags, etc.) may be replaced in a document with language-based replacement codes, as described above. After such compression or replacement, there may remain a number of additional text strings in the document, such as element names, start tags, etc. Therefore, an additional replacement technique may be used to further compress the document using additional replacement codes that may be generated based on a schema for use with the document, according to an example embodiment.
According to an example embodiment, a structured document (such as an XML document) may be compressed based on a value replacement table. In an example embodiment, a value replacement table may indicate codes (e.g., hex values, alpha-numeric values, or other codes) that may be used to replace text strings or other values in a structured document in order to decrease the size of the document. In one example embodiment, a value replacement table may be generated based on a schema for use with the document (the document to be compressed). The value replacement table may be generated based on the schema to be used for the document according to a set of rules, for example. This may allow both a transmitting device and a receiving device to independently generate the value replacement table based on the schema. In this manner, further compression or transmission efficiency may be obtained by not transmitting the value replacement table, since its transmission is unnecessary. This is because the receiving node may generate the value replacement table, for example, based on the schema for the document according to a same set of rules used by both transmitting device and receiving device.
A variety of different sets of rules may be used to generate the (schema-based) value replacement table. For example, a plurality of text strings or values in the schema may be put in alphabetical order, and a number (replacement code) assigned to each text string or value in increasing order (e.g., 0, 1, 2, 3. . . ). This is merely one example set of rules for generating a value replacement table, and many other rules may be used. According to an example embodiment, text strings or values that may typically be present in both the schema and the (e.g., XML) document may be considered good candidates for replacement codes (and thus, compression), since the replacement codes may be identified based only on the schema (e.g., not the document) and then used to compress (and later decompress) the document. For example, for an XML schema, value replacement codes may be identified for each of the following text strings or values present in the XML schema (merely as examples): element names, attribute names, enumeration values, default values, fixed values, values of namespace declarations, etc. and other text strings or values.
In schema 200 of
Note that the text strings or values are provided in Table 2 in alphabetical order. Also, the schema-based replacement codes may, for example, be assigned in numerical order (e.g., 0, 1, 2, 3, . . . 8), per the example rules for generating the example value replacement table (shown in Table 2). Thus, in an example embodiment, a transmitting device and a receiving device(s) may have agreed in advance to use a same set of rules to generate the value replacement table based on a schema. These are merely example rules, and other types of rules may be used.
Table 3 below lists an example uncompressed XML document. Table 4 below lists an example output after being compressed (e.g., compressed XML document) based on using both: 1) a language-based replacement table (e.g., language based replacement codes to replace language constructs), and 2) a value replacement table (or schema-based replacement table) based on a schema (e.g., schema-based replacement codes). Some examples will be described to illustrate aspects of this compression process, with reference to Table 3 and Table 4.
The first element of the uncompressed document is a start element with a prefix <my:Order xmlns:my=“http://mysite/order”>. Thus, this language construct is replaced with a zero (0), indicating element-with-prefix (replacement code of 0 from the language-based replacement table, see Table 1). Next, the prefix “my” (within that first element) is a text string that has been assigned the schema-based replacement code of 4 (see Table 2). Next, the text string “Order” is replaced with the schema-based replacement code of 6 (see Table 2). Next the namespace declaration (xmlns:. “. . . ” ) is replaced with the langauge-based replacement code of 3 (indicating namespace declaration, see Table 1). The prefix “my” (within the namespace declaration) is replaced with the schema-based code of 4 (see Table 2). Next, the text string http://mysite/order is replaced with the schema-based replacement code of 2 (see Table 2). Thus, this may result in a compressed document output of: 046342. . . . Other values or strings or language constructs may be similarly replaced with their appropriate replacement codes. This process may be reversed at a receiver, for example, in order to decompress the compressed structured document (e.g., replace the replacement codes in the compressed document with the associated language construct or text string or value).
Note that while the use of the language-based replacement table is shown in this example as being performed before the use of the schema-based replacement table, these two technqiques may be performed in any order, and may also be used alone or separately to compress a document.
At 320, a schema is determined for use with the structured document. For example, at 322, an XML schema or other schema may be received. At 324, a schema may be retrieved from a network device, network resource, server, etc., for example based on a Uniform Resource Identifier (URI) or other identifier.
At 330, a value replacement table (e.g., a schema-based replacement table) is generated based on the schema according to a set of rules. For example, this may include, at 322, creating a list of one or more text strings or other values in the schema, and at 324, assigning a schema-based replacement code to each text string or value in the list according to the set of rules.
At 340, the structured document may be compressed based on the value replacement table (e.g., schema based replacement table). For example, this may include replacing one or more text strings or values in the structured document with a schema-based replacement code according to the generated value replacement table.
At 420, according to an example embodiment, the structured document may be based on a language, such as XML. The structured compressed document may be decompressed (or partially decompressed) based on a structure of the language. This may include, for example, replacing each of one or more language-based replacement codes in the compressed structured document with a language construct.
At 430, a schema is determined for use with the received compressed document. For example, at 432, an XML schema may be received, or at 434, an XML schema may be retrieved.
At 440, a value replacement table (e.g., a schema-based replacement table) is generated based on the determined schema according to a set of rules. This may include, for example, at 442, creating a list of one or more text strings or values in the schema, and at 444, assigning a schema-based replacement code to each text string or value in the list according to a set of rules.
At 450, the compressed structured document may be decompressed based on the value replacement table (e.g., schema-based replacement table). For example, this may include replacing one or more schema-based replacement codes in the compressed document with the assigned text string or value according to the generated value replacement table.
By compressing a document, this may offer a number of advantages, such as one or more of reducing the amount of data stored on servers, databases and other computing devices, reducing the amount of data transmitted between devices, reducing the network latency for such data transmission, freeing up resources and reducing the load on servers, etc. These are merely some examples, and the various embodiments are not limited thereto.
Implementations of the various techniques described herein may be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. Implementations may implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program, such as the computer program(s) described above, can be written in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps may be performed by one or more programmable processors executing a computer program to perform functions by operating on input data and generating output. Method steps also may be performed by, and an apparatus may be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. Elements of a computer may include at least one processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer also may include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory may be supplemented by, or incorporated in special purpose logic circuitry.
To provide for interaction with a user, implementations may be implemented on a computer having a display device, e.g., a cathode ray tube (CRT) or liquid crystal display (LCD) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
Implementations may be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation, or any combination of such back-end, middleware, or front-end components. Components may be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
While certain features of the described implementations have been illustrated as described herein, many modifications, substitutions, changes and equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the various embodiments.
Number | Name | Date | Kind |
---|---|---|---|
5991713 | Unger et al. | Nov 1999 | A |
6631379 | Cox | Oct 2003 | B2 |
6635088 | Hind et al. | Oct 2003 | B1 |
6643652 | Helgeson et al. | Nov 2003 | B2 |
6665731 | Kumar et al. | Dec 2003 | B1 |
6711740 | Moon et al. | Mar 2004 | B1 |
6725424 | Schwerdtfeger et al. | Apr 2004 | B1 |
6732109 | Lindberg et al. | May 2004 | B2 |
6782380 | Thede | Aug 2004 | B1 |
6804677 | Shadmon et al. | Oct 2004 | B2 |
6845380 | Su et al. | Jan 2005 | B2 |
6850948 | Krasinski | Feb 2005 | B1 |
6871320 | Morihara et al. | Mar 2005 | B1 |
6883137 | Girardot et al. | Apr 2005 | B1 |
6961760 | Li et al. | Nov 2005 | B2 |
7043686 | Maruyama et al. | May 2006 | B1 |
7054851 | Haley | May 2006 | B2 |
7089567 | Girardot et al. | Aug 2006 | B2 |
7119577 | Sharangpani | Oct 2006 | B2 |
7120869 | Birder | Oct 2006 | B2 |
7131077 | James-Roxby et al. | Oct 2006 | B1 |
7143397 | Imaura | Nov 2006 | B2 |
7152058 | Shotton et al. | Dec 2006 | B2 |
7152205 | Day et al. | Dec 2006 | B2 |
7174327 | Chau et al. | Feb 2007 | B2 |
7191196 | Perks et al. | Mar 2007 | B2 |
7257575 | Johnston et al. | Aug 2007 | B1 |
7281205 | Brook | Oct 2007 | B2 |
20010054172 | Tuatini | Dec 2001 | A1 |
20020032706 | Perla et al. | Mar 2002 | A1 |
20020065822 | Itani | May 2002 | A1 |
20020073120 | Bierbrauer et al. | Jun 2002 | A1 |
20020087596 | Lewontin | Jul 2002 | A1 |
20020107866 | Cousins et al. | Aug 2002 | A1 |
20020107887 | Cousins | Aug 2002 | A1 |
20020138518 | Kobayashi et al. | Sep 2002 | A1 |
20020156803 | Maslov et al. | Oct 2002 | A1 |
20020157023 | Callahan et al. | Oct 2002 | A1 |
20020161801 | Hind et al. | Oct 2002 | A1 |
20020198743 | Ariathurai et al. | Dec 2002 | A1 |
20030018466 | Imaura | Jan 2003 | A1 |
20030158854 | Yoshida et al. | Aug 2003 | A1 |
20030167445 | Su et al. | Sep 2003 | A1 |
20040003343 | Liao et al. | Jan 2004 | A1 |
20040215595 | Bax | Oct 2004 | A1 |
20050182778 | Heuer et al. | Aug 2005 | A1 |
20050228791 | Thusoo et al. | Oct 2005 | A1 |
20050228792 | Chandrasekaran et al. | Oct 2005 | A1 |
20050268341 | Ross | Dec 2005 | A1 |
20060117307 | Averbuch et al. | Jun 2006 | A1 |
20060150168 | Mitchell et al. | Jul 2006 | A1 |
20060218161 | Zhang et al. | Sep 2006 | A1 |
20060253465 | Willis et al. | Nov 2006 | A1 |
Number | Date | Country |
---|---|---|
WO-2005067153 | Jul 2005 | WO |
Number | Date | Country | |
---|---|---|---|
20070162479 A1 | Jul 2007 | US |