Online documents accessed by a client or a server may be transformed using a transformation processor such as an XSL Transformation (XSLT) processor. The XSLT processing model utilizes a source document, a stylesheet and an XSLT processing engine to produce a result document. The XSLT processing model follows a fixed algorithm, building a source tree from the source document. The model processes the source tree's root node, finding in the stylesheet a matching template for that node, and evaluating the template's contents. Instructions in each template generally direct the processor to either create nodes in the result tree, or process more nodes. Output is generally derived from the result tree.
Processing web applications using such a model may present obstacles for the client or the server. For example, when a client or a server retrieves a complex data structure from a third party service, the computational resources required to consume the data structure is great, and the time to create an output document is considerable. Generally this is the result of the need to construct an intermediate structure prior to any output. The creation of an intermediate structure, such as an intermediate tree or index structure, dramatically increases the resources and time required by the client or server to create and deliver the output document.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In view of the above, this disclosure describes example methods, systems, and computer-readable media for implementing a transformation engine and transformation processes to reduce computational resources used by a client or a server during the consumption of a document.
In an example implementation, a data stream is received in a first format over a network. For example, the data stream may be in the form of an extensible markup language (XML) format, a simple object access protocol (SOAP) format, a JavaScript object notation (JSON) format, or any structured data format. A mapping template is then associated with the data stream. A forward-traversal of the mapping template is performed without the accumulation of an intermediate state. Following the traversal of the mapping template, an output stream is emitted in a custom binary format.
A transformation engine is used to transform an input stream from one format to an output stream in another format. For example, the transformation engine converts the input stream to a token stream. The tokens of the token stream are used to traverse a mapping template associated with the input stream, resulting in the production of an output stream.
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
Some implementations herein provide a transformation engine and transformation processes to reduce computational resources used by a client or a server during consumption of a document. More specifically, an example process may transform a complex data structure, such as, without limitation, an extensible markup language (XML) document, to a new data structure, such as a custom binary format, without allocating an intermediate tree or index structure. The transformation engine receives the complex data structure and utilizes an associated mapping template to emit a stream in any desired format.
The computing device 102 may connect to one or more network(s) 104 and is associated with a user 106. The computing device 102 may include a transformation engine 108 to transform one or more documents or other data structures during consumption by the computing device 102. Transformation engine 108 may also, without limitation, be used to create output for printing, direct video displays, translate messages between different schemas, or make changes to a document within a scope of a single schema.
For example, as illustrated in
The network(s) 104 represent any type of communications network(s), including, but not limited to, wire-based networks (e.g., cable), wireless networks (e.g., cellular, satellite), cellular telecommunications network(s), and IP-based telecommunications network(s). The network(s) 104 may also include traditional landline or a public switched telephone network (PSTN), or combinations of the foregoing (e.g., Unlicensed Mobile Access or UMA networks, circuit-switched telephone networks or IP-based packet-switch networks).
The network services 110(1)-110(N) are illustrated in this example as web-based services available over the Internet, but may additionally or alternatively include services on a variety of other wide area networks (WANS), such as an intranet, a wired or wireless telephone network, a satellite network, a cable network, a digital subscriber line network, a broadcast, and so forth. The network services 110(1)-110(N) may include or be coupled to one or more types of system memory (not shown). The network services 110(1)-110(N) may communicate a data transmission, such as input stream 112, to the computing device 102. In one implementation, the data transmission is an XML transmission. In other implementations, the data transmission may include substantially real-time content, non-real time content, or a combination of the two. Sources of substantially real-time content generally include those sources for which content is changing over time, such as, for example, live television or radio, webcasts, or other transient content. Non-real time content sources generally include fixed media readily accessible by a consumer, such as, for example, pre-recorded video, audio, text, multimedia, games, or other fixed media readily accessible by a consumer.
Memory 204 may store programs of instructions that are loadable and executable on the processor 202, as well as data generated during the execution on these programs. Depending on the configuration and type of server, memory 204 may be volatile (such as RAM) and/or non-volatile (such as ROM, flash memory, etc.). The computing device 102 may also include additional removable storage 208 and/or non-removable storage 210 including, but not limited to, magnetic storage, optical disks, and/or tape storage. The disk drives and their associated computer-readable medium may provide non-volatile storage of computer readable instructions, data structures, program modules, and other data for the computing device 102.
Memory 204, removable storage 208, and non-removable storage 210 are all examples of computer storage media. Additional types of computer storage medium that may be present include, but are not limited to, RAM, ROM, flash memory or other memory technology, CD-Rom, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage (e.g., floppy disc, hard drive) or other magnetic storage devices, or any other medium which may be used to store the desired information.
Turning to the contents of memory 204 in more detail, the memory may include an operating system 212. In one implementation, the memory 204 includes a data management module 214 and an automatic module 216. The data management module 214 stores and manages storage of information, such as images, return on investment (ROI), equations, and the like, and may communicate with one or more local and/or remote databases or services. The automatic module 216 allows the process to operate without human intervention. The computing device 102 may also contain communication connection(s) 218 that allow processor 202 to communicate with other services. Communications connection(s) 218 is an example of a communication medium. A communication medium typically embodies computer-readable instructions, data structures, and program modules. By way of example and not limitation, communication medium includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
The operating system 212 comprises a transformation engine 108. The transformation engine 108 may be a standalone application or a software component. In some implementations, the transformation engine is a processor utilized to process input stream 112 to produce output stream 114. To facilitate the process and reduce the resources utilized by the computing device 102, the transformation engine may utilize a custom binary format such as, without limitation, WAP Binary XML (WBXML), Binary JSON (BSON), or the like. For example, in one implementation, the input stream 112 is an XML stream. The transformation engine 108 may choose a corresponding customary binary format, reducing the resources utilized by the computing device 102 and maintaining the XML data structure of the input stream 112. Preservation of the XML data structure ensures an accurate transmission between the network services 110(1) and the computing device 102.
The computing device 102, as described above, may be implemented in various types of systems or networks. For example, the computing device may be a stand-alone system, or may be a part of, without limitation, a client-server system, a peer-to-peer computer network, a distributed network, a local area network, a wide area network, a virtual private network, a storage area network, and the like.
In one implementation, the named strings emitted in the optional tags 404(3)-404(4) need not match the node. For example, description node 402(4) does not match the emitted named string 404(3) (desc : string), representing what information is to be emitted in the output stream 114. Alternatively, the names emitted in the optional tags 404(3)-404(4) may match the corresponding node.
As illustrated in
Mapping template(s) 302(1)-302(N) may be turned into the perspective of the input document tree based upon inferences made from one or more matching expressions. The matching expressions are determined on a forward only traversal of the input stream 112, emitting the corresponding optional tags 404(1)-404(4) as the traversal process proceeds.
In one implementation, the match expressions are plain paths, relative to expressions from their parent. For example, as illustrated in
Looking at this example, “RSS/channel” matches nodes named ‘channel’ that are children of ‘RSS’ nodes. Because this is a root level match expression, ‘RSS’ is the root of the mapping template. Next, ‘item’ is a relative match for nodes named ‘item’ and implies that these nodes are children of ‘channel’ because of the parent's match expression. Matches are generally relative to the parent's expression. Finally, ‘description/’ and ‘link/’ match the values of those nodes which are children of ‘item’. The trailing slash in ‘description/’ and ‘link/’ indicate a match with an anonymous child node of a node named ‘description’ or ‘link’. For example, in an XML document looks like <description>foo</description>which emits a start branch node named ‘description’ followed by an anonymous value ‘foo’. By adding the trailing slash, the anonymous value tokens may be extracted. A specific example is:
Once the appropriate mapping template to be used during the transformation process is determined by the transformation engine 108, the transformation engine walks up and down the mapping template as the node is seen streaming in. The mapping tree may be pivoted into the perspective of the input stream 112 using the matching expressions described above. Therefore, the actual work performed by the transformation engine 108 to complete the transformation engine is minimal at the time of the transformation process. For example, the transformation engine 108 processes incoming data contained within input stream 112 as the input document streams over the network 104, without building up any intermediate per-request data structures.
In one implementation, the input stream 112 is transformed to a stream of tokens to be used by the transformation engine 108 to guide the transformation engine through the mapping template 302(1). In one implementation, the transformation engine 108 performs the transformation as follows: as tokens are streaming in over network 104, the transformation engine recognizes and moves to the RSS node 402(1); then, as the tokens continue to stream in, the transformation engine 108 recognizes and moves to the channel node 402(2) and emits an anonymous start vector; this process continues until the branch ends; the transformation engine 108 walks back up the mapping template 302(1) and any tokens the transformation engine does not recognize are ignored. An example of how the mapping template 302(1) can utilize the data from input stream 112 is as follows:
The code set forth above is defined in terms of the desired output structure and the type of each node. A match expression results, indicating where to find specific data in the mapping template. Multiple matches, such as “item” may produce multiple results.
In one implementation, a general notion of a serialized mapping template (or data tree) represented by a stream of tokens may be used. For example, when implemented in C#, the stream of tokens may be represented as an IEnumerable<token. In this example, each token consists of an optional ‘Name’ and an optional ‘Value’, both of which are strings, and a ‘type’ which may be a Start or End branch or leaf Using such a general notion, in one example, an XML character stream is converted to a token stream straightforwardly. Elements become named, valueless start/end nodes. Attributes become named leaf values. Text and CDATA become anonymous leaf values. In another example, JSON is converted to a token much the same way as XML characters. One small difference is that JSON allows for anonymous branch nodes (or objects) while XML has no such construct, only named branches-elements.
As discussed above, the mapping template 302(1) may also be represented as a token stream 500, illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
The cursor 502 traverses the mapping template 302(1) until a BranchEnd is identified. Once a BranchEnd is identified, the cursor 502 moves back up the mapping template 302(1) to the ‘channel’ node 602, ready to match the next item, if there are any, or move further up the mapping template 302(1) when a ‘channel’ BranchEnd is identified. Traversing the mapping template 302(1) as described above in
One example which may necessitate a small intermediate state would be the use of a search schema. An example search schema is:
As displayed in the example search schema above, there is no apparent way to specify a match against Result/Item nodes within the ‘Local’ domain set as opposed to the ‘Web’ domain, where a “DomainResults/Results/Item” would match either the Result node or the Item node. For example, filter expressions such as “DomainResults[Domain=‘Local’]/Results/Item” and “DomainResults[Domain=‘Web’]/Results/Item” may be generated to capture a match for the Result node and the Item node. Such a pattern may not be able to be evaluated in a forward-only manner because it is difficult, without breaking the schema, to change the order of elements such that the sibling node may not be seen in time. For example:
However, utilizing a simple mode concept, assumptions may be made about the order of the child or sibling nodes. In one implementation, a “SetMode” match occurs for “DomainResults/Domain/”, setting the mode to “Web” or “Local”. Alternative, matches may be scoped and arranged to work only in a particular mode.
Another example which may necessitate a mechanism involving a small intermediate state. The mechanism enables changing plain “Results” tokens to “WebResults” or “LocalResults” by remembering the “DomainResults/Domain/”. Specifically, a transformation object, such as an ITransformer object, may be employed to massage the token stream as the input stream 112 is transmitted over the network 104. For example, without limitation, when implemented in C#:
While the transformation process attempts to continue in the stream processing style described above with respect to
One example where the mechanism may be implemented is a search engine. For example, movie/theater/showtime results found during a search on a search engine are restructured, while each individual result remains intact and is re-emitted exactly as the movie time was received. This enables the mapping template layer to dominate the transformation process, and if the individual item changes, the transformation engine 108 would be agnostic and only the mapping template would be affected.
An example search template is:
While this appears to be a lot of code, it replaces approximately three times as much custom code generally utilized by a search engine to conduct the requested search.
As illustrated in the example search template above, the search template comprises at least two portions, a request portion and a response portion. The request portion establishes parameters formulating the request, in this example movie times, while the response portion, utilizing the transformation process described above with respect to
In one implementation, the ‘request’ section of the search template includes the universal resource locator (URL), one or more optional headers, and an optional POST body. Within each of the URL, optional headers, and optional POST body sections, there may be replacement tokens, for example, ˜replace_me˜. In one implementation, the replacement tokens are values taken from the query string of the transformation engine 108 request from the client. In this implementation, the replacement is carried out in a very efficient manner. The request URL, headers and body are broken into fragments surrounding the replacement tokens and are streamed out in chunks while slipping in the replacement values, avoiding parsing and allocations at the request-time. Replacement tokens may be in the form of ˜foo:bar˜, where the token is “foo”, with a default value “bar”.
Replacement tokens may specify a conversion factor to be applied to input parameters before substituting. For example, ˜{mapx}|foo:int˜ will convert a longitude, given by a parameter ‘foo’ into integer map coordinates ({mapy} would do the same for latitude).
Portions of the request template may be delimited by “conditional tokens” in the form of:
Generally, conditional tokens begin with ‘?’ and end with an “=bar” condition. This example corresponds with a parameter named “foo” for which, when “bar” is passed, it will include the “some content”. The “.default” is a default value used if “foo” isn't supplied. However, the content may contain replacement tokens, but cannot contain nested conditional blocks.
In this example, portions of the request template may be delimited by “repeater tokens” in the form:
Generally, repeater tokens begin with ‘!’. In this example, the parameter ‘foo’ is used to produce the repeated block. In one implementation, the repeated block is a UrlEncoded value that looks very similar to a query string. For example, the repeated block may have the form <set>&<set>& . . . , where the set is <pair1>|<pair2> . . . , where the pair is <name>=<value>. A specific example is:
This example may be decoded to:
The decoded example above contains two pipe-separated sets of ‘a’ and ‘b’ parameters. This enables the repeater content to be emitted twice. Accordingly, within the repeater block, there are replacement tokens for the ‘a’ and ‘b’ parameters. For example, a template of:
Given the above sets of values for the ‘a’ and ‘b’ parameters, the result will be:
Alternatively, the request parameters may be simplified to, foo=42&bar=baz, and the result will be the same.
At block 1002, an input stream 112 in a first format is received by the transformation engine 108. In some implementations, the input stream 112 is in an XML format. However, in other implementations, the input stream 112 may be in a SOAP format, a JSON format, or any suitable format.
At block 1004, a mapping template is associated with the input stream 112. The mapping template may be modified to correspond to the perspective of the input source 112 using one or more matching expressions described above.
At block 1006, the input stream 112 is converted into a stream of tokens. The stream of tokens is used as a guide through the mapping template. In some implementations, when implemented in .NET, an ITransformer object may be employed within the token stream to manipulate the input stream 112.
At block 1008, a corresponding token is recognized by the transformation engine 108, enabling a forward transversal of the mapping template and the emission of an associated string. The associated string corresponds to information relating to output stream 114.
At block 1010, a transformed output stream 114 is created.
Although a transformation process for the transformation of an input stream using a mapping template has been described in language specific to structural features and/or methods, it is to be understood that the subject of the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as example implementations.