N/A
Computing systems—i.e. devices capable of processing electronic data such as computers, telephones, Personal Digital Assistants (PDA), etc.—communicate with other computing systems by exchanging data messages according to a communications protocol that is recognizable by the systems. Such a system utilizes filter engines containing queries that are used to analyze messages that are sent and/or received by the system and to determine if and how the messages will be processed further.
A filter engine may also be called an “inverse query engine.” Unlike a database, wherein an input query is tried against a collection of data records, an inverse query engine tries an input against a collection of queries. Each query includes one or more conditions, criteria, or rules that must be satisfied by an input for the query to evaluate to true against the input.
An XPath filter engine is a type of inverse query engine in which the filters are defined using the XPath language. The message bus filter engine matches filters against eXtensible Markup Language (XML) to evaluate which filters return true, and which return false. In one conventional implementation, the XML input may be a Simple Object Access Protocol (SOAP) envelope or other XML document received over a network.
A collection of queries usually takes the form of one or more filter tables that may contain hundreds or thousands of queries, and each query may contain several conditions. Significant system resources (e.g., setting up query contexts, allocating buffers, maintaining stacks, etc.) are required to process an input against each query in the filter table(s) and, therefore, processing an input against hundreds or thousands of queries can be quite expensive.
Queries included in a particular system may be somewhat similar since the queries are used within the system to handle data in a like manner. As a result, several queries may contain common portions or sub-expressions that typically had to be evaluated individually. Recently, however, developments have allowed identifying redundant portions of query expressions in an attempt to reduce the processing required to evaluate each expression against inputs for each message or XML document.
For example, some inverse query systems represent an expression as a hierarchical instruction tree, in which each node of the instruction tree represents an instruction, and in which each branch of an execution path in the instruction tree when executed from a root node to a terminating branch node represents a full query expression. As such, when an input message is received, the instruction tree is iterated over by executing instructions against the inputs or a message leading from the root node to a main or common branching node. Whenever the inverse query engine encounters a branching node, which represents a divergence from some of the redundant portions of query expressions, the inverse query engine may preserve the processing context or state for the sequential execution up to that branching node. The preserved processing context is then sequentially passed to each branch leading from the branching node such that when one branch is considered fully processed the next branch may then receive the processing context to evaluate its remaining expression path.
Although these systems allow for the processing of query expressions to occur more rapidly, there are still several drawbacks and shortcomings to such systems. For example, as mentioned above, when a query engine encounters a branching node, it evaluates the branches individually by sequentially handing state from the branching node to each extending branch. Assuming each extending branch operates on an input nodeset with the same axis, the input nodeset will be passed to each branch which will iterate over all the input nodes choosing the ones that fit the criteria thereof. Thus, this process takes the form of an O(mn) algorithm, wherein m equals the number of nodes in nodeset and n is the number of extending branches. When the number of branches or “splay” extending from the branching node is small, such evaluation may be acceptable. With a large splay, however, iterating over the nodeset once per branch can consume significant processing time and resources.
The above-identified deficiencies of current inverse query filter engines are overcome through example embodiments of the present invention. For example, embodiments described herein provide for efficiently evaluating select instruction of query expressions by simultaneously producing outputs thereof. Note that this Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Example embodiments provide for an inverse query filter engine sequentially executing instructions in an instruction tree leading from a root node to a branching node based on input(s) within a received message, wherein the branching node includes a query sub-paths extending there from. The branching node is then identified as a select index node in that some of the query sub-paths include select instruction nodes that operate on input nodes that share a common axis relative to the processing context for the sequential execution up to the branching node. Based upon identifying the branching node as a select index node, empty node sets are initialized within an associative array data structure for each of the select instruction nodes. The identifiers for each of the input nodes are compared to indices within the associative data structure for determining matches thereof. Based upon the comparison, at least one of the input nodes is added to the empty node set(s) within the associative data structure for simultaneously producing results of evaluating each of the select instruction nodes. Accordingly, the results are passed to branches extending from each of the plurality of query sub-paths for further evaluation.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The features and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
The present invention extends to methods, systems, and computer program products for evaluating nodes in parallel that diverge from a branch of an instruction tree for an inverse query engine. The embodiments of the present invention may comprise a special purpose or general-purpose computer including various computer hardware, as discussed in greater detail below.
Exemplary embodiments provide for select indexing, which uses an associative array or select index (or associative array, e.g., hash table) data structure, to simultaneously construct output sets for instructions that operate on input nodesets with common axes. More specifically, during evaluation of a message (e.g., an XML document) against an instruction tree, when a branching node with typically a large “splay” is iterated over, rather than iterating over an input nodeset with a common axis once per branch, the select indexing described herein iterates over the nodeset once in total.
More specifically, the select index data structure includes an array of empty node sets for instructions that both, branch from a branching node, and evaluate input nodes that share a common axis. Identifiers (e.g., namespace, local name, Qname, type, etc.) for the input nodeset of the common axis are then compared to indices within the select index for identifying those instructions interested in the input node. As matches are found, the input node is added to the corresponding nodeset for the appropriate instruction. The results of the evaluation, along with the processing context, are passed to the branches of the select instructions for further processing. Accordingly, embodiments described herein generate the exact same nodesets as a typical inverse query engine, except the select indexing creates the resultant output set simultaneously rather than in series.
Although more specific reference to advantageous features are described in greater detail below with regards to the Figures, embodiments within the scope of the present invention also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media.
Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
As used herein, the term “module” or “component” can refer to software objects or routines that execute on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system (e.g., as separate threads). While the system and methods described herein are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In this description, a “computing entity” may be any computing system as previously defined herein, or any module or combination of modulates running on a computing system.
Although the instruction tree 120 is illustrated schematically as a box in
To clarify this principle, a specific example is provided with respect to
These groups are represented in
A “stem” of the instruction tree is defined as those instructions that lead from a root node of the instruction tree to the first branching node of the instruction tree. For example, the instruction tree 200 has a root node “/a” and a first branching node “BN1”. Accordingly, the stem of the instruction tree is represented by the instruction group sequence “/a/b/c”. The first branching node will also be referred to herein as a “first-order” branching node or “main” branching node. For example, node “BN1” is the first-order or main branching node of instruction tree 200.
It should be noted that a “branch”, “sub-expression”, “sub-path” are referred to herein interchangeably to refer to any portion of an overall expression. For example, the stem “/a/b/c” is a sub-path for all of the queries Q1 to Q6, and the branch formed by “/d/g” extending from main branch “BN1” is a portion of the query Q1. Note, however, that the stem “a/b/c” although referred to as a sub-expression may also be considered a branch since it extends from the root node “/a”, thereby forming the only branch of the root node.
In one embodiment, the queries are XPath queries. XPath is a functional language for representing queries that are often evaluated against XML documents. During conventional evaluation of inverse query paths (such as XPath statements) against XML documents, there is significant looping in order to fully navigate the XML document. For example, if the XML document has one parent element having at least one child element, at least one of the child elements having at least one second-order child element, and at least one of the second-order child elements having at least one third-order child element, there would be a three layer “for” loop nest conventionally used to navigate the tree.
A loop is an expression that executes a group of one or more sub-expressions repeatedly. Each repetition is termed an “iteration”. The number of times a loop iterates over a group of one or more sub-expressions is known as the loop's “iteration count”.
Conventional loops run sequentially from a branching node. A loop with an iteration count of “n” evaluates its groups of one or more sub-expressions “n” times, one iteration at a time, with the second iteration beginning from the branching node only when the first completes. This process is repeated for each branch or sub-path extending from the branching node, thus nodesets (i.e., those input nodes with common axis relative to the branching node, e.g., children, descendent, parent(s), sibling, etc.) are iterated over once per branch. Each of these iterations, however, has implicit overhead, such as the stack manipulation required to make function calls.
In accordance with example embodiments, rather than iterating over an input nodeset with a common axis once per branch, the select indexing described herein iterates over the nodeset once in total. This is accomplished by identifying select index nodes that include select branching instruction that operate on inputs with a common axis. A select index or associative array data structure (e.g., a hash table) is then used to generate empty nodesets indexed to each of the select instructions. The input nodeset for the common axis is iterated through by looking up identifiers, e.g., local name, namespace, or type of input for a type test. These identifiers are then used as the keys into the select index table for identifying matching select instruction(s), whereupon the node is added to the empty (or populated, as the case may be) nodeset for the appropriate instruction. The resultant context is then passed to the branches of the select instructions for further processing. Accordingly, embodiments described herein generate the exact same nodesets as typical inverse query engines described above, except the select indexing produces simultaneous construction of all of the output sets by iterating over the input nodeset once in total, rather than once per branch.
For example, when the evaluation of message 355 against instruction tree 305 reaches the main branching node BN1, the message processor 330, or some other module, may first identify the branching node as a select index node 310. Note that the identification of select index node 310 may occur in any one of numerous well know ways; however, in order for a branching node, e.g., BN1 to qualify as a “select index” node 310, it should meet some basic requirements. For example, the select index node 310 acts as a super node that includes instruction nodes for query expressions with common properties. More specifically, as shown in
In addition, the instruction nodes within the node set should operate on the same input axis within a message. For example , as shown in
Note that these instruction nodes “/d”, “/e”, and “/i” selectively operate on the input nodes of the common axis by default. More specifically, because a resultant operating axis is not specified in each of these examples, the default of child is assumed. Embodiments described herein, however, are also applicable to other axis relative to the branching node. In fact, as previously noted, all that is needed is that the instruction nodeset within the select index node 310 all be operating on the same or common input axis 365. Accordingly, the axis may be any one of a child, parent, grandparent, decedent, sibling, or any other known axis relative to the node from which they diverge. Accordingly, any specific reference to any specific axis as described herein is used for illustrative purposes only and is not meant to limit or otherwise narrow the scope of the embodiments unless explicitly claimed.
Regardless of the common axis 365 for which the instruction nodes (e.g., “/d”, “/e”, and “/i”) within the select index node 310 operate on, once message processor 330, or other component, identifies the branching node as a select index node 310, the message processor will use a select index for the common axis 350 for generating an index table as described in greater detail below. Note that the select index for the common axis 350 may be chosen from a plurality of select indexes 345 generated for different common axis 365. Further note, that the indexes may be generated on the fly, or predefined at compile time. In fact, embodiments herein contemplate many well known ways of generating the select index tables or associative data structures, as well as allow for many different forms thereof (e.g., hash tables). Accordingly, any specific reference to any way of generating an index table 350 or for any specific type of array is used herein for illustrative purposes only and is not meant to limit or otherwise narrow the scope of embodiments herein unless explicitly claimed.
Regardless of how or what type of select index 350 for common axis 365 is generated, the resultant associative array will generate an empty nodeset 340 for each select instruction node in the select index node 310. For example, as shown by the “{}” brackets in the select index for the common axis 350, the instruction nodes “/d”, “/e”, and “/i” have corresponding empty node sets 340 for which they are indexed. The inputs 360 within the message 355 that correspond to the common axis (in this example, shown circled 365 in message 355) may then compared to the select index for the common axis 350.
As can be seen, the common axis 365 in message 355 includes inputs nodes “<d>” and “<e>”. Accordingly, the namespaces, local names, qualified names, or types (in the case of type tests) of these inputs or elements may then be compared with the values within the select index for the common axis 350. Note, however, that any other alphanumeric or other equality value may be used for determining matches. In fact, anything that can be used as the key to a random access, associative data structure can be used as the basis for the comparison. Accordingly, any specific value is used for illustrative purposes only unless otherwise explicitly claimed. In addition, note that for efficiency purposes typically the select indexes 345 will be hash tables; however, that need not be the case. In any event, if the select index for the common axis 350 is a hash table, the inputs will be converted to hash values using message processor 330 or some other component for comparison against the select index table.
Regardless of the type of alphanumeric values used in the comparison, message processor 330 iterates through the common axis 365 searching for the appropriate identifiers (e.g., Qnames for nodes “<d>” and “<e>”) to compare to the indices (e.g., “/d”, “/e”, and “/i”) within the table 340 and matches the appropriate input nodes 365 with the corresponding index in the select index 350. In this example, note that the select index for the common axis 350 matches the <d> input element from the common axis 365 in the message 355 with instruction node “/d”. Similarly, “<e>” input node from the common axis 365 matches with the “/e” instruction node, whereas there is no match in the common axis 365 for “/i” instruction node.
Accordingly, as matches are identified, their corresponding empty node set 340 is populated with the appropriate input node information to produce the output results 365. These results 365 may then be passed using message processor 330 to the branches extending from the instruction nodes for, in this case, “/d”, “/e”, and “/i” for further processing. Mores specifically, the results will be passed to the children nodes of the select index node 310, in this case “/g”, “/f”, and “BN3”. Note that although in this example only one match was found for each empty nodeset, this need not be the case. Accordingly, the empty nodesets within the instruction tree may be populated with multiple input nodes for passing to the corresponding branches extending from the select index node 310.
Also as previously noted, the above process can be repeated for any other branching node within the instruction tree 305. For example, other select indexes 345 may be used for other axis relative to other branches nodes, e.g., BN2, BN3, etc. Typically, however, the branching nodes should have a large splay in order to reap optimal benefits from embodiments described herein.
The present invention may also be described in terms of methods comprising functional steps and/or non-functional acts. The following is a description of steps and/or acts that may be performed in practicing the present invention. Usually, functional steps describe the invention in terms of results that are accomplished, whereas non-functional acts describe more specific actions for achieving a particular result. Although the functional steps and/or non-functional acts may be described or claimed in a particular order, the present invention is not necessarily limited to any particular ordering or combination of steps and/or acts. Further, the use of steps and/or acts is the recitation of the claims—and in the following description of the flow diagram for FIG. 4—is used to indicate the desired specific use of such terms.
As previously mentioned,
Method 400 includes an act of sequentially executing 405 instructions in the instruction tree leading from a root node to a branching node. For example, based on inputs 360 within a received message 355, message processor 330 within inverse query engine 300 (e.g., an XPath query filter engine) can sequentially execute instructions in instruction tree 305 leading from root node “/a” to main branching node BN1. Note that the main branching node includes a plurality of query sub-paths (“/d”, “/e”, and “/i”) for a plurality of query expressions (e.g., Q1 to Q6) leading from the branching node BN1. Further, the message may be an XML document such as a SOAP message that includes headers, and the common axis 355 may represent a series of headers in the XML document.
Upon encountering the branching node, Method 400 includes a step for simultaneously producing 430 results of evaluating select index node(s). More specifically, step for 420 includes an act of identifying 410 the branching node as a select index node. For example, message processor 330, or other component, may identify branching node BN1 as part of a select indexing node 310 in that the query sub-paths extending there from (e.g., “/d”, “/e”, and “/i”) each include a select instruction node that operate on input nodes 360 for the message 355 that share a common axis 355 relative to the processing context for the sequential execution up to the branching node BN1. Note that the common axis 355 may be a child, parent, sibling, descendent, or any other related node relative to the branching node.
Based on the identification of the branching node as a select index node, step for 430 also includes an act of initializing 415 empty nodesets within an associative array data structure. For example, upon identifying the branching node BN1 as corresponding to a select index node 310, message processor 330 may initialize empty nodesets 340 within select index for common axis 350 for each of the select instruction nodes, e.g., “/d”, “/e”, and “/i”. Note, as mentioned previously, that the initialization of the empty node sets 340 may include retrieving the select index for common axis 345 from memory, or may generate such on-the-fly, or by any other well known way.
In any event, step for 430 also includes an act of comparing 420 identifiers for inputs to indices within the associative array data structure. For example, message processor 330 may compare the input nodes “<d>” and “<e>” to the indices “/d”, “/e”, and “/i” within the select index 350 for determining matches thereof. Based on the comparison, step for 430 includes an act of adding at least one of the inputs to the empty node set(s). For instance, upon identifying the matches for “/d” and “/e”, message processor may add the input nodes “<d>” and “<e>” to the empty node sets 340 within the select index 350 for simultaneously producing results 365 of evaluating each of the select instruction nodes, e.g., “<d>” and “<e>”.
Finally, method 400 includes an act of passing the results to the sub-paths for further evaluation. For example, the child nodes of the select index node 310 may be passed the results 365 for further evaluation.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Number | Name | Date | Kind |
---|---|---|---|
20020133484 | Chau et al. | Sep 2002 | A1 |
20030033285 | Jalali et al. | Feb 2003 | A1 |
20040010752 | Chan et al. | Jan 2004 | A1 |
20040103091 | Lindblad et al. | May 2004 | A1 |
20050055334 | Krishnamurthy | Mar 2005 | A1 |
20050097084 | Balmin et al. | May 2005 | A1 |
20050187906 | Madan et al. | Aug 2005 | A1 |
20050187947 | Wortendyke et al. | Aug 2005 | A1 |
20050228768 | Thusoo et al. | Oct 2005 | A1 |
20050228792 | Chandrasekaran et al. | Oct 2005 | A1 |
20060005122 | Lemoine | Jan 2006 | A1 |
Number | Date | Country |
---|---|---|
2004068270 | Dec 2004 | WO |
Number | Date | Country | |
---|---|---|---|
20070078874 A1 | Apr 2007 | US |