This disclosure is related to simplifying tree expressions, such as for pattern matching.
In a variety of fields, data or a set of data, may be represented in a hierarchical fashion. This form of representation may, for example, convey information, such as particular relationships or patterns between particular pieces of data or groups of data and the like. However, manipulating and/or even recognizing specific data representations or patterns is not straight-forward, particularly where the data is arranged in a complex hierarchy. Without loss of generality, examples may include a database, and further, without limitation, a relational database. Techniques for performing operations on such databases or recognizing specific patterns, for example, are computationally complex, time consuming, and/or otherwise cumbersome. A need, therefore, continues to exist for improved techniques for performing such operations and/or recognizing such patterns.
Subject matter is particularly pointed out and distinctly claimed in the concluding portion of the specification. The claimed subject matter, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference of the following detailed description when read with the accompanying drawings in which:
In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the claimed subject matter. However, it will be understood by those skilled in the art that the claimed subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components and/or circuits have not been described in detail so as not to obscure the claimed subject matter.
Some portions of the detailed description which follow are presented in terms of algorithms and/or symbolic representations of operations on data bits or binary digital signals stored within a computing system, such as within a computer or computing system memory. These algorithmic descriptions and/or representations are the techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations and/or similar processing leading to a desired result. The operations and/or processing involve physical manipulations of physical quantities. Typically, although not necessarily, these quantities may take the form of electrical and/or magnetic signals capable of being stored, transferred, combined, compared and/or otherwise manipulated. It has proven convenient, at times, principally for reasons of common usage, to refer to these signals as bits, data, values, elements, symbols, characters, terms, numbers, numerals and/or the like. It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout this specification discussions utilizing terms such as “processing”, “computing”, “calculating”, “determining” and/or the like refer to the actions and/or processes of a computing platform, such as a computer or a similar electronic computing device, that manipulates and/or transforms data represented as physical electronic and/or magnetic quantities and/or other physical quantities within the computing platform's processors, memories, registers, and/or other information storage, transmission, and/or display devices.
In a variety of fields, data or a set of data, may be represented in a hierarchical fashion. This form of representation may, for example, convey information, such as particular relationships or patterns between particular pieces of data or groups of data and the like. However, manipulating and/or even recognizing specific data representations or patterns is not straight-forward, particularly where the data is arranged in a complex hierarchy. Without loss of generality, examples may include a database and further, without limitation, a relational database. Techniques for performing operations on such databases or recognizing specific patterns, for example, are computationally complex, time consuming, and/or otherwise cumbersome. A need, therefore, continues to exist for improved techniques for performing such operations and/or recognizing such patterns.
As previously discussed, in a variety of fields, it is convenient and/or desirable to represent data, a set of data and/or other information in a hierarchical fashion. In this context, such a hierarchy of data shall be referred to as a “tree.” In a particular embodiment, a tree may comprise a finite, rooted, connected, acyclic graph. Likewise, such trees may be either ordered or unordered. Here, ordered refers to the notion that there is an ordering or precedence among nodes attached to a common node corresponding to the order of the attached nodes shown in a graphical illustration. An ordered tree is illustrated here, for example, in
As previously suggested, in a variety of contexts, it may be convenient and/or desirable to represent a hierarchy of data and/or other information using a structure, such as the embodiment illustrated in
One example of an ordered BELT is illustrated by embodiment 200 of
A subset of BELTs may be referred to, in this context, as binary edge labeled strings (BELSs). One embodiment, 400, is illustrated in
As may be apparent by a comparison of
Despite the prior observation, as shall be described in more detail hereinafter; an association may be made between any particular binary edge labeled string and a binary edge labeled tree or vice-versa, that is, between any particular binary edge labeled tree and a binary edge labeled string. See, for example, U.S. provisional patent application Ser. No. 60/543,371, filed on Feb. 9, 2004, titled “Manipulating Sets of Hierarchical Data,” assigned to the assignee of the presently claimed subject matter. In particular, an association may be constructed between binary edge labeled trees and binary edge labeled strings by enumerating in a consecutive order binary edge labeled strings and binary edge labeled trees, respectively, and associating the respectively enumerated strings and trees with natural numerals. Of course, many embodiments of associations between trees, whether or not BELTs, and strings, whether or not BELS, or between trees, whether or not BELTs, and natural numerals are possible. It is intended that the claimed subject matter include such embodiments, although the claimed subject matter is not limited in scope to the aforementioned provisional patent application or to employing any of the techniques described in the aforementioned provisional patent application.
Binary edge labeled trees may also be listed or enumerated. See, for example, previously cited U.S. provisional patent application Ser. No. 60/543,371. This is illustrated, here, for example, in
However, for this particular embodiment, although the claimed subject matter is not limited in scope in this respect, a method of enumerating a set of ordered trees may begin with enumeration of an empty binary edge labeled tree and a one node binary edge labeled tree. Thus, the empty tree is associated with the natural numeral zero and has a symbolic representation as illustrated in
As illustrated, for this particular embodiment, and as previously described, the empty tree has zero nodes and is associated with the natural numeral zero. Likewise, the one node tree root comprises a single node and is associated with the natural numeral one. Thus, to obtain the tree at position two, a root node is attached and connected to the prior root node by an edge. Likewise, here, by convention, the edge is labeled with a binary zero. If, however, the tree formed by the immediately proceeding approach were present in the prior enumeration of trees, then a similar process embodiment is followed, but, instead, the new edge is labeled with a binary one rather than a binary zero. Thus, for example, to obtain the binary edge labeled tree for position three, a new root node is connected to the root node by an edge and that edge is labeled with a binary one.
Continuing with this example, to obtain the binary edge labeled tree for position four, observe that numeral four is the product of numeral two times numeral two. Thus, a union is formed at the root of two trees, where, here, each of those trees is associated with the positive natural numeral two. Likewise, to obtain the binary edge labeled tree for position five, begin with the binary edge labeled tree for position two and follow the previously articulated approach of adding a root and an edge and labeling it with a binary zero.
In this context, adding a root node and an edge and labeling it binary zero is referred to as a “zero-push” operation and adding a root node and an edge and labeling it binary one is referred to as a “one-push” operation. Thus, referring again to
In the embodiment just described, binary edge labeled trees use binary numerals “0” and “1.” However, the claimed subject matter is not limited in scope to binary edge labeled trees. For example, trees may employ any number of numeral combinations as labels, such as triplets, quadruplets, etc. Thus, using a quadruplet example, it is possible to construct trees, such as a zero-push of a particular tree, a one-push of that tree, a two-push of that tree, and a three-push of that tree. Thus, for such trees, edges may be labeled 0, 1, 2 or 3, etc.
The foregoing discussion has begun to characterize an algebra involving trees, in this particular embodiment, an algebra for ordered binary edge labeled trees or ordered BELTs. The foregoing discussion, therefore, defines a value zero, a zero node tree for this particular embodiment, value one, a one node tree for this particular embodiment, and a monadic operation, previously described as zero-push. For this particular embodiment, the push operation shall also be referred to as the successor operation. For this particular embodiment, this shall be denoted as S(x), where x refers to the tree to which the successor operation is applied. Of course, the claimed subject matter is not limited in scope to the successor operation, S(x), being limited to a zero-push. For example, alternatively, a “one-push” may be employed. For this embodiment, this is analogous, for example, to the convention that “0” represent “off” and “1” represent “on.” Alternatively and equivalently, “1” may be employed to represent “off,” and “0” may be employed to represent “on,” without loss of generality.
For this particular embodiment, two additional operations may be characterized, an “inversion” operation and a “merger” operation. For this particular embodiment, the inversion operation, when applied to a binary edge labeled tree, such as an ordered BELT, refers to replacing a “1” with a “0” and replacing a “0” with a “1”. Likewise, the merger operation with respect to trees refers to merging two trees at their roots. These two operations are illustrated, for example, in
As will now be appreciated, the inversion operation comprises a monadic operator while the merger operation comprises a binary operator. Likewise, the constants zero/one, referred to above, may be viewed as an operation having no argument or as a zero argument operator or operation. Thus, this operation, in effect, returns the same value whenever applied. Here, for this particular embodiment, the constant value zero, or zero argument operation that returns “0,” is denoted as “c,” the merger operator is denoted as “*”, the inversion operation is denoted as “'”, and the successor operator is denoted as previously described.
It may be shown that ordered binary edge labeled trees satisfy the following nine tree expressions referred to, for this particular embodiment, as the “basis expressions.” Specifically, any combination of four operations on a set of objects that satisfy these nine basis expressions generates an isomorph of the ordered binary edge labeled trees under the operations described above.
x″=x; (1)
((x*y)*z)=(x*(y*z)); (2)
(c*x)=c; (3)
(x*c)=c; (4)
(S(c)*x)=x; (5)
(x*S(c)))=x; (6)
c′=c; (7)
S(c)′=S(c); (8)
(x*y)′=(x′*y′). (9)
Thus, examining these tree expressions, in turn, should provide insight into the properties of ordered BELTs. The first tree expression above demonstrates that applying the inversion operation successively to an object returns the original object, here an ordered binary edge labeled tree. The second tree expression above is referred to as the associative property. This property with respect to ordered binary edge labeled trees is demonstrated, for example, in
As previously described, for this particular embodiment, the zero tree is denoted by the constant “c.” Thus, as illustrated, in tree expressions (3) and (4), the merger of the zero tree with any other ordered BELT produces the zero tree. Likewise, the successor of the zero tree is the one tree or one node tree, also referred to here also as the unity tree. As illustrated by tree expressions (5) and (6), a merger of the unity tree with any other ordered BELT produces the ordered BELT. Likewise, as demonstrated by these tree expressions, changing the order of the merger does not change the result for either the zero tree or the unity tree.
Tree expressions (7) and (8) demonstrate that the inverse or inversion of the zero tree and the inverse or inversion of the unity tree in each respective case produces the same tree. Likewise, the final tree expression above, (9), demonstrates the distributive property for the inversion operation over the merger operation. Thus, the inverse of the merger of two ordered BELTs provides the same result as the merger of the inverses of the respective ordered BELTs. In this context, any set of tree expressions, such as the nine basis tree expressions, may be thought of a symmetric set of “re-write rules” with the tree expression on the left-hand side of an equality sign being replaced by the tree expression on the right-hand side of the equality sign or vice-versa.
One additional aspect of the foregoing basis relationships that was omitted from this embodiment, but that might be included in alternate embodiments, is the addition of a second monadic operator, denoted here as “T(x).” This particular operator is omitted here without loss of generality at least in part because it may be defined in terms of operators previously described. More particularly, T(x)=S(x′)′, may be included in alternate embodiments. This approach, though not necessary from an implementation perspective, may add some symmetry and elegance to the above basis relationships: For example, it may be demonstrated that S(x)′=T(x′) and S(x′)=T(x)′. In some respects, this relationship is analogous to the relationship between the logical operations OR and AND in Boolean algebra, where −(A AND B)=−A OR −B, and −(A OR B)=−A AND −B. However, as indicated above, this may be omitted without loss of generality and, therefore, for implementation purposes, it may be easier to implement four operators rather than five.
As is well-known and straight-forward, a set of rewrite rules defines an equivalence relationship if and only if the rewrite rules are reflexive, symmetric and transitive. Likewise, as follows from another well-known mathematical theorem concerning such reflexive, symmetric, and transitive relations, see, for example, Theorem 16.3 on page 65 of An Introduction to Modern Algebra and Matrix Theory, by Ross A. Beaumant and Richard W. Ball, available from Rhinehart and Co., NY, 1954, such a set of rewrite rules divides the ground terms of the associated algebra, here binary edge labeled trees with no variables, into disjoint equivalence classes that may be viewed as elements of a model of the tree expressions. The operations indicated and/or implied by the tree expressions, therefore, are interpreted as operations applied to one or more equivalence classes to produce an equivalence class. For example, in the case of the binary operation merger, to know the equivalence class of the result of a merger, a representative ordered BELT is selected from a first equivalence class, a representative is selected from a second equivalence class and the representatives are combined. The product of a merger operation, in this embodiment, is the equivalence class associated with the merger of the representatives. A similar approach may be likewise applied to the previously described monadic operations.
As previously suggested, a set of operations that satisfies the nine basis tree expressions previously described is isomorphic to the set of finite, rooted, ordered binary edge labeled trees. Thus, or more particularly, there is a one-to-one relationship between the equivalence classes that satisfy the nine tree expressions and the finite, rooted, ordered binary edge labeled trees. In this particular embodiment, this algebra includes the zero tree, denoted c, the unity tree, denoted S(c), the monadic operator of inversion, denoted ' (apostrophe), the monadic operator of succession, denoted S(x), and the binary operator of merger, denoted * (asterisk).
For this particular embodiment, to illustrate the concept of equivalence classes,
Of course, as previously alluded to, for this particular embodiment, a useful distinction is also made between an ordered binary edge labeled tree and an unordered binary edge labeled tree. In this context, and as previously suggested, the notion of “ordered” refers to the property that the nodes attached to a particular node form an ordered set, the order corresponding to the order in which those nodes are displayed in the graph of the tree. However, it may likewise be observed that two ordered trees are resident in the same equivalence class of unordered BELTs if and only if the two trees are commutative translates of each other. In other the words, the two trees are equivalent and in the same unordered BELT equivalence class where the trees differ only in the order of the attached nodes. This prior observation demonstrates that by adding a tenth tree expression as follows:
(x*y)=(y*x) (10a)
referred to here as commutativity, the nine basis tree expressions, plus the tree expression above, form a model or algebra that is isomorphic to the unordered binary edge labeled trees. In a similar fashion, if, instead, the tenth tree expression is the following expression:
S(x*y)=S(x)*y (10b)
referred to here as linearity, alternately, an algebra of objects that satisfies such ten tree expressions is isomorphic to the four operations applied to the set of finite binary edge labeled strings. See, for example, previously referenced U.S. provisional patent application 60/543,371.
The power of the observations made above, as shall become more clear hereinafter, is that by applying the nine basis tree expressions, it shall be possible, through repetitive application of algebraic manipulations, to simplify and, in some cases, solve a tree expression in ordered binary edge labeled. Likewise, in a similar fashion a similar proposition may be made and a similar process may be applied using a set of basis expressions to simplify tree expressions for unordered binary edge labeled trees and/or to simplify string expressions for finite binary edge labeled strings.
As shall be demonstrated below, by employing the nine basis tree expressions, in the case of ordered BELTs, and eight basis tree expressions in the case of unordered BELTs or BELS, it may be possible to derive a mechanism for simplifying any expression in the four previously described operators, regardless of complexity. For this particular embodiment, for example, regarding ordered BELTs, the approach employed is to match all possible single term expressions for ordered trees on the left-hand side of an equality with all possible single term expressions for ordered trees on the right-hand side of the equality. Any tree expression to be simplified comprises a combination of such delineated possible single term tree expressions. Therefore, for this particular embodiment, to simplify or reduce a tree expression, a set of algebraic manipulations may be repetitively applied to reduce or simplify a tree expression into a set of interrelated queries, in this particular embodiment, queries interrelated by Boolean operations, as illustrated below.
Although the claimed subject matter is not limited in scope in this respect, one technique for implementing this approach may be to apply a table look up approach. Therefore, the table may contain the single term tree expressions discussed above and illustrated in more detail below. Thus, when confronted with a tree expression to be simplified or reduced, table look ups may be applied repetitively to reduce a tree expression until it is simplified to a point where it may not be reduced or simplified further by applying such algebraic manipulations. Once this has been accomplished, the set of queries that has resulted may then be applied to one or more tree objects to produce a result that makes the original or initial tree expression true. Likewise, alternatively, this approach may ultimately indicate that no such tree expression is possible.
Techniques for performing table look-ups are well-known and well-understood. Thus, this will not be discussed in detail here. However, it shall be appreciated that any and all of the previously described and/or later described processing, operations, conversions, transformations, manipulations, etc. of strings, trees, numerals, data, etc. may be performed on one or more computing platforms or similar computing devices, such as those that may include a memory to store a table as just described, although, the claimed subject matter is not necessarily limited in scope to this particular approach. Thus, for example, a hierarchy of data, such as a tree as previously described, for example, may be formed. Likewise, operations and/or manipulations, as described, may be performed; however, operations and/or manipulations in addition to those described or instead of those described may also be applied. It is intended that the claimed subject matter cover such embodiments.
The following algebraic manipulation examples, numbered zero through forty-six below, delineate the extent of the possibilities of single term tree expressions and their simplification to accomplish the result described above for this particular embodiment. What follows is a listing of these forty seven expressions to demonstrate a technique for applying algebraic manipulations that take a given tree expression and return a simpler form. Of course, for some embodiments, such as a table look-up, it may be desirable to omit from such a table look up those expressions that are “always” and/or “never” true, such as c=c and/or c=S(c), respectively, for example. It is, of course, understood here that terms such as “always,” “never,” “if and only if” (iff) and the like, are limited in context to the particular embodiment.
c=c
This is a logical truth.
c=x
This is true iff x=c.
c=S(x)
This is true nowhere because c is never equal to S(x), for any x, in this embodiment.
c=x′
This is true iff c=x. This may be shown by the following argument:
c=x′ iff c′=x” iff c′=x iff c=x.
c=(x*y)
This is true iff x=c or y=c.
The examples above have c on the left-hand side with all other syntactic variations on the right-hand side. Next are cases where the left-hand side is x. In these cases, the right hand side is assumed to be something other than c because this case was addressed above.
x=x
This is a logical validity and true everywhere.
x=y
This is true iff x=y.
x=S(x).
This is never true.
x=S(y)
This is true iff x=S(y).
x=x′
This is true iff x=c or x=S(c).
x=y′
This is true iff x=y′ or y=x′.
Next are the cases where the term on the right is a merger, *, of two simpler terms.
x=(x*x)
This is true iff x=c or x=S(c).
x=(x*y)
This is true iff x=c or y=S(c).
x=(y*x)
This is true iff x=c or y=S(c).
x=(y*z)
This is true iff x can be decomposed into two parts connected by the * operation where y is the first part of the decomposition and z is the remainder. The number of possible terms that y might be for this to be true is found by seeing x as an ordered multiple product and letting y be the initial product of any number of these terms. Thus, this statement is true iff the disjunction that lists all possible ways of assigning y to an initial product of the parts and assigning z to the remaining parts is true. The last term in this disjunction assigns y to x and S(c) to z.
Next are cases where the left-hand side is S(x) or S(x′).
S(x)=S(x)
This is always true.
S(x)=S(y)
This is true iff x=y.
Here, by way of illustration, a more complex equation is reduced to a simpler equation. In general, as previously suggested, for this particular embodiment, reducing a complex expression to a possible Boolean combination of simpler expressions provides a mechanism to determine conditions that make the tree expression true.
S(x)=x′
This is never true.
S(x)=y′
This is true iff (x=c and y=S(c)) or y=S(x)′.
S(x)=(x*x)
This is never true.
S(x)=(x*y)
This equation is true iff x=S(c) and y=S(S(c)).
S(x)=(y*x)
This is solvable iff x=S(c) and y=S(S(c)).
S(x)=(y*z)
This equation is solvable iff either (a) (y=S(c) and z=S(x)) or
S(x′)=(x*x)
This equation is never true.
S(x′)=(y*y)
This equation is true iff x=c and y=S(c).
S(x′)=(x*y)
This is true iff x=S(c) and y=S(S(c)).
S(x′)=(y*x)
This is true iff x=S(c) and y=S(S(c)).
S(x′)=(y*z)
This is solvable iff ((x=c and y=S(c)) and z=S(c)) or (x=S(c) and ((y=S(c) and z=S(S(c)) or (z=S(c) and y=S(S(c)))).
Next are the cases where the left-hand side is x′.
x′=x′
This is always true.
x′=y′
This is true iff x=y.
x′=(x*x)
This is true iff x=c or x=S(c).
x′=(x*y)
This is true iff x=c or (x=S(c) and y=S(c)).
x′=(y*x)
This is true iff x=c or (x=S(c) and y=S(c)).
x′=(y*z)
This is solvable iff x=(y′*z′) which is solvable iff
S(x)′=(x*x)
This equation is never true.
S(x)′=(y*y)
This equation is true iff x=c and y=S(c).
S(x)′=(x*y)
This is true iff x=S(c) and y=S(S(c))′.
S(x)′=(y*x)
This is true iff x=S(c) and y=S(S(c))′.
S(x)′=(y*z)
This is solvable iff S(x)=(y′*z′) which is solvable iff
The remaining cases are where the left-hand side is a merger operation.
(x*x)=(x*x)
This is always true.
(x*x)=(y*y)
True iff x=y.
(x*x)=(x*y)
True iff x=y.
(x*x)=(y*x)
True iff x=u.
(x*x)=(u*v)
This is solvable iff
(x*y)=(x*v)
True iff x=c or (y=c and v=c) or y=v.
(x*y)=(u*x)
True iff x=c or (y=c and u=c) or y=u.
(x*y)=(u*v)
This is solvable iff
Previously described is an embodiment of a method or technique for reducing or simplifying a tree expression to obtain a result that provides the condition(s) under which the tree expression is true. For example, if the left-hand side is a tree expression in x and the right-hand side is a tree expression in ground terms, this may reduce to another tree expression that makes the overall initial tree expression true. Of course, the claimed subject matter is not limited in scope to only this particular embodiment.
Another embodiment of a method or technique for reducing or simplifying a tree expression may involve segregating simplification with the basis relationships from simplifying using the 47 expressions provided above. For example, from an implementation perspective, it may simplify results to apply the basis expressions before applying the 47 derived expressions. In one particular embodiment, although, of course, the claimed subject matter is not limited in scope in this respect, the basis expressions may be applied in the following specific order to simplify the given tree expression:
x″=x; (1)
(x*y)′=(x′*y′); (9)
x″=x; (1)
c′=c; (7)
(c*x)=c; (3)
(x*c)=c; (4)
S(c)′=S(c); (8)
(S(c)*x)=x; (5)
(x*S(c)))=x; (6)
((x*y)*z)=(x*(y*z)). (2)
Once the tree expression has been simplified using these basis expressions, the 47 derived expressions may be applied to further reduce or simplify the tree expression. Of course, again, this is merely one possible approach and the claimed subject matter is not limited in scope to this particular embodiment.
Embodiments of a method of reducing or simplifying tree expressions have a variety of potentially useful applications. As described previously, trees provide a technique for structuring and/or depicting hierarchical data. Thus, for example, trees may be employed to represent language sentence structures, computer programs, algebraic formulae, molecular structures, family relationships and more. For example, one potential application of such a tree reduction technique is in the area of pattern matching. Thus, in pattern matching, substructures, in the form of a tree, for example, may be located within a larger structure, also in the form of a tree, referred to in this context as the target. This may be accomplished by comparing the structures; however, typically, such a comparison is complex, cumbersome, and/or time consuming. Although the claimed subject matter is not limited in scope to pattern matching or to any of the other potential applications described above, it may be instructive to work through at least one particular example of applying the previously described tree reduction approach to a pattern matching problem to demonstrate the power and/or versatility of this particular embodiment.
Within this particular context and for this particular embodiment, there are a number of potential pattern matching inquiries that may be made. Although these are simply examples and the claimed subject matter is not limited in scope to only these particular inquiries, one such inquiry, for example, may be whether a first tree, such as an ordered binary edge labeled tree, is equal to a second binary edge labeled tree? To phrase this differently, it may be useful to determine whether the trees match exactly. Likewise, another such query, or active verb, may be referred to in this context as a rooted partial sub tree (RPS) query or inquiry. This particular type of query or inquiry is demonstrated with reference to
Thus, in Examples 1 and 2 of
One query or question to be posed, for the purposes of pattern matching, is whether the tree on the left-hand side, such as in example one, is a rooted partial subtree of the tree on the right-hand side. In addition to that, several other potential questions may be posed and potentially answered. For example, if the tree on the left-hand side is a rooted partial subtree of the tree on the right-hand side, it may be useful to know how many times this rooted partial subtree is present in the right-hand side tree. Likewise, assume that a rooted partial subtree is present more than once. It may be useful to have a mechanism to identify one of the several rooted partial subtrees to a machine, for example, for further processing.
It also may be desirable, in other circumstances, to determine whether there is a match between a rooted tree and a subtree that is not rooted. In this context this may be referred to, for example, as a “projected match”. In this context, this refers to projecting one tree into another tree without matching corresponding roots and having the form and labels of the projected tree still be preserved in the tree in which it is projected.
Likewise, with reference to Example 2, in which the tree on the left-hand side does not match the tree on the right-hand side, an alternative query or question may relate to a measurement of the similarities and/or differences, as an embodiment of a measurement of the matching. For example, particular branches of the tree on the left-hand side may match with particular branches of the tree on the right-hand side, although overall, the entire tree on the left-hand side may not match to a subportion of the tree on the right-hand side, in this particular example. Thus, it may be appropriate, for example, to weight the matching in some form. Such an approach, for example, might be employed in data analysis, as simply one example. In one embodiment, for example, it may be desirable to identify a partial match that results in the maximum number of matching nodes and edges; likewise, in a different embodiment, it may be desirable to identify a partial match such that the match is closest to or most remote from the root. Again, any one of a number of other approaches is possible and such approaches included within the scope of the claimed subject matter. Thus, it may be desirable, assuming there is no identical match, to identify the closest match where “closest” or “most remote” is defined with respect to a particular weighted criterion designed to achieve a particular objective, such as the examples previously described.
For this particular embodiment, as previously suggested, a method of performing pattern matching may include constructing a tree expression. A tree expression may be constructed in a manner so as to match a first pattern, represented as a first tree, with or against a second pattern, represented as a second tree. After such a tree expression has been formulated or constructed, the conditions under which the tree expression is true may be determined. This may be accomplished, for example, by applying the previously described techniques for simplifying or reducing tree expressions. Likewise, as previously suggested, alternatively, such an approach may determine the conditions under which the formulated or constructed tree expression is false rather than true.
In order to appreciate the power and/or versatility of this particular application, a simple example is desirable. Therefore,
Continuing with this example, emanating from the root node “book” are five edges, three of them labeled with a “0” and two of them labeled with a “1”. For this particular embodiment, a “0” designates chapters of the book, whereas a “1” designates appendices of the book. Each edge includes another node other than the root node to support the edge that is labeled as just described. Likewise, as illustrated in
The simplification may be accomplished as follows. The tree expression, S(x)<=(R) y, where “<=” denotes “partial rooted subtree of,” indicates that the successor of tree x is a rooted partial subtree of the tree, T, in which book is a root node. The second half of the tree expression above comprises a tree equation in x. Thus, the expression (S(x)<=(R) y) may be reduced by simplifying the expression S(x) in the manner previously described above, for example. Likewise, simplifying the expression (y=S(S(S(c))′) involves several successive simplification operations involving the inverse operation and applying the successor operation. Again, as previously discussed, this may be accomplished through application of a table look up technique. As previously explained, a complex tree expression may therefore be reduced into simpler tree expressions that are interrelated by Boolean logic. This is demonstrated, for example, through the simplification of tree expressions as illustrated in Examples 0 to 46 above for ordered trees. Thus, in this example, these simplified tree expressions comprise queries interrelated by Boolean operations.
Furthermore, to apply such queries such as, for example, determining whether a first tree is a rooted partial subtree of another tree, as indicated by the tree expression above, involves the application of known programming techniques. See, for example, Chapter 4, “Tree Isomorphism,” of Algorithms on Trees and Graphs, by Gabriel Valiente, published by Springer, 2002. Such well-known and well-understood programming techniques will not be discussed here in any detail.
Much of the prior discussion was provided in the context of ordered binary edge labeled trees. However, a similar approach may be applied to unordered binary edge labeled trees, for example. In general, it is understood that performing such simplifications or reductions to unordered BELTs presents more of a processing challenge. See, for example, “Tree Matching Problems with Applications to Structured Text Databases,” by Pekka Kilpelainen, Ph.D dissertation, Department of Computer Science, University of Helsinki, Finland, November, 1992. A potential reason may be that a greater number of possibilities are present combinatorially in those situations in which nodes may be unordered rather than ordered. Nonetheless, to address such unordered trees, as previously for ordered trees, we begin with the basis expressions. More specifically, a set of basis expressions may be isomorphic to applying the four previously described operations, to unordered BELTs. Here, there are eight basis expressions rather than nine.
X″=x; (1)
((x y)*z)=(x*(y*z)); (2)
(c*X)=c; (3)
(S(c)*x)=x; (4)
c′=c; (5)
S(c)′=S(c); (6)
(x*y)′=(x′*y′); (7)
(x*y)=(y*x). (8)
It is worth observing that expressions (4) and (6) from the previous set of expressions have been omitted here. Such expressions are redundant in light of expression (8) above, which has been added. As previously, then, these expressions may be employed to solve single term expression that may be employed through a table look-up process, for example, to simplify more complex tree expressions. As previously, the tree expressions are provided below. Again, “never,” “always,” “iff and the like are limited in context to the particular embodiment.
c=c
This is a logical truth.
c=x
This is true iff x=c.
c=S(x)
This is true nowhere because c is never equal to S(x), for any x, in this embodiment.
c=x′
This is true iff c=x. This may be shown by the following argument:
c=x′ iff c′=x” iff c′=x iff c=x.
c=(x*y)
This is true iff x=c or y=c.
x=x
This is a logical validity and true everywhere.
x=y
This is true iff x=y.
x=S(x).
This is never true.
x=S(y)
This is true iff x=S(y).
x=x′
This is true iff x=c or x=S(c).
x=y′
This is true iff x=y′ or y=x′.
x=(x*x)
This is true iff x=c or x=S(c).
x=(x*y)
This is true iff x=c or y=S(c).
x=(y*x)
This is true iff x=c or y=S(c).
x=(y*z)
This is true iff x can be decomposed into two parts connected by the * operation where y is the first part of the decomposition and z is the remainder. The number of possible terms that y might be for this to be true is found by seeing x as an unordered multiple product and letting y be the initial product of any number of these terms. Thus, this statement is true iff the disjunction that lists all possible ways of assigning y to an initial product of the parts and assigning z to the remaining parts is true. The last term in this disjunction assigns y to x and S(c) to z.
Next are cases where the left-hand side is S(x) or S(x′).
S(x)=S(x)
This is always true.
S(x)=S(y)
This is true iff x=y.
Here, by way of illustration, a more complex equation is reduced to a simpler equation. In general, as previously suggested, for this particular embodiment, reducing a complex expression to a possible Boolean combination of simpler expressions provides a mechanism to determine conditions that make the tree expression true.
S(x)=x′
This is never true.
S(x)=y′
This is true iff (x=c and y=S(c)) or y=S(x)′.
S(x)=(x*x)
This is never true.
S(x)=(x*y)
This equation is true iff x=S(c) and y=S(S(c)).
S(x)=(y*x)
This is solvable iff x=S(c) and y=S(S(c)).
S(x)=(y*z)
This equation is solvable iff either (a) (y=S(c) and z=S(x)) or
S(x′)=(x*x)
This equation is never true.
S(x′)=(y*y)
This equation is true iff x=c and y=S(c).
S(x′)=(x*y)
This is true iff x=S(c) and y=S(S(c)).
S(x′)=(y*x)
This is true iff x=S(c) and y=S(S(c)).
S(x′)=(y*z)
This is solvable iff
((x=c and y=S(c)) and z=S(c)) or
(x=S(c) and ((y=S(c) and z=S(S(c)) or (z=S(c) and y=S(S(c)))).
Next are the cases where the left-hand side is x′.
x′=x′
This is always true.
x′=y′
This is true iff x=y.
x′=(x*x)
This is true iff x=c or x=S(c).
x′=(x*y)
This is true iff x=c or (x=S(c) and y=S(c)).
x′=(y*x)
This is true iff x=c or (x=S(c) and y=S(c)).
x′=(y*z)
This is solvable iff either
S(x)′=(x*x)
This equation is never true.
S(x)′=(y*y)
This equation is true iff x=c and y=S(c).
S(x)′=(x*y)
This is true iff x=S(c) and y=S(S(c))′.
S(x)′=(y*x)
This is true iff x=S(c) and y=S(S(c))′.
S(x)′=(y*z)
This is solvable iff S(x)=(y′*z′) which is solvable iff
(x*x)=(x*x)
This is is always true.
(x*x)=(y*y)
True iff x=y.
(x*x)=(x*v)
True iff x=v.
(x*x)=(u*x)
True iff x=u.
(x*x)=(u*v)
This is solvable iff
(x*y)=(x*v)
True iff y=v.
(x*y)=(u*y)
True iff x=u.
(x*y)=(u*v)
This is solvable iff
Thus, a similar process as previously described for ordered trees may be applied for unordered trees using the 47 expressions above.
Of course, the claimed subject matter is not limited to ordered or unordered binary edge labeled trees. For example, as described in previously cited U.S. provisional patent application 60/543,371, binary edge labeled trees and binary node labeled trees may be employed nearly interchangeably to represent substantially the same hierarchy of data. In particular, a binary node labeled tree may be associated with a binary edge labeled tree where the nodes of the binary node labeled tree take the same values as the edges of the binary edge labeled tree, except that the root node of the binary node labeled tree may comprise a node having a zero value or a null value. This is illustrated, for example, in
In accordance with the claimed subject matter, therefore, any tree, regardless of whether it is binary edge labeled, binary node labeled, non-binary, a feature tree, or otherwise, may be manipulated and/or operated upon in a manner similar to the approach of the previously described embodiments. Typically, different association embodiments shall be employed, depending at least in part, for example, upon the particular type of tree and/or string, as described, for example in the previously referenced U.S. provisional patent application 60/543,371. For example, as described in the previously referenced US provisional patent application, a node labeled tree in which the nodes are labeled with natural numerals or data values may be converted to a binary edge labeled tree. Furthermore, this may be accomplished with approximately the same amount of storage. For example, for this particular embodiment, this may involve substantially the same amount of node and/or edge data label values. However, for convenience, without intending to limit the scope of the claimed subject matter in any way, here, operations and/or manipulations and the like have been described primarily in the context of BELTs.
In another embodiment, however, a particular tree may include null types or, more particularly, some node values denoted by the empty set. This is illustrated, for example, by the tree in
Likewise, in an alternative embodiment, a node labeled tree, for example, may comprise fixed length tuples of numerals. For such an embodiment, such multiple numerals may be combined into a single numeral, such as by employing Cantor pairing operations, for example. See, for example, Logical Number Theory, An Introduction, by Craig Smorynski, pp, 14-23, available from Springer-Verlag, 1991. This approach should produce a tree to which the previously described embodiments may then be applied. Furthermore, for one embodiment, a tree in which nodes are labeled with numerals or numerical data, rather than binary data, may be converted to a binary edge labeled tree and/or binary node labeled tree, and, for another embodiment, a tree in which edges are labeled with numerals or numerical data, rather than binary data, may be converted to a binary edge labeled tree and/or binary node labeled tree. See previously referenced US provisional patent application Ser. No. 60/43371.
Furthermore, a tree in which both the nodes and the edges are labeled may be referred to in this context as a feature tree and may be converted to a binary edge labeled tree and/or binary node labeled tree. For example, without intending to limit the scope of the claimed subject matter, in one approach, a feature tree may be converted by converting any labeled node with its labeled outgoing edge to an ordered pair of labels for the particular node. Using the embodiment described, for example in the previously referenced US provisional patent application, this tree may then be converted to a binary edge labeled tree.
In yet another embodiment, for trees in which data labels do not comprise simply natural numerals, such as, as one example, trees that include negative numerals, such data labels may be converted to an ordered pair of numerals. For example, the first numeral may represent a data type. Examples include a data type such as negative, dollars, etc. As described above, such trees may also be converted to binary edge labeled trees, such as by applying the embodiment of the previously referenced US provisional patent application, for example. However, again, this is provided for purposes of explanation and illustration. The claimed subject matter is not limited in scope to employing the approach of the previously referenced provisional patent application.
It will, of course, be understood that, although particular embodiments have just been described, the claimed subject matter is not limited in scope to a particular embodiment or implementation. For example, one embodiment may be in hardware, such as implemented to operate on a device or combination of devices, for example, whereas another embodiment may be in software. Likewise, an embodiment may be implemented in firmware, or as any combination of hardware, software, and/or firmware, for example. Likewise, although the claimed subject matter is not limited in scope in this respect, one embodiment may comprise one or more articles, such as a storage medium or storage media. This storage media, such as, one or more CD-ROMs and/or disks, for example, may have stored thereon instructions, that when executed by a system, such as a computer system, computing platform, or other system, for example, may result in an embodiment of a method in accordance with the claimed subject matter being executed, such as one of the embodiments previously described, for example. As one potential example, a computing platform may include one or more processing units or processors, one or more input/output devices, such as a display, a keyboard and/or a mouse, and/or one or more memories, such as static random access memory, dynamic random access memory, flash memory, and/or a hard drive. For example, a display may be employed to display one or more queries, such as those that may be interrelated, and or one or more tree expressions, although, again, the claimed subject matter is not limited in scope to this example.
In the preceding description, various aspects of the claimed subject matter have been described. For purposes of explanation, specific numbers, systems and/or configurations were set forth to provide a thorough understanding of the claimed subject matter. However, it should be apparent to one skilled in the art having the benefit of this disclosure that the claimed subject matter may be practiced without the specific details. In other instances, well-known features were omitted and/or simplified so as not to obscure the claimed subject matter. While certain features have been illustrated and/or described herein, many modifications, substitutions, changes and/or equivalents will now occur to those skilled in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and/or changes as fall within the true spirit of the claimed subject matter.
This disclosure claims priority pursuant to 35 USC 119(e) from U.S. provisional patent application Ser. No. 60/575,784, filed on May 28, 2004, by LeTourneau, titled, “METHOD AND/OR SYSTEM FOR SIMPLIFYING TREE EXPRESSIONS, SUCH AS FOR PATTERN MATCHING,” assigned to the assignee of the presently claimed subject matter.
Number | Date | Country | |
---|---|---|---|
60575784 | May 2004 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15043267 | Feb 2016 | US |
Child | 16911282 | US | |
Parent | 11007139 | Dec 2004 | US |
Child | 15043267 | US |