1. Field of the Invention
This invention is related to code string searches that search with a computer for codes or code strings consisting of bit strings in the same way as character string searches that search for character codes or character code strings consisting of bit strings.
2. Description of Related Art
Recently it has become customary to use word processing to create business documents, and by the spread of the internet, the number and size of electronic documents, using character codes consisting of bit strings that can be processed by computers, have grown immensely throughout the world. For this reason, various character string search methods are being developed in order to fetch a necessary document from out of this huge amount of documents using computers.
As an example of these character string search methods, a longest prefix match search that searches variable length character strings (hereinbelow expressed as a longest prefix match search for variable length character strings), is described, referencing
The example shown in
When these character strings to be searched 10 are searched using the search character string 40a “ABEABC”, the character strings to be searched that prefix-match search character string 40a are “A”, “AB”, and “ABEAB”. Because the longest character string to be searched among these three is “ABEAB”, “ABEAB” is the search result character string 50a for the longest prefix match search.
When these character strings to be searched 10 are searched using the search character string 40b “ABE”, the character strings to be searched that prefix-match are “A” and “AB”. Because the longest character string to be searched among these two is “AB”, “AB” is the search result character string 50b. Also, although the search character string 40b “ABE” prefix-matches the character string “ABEAB” included in the character strings to be searched 10, the longest prefix match search of this application, as was noted above, is a search that searches the set of character strings to be searched for the longest character string that prefix-matches the search character string, and because the character string “ABEAB” does not prefix match the search character string 40b “ABE”, it cannot be obtained as a search result character string.
Also, when the character strings to be searched 10 is searched for the search character string 40c “AB”, the character strings to be searched that prefix-match are the same “A” and “AB” as above. Because the longest character string to be searched among these two is “AB”, the same “AB” as above becomes the search result character string 50b.
Among the longest prefix match searches for a variable length character string noted above, there is a method that divides the variable length character string into a front section with a certain length as a prefix and the remaining part as a suffix, and searches using the prefix as an index, and, after reducing the number of candidates, collates them with the suffix.
Among these kinds of methods, a variable length character string search apparatus and search method have been proposed (Patent Document 1) that seek to increase search efficiency even if the lengths of duplicate parts in the stored patterns that are subject to searches are variable, by making prefixes with a plurality of lengths to be indexes, enabling an index with an appropriate length to be selected.
Also, in order to perform the search at high speed, a method using the data configuration called a Patricia tree is well known. A Patricia tree is one kind of a binary tree and a node of a Patricia tree is formed to include an index key, a test bit position for a search key, and right and left link pointers. Although search processing using a Patricia tree has the advantages of being able to perform a search by testing only the required bits and of only being necessary to perform an overall key comparison one time, there are the disadvantages of an increase in storage capacity caused by the inevitable two links from each node, the added complexity of the decision processing because of the existence of back links, the delay in the search processing by returning by a back link in order to compare with an index key for the first time, and the difficulty of data maintenance such as adding and deleting a node.
Whereat, this applicant proposed (Patent Document 2 and Patent Document 3) a bit string search apparatus and search method preparing a data configuration called a coupled-node tree in order to resolve the disadvantages of the Patricia tree, reduce the amount of memory needed, speed up the search, and simplify data maintenance.
The coupled-node tree disclosed in Patent Document 2 and Patent Document 3 prepares branch nodes that have data for link targets and leaf nodes that have index keys that are search targets. And this tree configuration is configured from a root node and node pairs disposed in adjacent storage areas, consisting of a branch node and a leaf node, or two branch nodes, or two leaf nodes.
The branch node includes a discrimination bit position in the search key and information indicating a position of a primary node, which is one node of a node pair that is a link target, and the leaf node includes an index key that is a target bit string of a bit string search. The root node is a branch node unless there is only one node in the tree.
Although the discrimination bit position in the search key is the same as the inspection bit position of a Patricia tree from the point that the bit value at that position in the search key is being used, they differ in the point that the bit value at the inspection bit position of a Patricia tree is analyzed and used to obtain the link target whereas the bit value at the discrimination bit position of a coupled-node tree is used in a calculation to obtain the node that is the link target.
The execution of a search using a search key is performed, at each branch node including the root node, by successively linking to one of the nodes in the node pair that is the link target in accordance with the bit value in the search key at the discrimination bit position included in that branch node until a leaf node is reached.
When a leaf node is reached, the index key kept in the leaf node is extracted. The extracted index key can be compared with the search key and if they coincide the search can be taken to be a success, and if no index key that is an object of searches matches the search key, the search can be taken to be a failure. Or, the extracted index key can be simply taken to be the search result key.
Also, this applicant has proposed (Patent Document 4) that the leaf nodes in a coupled-node tree do not directly include the index keys that are the object of searches and instead include a reference pointer which is a pointer to an area holding the index keys.
To simplify notation hereinafter, in the description below the wording “leaf node including an index key” and “index key included in a leaf node” may at times be used even if the leaf node includes a reference pointer instead of an index key. Also, for a coupled-node tree, which has leaf nodes that include index keys, expressions such as “a coupled-node tree wherein index keys are stored” or “index keys stored in a coupled-node tree” may at times be used. Furthermore, expressions such as “index key related to the leaf node” or “leaf node related to the index key” may be used regardless of whether the leaf node includes an index key or a reference pointer to the index key.
Referring to
The array element with the array element number 20 has stored therein a node [0] 112, which is the primary node of the node pair 111. A node [1] 113 forming a pair with the primary node is stored into the next, adjacent, array element (array element number 20+1). Node [0] 112 is also a branch node like node 101. The value 0 is stored in the node type 114 of the node [0] 112, the value 3 is stored in the discrimination bit position 115, and the value 30 is stored in the coupled node indicator 116. Also, node [1] 113 is formed from a node type 117 and a reference pointer 118a. The value 1 is stored in the node type 117, indicating that node [1] 113 is a leaf node. In the reference pointer 118a is stored a pointer referencing a storage area for a code string that is the target of searches. To simplify notation hereinafter, the data stored in the reference pointer may also at times be called the reference pointer.
Primary nodes are indicated as the node [0], and nodes that are paired therewith are indicated as the node [1]. The node paired with a primary node may at times also be called a non-primary node. Also the node stored in an array element with some array element number is called the node of that array element number and the array element number stored in the array element of that node is also called the array element number of the node.
The contents of the node pair 121 formed by the node 122 and the node 123 that are stored in the array elements having array element numbers 30 and 31 are not shown.
The 0 or 1 that is appended to the node [0] 112, the node [1] 113, the node 122, and the node 123 indicates respectively to which node of the node pair linking is to be done when performing a search using a search key. The node in the position where a “0” is appended may at times be called the node on the [0] side and the node in the position where a “1” is appended may at times be called the node on the [1] side. Also the position in a node pair wherein a “0” is appended may at times be called the node [0] position and the position in a node pair wherein a “1” is appended may at times be called the node [1] position. In a search using a coupled node tree, linking is done to the node at the node [0] position or the node [1] position depending on the bit value of the search key at the discrimination bit position of the immediately previous branch node. Therefore, by adding the bit value of the discrimination bit position of the search key to the coupled node indicator of the immediately previous branch node, it is possible to determine the array element number of an array element storing a node at the link target.
Although in the above-noted example the smaller of the array element numbers at which the node pair is located is used as the coupled node indicator, it will be understood that it is also possible to use the larger of the array element numbers in the same manner.
Furthermore, these applicants have also proposed a bit search method using a coupled-node tree that includes index keys comprising bit strings that include a “don't care” bit (Patent Document 5).
Although bit string searches using a coupled-node tree have the special features of requiring less memory capacity for holding the tree, their search speed being very fast, and their maintenance being easy, still the technology for applying a coupled-node tree to a longest prefix match search for variable length character strings or variable length code strings currently does not exist.
Whereat, this invention has the objective of proposing a coupled-node tree that can be applied to longest prefix match searches for variable length code strings and realizing a longest prefix match search for variable length code strings that actualizes the special characteristics that are intrinsic to coupled-node trees.
In order to achieve the objective noted above, in accordance with this invention, a search is performed on a coupled-node tree with a configuration prescribed by the bit values of index keys whose bit strings are encodings of the search target code strings, by means of an encoded search key which is a bit string that encodes a search key consisting of a code string.
The coupled-node tree, as noted above, has a configuration prescribed by the bit values of index keys whose bit strings are encodings of the search target code strings, and it has a root node and node pairs, which are compositional elements of a tree, and which are two nodes, a primary node and a non-primary node, disposed in adjacent storage areas. The nodes have an area for storing a node type that indicates whether that node is a branch node or a leaf node. The branch node has, in addition to the node type, an area for storing a discrimination bit position in the encoded search key and an area for storing information indicating the position of the primary node of a node pair that is the link target. The leaf node has, in addition to the node type, an area for storing the search target code string or a reference pointer pointing to a storage area for the search target code string. Also, regardless whether the leaf node includes the search target code string or includes a reference pointer to the search target code string the wording “the search target code string related to the leaf node” or “the leaf node related to the search target code string” may at times be used.
The encoded search key is a bit string with differentiating bits appended at the head position for the bit strings for each code included in the code string that is the above noted search key, which indicate that there are following codes (hereinbelow this may be called continue bits) and with a differentiating bit appended at the tail end of the code string, which indicates that there are no more following codes (hereinbelow this may be called an end bit). Also, the index keys are bits strings wherein a continue bit is appended at the head of the bit string for each code included in the search target code string and an end bit is connected to the tail end of the code string.
Thus, when considering that a non-significant code with length 0 can exist both in the code string that is the search key and at the tail end of the search target code strings, the differentiating bit differentiates whether the codes following the differentiating bit are significant codes or non-significant codes. The differentiating bit can also indicate whether or not there are any following codes.
In accordance with this invention, first, an initial search is executed that searches a coupled-node tree by means of an encoded search key and obtains a search target code string as the search result code string and then stores in the stack information indicating the position of a branch node of the branch nodes traversed during the search, for which the value of the discrimination bit position of the branch node matches the position wherein one of the differentiating bits in the bit string configuring the encoded search key exists (hereinafter the branch node may be called the code string delimiter branch node) and information for accessing the search target code string that is related to the code string terminus node, which is the node of the node pair that is the link target of code string delimiter branch node, whose node position is computed, when the value at the discrimination bit position has the value of the end bit. If the nodes configuring the node pair that is the link target of the code string delimiter branch node are defined as child nodes of the branch node and the branch node that is the link source is defined as the parent node, the information indicating the position of the code string delimiter branch node is stored in the stack as information indicating the position of the parent node. Also, for example, if information indicating the position of the node that is one of the child nodes of the code string delimiter branch node is made to be information for accessing the search target code string related to the code string terminus node, that information is stored as information indicating the position of that child node. By the definition of a code string delimiter branch node, of the child nodes, either the node on the [0] side or the node on the [1] side is a leaf node.
Next, a longest prefix match search is executed by encoding the search result code string as an index key and comparing it with the encoded search key, and a determination is made whether the search result code string is the longest prefix matching code string (hereinbelow this may be called the longest prefix matching key) and if the search result code string is not the longest prefix matching key, the information for accessing a search target code string related to a code string terminus node is read out from the stack and a search target code string is searched for, and a longest prefix matching key is obtained from the search target code strings.
In accordance with this invention, the configuration of a coupled-node tree is made to be that which is prescribed by the index keys, encoded by combining the bit strings corresponding to the codes with differentiating bits that indicate whether or not following codes exists in the search target code strings. An initial search is done using an encoded search key that encodes the search key in the same way as the search target code strings, and the path traversed during the search is memorized. Then, a longest prefix match search using a search key consisting of a code string can be realized by searching the search result code string by the initial search and search target code strings accessed by means of the information about the search path that is memorized.
Next details about a preferred embodiment of this invention is described. Hereinbelow, after describing an example of an encoding method for the code string and an example of a coupled-node tree, the search processing, insertion processing, and deletion processing are each described. Also, although the description below assumes that the leaf nodes include a reference pointer pointing the storage area holding the search target code string, it is clear to one skilled in the art that the same description applies even if the leaf nodes include the search target code strings directly.
This invention takes as its object code strings consisting of codes used to distinguish not only letters but also any symbol or any item. And this invention does not handle directly the code strings just as they are but rather handles strings of encoded codes that encode each code included in the code string. As was noted above, each code is encoded as a combination of a differentiating bit indicating whether or not a following code exists or not and a plurality of bits expressing in bits each code. This invention performs searches and so forth by means of encoded code strings that are a string of encoded codes encoding each code in the code string.
One example of an encoding method for code strings for the code string search apparatus, search method, and program of this invention is described referencing
The example shown in
Also, the code “*” is one equivalent to the non-significant code with a length of zero noted above, as will be understood by the description hereinbelow.
Here, a case is described wherein the code string 50, which is a concatenation of the codes “A”, “B”, “E”, “A”, and “B”, is encoded. The label 52 in the drawing indicates the code positions (in this example, P1 to P6). As shown in the drawing, the code string 50 consists of six codes with the code “A” at code position P1, the code “B” at code position P2, the code “E” at code position P3, the code “A” at code position P4, the code “B” at code position P5, and the terminal code “*”, which indicates the end of the code string, at code position P6.
The above noted code string 50 “ABEAB*” becomes the code string expressed in bits shown by the label 60 in the drawing, by using the bit values of the codes described in the above noted code table 13. In this example, the code string expressed in bits 60 is “001 010 101 001 010 000”.
As is noted above, each code in the code string is encoded by combining a differentiating bit, which shows whether or not there is a following code, with the plurality of bits that are the bit-expression for each code. As shown in
Also it is assumed that the end bit 73b, showing the string end, is not included in the “encoded bit length” that shows the length of the encoded code string. Thus, as shown in
In accordance with this encoding method, it is easy to determine from the bit expression of the encoded code string whether or not there is a following significant code in the code string before encoding. In other words, the (number of bits accommodating a code [in this example, 3]+1) n-th bit in the encoded code string (n being an integer with a value of 0 or greater) is the position of a differentiating bit and depending on whether the bit value at this position is a “0” or a “1”, a determination can be made whether or not there is a following significant code.
Also, in the above the value of the continue bit is taken to be a “1”, and value of the end bit is taken to be a “0”, but the reverse is also possible. Also, a differentiating bit consisting of a plurality of bits may also be used.
This invention configures a coupled-node tree by means of a set of index keys that are encoded bit strings that encode, with the above noted encoding method, search target code strings and this invention performs searches and so forth using an encoded search key that is an encoded bit string that encodes, with the above noted encoding method, a search key consisting of a code string.
Next an example of a coupled-node tree in one embodiment of the present invention is described.
Here, the reason why the coupled-node tree 200 is made to include also a code string consisting of only the terminal code “*” is to prevent a case wherein, in a longest prefix match search, details of which are described hereinbelow, not even 1 of the search target code strings prefix-matches the search key.
Of course, not even 1 of the search target code strings prefix-matching the search key can be allowed and the coupled-node tree 200 can also be made so that it does not include a code string consisting of only the terminal code “*”.
Details of how a search result key can always be obtained for any search with any kind of search key by making the coupled-node tree 200 to include also a code string consisting of only the terminal code “*” are explained hereinbelow in the description of a longest prefix match search.
In the drawing the reference numeral 210a shows the root node. In the example shown, the root node 210a is the primary node of the node pair 201a located at the array element number 220.
In this tree structure, a node pair 201b is located below the root node 210a, and below that is located the node pair 201c. Below the node pair 201c are located the node pair 201f and the node pair 201d. Below the node pair 201d is located the node pair 201e. The 0 or 1 code that is appended before each node is the same as the labels that are appended before the array element numbers described in
In the example shown, the node type 260a of the root node 210a is “0”, thereby indicating that this is a branch node, and the discrimination bit position 230a indicates “0”. The coupled node indicator is 220a, which is the array element number of the array element in which the primary node 210b of the node pair 201b is stored.
The node pair 201b consists of node 210b and node 211b. Because a “1” is stored in the node type 260b of node 210b, this node is a leaf node and it includes the reference pointer 250b. The pointer that is stored in the reference pointer 250b references an area in the code string storage area 311 wherein is stored the code string 290b consisting of only the terminal code “*”. As was noted hereinabove, the pointer stored in reference pointer 250b may also be called the reference pointer and is expressed with the label 280b. The same applies to the other leaf nodes: the pointer stored in the reference pointer may at times be called a reference pointer. Also the “0” depicted immediately below the reference pointer 250b is the bit expression for the encoded code string that encodes the code string referenced by reference pointer 280b, and the (*) shows that that bit expression is the bit expression for the code string “*”. The same applies to the other leaf nodes. In the description hereinbelow, the bit expression for any arbitrary code string “ABC” may at times be notated as (ABC).
Also the node type 261b of node 211b is a “0”, indicating that the node is a branch node. A “2” is stored in the discrimination bit position 231b in node 211b, and the array element number of the array element 221b wherein is stored the primary node 210c of the node pair 201c is stored in the coupled node indicator for the link target.
The node pair 201c is configured by node 210c and node 211c. Both of their nodes types 260c and 216c are “0”, indicating that they are branch nodes. The discrimination bit position 230c in node 210c is a “4”, and the array element number of the array element 220c wherein is stored the primary node 210d of the node pair 201d is stored in the coupled node indicator.
Because a “1” is stored in the node type 260d for node 210d, this node is a leaf node, and the reference pointer 280d, which points to the area wherein is stored the code string “A*” shown with the label 290d, is stored in reference pointer 250d.
The node type 261d for node 211d that is a pair to node 210d is a “0”, and an “8” is stored in the discrimination bit position 231d. And the array element number of the array element 221d wherein is stored the primary node 210e of the node pair 201e is stored in the coupled node indicator.
The node pair 201e is configured by node 210e and node 211e, and their nodes types 260e and 261e are both “1”, indicating that both are leaf nodes. The reference pointer 280e, which points to the area wherein is stored the code string “AB*” shown with the label 290e, is stored in reference pointer 250e for node 210e, and the reference pointer 281e, which points to the area wherein is stored the code string “ABEAB*” shown with the label 291e, is stored in reference pointer 251e for node 211e.
The discrimination bit position 231c in node 211c, which is the other node of the above noted node pair 201c, is a “5”, and the array element number of the array element 221c wherein is stored the primary node 210f of the node pair 201f is stored in the coupled node indicator.
The node pair 201f is configured by node 210f and node 211f, and their nodes types 260f and 261f are both “1”, indicating that both are leaf nodes. The reference pointer 280f, which points to the area wherein is stored the code string “BAB*” shown with the label 290f, is stored in reference pointer 250f for node 210f, and the reference pointer 281f, which points to the area wherein is stored the code string “BEAB*” shown with the label 291f, is stored in reference pointer 251f for node 211f.
Next, the meaning of the coupled-node tree configuration is described.
The search target code strings in the coupled-node tree 200 shown in
In the above noted Table 1, significant code strings, those other than the code string “*”, have a “1” in the 0-th bit of their encoded bit string, and the encoded bit string for the code string “*” has a “0” for the value of the 0-th bit. Thus the code string “*” can be differentiated from the other code strings by a determination of the value at 0-th bit in the encoded bit string. In
Next, if we look at the significant code strings in the encoded bit strings, we can see that the bits at bit 1 are alike in all being “0” while the bit at bit 2 is a “1” for the code strings “BEAB*” and “BAB*” and a “0” for the code strings “ABEAB*”, “AB*”, and “A*”.
Because there exist encoded bit strings whose bit values at bit 2 mutually differ, the discrimination bit position 231b for branch node 211b, which is the link target when the value at bit 0 in the encoded bit string is a “1”, has the value “2”, and when the value at bit 2 in the encoded bit string is a “0” a link is made to primary node 210c of the node pair 201c and when the value is “1” a link is made to node 211c.
When the branching at the above noted branch node 211b is seen from the point of view of the code string, that branching reflects the fact that the code positioned in the first code position in the code strings in the search target code strings is either an “A” or a “B”. In the description hereinbelow, branch nodes, like branch node 211b, wherein the value in the discrimination bit position does not coincide with the position of a differentiating bit, may be called a code distinguishing branch node. In the above noted example, although the first code is completely divided into whether the first code in the code string is an “A” or a “B” at code distinguishing branch node 211b by performing bifurcation, in general a code at any position in the code string is not completely divided at a code distinguishing branch node.
The discrimination bit position 230c in node 210c, which is the link target when the value at bit 2 in the encoded bit string is a “0”, has a “4”. This number is based on the fact that when we look at the bit values at bit 3 and thereafter in the encoded bit strings for the code strings “ABEAB*”, “AB*” and “A*”, for which the value at bit 2 in the above noted Table 1 is a “0”, we find that the value at bit 3 is a “1” in each of them and the value at bit 4 is a “1” for the code strings “ABEAB*” and “AB*” and a “0” for code string “A*. In other words, this branching is based on separating code strings wherein the number of significant codes is “1” from code strings wherein the number of significant codes is 2 or more. And the reference pointer 280d, which points to the area wherein is stored the code string “A*”, is stored in the primary node 210d of node pair 201d, which is the link target when the value at bit 4 in the encoded bit string is a “0”.
Also, an “8” is stored in the discrimination bit position 231d of node 211d, which is the link target when the value at bit 4 in the encoded bit string is a “1”. This number is based on the fact that when we look at the bit values at bit 5 and thereafter in the encoded bit strings for the code strings “ABEAB*” and “AB*”, for which the value at bit 2 is a “0” and the value at bit 4 is a “1”, we find that the values at bit 5 through bit 7 are the same, but the value at bit 8 is different. In other words, this branching distinguishes code strings wherein the number of significant codes is two from code strings wherein the number of significant codes is three or more.
And the reference pointer 280e, which points to the area wherein is stored the code string “AB*”, is stored in the primary node 210e (the link target when bit 8 in the encoded bit string is a “0”) of node pair 201e, which is the link target from node 211d, and the reference pointer 281e, which points to the area wherein is stored the code string “ABEAB*”, is stored in node 211e, which is the link target when bit 8 in the encoded bit string is a “1”.
The value “5” is stored as the discrimination bit position 231c in node 211c, which is the link target when bit 2 in the encoded bit string is a “1”. This number is based on the fact that when we look at the bit values at bit 3 and thereafter in the encoded bit strings for the code strings “BEAB*” and “BAB*”, for which the value at bit 2 is a “1”, we find that the values at bit 3 and bit 4 are the same, but the value at bit 5 is different. And the reference pointer 280f, which points to the area wherein is stored the code string “BAB*”, is stored in node 210f, which is the link target when the value at bit 5 in the encoded bit string is a “0”, and the reference pointer 281f, which points to the area wherein is stored the code string “BEAB*”, is stored in node 211f, which is the link target when the value at bit 5 in the encoded bit string is a “1”. The branching at node 211c, which is a code distinguishing branch node, reflects the fact, among the code strings in the search target code strings at that point, the code positioned in the second code position is either that for an “E” or that for an “A”.
In this way, the configuration of a coupled-node tree is prescribed by the bit values at each bit position in each key included in the set of index keys (encoded bit strings that encode the search target code strings).
In other words, delta information about the index keys can be said to be stored in the coupled-node tree.
And a branch is taken at each bit position with a mutually differing bit value, in the sequence from the bit position closest to the beginning of an index key, to the node for which the bit value is a “1” or to the node for which the bit value is a “0”. Also, the magnitude relation among the code strings is not changed by the encoding. From this fact, when we traverse the tree to leaf nodes giving priority to the node [1] side and to the depth direction in the tree and when we look at the search target code strings stored in those leaf nodes, or referenced by means of the reference pointer stored in those leaf nodes, we can be see that the search target code strings are sorted in descending order.
Also, because the coupled-node tree of this invention is one wherein is stored encoded bit strings that encode the search target code strings, it has the special characteristic that the node [0] that is the link target of a code string delimiter branch node is a leaf node. In the example of the coupled-node tree 200 shown in
Also, of the child nodes for the above noted code string delimiter branch node, the fact that the node [0] is a leaf node corresponds to the fact that the code “*” is encoded as a “0”. It is clear that if the code “*” is encoded as a “1”, of the child nodes for the code string delimiter branch node, the node [1] becomes the leaf node. Here, of the child nodes for the code string delimiter branch node, the leaf node that branches by means of the bit value that shows that a following code does not exist is called a code string terminus node or a code string terminus child node, and the node that is a pair of that node is called a code string linked node or a code string linked child node. And thus the code string terminus node is a leaf node. Also, the code string related to the code string terminus node prefix-matches the code strings related to the leaf nodes disposed below the code string linked node that is a pair to that code string terminus node. Furthermore, it is clear that the length of the code string related to the code string terminus node is shorter than the lengths of the code strings related to the leaf nodes disposed below the code string linked node that is a pair to the code string terminus node.
Also because a coupled-node tree can be identified by the array element number of the root node, the coupled-node tree can be managed using the array element number of the root node. Thus the array element number of the root node for the coupled-node tree is taken to be registered in the coupled-node tree management means.
Search processing and data maintenance are implemented with the search apparatus of the present invention by a data processing apparatus 301 having at least a central processing unit 302 and a cache memory 303, and a data storage apparatus 308. The data storage apparatus 308, which has an array 309 into which is disposed a coupled node tree, and a search path stack 310, into which are stored array element numbers of nodes which are traversed during the search, and code string storage area 311, can be implemented by a main memory 305 or a storage device 306, or alternatively, by using a remotely disposed apparatus connected via a communication apparatus 307. The array 100 in
In the example shown in
Also, although it is not particularly illustrated, a temporary memory area can of course be used to enable various values obtained during processing to be used in subsequent processing.
Basic search processing using this kind of a coupled-node tree is described referencing
In a preferred embodiment of this invention, a search path stack is prepared for holding the array element numbers of the array elements wherein are stored nodes passed during a search as a means for remembering the path traversed in a search of a coupled-node tree. As shown in
Next, at step S502, the array element number set at step S501 or obtained at step S509 noted below is stored in the search path stack, and at step S503, the array element corresponding to that array element number is read out as the node to be referenced. Then, at step S504, the node type is extracted from the read-out node, and at step S505, a determination is made whether the node type is that of a branch node.
If the determination at step S505 is that the read-out node is a branch node, processing proceeds to step S506, wherein information regarding the discrimination bit position is extracted from the node, and furthermore, at step S507, the bit value corresponding to the extracted discrimination bit position is extracted from the encoded search key. Then, at step S508, the coupled node indicator is extracted from the node, and at step S509, the bit value extracted from the encoded search key is added to the coupled node indicator and the result is made to be a new array element number and processing returns to step S502.
Thereinafter, the processing from step S502 to step S509 is repeated until the determination in step S505 is that of a leaf node and processing proceeds to step S510. At step S510, the reference pointer is extracted from the leaf node, and processing is terminated.
In this way, the search terminates when a leaf node is reached, and the array element numbers of the array elements holding the branch nodes traversed during the search up to the leaf node have been successively stored in the search path stack.
Next, code string search processing in one embodiment of the present invention is described referencing the flowchart in
The search processing in
In this preferred embodiment of the invention, the longest prefix matching key is the longest of the index keys that prefix-match the encoded search key, which is an encoding of the search key. An index key that prefix-matches the encoded search key coincides perfectly with the encoded search key throughout the length of that index key. Because an index key that is exactly the same as the encoded search key is the longest index key of all the index keys that prefix-match the encoded search key, it is the longest prefix matching key.
As shown in
Next, proceeding to step S602, encode processing is done wherein the search key set in the code string is encoded using the encoding method described referencing
The processing of the above noted step S601 and step S603 applies to the search key the encode processing in step S602, which is the encode processing shown in
Continuing, at step S604, the root node of the coupled-node tree that is the object of searches is set in the search start node, and next, at step S605, initial search processing is executed. This processing is the processing to use the encoded search key and search, from the search start node, the array holding the nodes of the coupled-node tree, and to obtain a reference pointer as the search result while at the same time storing in the search path stack 310 the array element numbers of the code string delimiter branch nodes and code string linked nodes traversed up to the end of the search. Details of the processing in step S605 are described hereinafter referencing
Next, proceeding to step S606, a longest prefix match search is executed to obtain the longest prefix matching key by means of the encoded search key and processing is terminated. This longest prefix match search processing is the processing to obtain the longest index key that prefix-matches the encoded search key from among the index keys corresponding to the code strings referenced by the reference pointer obtained as the search result of the initial search processing and the reference pointers stored in the code string terminus nodes that are pairs to the code string linked nodes whose array element numbers are stored in search path stack 310, in other words, it is the processing to obtain the longest prefix matching key. Details of the processing in step S606 are described hereinafter referencing
This encode processing is the processing executed in step S602 of
First, in step S701, the bit length of each code set in the code string (in the example shown in the above noted
Next, proceeding to step S702, the code position showing the position of the code to be processed next from among the codes in the code string is initialized. In one embodiment of this invention, in order to process the codes successively from the 0th code, the code position is initialized as “0”.
Then, in step S703, the storage position of the encoded code wherein is stored the encoded code of the encoded code string generated by this encode processing is set in the initial value.
Continuing, in step S704, a determination is made whether the code position is at the end of the code, in other words, whether the code pointed to by the code position is the code “*” that indicates the end of the end of the code string, and when it is not the code “*” that indicates the end of the end of the code string, processing proceeds to step S705 and when it is the code “*”, processing proceeds to step S709.
At step S705, the bit values in the code pointed to by the code position are extracted from the code string.
Then, at step S706a, the differentiating bit (in this example, “1”) that indicates the existence of a following code is set in the encoded code.
Next, at step S706b, the bit values of the code obtained at step S705 are appended to the end of the encoded code. Continuing, at step S707, the encoded code to which a bit value is appended at step S706b is stored in the position pointed to by the encoded code storage position in the encoded code string.
Then, at step S708a, the code position is advanced to the next code position, and at step S708b, the storage position of the encoded code is advanced to the next storage position for the encoded code, and processing returns to step S704. In the example shown in
When the determination at step S704 is that the code position at the end of the code string, processing proceeds to step S709, wherein the differentiating bit (in this example, “0”) that indicates the end of the code is stored in the position pointed to by the encoded code storage position for the encoded code string.
Then, at step S710 the encoded code storage position is set in the encoded bit length, and processing is terminated. By means of the above processing, an encoded code string encoded by the encoding method shown in
Also, as was noted above, the encode processing shown in
Although all the codes configuring a code string are encoded in a batch according to this preferred embodiment of the invention as shown in the example in
Next, an initial search in one embodiment of the present invention is described referencing
Encoded bit string “1001101111010” (hereinafter this may at times be called encoded search key 70) which is the encoded search key (ACE*) that encodes the search key “ACE*” is stored in the encoded search key 270.
The parts below node 211c in coupled-node tree 200 are omitted, and the search path for the initial search from root node 210a using the encoded search key 70 is shown by the bold boxes and bold arrows.
In the initial search, first the array element number 220 for the root node 210a is set as the search start node. The value of the discrimination bit position 230a in root node 210a is “0”, and because the bit value at bit position 0 in encoded search key 70 is a “1” a link is made to node 211b which is the node on the [1] side of node pair 201b. Also, because the value “0” in discrimination bit position 230a for root node 210a matches one of the bit positions 0, 4, 8, . . . wherein reside the differentiating bits of encoded bit string 70, in other words, because the root node is a code string delimiter branch node, the array element number 220 of root node 210a (parent node) and the array element number 220a+1 for node 211b on the [1] side, which, of the two child nodes of root node 210a, is the code string linked node, are stored in search path stack 310.
Next, because the value for discrimination bit position 231b is “2” and the bit value at bit position 2 in encoded search key 70 is “0”, a link is made to node 210c, which is the node on the [0] side of node pair 201c. Because the value of the discrimination bit position 231b in node 211b is “2” and that does not match one of the bit positions wherein reside differentiating bits of encoded bit string 70, the array element number of this node is not stored in search path stack 310.
Next, because the value at discrimination bit position 230c in node 210c is “4” and the bit value at bit position 4 in encoded search key 70 is “1”, a link is made to node 211d, which is the node on the [1] side of node pair 201d. Because the value “4” in discrimination bit position 230c for node 210c matches one of the bit positions wherein reside the differentiating bits of encoded bit string 70, node 210c is a code string delimiter branch node noted above. Thus the array element number 221b of node 210c (parent node) and the array element number 220c+1 for the node 211d that is on the [1] side for the two child nodes of node 210c are stored in search path stack 310.
Next because the value at discrimination bit position 231d in node 211d is “8” and the bit value at bit position 8 in encoded search key 70 is “1”, a link is made to node 211e, which is the node on the [1] side for node pair 201e. Because node 211d is a code string delimiter branch node, the array element number 220c+1 for node 211d (parent node) and the array element number 221d+1 for the node 211e that is on the [1] side for the two child nodes of node 211d are stored in search path stack 310.
The value for the node type 261e in node 211e is “1”, indicating that node 211e is a leaf node. At this point the initial search finishes by extracting the reference pointer 281e stored in reference pointer 251e.
As shown in the drawing, the code string “ABEAB*” is stored in the storage area pointed to by reference pointer 281e. The bit expression for the encoded code string that encodes code string “ABEAB*” is “1001101011011 . . . ”.
Storing, in search path stack 310, the array element numbers for the code string delimiter branch nodes (parent nodes) and the array element numbers for whichever of the child nodes of that branch node is a code string linked node in the initial search noted above, is done in order to find the code string terminus child nodes (the leaf nodes noted above) for the code string delimiter branch nodes traversed during the initial search and to read out the code strings pointed to by those reference pointers in the longest prefix match search that follows.
In the example of the initial search shown in
Also, instead of the array element numbers of code string linked nodes or code string terminus nodes, the code string terminus node itself, which is a leaf node, could also be stored, or the reference pointer, or the code string related to the leaf node could also be stored. In other words, it is sufficient to store information related to the parent node and information for accessing the code string related to the code string terminus child node.
Next the processing flow for an initial search is described.
Continuing, at step S802, the array element number of the search start node is set in the array element number. Because the processing executed in
Next, at step S803, the array element pointed to by the array element number is read out, as a node, from the array holding the nodes of the coupled-node tree. Then, at step S804, the node type information is extracted from the node read out at step S803, and at step S805, a determination is made whether that node is a branch node.
If the determination at step S805 is that the read-out node is a branch node (node type is “0”), processing proceeds to step S806, and information about the discrimination bit position is extracted from that node.
Then, at step S807, the bit value corresponding to the extracted discrimination bit position in the encoded search key is extracted, and at step S808, coupled node indicator information is extracted from that node.
Continuing, at step S811, a determination is made whether the discrimination bit position extracted at step S806 coincides with any of the positions wherein resides a differentiating bit in the encoded bit string. This determination, in accordance with the naming convention noted hereinabove, is the determination whether the node read out at step S803 is a code string delimiter branch node.
Also, as was noted above, the position of the differentiating bit depends on the encoding method. Although the position of the differentiating bit can be determined by computation and so forth in the case of a fixed length code, as shown in the example in the above noted
If the result of the determination in step S811 is that the discrimination bit position is a differentiating bit position, processing proceeds to step S812 in order to determine whether there is a following bit included in the encoded search key (a bit corresponding to a significant code), and a determination is made whether the bit value of the differentiating bit extracted at step S807 is a “1”.
If the bit value for the differentiating bit is “1”, that indicates that a bit having a value corresponding to a significant code exists in the bit position lower in the encoded search key than the discrimination bit position.
In this case, processing proceeds to step S813, and the array element number of the node read out at step S803 is stored in search path stack 310 as the array element number of the parent node.
Continuing, at step S814, the value computed by adding the value 1 to the coupled node indicator extracted at step S808 is set as the new array element number. Then, at step S815, the array element number obtained at step S814 is stored in search path stack 310 as the array element number of the child node, and, after incrementing the stack pointer by one, processing returns to step S803.
Also, the expression here of “incrementing by 1” is an expression arranged to match a description that illustrates an example wherein the search path stack 310 is divided into two columns, as shown in the example in
In other words, the storage place, in the search path stack 310 in this preferred embodiment of the invention, specified by a single value of the stack pointer, holds a set of two array element numbers consisting of the array element number of a code string delimiter branch node and the array element number of the code string linked node, which is one of the child nodes of that code string delimiter branch node.
Also, regarding the processing of step S815, instead of the array element number obtained at step S814, an implementation variation wherein the coupled node indicator extracted at step S808 can be stored in search path stack 310 as the array element number for the child node, in other words, as was noted hereinabove, the array element number for the code string terminus node can also be stored in search path stack 310 as the array element number for the child node.
Also other implementation variations are also possible, such as storing in the search path stack 310 the code string terminus node itself, or the reference pointer extracted from the code string terminus node, or the code string pointed to by the reference pointer.
Regardless, the processing of step S815 is the processing to store in the search path stack information for accessing the search target code string related to the code string terminus node.
Conversely, if the determination at step S811 is that the discrimination bit position is not the position of a differentiating bit, or if the determination at step S811 is that the discrimination bit position is the position of a differentiating bit but the determination at step S812 is that the value of the differentiating bit at the discrimination bit position is a “0”, in either case, processing proceeds to step S809, wherein the bit value extracted from the encoded search key at step S807 is added to the coupled node indicator extracted at step S808 and the result of that addition is set as a new array element number and processing returns to step S803.
Thereinafter, the processing loop of step S803 to step S815 is repeated until the determination at step S805 is that of a leaf node. In this processing loop, the array element number set at step S809 or at step S814 is used at step S803.
If the determination in step S805 is that the node readout at step S803 is not a branch node, in other words, if the determination is that of a leaf node (node type is a “1”), processing proceeds to step S810, wherein the reference pointer included in that leaf node is extracted and processing is terminated.
As described above, in accordance with an initial search in this preferred embodiment of the invention, a coupled-node tree is searched using an encoded search key until a leaf node is reached, the reference pointer stored in the leaf node is read out, and at the same time, the array element numbers of the code string delimiter branch nodes traversed in that search and the array element numbers of their code string linked child nodes are successively stored in search path stack 310.
Next a longest prefix match search related to one embodiment of this invention is described referencing
As shown in
The parts below node 211c in coupled-node tree 200 are omitted, just like in
In the longest prefix match search, first, the encoded bit length of the index key (ABEAB*) that encodes the search target code string “ABEAB*” and which is obtained in the initial search is compared with the encoded bit length of the encoded search key (ACE*). In the example noted above, the encoded bit length of the index key (ABEAB*) is 20, and the encoded bit length of the encoded search key (ACE*) is 12. Thus because the encoded bit length of the index key is longer than the encoded bit length of the encoded search key, the code string “ABEAB*” does not prefix-match the search key “ACE*”.
At this point, next, the array element number 221d+1 for the child node on the [1] side pointed to by stack pointer at the end of the initial search is extracted from search path stack 310, and from that array element number the child node on the [0] side, in other words, array element number 221d for the code string terminus child node 210e is obtained and node 210e is read out. Then the code string “AB*” is read out via the reference pointer from node 210e, and the (AB*) that encodes that code string is taken to be a new index key and the encoded bit length of that index key is compared with the encoded bit length of the encoded search key (ACE*).
When this is done, because the encoded bit length of the index key (AB*) is 8 and that is shorter than the encoded bit length 12 of the encoded search key (ACE*), thereinafter, by means of the relative position relationship between the difference bit positions between the index keys and the encoded search key and the discrimination bit positions of the parent nodes for the code string terminus child nodes related to those index keys, a code string terminus child node is identified and the code string pointed to by the reference pointer in the identified code string terminus child node is taken to be the longest prefix matching key.
In other words, the array element numbers of the parent nodes are successively read out from the search path stack and the discrimination bit positions are extracted from the code string delimiter branch node disposed in the array elements pointed to by those array element numbers. Then, if that discrimination bit position coincides with the above noted difference bit positions or has a higher position relationship, the code string pointed to by the reference pointer in the code string terminus child node for that code string delimiter branch node is taken to be the longest prefix matching key.
The discrimination bit position search shown by the arrows with bold lines in
Also, the determination of the longest prefix matching key shown by the arrows with bold lines in
In the example shown in
Next, why the longest prefix matching key obtained by the above noted method is the longest code string that prefix-matches the search key, of all the search target code strings, is described.
First, terms are defined for the description hereinbelow.
In the initial search, the code strings related to the code string terminus child nodes for the code string delimiter branch nodes whose array element numbers are stored in the search path stack as the array element number of a parent node are called code strings in the search path for the initial search. In the example shown in
Thus, as was noted above, the code strings in the search path for the initial search prefix-match the code strings related to the leaf nodes disposed at levels lower than the code string linked child nodes paired with those code string terminus child nodes related to those code strings. Also, the lengths of the code strings in the search path for the initial search are shorter than the lengths of the code strings related to the leaf nodes disposed at levels lower than the code string linked child nodes paired with those code string terminus child nodes related to those code strings.
If the search result key for the initial search prefix-matches the search key, the code strings in the search path for the initial search prefix-match the search key because they prefix-match the search result key but their lengths are equal to or less than the length of search result key. Then, by the special properties of the coupled-node tree related to this invention, no other code strings that prefix-match the search key, other than the code strings in the search path for the initial search, are stored in the coupled-node tree. Thus, if the search result key for the initial search prefix-matches the search key, that search result key is the longest prefix matching key.
Next, if the search result key for the initial search does not prefix-match the search key and a code string that prefix-matches the search key is stored in the coupled-node tree, then that code string is included among the code strings in the search path for the initial search. Thus, the longest code string of all the code strings in the search path that prefix-match the search key is the longest prefix matching key.
For that reason, the longest prefix matching key obtained by the above noted method is the longest code string that prefix-matches the search key, of all the search target code strings.
Next, the processing flow for a longest prefix match search based on the results of an initial search is described referencing
As shown in
Next, proceeding to step S902, encode processing is performed wherein the code string set at step S901 is encoded using the encoding method described using
Next, in step S903, the encoded code string generated at step S902 is set in the index key and the encoded bit length of the encoded code string obtained at step S902 is set in the encoded bit length of the index key. In the example shown in
The processing of the above noted step S901 and step S903, the same as for the processing in step S601 and step S603 in
Also, the code string set in the first-time processing of step S901 may at times be called the search result code string for the initial search. Also, the index key set in the first-time processing of step S902 and step S903 may at times be called the index key obtained in the initial search.
Next, in step S904, a determination is made whether the encoded bit length of the index key is equal to or less than the encoded bit length of the encoded search key. Here, the encoded bit length of the encoded search key is the one set at step S603 shown in
If the encoded bit length of the index key is not equal to or less than the encoded bit length of the encoded search key index key, in other words, if the number of codes in the search target code string before encoding is larger than the number of codes in the search key, that search target code string does not prefix-match the search key.
Whereat, when the determination at step S904 is negative, the processing of step S905 to step S909 is done and processing returns to step S901, and the successive access to the code strings in the search path for the initial search is repeated until the determination at step S904 is positive.
At step S905, the array element number for the child node pointed to by the stack pointer is read out from the search path stack, and at step S906, the stack pointer for the search path stack is decremented by one.
Next, at step S907, the array element number that is paired with the array element number for the child node read out above is obtained. Then, proceeding to step S908, the array element pointed to by the array element number obtained at step S907 is read out, as a node, from the array holding the nodes of the coupled-node tree.
Continuing, at step S909, the reference pointer is extracted from the node read out at step S908, and processing returns to step S901. In the second-time and thereafter processing of step S901, the reference pointer is the one extracted at step S909.
If, in the initial search, the array element number of a code string terminus node is stored in the search path stack as the array element number of a child node, the above noted step S907 is unnecessary, and at step S908, the array element pointed to by the array element number obtained at step S905 is then read out as a node.
Also, if, in the initial search, the code string terminus node is stored in the search path stack, in step S905, the code string terminus node pointed to by the stack pointer is read out from the search path stack, and step S907 and step S908 are skipped, and in step S909, the reference pointer is extracted from the code string terminus node read out at step S905 and processing then returns to step S901.
Furthermore, it is clear to one skilled in the art, from the above description, how the processing flow in
When the determination at step S904 becomes positive in the above noted processing loop of step S901 to step S909, processing moves to step S910 shown in
In the example shown in
As shown in
Then, at step S911, a determination is made whether the bit values of the encoded search key and the index key coincide within the range of the comparison bit length. This is equivalent to a determination whether the search key and the search result code string coincide with the range of the length of the search result code string. If the result of this determination is that the encoded search key and the index key coincide within the range of the comparison bit length, in other words, within the encoded bit length of the index key, processing proceeds to step S911a, and the code string encoded in that index key is set in the search result code string and processing is terminated. That search result code string is the code string that matches the search key the longest.
Conversely, when the result of the determination at step S911 is that the encoded search key and the index key do not coincide within the range of the comparison bit length, processing proceeds to step S912.
At step S912, a bit comparison is done between the encoded search key and the index key within the range of the comparison bit length and a difference bit string for the length of the comparison bit length is obtained. The difference bit string consisting of, for example, values for a bit at a position where the value in the encoded search key and the index key coincide is a “0” and the values for a bit at a position that does not coincide is a “1”, and this can be obtained, for example, by an exclusive OR operation between the encoded search key and the index key.
Continuing, at step S912a, the highest position in the difference bit string, in other words, the bit position of the first non-coinciding bit, seen from the 0th bit, is set in the difference bit position, and processing proceeds to the processing in step S913 and thereafter shown in
In the example shown in
As shown in the drawing, in step S913, the array element number is extracted from the search path stack, and the stack pointer is decremented by one. Then, at step S914, the array element pointed by the array element number is read out from the array as a node, and in step S915, the discrimination bit position is extracted from the node.
Next, in step S916, a determination is made whether the extracted discrimination bit position has a higher position relationship than the difference bit position set at step S912a. Then, if the discrimination bit position has a higher position relationship than the difference bit position, processing proceeds to step S916a, and if it does not, processing returns to step S912. In other words, when the discrimination bit position included in the node with the array element number extracted from search path stack 310 does not have a higher position than the difference bit position, a processing loop is executed to traverse the search path stack and extract array element numbers until a node whose discrimination bit position has a higher position relationship than the difference bit position is read out. This processing loop is equivalent to the difference bit position search shown in the example in
Because, in the example shown in
At step S916a, the previous status is returned by incrementing by 1 the stack pointer for the search path stack that has been decremented at step S913, and at step S917, the array element number of the child node pointed to by the stack pointer for the search path stack is read out.
Then, at step S918, the array element number of the node that is a pair with the array element number of that child node is obtained, and at step S919, the node pointed to by the array element number of the node comprising that pair is read out.
Then, at step S920, the reference pointer is extracted from that node, and at step S921, the code string pointed to by the reference pointer is read out from code string storage area 311 and is set in the search result code string.
In the example shown in
Also, if, in the initial search, the array element number of the code string terminus node is stored in the search path stack as the array element number of the child node, the processing of the above noted step S918 is unnecessary and at step S919, the array element pointed to by the array element number obtained at step S917 is read out as a node.
Also, if, in the initial search, the code string terminus node is stored in the search path stack, in step S917, the code string terminus node pointed to by the stack pointer is read out from the search path stack, and steps S918 and step S919 are skipped, and in step S920, the reference pointer is extracted from the code string terminus node read out at step S917. Furthermore, it is clear to one skilled in the art, from the above description, how the processing flow in
Next, we describe how a search result key can always be obtained by making the coupled-node tree also include a code string comprised only of the terminal code “*”, even for searches using any kind of a search key.
When an initial search is executed using an encoded search key that encodes any arbitrary search key and then a longest prefix match search is performed, after the processing shown in
Conversely, if the bit values for the bit strings within the range of the comparison bit length for the encoded search key and the index key do not coincide, as shown in
Now, from the fact that the coupled-node tree includes a code string consisting only of the terminal code “*”, the root node is a code string delimiter branch node, and its discrimination bit position is 0. Also, as long as the search key consists of significant codes, the above noted difference bit position is a position lower than 0. Thus, because the determination in step S916 of
If the coupled-node tree is made so that it does not include a code string consisting only of the terminal code “*”, for a longest prefix match search in that case it is sufficient to insert in the processing loop of
Hereinabove, details of a preferred embodiment related to a longest prefix match search in this invention were described. Hereinbelow, a concrete example of a longest prefix match search is described, referencing
The coupled-node tree in the concrete example described hereinbelow is the one shown in the example in
In search path stack 310 are stored array element numbers, the same as those shown in
As shown in
Next, as shown by the downward-pointing arrow, array element number 221b and array element number 220c+1 are stored in search path stack 310, followed by array element number 220c+1 and array element number 221d+1.
As shown by the arrows with dotted lines from each of these, the index key (A*) with the reference label 61c corresponds with array element number 220c+1, and when at step S905 shown in
In a bit expression it becomes “1001101011011001101010110” and its encoded bit length 52a is 24 bits.
When an initial search is executed with this encoded search key 51a using the coupled-node tree 200 shown in
Then, in the first-time processing of step S901 to step S903 in the longest prefix match search shown in
Continuing, in step S904, a magnitude comparison is made between the encoded bit length 62a of the index key and the encoded bit length of the encoded search key 52a, and because the encoded bit length 62a is equal to or smaller than the encoded bit length of the encoded search key 52a, the encoded bit length 62a of the index key is set in the comparison bit length 71a.
Then, as shown in
As was noted above, encoded search key 51b is (ACEABC*), which encodes the search key “ACEABC*”. In a bit expression it becomes “1001101111011001101010110” and its encoded bit length 52b is 24 bits.
As shown in
Because the value of the 0th bit, the 2nd bit, the 4th bit, the 8th bit in encoded search key 51b coincide with the values at those respective positions in encoded search key 51a, the result of the initial search is the same as the result for an initial search using encoded search key 51a. Thus, just as in the example shown in
In bit string comparison 1 (91b), the determination at step S911 is that the bit values in encoded search key 51a and index key 61a do not coincide within the range of comparison bit length 71b, and the bit position of the 7th bit is set in the difference bit position 72b by the processing of step S912 to step S912a.
Next, by means of the processing loop of steps S913 to S916 shown in
The bit string comparison 2 (92b) shows encoded search key 51b and the index key (AB*) related to the code string terminus child node for the code delimiter branch node 211d and shown with the reference label 61b. The bit expression for index key 61b is “100110100”, and its encoded bit length 62b is 8 bits.
The bit string comparison 2 (92b) depicts an arrow showing which of the bit positions in encoded search key 51b and index key 61b is the bit position corresponding to the difference bit position 72b and an arrow showing which of the bit positions in index key 61b has the value “8”, which is the bit position corresponding to discrimination bit position 81b.
In bit string comparison 2 (92b), it is determined that discrimination bit position 81b does not have a higher position relative to difference bit position 72b. Thus, as shown in the drawing, because, in the code string “AB*” (61b) in the search path for the initial search, the part that encodes significant codes located higher than the discrimination bit position 81b has a different value at difference bit position 72b than encoded search key 51b, the code string 61b does not prefix-match encoded search key 51b.
Then, the processing loop of steps S913 to S916 shown in
The bit string comparison 3 (93b) shows encoded search key 51b and the index key (A*) related to the code string terminus child node for the code delimiter branch node 210c and shown with the reference label 61c. The bit expression for index key 61c is “10010”, and its encoded bit length 62c is 4 bits.
The bit string comparison 3 (93b) depicts an arrow showing which of the bit positions in index key 61c has the value “4”, which is the bit position corresponding to discrimination bit position 81c, and an arrow showing that, in the index key 61c, the part that encodes significant codes located higher than discrimination bit position 81c prefix-matches encoded search key 51b.
In bit string comparison 3 (93b) a determination is made that discrimination bit position 81c has a higher position relationship than difference bit position 72b. Then, because the values in the bits in encoded search key 51b and index key 61c coincide at positions higher than difference bit position 72b, the part encoding significant codes located higher than discrimination bit position 81c in the code string “A*” (61c) in the search path for the initial search coincides with the part encoding significant codes located higher than discrimination bit position 81c in encoded search key 51b, and the index key 61c prefix-matches encoded search key 51b. Also, the index key 61c is the longest key among the keys that prefix-match encoded search key 51b and is the longest prefix matching key.
As was noted above, encoded search key 51c is (ACE*), which encodes the search key “ACE*”. Its bit expression is “1001101111010”, and its encoded bit length 52c is 12 bit.
As shown in
Because the value of the 0th bit, the 2nd bit, the 4th bit, the 8th bit in encoded search key 51c coincide with the values at those respective positions in encoded search key 51a and encoded search key 51b, the result of the initial search is the same as the result for an initial search using encoded search key 51a and encoded search key 51b. Thus, just as in the examples shown in
During bit string comparison 1 (91c), the determination at step S904 is that the encoded bit length 62a for index key 61a is longer than the encoded bit length 52c for encoded search key 51c.
Due to the determination at step S904, the processing of step S905 to step S909 is done and then once again the processing of step S901 to step S903 is done. As a result, the index key (AB*) related to the code string terminus child node 210e for the code delimiter branch node 211d with array element number 220c+1 that has been last stored by the stack pointer and its encoded bit length 62b are set, and bit string comparison 2 (92c) shown in
The bit string comparison 2 (92c) shows encoded search key 51c and the index key (AB*) related to the code string terminus child node for the code delimiter branch node 211d and shown with the reference label 61b. The bit expression for index key 61b is “100110100”, and its encoded bit length 62b is 8 bits.
In bit string comparison 2 (92c), first, at step S904, a determination is made that the encoded bit length 62b for the index key 61b is shorter than the encoded bit length 62a for the encoded search key 51c. Then, the encoded bit length 62b for the index key 61b is set in the comparison bit length 71c by the processing in step S910.
Also, the bit string comparison 2 (92c) depicts the encoded search key 51c, an arrow showing which of the bit positions in index key 61b is the bit position corresponding to the difference bit position 72c, and an arrow showing which of the bit positions in index key 61b has the value “8”, which is the bit position corresponding to discrimination bit position 81b.
Then, in bit string comparison 2 (92c), it is further determined that discrimination bit position 81b does not have a higher position relative to difference bit position 72c. Thus, as shown in the drawing, because, in the code string “AB*” in the search path for the initial search, the part that encodes significant codes located higher than the discrimination bit position 81b has a different value at difference bit position 72c than encoded search key 51c, “AB*” does not prefix-match encoded search key 51c.
Then, the processing loop of steps S913 to S916 shown in
As is clear from a comparison between the bit string comparison 3 (93c) shown in
Next, the processing to insert, in accordance with the specification of an insertion key, a leaf node into a coupled-node tree related to one preferred embodiment of this invention is described referencing
First, at step S1201, the pointer to the storage area wherein is stored the code string (insertion key) that is to be inserted in the coupled-node tree is obtained.
Continuing, in step S1202, a determination is made whether the array element number of the root node for the coupled-node tree has been registered. As was noted above, in one embodiment of this invention, the array element number of the root node for the coupled-node tree is registered in the management means for the coupled-node tree, and at this step S1202, a check is made whether the array element number of the root node has been registered. If the result is that it has been registered, processing proceeds to step S1203.
At step S1203, the insertion key stored in the storage area pointed to by the pointer obtained at step S1201 is set in the insertion code string, and next, in step S1203a, an encoded insertion key is generated from the insertion code string. The encode processing in step S1203a can be implemented by the processing flow shown in
Next, proceeding to step S1204, the array wherein the coupled-node tree is stored is searched from the root node using the encoded insertion key, and the processing is performed to insert a leaf node that includes a reference pointer pointing to the area wherein is stored the insertion key, and this insertion processing is terminated. Details of the processing in this step S1204 are described hereinbelow referencing
Conversely, if the determination at step S1202 is that a root node is not registered, the registration and generation of a completely new coupled-node tree begins. In other words, proceeding to step S1205, an empty node pair is obtained from the array, and the array element number of the array element that shall be the primary node of that node pair is acquired.
Next, in step S1206, an array element number computed by adding the value “0” to the array element number acquired at step S1205 is obtained. (Because, in this preferred embodiment of the invention, the computed array element number obtained in this step is identical to the array element number acquired at step S1205, step S1206 can be omitted).
Continuing, in step S1207, the root node is inserted by writing a “1”, indicating a leaf node, in the node type of the array element with the array element number obtained at step S1206 and writing, in its reference pointer, the above noted pointer pointing to the storage area wherein is stored the insertion key acquired at step S1201.
Then, at step S1208, the array element number obtained at step S1206 is registered in the management means for the coupled-node tree as the array element number of the root node and the processing of
Next the processing of the above noted step S1204, in other words, the processing to insert, into an already-existing coupled-node tree, a leaf node holding a reference pointer pointing to the storage area wherein is stored the insertion code string, is described referencing
First, in step S1301 of
Next, proceeding to step S1310a, the array wherein the coupled-node tree is stored is searched from the root node using the encoded insertion key, and a reference pointer is obtained. This processing is realized by the basic search processing shown in
Then, at step S1310b, the code string pointed to by the reference pointer obtained at step S1310a is read out from the code string storage area 311, and, at step S1310c, the read-out code string is encoded and an encoded bit string (index key) is generated. The encode processing in step S1310c can be realized by the processing flow shown in
Next, in step S1311, a determination is made whether the encoded insertion key coincides with the index key generated at step S1310c. If the encoded insertion key and the index key coincide, the insertion fails because a leaf node related to a search target code string corresponding to the insertion key already exists in the coupled-node tree, and processing is terminated.
When the encoded insertion key and the index key do not coincide, processing proceeds to step S1312 in
In this step S1312, an empty node pair is obtained from the array, and the array element number of the array element that shall be the primary node of that node pair is acquired.
Next, proceeding to step S1313, a magnitude comparison is made between the encoded insertion key and the index key generated at step S1310c, and when the encoded insertion key is larger, a Boolean value of “1” (true) is obtained, and when it is smaller, a Boolean value of “0” (false) is obtained.
Then proceeding to step S1314, the Boolean value obtained at step S1313 is added to the array element number of the array element obtained at step S1312, obtaining an array element number. As is noted hereinbelow, the array element number obtained at this step S1314 becomes the array element number of the array element wherein is stored the leaf node holding the reference pointer pointing to the storage area holding the insertion key.
Continuing to step S1315, the value that is a bit inversion of the Boolean value obtained at step S1313 (logical negation value for the Boolean value) is added to the array element number of the primary node obtained at step S1312, obtaining an array element number. This array element number becomes the array element number of the array element wherein is stored the node that is the other pair to the leaf node holding the reference pointer pointing to the storage area holding the insertion key.
In other words, as a result of a magnitude comparison between the encoded insertion key and the index key obtained as an encoding of the code string referenced by the reference pointer stored in the leaf node obtained in the search processing shown in
Next, processing proceeds to the processing of step S1316 and thereafter shown in
As shown in
Then, in step S1318, a determination is made whether the stack pointer for search path stack 310 points to the array element number of the root node. if it points to the array element number of the root node, processing proceeds to step S1324, and if it does not point to the array element number of the root node, processing proceeds to step S1319.
At step S1319, the stack pointer for search path stack 310 is decremented by 1 and the array element number stored therein is extracted. Next, proceeding to step S1320, the array element with the array element number extracted at step S1319 is read out from the array as a node. Next, proceeding to step S1321, the discrimination bit position is extracted from the node read out at step S1320.
Then, proceeding to step S1322, wherein a determination is made whether the discrimination bit position extracted at step S1321 has a higher position relationship than the bit position obtained at step S1317. If the result of the determination at step S1322 is “no”, processing returns to step S1318, and the processing loop of step S1318 to step S1322 is repeated until the result of the determination at step S1318 becomes “yes” or the result of the determination at step S1322 becomes “yes”. When the result of the determination at step S1322 becomes “yes”, at step S1323, the stack pointer for the search path stack is incremented by 1, and processing moves to the processing in step S1324 and thereafter.
This processing loop of step S1316 to step S1322 is the processing to check the relative position relationship between the bit position of the first differing bit in the difference bit string and the discrimination bit position in a branch node stored in the array element with the array element number stored in search path stack 310, and to decide the insertion position, in the coupled-node tree, for the node pair to be inserted, by successively traversing the search path stack in reverse until the discrimination bit position becomes the higher position.
In step S1324, the array element number pointed to by the stack pointer is extracted from search path stack 310. Then, in step S1325, a “1” (leaf node) is written in the node type of the array element pointed to by the array element number obtained at step S1314 and the pointer pointing to the storage area wherein the insertion key is stored is written into the reference pointer. In this way, the reference pointer pointing to the insertion code string is written into the leaf node.
Next, proceeding to step S1326, the array element with the array element number obtained from the array at step S1324 is read out. Continuing, in step S1327, the contents read out at step S1326 are written into the array element with the array element number obtained at step S1315.
Finally, in step S1328, a “0” (branch node) is written into the node type of the array element pointed to by the array element number obtained at step S1324, the bit position obtained at step S1317 is written into the discrimination bit position, the array element number obtained at step S1312 is written into the coupled node indicator, and processing is terminated.
In this way, by the processing in step S1324 and thereafter, data is set in each node and insertion processing is completed.
Next, referencing
First, at step S1401, the code string (deletion key) to be deleted from the coupled-node tree is set in the deletion code string. Next, at step S1402, the deletion code string is encoded and an encoded deletion key is generated. The encode processing in step S1402 can be implemented by the processing flow shown in
Next, in step S1403, the array element number of the root node is set in the array element number of the search start node, and at step S1404, the encoded deletion key is set in the encoded search key, and processing proceeds to step S1405. At step S1405, the array is searched from the search start node using the encoded search key, and a reference pointer is obtained. This processing is implemented using the basic search processing shown in
Next, proceeding to step S1406, the code string pointed to by the reference pointer obtained in step S1405 is read out from code string storage area 311. Then, at step S1407, an encoded code string (index key) is generated from the code string read out at step S1406. The encode processing in step S1407 can be implemented by the processing flow shown in
Then, at step S1408, the encoded deletion key set at step S1404 is compared with the index key generated at step S1407, and if they do not coincide the deletion fails because a leaf node related to a search target code string that corresponds to the deletion key does not exist in the coupled-node tree and processing is terminated. If they do coincide, processing proceeds to the processing of step S1412 in
When the result of that determination is “no”, there is only one array element number stored and that array element number is the one for the array element wherein the root node is stored. In this case, processing proceeds to step S1418, and the node pair related to the array element number of the root node set at step S1403 is deleted. Then, proceeding to step S1419, the array element number of the root node registered in the management means for the coupled-node tree is deleted and processing is terminated.
Conversely, when the determination in step S1412 is that 2 or more array element numbers are stored in search path stack 310, processing proceeds to step S1413, and the bit value obtained at step S507 in
Next, in step S1414, the contents of the array element with the array element number obtained at step S1413 are read out from the array, and in step S1415, the stack pointer for the search path stack is decremented by 1 and the array element number is extracted.
Next, proceeding to step S1416, the contents of the array element read out at step S1414 are written over the contents in the array element with the array element number obtained at step S1415. This processing is the processing to replace the branch node that is the link source for the leaf node holding the reference pointer pointing to the area wherein is stored the deletion key with the node that is the pair to the leaf node.
Finally, in step S1417, the node pair pointed to by the coupled node indicator obtained at step S508 in
As was described above, in this invention, the advantages of a coupled-node tree continue to be kept such that the range of existing nodes that are affected by the insertion processing and deletion processing noted above is minimal and the maintenance cost for inserting and deleting is low. Also these advantages can continue to be kept by using the above noted encoding method, and a high-speed longest prefix match search is enabled.
Hereinabove was described the processing flows for realizing a code string search method related to a preferred embodiment of this invention. It is clear that these processing flows can be placed in programs executed in a computer like the processing apparatus 301 exemplified in
As shown in
The initial search part 510 prepares the search result code string obtaining means 511 and the search path storage means 512. The longest prefix match search part 520 prepares the prefix match determination means 521, the first longest prefix matching key obtaining means 522, and the second longest prefix matching key obtaining means 523.
The functions of the initial search part 510 are implemented by step S605 in
Also, although, in the preferred embodiment described hereinabove, as shown in
It is also allowed that the search path stack 310 wherein is stored the array element numbers of code string delimiter branch nodes and the array element numbers of child nodes may be divided into an area wherein is stored the array element numbers of code string delimiter branch nodes and an area wherein is stored the array element numbers of child nodes, and in the storage processing a stack pointer for each may be operated on and storing done, and in the extraction processing the stack pointers may be synchronized and the extraction done. For example, in step S813 and S815 in
Also, although, in the preferred embodiment noted above, the leaf nodes in the coupled-node tree are made to include a search target code string or a reference pointer pointing to a storage area wherein is stored the search target code string and the search result code string is encoded in the bit string comparison with the encoded search key, it is also allowed to encode the search target code string from the very beginning, and to directly obtain the index key that is the encoded code string as the search result. Which of those methods are used should be decided by considering the storage capacity needed for the search target code string and the processing cost needed for the encoding during the search.
Number | Date | Country | Kind |
---|---|---|---|
2010-293635 | Dec 2010 | JP | national |
This application is a continuation of PCT/JP2011/079375 filed on Dec. 19, 2011. PCT/JP2011/079375 is based on and claims the benefit of priority of the prior Japanese Patent Application No. 2010-293635, filed on Dec. 28, 2010, the entire contents of which is incorporated herein by reference. The contents of PCT/JP2011/0079375 are incorporated herein by reference in their entity.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2011/079375 | Dec 2011 | US |
Child | 13926545 | US |