1. Field of the Invention
This invention relates to the technology for classifying bit-string keys to be classified and to the technology for distributing the classified keys to an output target.
2. Description of Related Art
In recent years, with advancements in information-based societies, large-scale databases have come to be used in various places. To search such large-scale databases, it is usual to search for a desired record, retrieving the desired record by using as indexes items within records associated with the addresses at which each record is stored. Character strings in full-text searches can also be treated as index keys.
Then, because the index keys can be expressed as bit strings, the searching of a database is reduced to searching for bit strings in the database.
Furthermore, the processing of a database, as recited in the patent document 1 and patent document 2 cited below, includes merge sorting of the records in the database. This merge sort is also reduced to a merge sort of bit strings.
A basic merge sort method consists of dividing the data into pairs of 2, ordering the pair, and then combining the ordered pairs. In other words, the process is divided into an initial stage of repeatedly dividing the data to be sorted and sorting them, thus obtaining several groups of sorted data, and a later stage of repeatedly merging the sorted data, thus sorting completely the data to be sorted.
Patent document 2 discloses the processing shown in
The latter stage processing of a merge sort assumes the existence of the above described block 1 to block N. Thus the data must be classified into N blocks in order to execute the merge sort shown in
Then, as shown in
Meanwhile, as shown in
On the contrary, the patent document 3 below discloses an example of storing a coupled-node tree in an array as a data configuration used in a search for bit-string data. Storing the coupled-node tree in an array allows the node positions to be expressed as an array element numbers and enables the amount of information needed to express the position of primary nodes to be reduced.
Also, the patent document 4 below discloses methods for searching for the smallest value or the largest value in any arbitrary subtree in a coupled-node tree and methods for extracting index keys in ascending or descending sequence from any arbitrary subtree in a coupled-node tree. Hereinbelow, a coupled-node tree is described, referencing
The array element having the array element number 20 has stored therein a node [0] 112, which is the primary node of the node pair 111. Then the node [1] 113 forming a pair with the primary node is stored into the next, adjacent, array element (array element number 20+1). Node [0] 112, like node 101, is a branch node. The value 0 is stored in the node type 114 of the node [0] 112, the value 3 is stored in the discrimination bit position 115, and the value 30 is stored in the coupled node indicator 116. Also node 1 [113] consists of the node type 117 and the reference pointer 118a. The value 1 is stored in the node type 117, thereby indicating that the node 1 [113] is a leaf node. A pointer referencing the index key storage area is stored in reference pointer 118a. Hereinbelow, the data stored in the reference pointer is also called the reference pointer in order to abbreviate the notation.
The contents of the node pair 121 consisting of nodes 122 and 123 stored in the array elements with the array element number 30 and 31 is omitted. Also primary nodes are indicated as the node [0], and nodes that are paired therewith are indicated as the node [1]. Also the node stored in an array element with some array element number is called the node of that array element number and the array element number stored in the array element of that node is also called the array element number of the node. Furthermore, in order to show the relationship between a given leaf node and the index key stored in the storage area shown by the reference pointer in that leaf node we may say the index key associated with the leaf node and we may say the leaf node associated with the index key.
The 0 or 1 prefixed to the array elements of node [0] 112, node [1] 113, node 122, and node 123, respectively, shows to which node in a node pair a link is to be made if a search is performed with a search key. The bit value, 0 or 1, in the search key at the discrimination bit position of the previous stage branch node is added to the coupled node indicator and linking is done to the node with that array element number. Thus, by adding the bit value at the discrimination bit position in the search key to the coupled node indicator of the previous stage branch node, the array element number of the array element holding the node that is the link target can be obtained.
The 0 or 1 code that is appended before each node is the same as the codes that are appended before the array element numbers described in
In the example described, the node type 460a of the root node 410a is 0, thereby indicating that this is a branch node, and the discrimination bit position 430a indicates 0. The coupled node indicator is 420a, which is the array element number of the array element in which the primary node 410b of the node pair 401b is stored.
The node pair 401b is formed by the node 410b and the node 411b, the node types 460b and 461b thereof both being 0, indicating branch nodes. The discrimination bit position 430b of the node 410b has 1 stored therein, and in the coupled node indicator of the link target is stored the array element number 420b of the array element in which is stored the primary node 410c of the node pair 401c.
Because 1 is stored in the node type 460c of the node 410c, this node is a leaf node, and thus includes the reference pointer 450c. In the reference pointer 450c is stored a pointer that references the storage area wherein is stored the index key 270c. The data stored in the reference pointer 450c is also called the reference pointer and is shown by the reference code 480c. The same applies to the other leaf nodes: the same word, reference pointer, is used to refer both to the reference pointer and to the data stored in the reference pointer. The value “000111” is stored as an index key in the area pointed to by the reference pointer 480c to the index key storage area 311 shown in
The node type 461c of the node 411c is 0, the discrimination bit position 431c of the node 411c is 2, and in the coupled node indicator is stored the array element number 421c of an array element in which is stored the primary node 410d of the node pair 401d.
The node type 460d of the node 410d is 0, the discrimination bit position 430d of the node 410d is 5, and in the coupled node indicator is stored the array element number 420d of an array element in which is stored the primary node 410e of the node 401e. The node type 461d of the node 411d that is paired with the node 410d is 1, and “011010” is stored in the index key 271d, which is stored in the storage area shown by the reference pointer 481d.
The node types 460e and 461e of the nodes 410e and 411e of the node pair 401e are both 1, indicating that both are leaf nodes. In the reference pointers 450e and 451e of the nodes 410e and 411e are stored the reference pointers 480e and 481e which point to the storage areas wherein are stored the index key 270e with the value “010010” and the index key 271e with the value 010011, respectively.
The discrimination bit position 431b of the node 411b, which is the other node of the node pair 401b, has a 2 stored therein, and the array element number 421b of the array element in which is stored the primary node 410f of the node pair 401f is stored in the coupled node indicator of the link target.
The node types 460f and 461f of the nodes 410f and 411f of the node pair 401f are both 0, indicating that both are branch nodes. In the discrimination bit positions 430f and 431f of each are stored a 5 and a 3, respectively. The array element number 420f of the array element in which is stored the primary node 410g of the node pair 401g is stored in the coupled node indicator of the node 410f, and the array element number 421f of an array element in which is stored the node [0] 410h, which is the primary node of the node pair 401h, is stored in the coupled node indicator of the node 411f.
The node types 460g and 461g of the nodes 410g and 411g of the node pair 401g are both 1, indicating that both are leaf nodes. In the reference pointers 450g and 451g of the nodes 410g and 411g are stored the reference pointers 480g and 481g which point to the storage areas wherein are stored the index key 270g with the value “100010” and the index key 271g with the value “100011”, respectively.
In the same manner, the node types 460h and 461h of the node [0] 410h of the node pair 401h, and the node [1] 411h, which is paired therewith, are both 1, indicating that both are leaf nodes. In the reference pointers 450h and 451h of the nodes 410h and 411h are stored the reference pointers 480h and 481h which point to the storage areas wherein are stored the index key 270g with the value “101011” and the index key 271g with the value “101100”, respectively.
The processing flow in searching for the index key “100010” from the above-noted tree 400 is briefly described below. The discrimination bit positions are numbered 0, 1, 2, . . . and so on from the left. First, processing is started from the root node 410a using the bit string “100010” as the search key. Because the discrimination bit position 430a of the root node 410a is 0, examining the bit value of the discrimination bit position 0 reveals 1. This being the case, 1 is added to the array element number 420a stored in the coupled node indicator and linking is done to the node 411b stored in the resulting array element number. Because 2 is stored in the discrimination bit position 431b of the node 411b, examination of the bit value of the discrimination bit position 2 reveals 0, resulting in linking to the node 410f stored in the array element having the array element number 421b stored in the coupled node indicator.
Because 5 is stored in the discrimination bit position 430f of the node 410f, and because examination of the bit value of the discrimination bit position 5 of the search key “100010” reveals 0, linking is done to the node 410g stored in the array element having the array element number 420f stored in the coupled node indicator.
Because the node type 460g of the node 410g is 1, indicating a leaf node, the storage area shown by the reference pointer 480g is referenced and the index key 270g stored therein is read out and a comparison is performed with the search key. Both the index key 270g and the search key are “100010”, thus coinciding. In this way, searching is performed using the coupled node tree.
A search using the coupled-node tree 400 described above can be seen as the processing to classify 8 index keys into a group corresponding to 8 leaf nodes respectively. In other words, patent document 3 can also be thought to disclose an index key classification method wherein there is 1 index key in a block. However, the classification method for index keys when there are a plurality of index keys in a block, which presumes the latter stage processing of the above noted merge sort, is not disclosed.
Thus, the problem to be solved by this invention is, when data consisting of a bit string (Hereinbelow this may be called a bit-string key or simply a key. It also may be called an index key.) is to be classified into a plurality of blocks, to provide a classification method such that the ranges of the key values do not overlap and a distribution method for outputting the classified bit-string keys to an output target, applying the art of a coupled-node tree.
In accordance with this invention, a classification method that classifies bit-string keys into N blocks successively selects, as a classification key, keys from a key storage means that holds the keys to be classified, and generates, by means of the classification key, a classification tree that is an application of the art of a coupled-node tree, and associates its leaf nodes with the keys to be classified each into N blocks. The number of levels in the classification tree is restricted as a function of the number N of the blocks.
In accordance with one embodiment of this invention, the leaf nodes in the classification tree include key access information used to obtain position information for classification keys stored in a key storage means. Then a key position search table for obtaining the key position information is generated using the key access information. The number of levels n in the classification tree is limited to the value for which 2 to the power of (n−1) is equal to the number of blocks N or is the smallest value larger than N. For example if N=8, then n=4.
In accordance with one embodiment of this invention, a leaf node is successively extracted from the classification tree generated from all the keys to be classified, and using the key access information read out from the leaf node, position information for classified keys in the block associated with the leaf node is obtained from the key position search tables, and the classified keys are read out from the key storage means and output to the output target.
Due to a special characteristic of a coupled-node tree, the higher level bit values in a key up to the discrimination bit position in the branch node immediately above a given leaf node coincide with those for any key classified in the same block, and any key with a bit value differing up to the discrimination bit position in the branch node immediately above that leaf node is classified in a different block. Thus, in accordance with this invention, keys can be effectively classified in such a way that the range of the values of the keys to be classified does not overlap, by generating a classification tree that has the structure of a coupled-node tree.
In the example shown in
The discrimination bit position 142a used in the classification at classification by discrimination bit position 141a, which is the first level in classification by discrimination bit position (for 2 levels of classification) 140, is 0. Those of the index keys whose value in bit value 143a at bit position 0 is 0 are further classified by classification by discrimination bit position 141b at the second level, as shown by the dotted-line arrow group 150a. Conversely, those of the index keys whose value in bit value 143a at bit position 0 is 1 are further classified by classification by discrimination bit position 141c at the second level, as shown by the dotted-line arrow group 151a.
The discrimination bit position 142b used in the classification at classification by discrimination bit position 141b at the second level is 2. Of the index keys that are to be classified by classification by discrimination bit position 141b at the second level, the index key “0001” wherein the value in bit value 143b at bit position 2 is 0 is stored in classified key array 130a as classified key 131a, as shown by the dotted-line arrow 150b. Also of the index keys that are to be classified by classification by discrimination bit position 141b at the second level, the index key “0011” wherein the value in bit value 143b at bit position 2 is 1 is stored in classified key array 130b as classified key 131b, as shown by the dotted-line arrow 151b.
Conversely, the discrimination bit position 142c used in the classification at classification by discrimination bit position 141c at the second level is 1. Of the index keys that are to be classified by classification by discrimination bit position 141c at the second level, the index key “1010” wherein the value in bit value 143c at bit position 1 is 0 is stored in classified key array 130c as classified key 131c, as shown by the dotted-line arrow 151c. Also, of the index keys that are to be classified by classification by discrimination bit position 141c at the second level, the index keys “1111” and “1110” wherein the value in bit value 143c at bit position 2 is 1 are stored in classified key array 130d as classified keys 131d and 131e respectively, as shown by the dotted-line arrow group 151d.
Also, the apparatus that is the target of distribution is also not limited to being connected via a network, and, for example, can be made to be the central processing unit in a multiprocessing system.
Classification in accordance with this preferred embodiment is implemented by a data processing apparatus 301 having at least a central processing unit 302 and a cache memory 303, and using a data storage apparatus 308. Also the distribution of sorted keys by data processing apparatus 301 to the bit-string key sort apparatuses 340a, 340b, . . . 340m can also be implemented using data storage apparatus 308. The data storage apparatus 308, which has an array 309 wherein is stored the classification tree, a search path stack 310, into which are stored array element numbers of array elements holding nodes which are traversed during the search on the classification tree, and an index key management area 320 which holds the data for searching the position information of keys classified into each of the blocks, can be implemented by a main memory 305 or a storage device 306.
The bit-string key sort apparatuses 340a, 340b, . . . 340m sort in parallel the classified keys that are distributed to them. The exemplary configuration of the bit-string key sort apparatus 340a is shown as an example of their configurations. As shown in
In the example described in
Also, although it is not particularly illustrated, a temporary memory area in main memory 305 can of course be used to enable various values obtained during processing or initial values and so on to be used in subsequent processing, depending on the processing to be done. The same also applies to the configuration of the data storage devices and the connection method for the various devices in the bit-string key sort apparatuses 340a, 340b, . . . 340m.
The classification tree 200 in the example shown in
In the index key management area 320 shown in
Reference code 210a indicates the root node of the classification tree 200 shown in the example in
The example in the drawing shows that the node type 260a for root node 210a is a 0, indicating a branch node and that the discrimination bit position 230a is a 0. The coupled node indicator is 220a, and that is the array element number of the array element holding the primary node 210b of the node pair 201b.
Node pair 201b is configured from node 210b and 211b, and both of their node types 260b and 261b are 0, indicating branch nodes. In the discrimination bit position 230b of node 210b is stored a 1, and in the coupled node indicator for the link target is stored the array element number 220b of the array element holding the primary node 210c of the node pair 201c.
Because a 1 is stored in the node type 260c of node 210c, this node is a leaf node. The leaf node in the classification tree includes the classification reference pointer 250c. In the classification reference pointer 250c is stored a pointer that points to a key classification table entry included in the index key management area 320. The data stored in the classification reference pointer 250c is also called a classification reference pointer and it is indicated with the reference code 280c. In the same way for the other leaf nodes, both the classification reference pointer and the data in the classification reference pointer are expressed by the same words classification reference pointer.
Because a 1 is also stored in the node type 261c of node 211c, which is the other node of the node pair 201c, this node is also a leaf node. In the classification reference pointer 251c for node 211c is stored the classification reference pointer 280e.
In the discrimination bit position 231b for node 211b, which is the other node of the node pair 201b, is stored a 2, and in the coupled node indicator for the link target is stored the array element number 221b of the array element wherein is disposed node [0] 210f which is the primary node of the node pair 201f.
The node types 260f and 261f of node [0] 210f, which is the primary node of the node pair 201f, and of node [1] 211f, which is its pair, are both 1, indicating that both are leaf nodes. In the classification reference pointers 250f and 251f for nodes 210f and 211f are stored classification reference pointers 280g and 280h respectively.
As shown in
Also, the index key management area 320 includes the key classification table 321 and the key link table 322. The storage status of each of the data in index key management area 320 shown in
The key classification table 321 has 4 entries associated with the number of blocks into which the index keys are classified. The starting address of each entry is the classification reference pointers 280c, 280e, 280g, and 280h in the 4 leaf nodes 210c, 211c, 210f, and 211f in the classification tree 200 and each is indicated by that address.
Also, in the example shown in the drawing, the key classification table 321 includes in each entry a smallest value key 312a, a largest value key 312b, a key output target 312e, a head link 312c, and a tail link 312d. In the example shown in the drawing, both the head link 312c and the tail link 312d in the entry pointed to by the classification reference pointer 280c hold the same key management pointer 370c pointing to an entry in the key link table 322. In the head link 312c and the tail link 312d in the entry pointed to by classification reference pointer 280e are stored the key management pointers 370e and 371d, respectively. In the head link 312c and the tail link 312d in the entry pointed to by classification reference pointer 280g are stored the key management pointers 370g and 371g, respectively. In the head link 312c and the tail link 312d in the entry pointed to by classification reference pointer 280h are stored the key management pointers 370h and 371h, respectively.
Notation of the values stored in the smallest value key 312a, the largest value key 312b, and the key output target 312e is omitted. Although the values in the smallest value key 312a and the largest value key 312b are written and updated as the classification tree 200 is being generated, the key output target 312e is written after the generation of the classification tree is completed. Also the key output target can be set to be associated with the blocks classifying the keys to be classified in another table than the key classification table.
The key link table 322 is a table wherein is written the link relationship between index keys that enables the index keys associated with the same leaf node to be traversed successively, and it has entries corresponding to the number of index keys to be classified. For example, if the index key storage area is that shown in
In the example shown in the drawing, the key link table 322 includes a key reference pointer 313a and a link 313b in each entry. Key reference pointer 313a points to the index key storage area associated with the key management pointer pointed to by that entry. Thus, if the index storage area is that shown in
The key management pointer associated with another index key classified into the same block as the index key associated with current key management pointer is stored in link 313b in the key link table 322 entry pointed to by the current key management pointer. In the example shown in
In the same way, the index key 270g associated with the key management pointer 370g and the index key 271g associated with key management pointer 371g are classified into the block associated with the classification reference pointer 280g. The key management pointer 371g is stored in link 313b of the key link table 322 entry pointed to by the key management pointer 370g associated with the index key 270g, which is the head link 312c of that block. The key management pointer 371g, which is associated with index key 271g, is stored in the tail link in key classification table 321, and because index key 271g is the tail index key in the block, nothing is stored in link 313b pointed to by key management pointer 371g.
The index key 270e associated with the key management pointer 370e, and the index key 271e associated with key management pointer 371e, and the index key 271d associated with the key management pointer 371d are classified into the block associated with the classification reference pointer 280e. The key management pointer 371e is stored in link 313b of the key link table 322 entry pointed to by the key management pointer 370e associated with the index key 270e, which is the head link 312c of that block. The key management pointer 371d is stored in link 313b of the key link table 322 entry pointed to by the key management pointer 371e. Then key management pointer 371d, which is associated with index key 271d, is stored in the tail link in key classification table 321, and because index key 271d is the tail index key in the block, nothing is stored in link 313b of the key link table 322 entry pointed to by key management pointer 371d.
Also, the index key 270c associated with key management pointer 370c is classified into the block associated with the classification reference pointer 280c. Because the key management pointer 370c associated with the index key 270c is stored in both the head link 312c and tail link 312d in the key classification table 321 entry pointed to by the classification reference pointer 280c, the index keys classified into the block associated with classification reference pointer 280c consist of only index key 270c. Thus, nothing is stored in link 313b of the key link table 322 entry pointed to by the key management pointer 370c.
Also, in the description hereinbelow, associating index keys with a leaf node is sometimes said to be linking index keys into the key link table for the leaf node. Also the leaf node for which index keys have been associated may at times be said to the leaf node linking those index keys.
The index keys in a block associated with a classification reference pointer in the above noted key classification table 321 can all be extracted in the following way. First, the head link 312c pointed to by the classification reference pointer is made to be the key management pointer for key link table 322 and the key reference pointer 313a is read out, and the index key pointed to by the key reference pointer 313a is extracted from the index key storage area. Next the operation of making the link 313b to be the key management pointer for key link table 322, and reading out the key reference pointer 313a, and extracting the index key pointed to by the key reference pointer 313a from the index key storage area is repeated until link 313b coincides with the tail link 312d in the key classification table 321.
Next, an overview of the processing to generate classification tree 200 is described, referencing
As shown by the dotted-line arrow 290d in
Also the count value for the search path counter is depicted. The count value in the search path counter is used, when a classification key is to be inserted in the classification tree, to enable the determination whether the restriction on the number of levels in the classification tree is satisfied and whether the node pair that includes the leaf node that links to the classification key can be inserted into the classification tree. The count value in the search path counter is counted up, for example, from an initial value 0, and, as shown in
In the example in
Because the index key 271d is larger than the classification key 271e, the leaf node linking to the classification key 271d becomes node 211d, which is the node [1] in node pair 201d. As shown in
The node 210d, which is the node [0] in node pair 201d, has become a leaf node linking to classification key 271e. The classification reference pointer 280e is stored in its classification reference pointer 250d. Then, as shown by the dotted-line arrow 290e, index key 271e “010011” stored in index key storage area 311 is associated with leaf node 210d by means of the classification reference pointer 280e via the index key management area 320. In other words, classification key 271e “010011” is classified into leaf node 210d.
In addition to the classification tree 200b,
In addition to the classification tree 200c,
As shown in
In addition to the classification tree 200d,
Because the classification key 270c is smaller than the index key 270e, the leaf node linking to classification key 270c becomes node 210c which is the node [0] in the node pair 201c. The classification reference pointer 280c is stored in its classification reference pointer 250c. Then, as shown by the dotted-line arrow 290c, index key 270c “000111” stored in index key storage area 311 is associated with leaf node 210c by means of the classification reference pointer 280c via the index key management area 320. In other words, classification key 270c “000111” is classified into leaf node 210c.
Also, as shown in
Next, referencing
As shown in
Next in step S701a, the top key storage position in the index key storage area is set in the key reference pointer for the index key storage area. Here the key reference pointer for the index key storage area is one of the temporary memory areas not especially illustrated but used to enable various values obtained during processing to be used in subsequent processing, noted above.
Next in step S702, a determination is made whether all the keys to be classified have been processed, and if they are finished, processing proceeds via step S706 to step S711 and thereafter shown in
At step S703, the key pointed to by the key reference pointer is read out from the index key storage area and is set in the classification key. Then in step S704, the classification tree is generated using the classification key. Details of the classification tree generation processing in step S704 is explained later referencing
Next in step S705, the storage position of the next key stored in the index key storage area is set in the key reference pointer for the index key storage area and a return is made to step S702. The processing loop of step S702 to step S705 is repeated until the determination at step S702 is that all of the keys have been processed, and when the determination at step S702 is that all of the keys have been processed, classification processing is terminated and in step S706, the output target for the keys is set in the key output target in the key classification table, and processing proceeds to the distribution processing in step S711 and thereafter shown in
As shown in
Next, in step S714, the array is searched from the search start node, and the leaf node with the smallest value of the index keys is obtained, and the index keys linked to the key link table for that leaf node are successively extracted. Details of the processing in step S714 are described later referencing
Next, at step S718, an array element number is extracted from the search path stack, the stack pointer for the search path stack is decremented by 1, and processing proceeds to step S719. At step S719, a determination is made whether the array element number extracted at step S718 is the termination number. If the result of the determination is that it is the termination number, processing is terminated, and if it is not the termination number, processing proceeds to step S721.
At step S721, the node position (node [0] or node [1]) for whichever of the array elements of the node pair wherein is stored the node for that array element number is obtained from the array element number extracted at step S718. For example because a node [0] would be stored in an array element in the array with an even array element number, the node position can be obtained from the array element number.
Then, at step S722, a determination is made whether the node position obtained at step S721 is a node [1]. If the determination in step S722 is that it is the node [1], a return is made to step S718.
When the determination in step S722 is that it is a node [0], processing proceeds to step S723, wherein a 1 is added to the array element number, and the array element number of the node [1] that is a pair to that node is obtained. Then, at step S724, the array element number for the node [1] obtained at step S723 is set in the array element number for the search start node, and a return is made to step S714.
The processing loop of the above noted steps S714 to S724 is repeated while decrementing the stack pointer for the search path stack by 1 at step S718, until the determination at step S719 is that the array element number extracted from the search path stack is the termination number. If the array element number extracted from the search path stack is the termination number, processing is terminated because processing is completed for all the leaf nodes in the classification tree.
The processing to extract the classification keys stored in the classification tree shown in the above noted
Next, details of the processing in step S704 shown in
As shown in
At step S802, the classification reference pointer and the key management pointer for the index key management area are obtained. The classification reference pointer and key management pointer obtained here are set for the index key management area 320 in data storage apparatus 308 and are obtained by determining the addresses of the entries to be used first in the key classification table and key link table.
Next, in step S803, the key management pointer is written in the head link and tail link of the key classification table entry pointed to by the classification reference pointer and the classification key is written in the smallest value key and largest value key, and in step S804, the key reference pointer used to read out the key at step S703 in
Next, in step S805, an empty node pair is obtained from the array, and the array element number of the array element that is intended to be the primary node in that node pair is obtained, and in step S806, a 0 is added to the array element number obtained at step S805 and an array element number is obtained. (This number is actually the same as the array element number obtained at step S805.)
Furthermore, in step S807, in the array element with the array element number obtained at step S806, a 1 (leaf node) is written in the node type of the root node that is to be generated and the classification reference pointer obtained at step S802 is written in the classification reference pointer, and at step S808, the array element number of the root node obtained at step S805 is registered and processing is terminated.
When the above noted classification key first to be processed is made to be key 271d, the leaf node 210b shown in
Also, although the entry pointed to by classification reference pointer 280d in the key classification table shown in
When the determination in step S801 is that the array element number of the root node has been registered, processing proceeds to step S809, wherein an updated classification tree is generated by inserting the classification key in the classification tree for which the root node has already been registered, and processing is terminated. Details of the processing in step S809 is described next referencing
As shown in
At step S903, the array element pointed to by the array element number is read out from the array as a node, and at step S904, the node type is extracted from the node, and at step S905, a determination is made whether the node type is a branch node.
If, in the determination in step S905, the determination is that the read-out node is a branch node, processing proceeds to step S906, wherein the discrimination bit position is extracted from the node, and furthermore, at step S907, the bit value corresponding to the extracted discrimination bit position is extracted from the classification key. Then, in step S908, the coupled node indicator is extracted from the node. Also in step S909, the bit value extracted from the classification key is added to the coupled node indicator, an updated array element number is obtained, and a return is made to step S902.
Thereafter, the processing loop of step S903 to step S909 is repeated until the determination in step S905 is that the node is a leaf node and processing proceeds to step S910. At step S910, the classification reference pointer is extracted from the leaf node, and processing proceeds to step S911 shown in
As shown in
If the classification key is not smaller than the smallest value key, in step S916, a further determination is made whether the classification key is larger than the largest value key. If the determination in step S916 is that the classification key is not larger than the largest value key, in other words, the value of the classification key is determined to be between the values of the smallest value key and the largest value key, then at step S917, the classification key is linked to the key link table for the leaf node, and insertion processing is terminated. Details of the processing in step S917 are described later referencing
Conversely, when the determination in step S916 is that the classification key is larger than the largest value key, processing proceeds to step S918 wherein the largest value is set in the index key and processing proceeds to step S920.
Also, when the determination in step S915 noted above is that classification key is smaller than the smallest value key, in step S919, the smallest value is set in the index key and processing proceeds to step S920.
In step S920, a bit string comparison is performed, for example with an exclusive OR, between the classification key and the index key set at step S918 or step S919, and a difference bit string is obtained. Next, in step S921, the bit position (difference bit position) of the first differing bit seen from the highest 0th bit is obtained from the difference bit string obtained at step S920. This processing can be done, for example, by inputting the difference bit string to a CPU that has a priority encoder and thus obtaining the differing bit position. The bit position of the first differing bit can also be obtained by having software perform the same kind of processing as a priority encoder.
Next, in step S922a, the array element number of the root node is set in the array element number for the insertion position, and processing proceeds to step S922b shown in
As shown in
At step S923, the count value in the search path counter is incremented by 1. Next, in step S924a, the array element pointed to by the array element number of the insertion position is read out from the array as a node, and in step S924b, the node type is extracted from the node, and processing proceeds to step S925.
At step S925, a determination is made whether the node type extracted at step S924b is a branch node. If the node type does not indicate a branch node, in other words, it indicates a leaf node, then processing proceeds to step S932.
Conversely, if the node type indicates a branch node, processing proceeds to step S926, wherein the discrimination bit position is extracted from the node, and in step S927, a determination is made whether the discrimination bit position has a positional relationship higher than the difference bit position obtained at step S921. If the discrimination bit position is not higher than the difference bit position, processing proceeds to step S935.
Conversely, if the discrimination bit position is higher than the difference bit position, processing proceeds to step S928a. At step S928a, the bit value pointed to by the discrimination bit position is extracted from the classification key, and at step S928b the coupled node indicator is extracted from the node. Then, in step S929, the value obtained by adding the value obtained at step S928a to that coupled node indicator is set in the array element number of the insertion position, and a return is made to step S923.
The processing loop of the above noted step S923 to step S929 repeats a search from the root node until the determination at step S925 is that the node type is a leaf node or the determination at step S927 is that the discrimination bit position is not higher than the difference bit position. The array element number of the insertion position set at step S929 immediately before the processing loop is escaped at step S925 or step S927 or, if the root node is a leaf node, the array element number of the insertion position set at step S922a indicates the insertion position for a node pair that includes the leaf node linking to the classification key.
The processing loop of the above noted step S923 to step S929 traverses the branch nodes in the search path and, the same as for the search processing shown in
When the determination at the above noted step S925 is that the node type extracted at step S924b indicates a leaf node, processing proceeds to step S932, wherein a determination is made whether the search path counter shows the largest value, which is the value of the maximum number of levels in the classification tree.
When the determination is that the count value in the search path counter is not the largest value, processing proceeds to step S933, wherein the classification key is inserted at the insertion position, and insertion processing is terminated. One example of the status wherein the node at the insertion position is a leaf node and the count value in the search path counter is not the largest value is the status described above referencing
Conversely, when the determination at step S932 is that the count value in the search path counter is the largest value, processing proceeds to step S934, wherein the classification key is linked to the key link table for the leaf node, and insertion processing is terminated. One example of the status wherein the count value in the search path counter is the largest value is the status described above referencing
Also, because the upper level bit values up to the discrimination bit position in the branch node immediately above the leaf node are identical for any key classified into the same block and a key with a bit value differing up to the discrimination bit position in the branch node immediately above the leaf node is classified into a different block, even if the classification key is linked to the leaf node in step S934, the range of the classification key values classified into each block does not overlap. Details of the processing in step S934 is described later referencing
When the determination at step S927 noted above is that the discrimination bit position does not have a higher positional relationship than the difference bit position obtained at step S921, processing proceeds to step S935.
At step S935, the processing is performed to guarantee that the restriction on the number of levels in the classification tree is not exceeded when the classification key is inserted at the insertion position. In other words, when the classification key is inserted at the insertion position, a check is performed whether the number of levels of leaf nodes below the insertion position exceeds the largest value, and if there is a leaf node exceeding the restriction, the parent node of the node pair including that leaf node is made into a leaf node, and the keys linked to the key link table for the leaf nodes configuring that node pair are linked to the key link table for the parent node that is made a leaf node, and this process is repeated for all leaf nodes below the insertion position. By means of the processing in this step S935, even when a classification key is inserted in the insertion position, the number of levels in the classification tree does not exceed the largest value.
One example of the status wherein the number of levels of leaf nodes below the insertion position exceeds the largest value when a classification key is inserted in the insertion position is the status explained above referencing
One example of the status wherein the parent node 210b of the node pair 201d which includes the leaf node 210d is made to be a leaf node and the keys 270e and 271e linked to the key link table for leaf node 210d and the key 271d linked to the key link table for leaf node 211d, which is the other node in the node pair 201d that includes leaf node 210d, are linked to the key link table for the parent node 210b, which has been made a leaf node, is the status shown in
Details of the processing in step S935 is described later referencing
Following step S935, processing proceeds to step S936 wherein, the same as for the above noted step S933, the classification key is inserted in the insertion position, and insertion processing is terminated. One example of the status wherein, by making the parent node of the node pair that includes the leaf node into a leaf node and by linking to the key link table for the parent node that is made a leaf node the keys linked to the key link table for the leaf nodes that configure the node pair, the value of search path counter has been decremented by 1 from the largest value, and the classification key has been inserted in the insertion position is the status described above referencing
Details of the processing in step S936, in the same is as is noted above for step S933, are described below referencing
As shown in
Then, proceeding to step S1003, the array element number computed by adding the boolean value obtained at step S1002 to the array element number of the primary node obtained at step S1001 is obtained. Also, in step S1004, the array element number computed by adding the logical negation value of the boolean value obtained at step S1002 to the array element number of the primary node obtained at step S1001 is obtained.
Next, proceeding to step S1005, the classification reference pointer and the key management pointer for the index key management area are obtained. Here, the classification reference pointer and the key management pointer for the index key management area are obtained in order to secure the key classification table and key link table entries associated with the leaf node that includes the classification key to be inserted.
Because the number of entries in the key link table is the number of keys to be classified, the acquisition of the key management pointer can be executed by securing beforehand in the index key management area an area of empty entries equal to the number of keys to be classified and by passing successively the starting address of an empty entry as the key management pointer whenever there is an acquisition request.
Although the number of entries in the key classification table, that is, the number eventually necessary, is the number of blocks for classifying the keys, saying it differently, it is the number of leaf nodes in the classification tree, still the acquisition of the classification reference pointer can be executed, the same as for the key link table, by securing beforehand in the index key management area an area of empty entries equal to the number of keys to be classified and by passing successively the starting address of an empty entry as the classification reference pointer whenever there is an acquisition request.
As is noted above in the description of step S935 shown in
Next, in step S1006, the key management pointer obtained at step S1005 is written in the head link and tail link in the key classification table entry pointed to by the classification reference pointer obtained at step S1005, and the classification key is written in the smallest value key and the largest value key, and in step S1007, the key reference pointer for the index key storage area is written in the key reference pointer in the key link table entry pointed to by the key management pointer. The key reference pointer for the index key storage area is the pointer set in step S701 or step S705 shown in
Next, in step S1008, a 1 (leaf node) is written in the node type of the array element pointed to by the array element number obtained at step S1003 and the classification reference pointer obtained at step S1005 is written in the classification reference pointer.
Proceeding to step S1009, the contents of the array element with the array element number in the insertion position are read out from the array, and in step S1010, the contents read out at step S1009 are written in the array element pointed to by the array element number obtained at step S1004. Here, the array element number in the insertion position is the one set at step S929 shown in
Finally, in step S1011, a 0 (branch node) is written in the node type of the array element pointed to by the array element number in the insertion position, and the difference bit position obtained at step S921 shown in
When the example of processing flow shown in
As shown in
Next, proceeding to step S1103, the tail link in the key classification table entry pointed by the classification reference pointer extracted at step S910 shown in
Next in step S1106, the smallest value key and the largest value key in the key classification table entry pointed by the classification reference pointer extracted at step S910 shown in
If the classification key is smaller than the smallest value key, in step S1108, the classification key is written into the smallest value key in the key classification table entry pointed by the classification reference pointer extracted at step S910 shown in
If the classification key is not smaller than the smallest value key, in step S1109, a determination is made whether the classification key is larger than the largest value key. If the classification key is not larger than the largest value key, processing is terminated, and if the classification key is larger than the largest value key, in step S1110, the classification key is written into the largest value key in the key classification table entry pointed by the classification reference pointer extracted at step S910 shown in
Also, although the processing flow shown in the above noted example in
Then, when the classification key has been inserted in the insertion position, a check is made whether the number of levels of the leaf nodes below the insertion position does not exceed the largest value, and if there is a leaf node that exceeds the restriction, the parent node of the node pair that includes that leaf node is made a leaf node, and the keys linked to the key link table of the leaf nodes configuring that node pair are linked to the key link table for that parent node, for all the leaf nodes below the insertion position.
Whereat, the processing shown in
As shown in
Next, in step S1205, a determination is made whether the count value for the search path counter is the largest value. Here, the count value for the search path counter is the one counted when the leaf node is obtained at step S1204, and it indicates the level in the classification tree wherein is located the leaf node obtained at that point.
When, in step S1205, the determination is made that the count value for the search path counter is not that of the largest value, processing proceeds to step S1206, wherein the array element number is extracted from the search path stack, and the stack pointer for the search path stack is decremented by 1, and processing proceeds to step S1209. The array element number extracted from the search path stack at step S1206 is either the array element number, obtained in step S1204, of the array element disposed in the leaf node that includes the smallest value of the index keys or the array element number pointed to by the stack pointer that is decremented by 1 at step S1206 in the last previous processing loop from step S1206 to step S1212a.
Conversely, when the determination in step S1205 is that the count value for the search path counter is that of the largest value, processing branches to step S1207. At step S1207, the stack pointer for the search path stack is decremented by 1, and the array element number is extracted from the search path stack, and the extracted array element number is set in the array element number of the parent node. Here the parent node is the node immediately above the leaf node, obtained in step S1204, that includes the smallest value of the index keys. The node immediately above a given node is called the parent node of that node, and the node immediately below is called a child node.
Next, in step S1208, the leaf node that is linked to the keys that have been linked to the key link table for the leaf node obtained at step S1204 and the keys that have been linked to the key link table for the leaf node that is a pair to that leaf node is written into the array element of which array element number is set at step S1207, in other words, it is written into the array element of the parent node. Saying it differently, the branch node (parent node) immediately above the leaf node obtained at step S1204 is made to be a leaf node, and the keys that have been linked to the key link table for the leaf node obtained at step S1204 and the keys that have been linked to the key link table for the leaf node that is a pair to that leaf node are linked to the key link table for the parent node that is made a leaf node.
Details of the processing in step S1208 are described later referencing
Next, in step S1208a, the count value for the search path counter is decremented by 1, and processing proceeds to step S1209. Here the count value for the search path counter is decremented by 1 because the number of levels of the leaf node has been decremented by 1 by making the parent node into a leaf node in the processing in the above noted step S1208. By means of this processing, the count value for the search path counter when there is once again a search for the smallest value in step S1204 can be made to coincide with the number of levels in the search path.
At step S1209, a determination is made whether the array element number extracted at step S1206 or step S1207 is the array element number of the insertion position. If the determination result is that the array element number extracted at step S1206 or step S1207 is the array element number of the insertion position, link processing is terminated because the processing of all the leaf nodes below the node at the insertion position has been completed. If the determination result is that the array element number extracted at step S1206 or step S1207 is not the array element number of the insertion position, processing proceeds to step S1211.
At step S1211, a node position is obtained that indicates in which of the array elements of a node pair is stored the node with the array element number extracted at step S1206 or step S1207. The node position can be obtained from the array element number, for example, by knowing that a node [0] is stored in the array element whose array element number is an even number and so forth.
Then, at step S1212, a determination is made whether the node position obtained at step S1211 is that of a node [1]. If the determination at step S1212 is that of a node [1], in step S1212a, the count value of the search path counter is decremented by 1, and a return is made to step S1206.
When the determination at step S1212 is that of a node [0], processing proceeds to step S1213, wherein the array element number is incremented by 1, and the array element number of the node [1] that is a pair to that node is obtained. Then, at step S1214, the array element number of the node [1] obtained at step S1213 is set in the array element number of the search start node, and a return is made to step S1204.
The processing loop of the above noted steps S1204 to S1214 is repeated, while decrementing by 1 the stack pointer for the search path stack at step S1206 or while reducing the number of levels of a leaf node by making the parent node to be a leaf node in the processing from step S1207 to step S1208a, until a determination at step S1209 is made that the array element number extracted from the search path stack is the array element number of the insertion position.
Just as for the example shown in
Supposing we were to change the example in
When a search for the smallest value is first executed, because the search is made for the leaf node that includes the smallest value in the subtree which has the node at the insertion position as its root node, the node position for that leaf node is a node [0], and the node [1] that configures the same node pair is made the search start node and a search is done for the next smallest value. Finally, a search is done for the leaf node that includes the largest value in the subtree which has the node at the insertion position as its root node. The node position of that leaf node is a node [1], and the processing loop of step S1206 to step S1212a is repeated until the array element number of the insertion position is extracted from the search path stack in step S1206, and the determination at step S1209 is that the array element number is the array element number of the insertion position, and processing is terminated.
As shown in
Next, in step S1302, the array element number is stored in the search path stack. Then, at step S1303, the array element pointed to by the array element number is read out from the array as a node, and at step S1304, the node type is extracted from the read-out node, and processing proceeds to step S1305.
At step S1305, a determination is made whether the node type is branch node, and when the determination is that the node type is branch node, processing proceeds to step S1305a, wherein the value in the search path counter is incremented by 1. Next, proceeding to step S1306, the coupled node indicator is extracted from the node, and at step S1307, the value 0 is added to the extracted coupled node indicator, and the result is made to be a new array element number, and a return is made to step S1302.
Thereinafter, the processing from step S1302 to step S1307 is repeated until the node extracted at step S1304 is determined to be a leaf node in step S1305, and when the node extracted at step S1304 is determined to be a leaf node in step S1305, processing is terminated.
Also, although the processing to search for a leaf node that includes the smallest value in the index keys described above referencing
As shown in
Next proceeding to step S1403, the value 1 is added to the coupled node indicator, and the array element number for node [1] is obtained. Then, in step S1404, the array element pointed to by the array element number for node [1] is read out from the array as a node, and in step S1405, the classification reference pointer is extracted from the node and is set in the classification reference pointer for node [1]. In the example shown in
Next, proceeding to step S1406, the value 0 is added to the coupled node indicator, and the array element number for node position 0 is obtained. Then, in step S1407, the array element pointed to by the array element number for node [0] is read out from the array as a node, and in step S1408, the classification reference pointer is extracted from the node and is set in the classification reference pointer for node [0]. In the example shown in
Next, in step S1409, the head link, tail link, and largest value key in the key classification table entry pointed to by the classification reference pointer for node [1] are read out and the head link and tail link are set in the head link for node [1] and in the tail link for node [1] respectively, and the largest value key is set in the largest value key for node [1].
Next, in step S1410, the tail link in the key classification table entry pointed to by the classification reference pointer for node [0] is read out, and in step S1411, the head link for node [1] set at step S1409 is written in the link in the key link table entry pointed to by the read-out tail link. Then, in step S1412, the tail link for node [1] is written in the tail link in the key classification table entry pointed to by the classification reference pointer for node [0], and the largest value key for node [1] is written in its largest value key.
By means of the above processing, the classification reference pointer for the parent node that is to be made a leaf node is made to be the classification reference pointer for the child node [0], and, in line with that, the key link table and the key classification table are rewritten. The process, in the above noted step S1411, of writing the head link for node [1] into the link in the key link table entry pointed to by the tail link in the key classification table entry pointed to by the classification reference pointer for node [0] sets the links to the keys that are linked to the key link table for leaf node [1] after the keys that are linked to the key link table for the leaf node [0].
Next, proceeding to step S1413, the contents read out at step S1407 is written in the array element pointed to by the array element number for the parent node. In the examples shown in
Finally, at step S1414, the node pair pointed to by the coupled node indicator extracted at S1402 is deleted, and at step S1415, the key classification table entry pointed to by the classification reference pointer for node [1] obtained at S1405 is deleted, and processing is terminated.
Next, referencing
As shown in
Next, at step S1501a, the array element number is stored in the search path stack, and proceeding to step S1502, the array element pointed to by the array element number is read out from the array as a node. Then at step S1503, the node type is extracted from the read-out node, and processing proceeds to step S1504.
At step S1504, a determination is made whether the node type extracted at step S1503 indicates a branch node. If the node type indicates a branch node, processing proceeds to step S1505, wherein the coupled node indicator is extracted from the node read out at step S1502, and, in step S1506, the value 0 is added to the extracted coupled node indicator, and an array element number is obtained, and a return is made to step S1501a.
Conversely, when the determination in step S1504 is that the node type extracted at step S1503 is a leaf node, processing proceeds to step S1508, wherein the classification reference pointer is extracted from the node read out at step S1502, and processing proceeds to step S1511 shown in
As shown in
Next in step S1513, the key reference pointer and link in the key link table entry pointed to by the read out pointer are read out. Here, the read out pointer is the one set at step S1512 or at step S1516, noted below.
Next in step S1513a, the key pointed to by the key reference pointer read out at step S1513 is read out from the index key storage area, and, in step S1514, the read out key is output to the key output target read out at step S1511, and processing proceeds to step S1515.
At step S1515, a determination is made whether the read out pointer coincides with the tail link, and if they do not coincide, in step S1516, the link read out at step S1513 is set in the read out pointer, and a return is made to step S1513.
When the determination in step S1515 is that the read out pointer coincides with the tail link, processing is terminated because all the index keys linked to the key link table for the leaf node that includes the smallest value of the index keys, which is obtained in the prior stage of processing shown in
The above describes the processing flow that realizes a bit-string key classification method and a distribution method related to a preferred embodiment of this invention. It is clear that the bit-string key classification apparatus and bit-string key distribution apparatus related to this invention can be constructed on a computer by means of a program that executes this processing flow on a computer like the data processing apparatus 301 shown in the example in
The bit-string key distribution apparatus 600 includes the leaf node extracting means 610 and the classification key outputting means 620; and the leaf node extracting means 610 successively extracts leaf nodes from the classification tree 530, and the classification key outputting means 620 reads out key access information from the leaf node extracted by the leaf node extracting means 610, and extracts key position information from the key position search table using the key access information, and reads out keys from the key storage means based on the key position information, and outputs them to the output target. Also, although the key output target can be set in the key classification table at the end of classification processing, the key output target can also be set in association with the blocks for classifying the keys to be classified and then the output target can be determined for each extraction of a leaf node from the classification tree.
Number | Date | Country | Kind |
---|---|---|---|
2009-246868 | Oct 2009 | JP | national |
This application is a continuation of PCT/JP2010/006305 filed on Dec. 25, 2010. PCT/JP2010/006305 is based on and claims the benefit of priority of the prior Japanese Patent Application 2009-246868 filed on Dec. 27, 2009, the entire contents of which are incorporated by reference. The contents of PCT/JP2010/006305 is incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2010/006305 | Oct 2010 | US |
Child | 13456955 | US |