1. Field of Invention
This invention relates to the field of building new structure in an interlocking trees datastore.
2. Description of Related Art
In many applications it is useful to identify when data or input sequences that have not been previously encountered, i.e. new variables or records, are received into a datastore. In known systems identifying a new sequence has required the very computationally intensive procedure of comparing the new sequence with all of the previously received to search for a match. In another known procedure for identifying new sequences a table of the distinct values was constructed for comparison. Therefore, more efficient methods for detecting a new sequence are required.
Additionally, when a new sequence is identified the rules for the construction of a particular interlocking trees datastore may require building a new node or nodes to record the new sequence. When a new node is being built care should be taken to prevent access to nodes being changed, for example, by another thread executing in the datastore. Therefore, nodes that are being changed should be locked until the changes are complete in order to prevent such an access. In known systems the entire interlocking trees datastore was locked from threads adding new sequences to prevent other threads from accessing changing nodes. This was a severe restriction because it slowed the system down and essentially limited the construction of interlocking trees datastore to a single thread.
All references cited herein are incorporated herein by reference in their entireties.
A method for recording information in an interlocking trees datastore having a plurality of K paths includes receiving an input particle to provide a received input particle and building a new K node in accordance with the received input particle. A K node is locked in accordance with the building of the new K node to provide a locked node. The locked node can be the Case node of the new K node. The Case bi-directional link between the locked Case node and the new K node is completed while the locked Case node is locked. A pointer is added to the new K node to an asCase list of the locked Case node. The locked node is locked only while adding the pointer to the new K node to the asCase list of the locked Case node. The locked node can be a Result node of the new K node. A pointer is added to the new K node to an asResult list of the locked Result node while the Result node is locked. The Result node is locked only while adding the pointer to the new K node to the asResult list of the locked Result node. A memory location of a Case node and a memory location of a Result node is associated with the new K node. The locked node is locked after the associating of the memory locations with the new K node. The locked node can be the Case node and the new K node is added to the asCase list of the locked Case node while the locked Case node is locked.
As a part of its regular processing of streams of input particles the KStore engine recognizes that a sequence being processing is ‘new’ when there is no existing structure to record the sequence being processed. This condition may occur at any level in the K structure. At this point it is possible to implement many different processes to deal with the new sequence at substantially low incremental cost.
For example, new K structure may be created to record the event, an error statement may be issued, the event may be logged, processes may be initiated to evaluate the new sequence in accordance with some established criteria, or any combination of processes may be initiated in response to encountering new sequence.
The invention will be described in conjunction with the following drawings in which like reference numerals designate like elements and wherein:
FIGS. 2F,G show interlocking trees datastores that may be used to represent data according to the system and method of the present invention.
The invention will be illustrated in more detail with reference to the following examples, but it should be understood that the present invention is not deemed to be limited thereto.
Referring now to
Additionally, according to the system and method of the invention, the K Engine 14 can be provided with a new sequence process 18 to permit the processing of new sequences as described in detail herein below. For example, one of the processes that are related to the new sequence process 18 is the lock process for locking previously existing nodes when a new node is constructed. The performance of the new sequence process 18 may be facilitated by providing specialized utilities 16. In the preferred embodiment of the invention the learn engine 26 and the API utility 23 can communicate with the K Engine 14 both directly and by way of the utilities 16.
Referring now to
In order to build the minimal KStore structure 100 the subcomponent K node 106 can be created by establishing the Case and Result bi-directional links. The Case bi-directional link 104 between the primary root node 102 and the K node 106, as best seen in the datastore element 130 of
The Result bi-directional link 120 is established between the elemental root K node 122 and the K node 106. A pointer to node 122 becomes the Result entry of node 106 and the asResult list of the elemental root node 122 is updated to include the subcomponent node 106. It will be understood that the foregoing operations for building a subcomponent node such as the subcomponent node 106 can be described in any order for illustrative purpose and can be performed in any order when practicing the present invention.
The Case bi-directional link 110 can then be established between the end product node 112 and the subcomponent node 106, as best seen in the datastore elements 140, 150 of
Referring now to
When the letter particle A of the sequence C-A-T is received the BOT-C-A node 258 can be built into the interlocking trees datastore 250 by establishing the Case bi-directional link to the BOT-C node 256. The Result bi-directional link to the A root node 270 can then be established. Thus, the BOT-C-A node 258 is built in response to receiving the letter particle A. The BOT-C-A node 258 is then designated as the current K location.
In order to form the BOT-C-A-T node 262 when the T letter particle is received the Case bi-directional link can be established to the BOT-C-A node 258 and the Result bi-directional link can be established to the T root node 280. The end product node 266 can then be created by forming the Case bi-directional link to the BOT-C-A-T node 262 and the Result bi-directional link to the EOT node 282. In this manner the K path 265 of the interlocking trees datastore 250 is built for representing the sequence of letter particles within the string C-A-T. With the processing of an EOT node the current K location is set to BOT.
In one preferred embodiment a count can be kept within each node of the interlocking trees datastore 250 in order to keep track of the number of times the node is traversed. The counts of the nodes within a K path of an interlocking trees datastore may be incremented each time they are encountered during later traversals of the K path. Thus, it will be understood that in alternate embodiments of the system and method of the invention the counts of the individual nodes of an interlocking trees datastore such as the interlocking trees datastore 250 can be incremented either as they are built, encountered or when the building and traversal is complete.
Referring now to
If a match is found in block 504 the particle is considered valid. The asCase list of the current K location node is obtained. A comparison is then performed between the Result node of each node in the asCase list and the root node of the input particle. The comparison between the nodes on the asCase list of the current node and the particle root node is shown in block 508. If there is a match in the comparison of block 508 the matched node becomes the current K location node as shown block 512. The count of the new current K location node and the input particle root node may be incremented at this time or may be incremented along with the other nodes in its K path when an end product node is encountered. If there is no match in the comparison of block 508 a new node may be created as shown in block 516 of new sequence determining procedure 500.
Referring now to
Although the particles set forth herein are input letter particles, it will be understood that the system and method of the invention can apply to any type of particle processing within an interlocking trees datastore. For example, they can apply to input words, sentences, pixels, molecules, amino acids, or any other data that can be received and stored in a datastore.
It will be understood by those skilled in the art that during its next node processing operations the next node particle processing procedure 300 will inherently detect the occurrence of a new sequence by the absence of structure to record an event within the interlocking trees datastore 250. This feature of the next node processing procedure 300 eliminates the need for any additional operations, beyond the normal input processing procedures, to determine when a new sequence is received. When a new sequence is determined within a KStore by the next node processing procedure 300 new KStore structure may be built to represent the new sequence. It will be understood that, the next node processing procedure 300 and any other process set forth herein, may be applied to a single level KStore such as the interlocking trees datastore 25 as well as any level of a multi-level KStore and to any other KStore.
Furthermore, the next node particle processing procedure 300 may be used in building new structure when it determines that a new sequence has arrived. For example, the particle next node procedure 300 may be applied to processing the sequence C-A-T-S within the interlocking trees datastore 250 to create the interlocking trees datastore 290.
When the sequence of letter data particles required for representing the sequence C-A-T-S is streamed into the KEngine for interlocking trees datastore 250 there is already structure within the interlocking trees datastore for the sequence. This structure was previously created for the C-A-T input sequence. The next node processing procedure 300 may thus begin by traversing the existing structure starting at the BOT node and possibly incrementing the count fields in the nodes encountered during the traversal. When the letter particle C of the sequence C-A-T-S is received, the current K location node is BOT and the asCase list of the BOT root node 252 may be followed to the BOT-C node 256. Since the Result node of the BOT-C node 256 (the C root node 274) matches the input letter particle C, traversal can continue. The C root node 274 is defined as a non-adjacent node of the BOT-C node 256 since it is not on the asCase list of the BOT-C node 256. At this point the count of the BOT node 252, the BOT-C node 256, and the C root node 274 can be incremented.
When the letter particle A of the sequence C-A-T-S is received execution of the next node processing procedure 300 can proceed to block 302. The next node processing procedure 300 may then make a determination whether the Result pointer of any subcomponent node in the asCase list of the BOT-C node 256 points to the A root node 270.
The only node in the asCase list of the BOT-C node 256 is the BOT-C-A node 258. Therefore, the current K location node is set to point to the BOT-C-A node 258 in block 302. Since Node is thus not null as determined in decision 304, a determination is made in decision 310 whether the Result pointerof the BOT-C-A node 258 points to the A root node 270.
Since a match is found in this determination the count of the BOT-C-A node 258 and the Root Node 270 may be incremented. The BOT-C-A node 258 is then made the current K location node in block 312. In this manner the next node processing procedure 300 can process the input stream. No new structure has been built within the interlocking trees datastore 250 thus far since the BOT-C-A node 258 of the sequence C-A-T-S was already formed when the sequence C-A-T was previously received.
When the letter particle T of the sequence C-A-T-S is received, the asCase pointer of the BOT-C-A node 258 can be followed to the BOT-C-A-T node 262 and the Result list of the BOT-C-A-T node 262 can be followed to the T node 280 and another match is found. Again, no new structure is built within the interlocking trees datastore 250. Thus, the next node processing procedure 300 has traversed the interlocking trees datastore 250 incrementing the counts of the nodes encountered as the input particles C-A-T of the sequence C-A-T-S were received.
However, when the letter particle S is received it will be determined by the next node procedure 300 that the structure for representing the sequence C-A-T-S does not exist within the interlocking trees datastore 250. Accordingly, when block 302 is executed the BOT-C-A-T-EOT node 266 is located using the asCase list of the BOT-C-A-T node 262 and the BOT-C-A-T-EOT node 266 is assigned to the variable Node in block 302. Execution of the next node processing procedure 300 proceeds to decision 310 by way of decision 304 since the BOT-C-A-T-EOT node 266 is not null. When the determination of decision 310 is made execution returns to block 302, since there is no match between the received letter particle S and the EOT node 282 indicated by the Result pointer of the BOT-C-A-T-EOT node 266.
Since there are no more nodes in the asCase list of the BOT-C-A-T node 262 the determination whether Node is null is affirmative the next time decision 304 is encountered. It will be understood that the affirmative determination in decision 304 indicates that the interlocking trees datastore 250 does not include a K path representing the sequence C-A-T-S. Therefore, the sequence C-A-T-S has not been previously entered into the interlocking trees datastore 250, and it may be determined that the sequence C-A-T-S is a new sequence at this point.
It will thus be appreciated by those skilled in the art that using a procedure such as the next node processing procedure 300 the system and method of the present invention may input sequences, and inherently and simultaneously determine the occurrence of a new sequence as part of the normal process of receiving the sequences. There is no need to perform any separate comparisons or any other operations in addition to the processing of the input stream in orderto detect the need for new structure since the detection of need for new structure is inherent in decision 304 as part of the process.
In response to the determination in decision 304 that Node is null and that new sequence has been encountered or captured, a determination can be made in block 306 whether a new node should be built in the interlocking trees datastore 250. Additionally, a determination can be made whether the occurrence of the new sequence should be reported to the user, administrator, log, etc. of the system and method of the present invention in block 308.
The determinations of blocks 306, 308 may be made according to predetermined rules or administrative guidelines or any other set of protocols. The parameters of the guidelines or protocols can be communicated to the next node processing procedure 300 by a calling procedure, a GUI or any other source. The rules or administrative guidelines can be set up by a user or administrator or any other party. For example, the user or administrator or other party can adapt the system and method of the invention to ignore the detection of new structure made in decision 304 and do nothing. Under these guidelines the occurrence of the new sequence can be treated substantially as noise.
Another possibility is to adapt the next node processing procedure 300 to build no new structure when a new sequence is detected or captured, but to report the occurrence to the user or administrator. The reporting can be done through the KEngine 14 or it can be performed by passing information to a calling procedure such as the learn engine 26 or the API utilities 23. Additionally, reporting can be done by way of an email. The reports can be used, for example, to trigger events such as the starting of another process, a new field event or a new record event, to set flags, to log messages, to send information or reports to a graphical user interface (GUI) 36, to send emails to a list of recipients or to perform any other operations. Additionally, the GUI 36 can report the occurrence to the user or administrator and permit the user to determine how to handle it. If a new node is built in view of the determination of block 306 the guidelines may or may not require providing a report.
When a report is made according to block 308 it may contain any information that may prove useful. For example, the value represented by the current node at the time that the new sequence was determined in decision 304 can be reported. The identity of the newly received particle can be reported. A list of possible next nodes obtained in block 302 of the particle processing procedure 300 according to the asCase list of the current node can be reported. It is also possible to anticipate the next node by making an estimate of which of the possible next nodes obtained in block 302 is the most likely next node. For example, the most likely next node may be determined by comparing the counts of the possible next nodes. The determination of the mostly likely next node can also be made according to how recently the possible next nodes have been accessed or according to the context in which the determination is made. In another embodiment the determination can be determined based upon processing a further node/particle or nodes/particles.
Referring again to
In order to form the BOT-C-A-T-S node 264 the Case bi-directional link may be established to the BOT-C-A-T node 262 by the routine for creating new nodes called in block 306. A Result bi-directional link may be established to the S root node 278. Thus, a branch is created in the path 265 of the interlocking trees datastore 290 at the BOT-C-A-T node 262 to provide the new K path 263.
Referring now to
The exemplary node 400 can also include a Result pointer 408. Thus, when the exemplary node 400 represents the subcomponent node 264 of the interlocking trees datastore 290, the Result pointer 408 would point to the S root node 278.
A pointer to asCase list 410 can also be included in the exemplary node 400. The pointer to asCase list 410 is a pointer to a list of the subcomponent nodes or end product nodes for which the node represented by the exemplary node 400 is the Case node. The pointer to asResult list 412 is a pointer to a list of the subcomponents nodes or end product nodes for which the node represented by the exemplary node 400 is the Result node. The nodes of the interlocking trees datastores 250, 290 can also include one or more additional fields 414. The additional fields 414 may be used for an intensity or count associated with the node or for any number of different items associated with the structure. Another example of a parameter that can be stored in an additional field 416 is the particle value for an elemental root node.
Referring now to
Referring now to
In the node locking procedure 600 a new node is created as shown in block 604. The new node can be created as previously described with respect to
Additionally, the asCase list of the current K location node may be updated to include the newly created node as shown in block 608. Therefore, the current K location node may be locked while its asCase list is updated and unlocked after the asCase list is updated. In a preferred embodiment all list updating operations are performed with the node being locked and immediately updated and unlocked in this manner in order to minimize the lock time.
Furthermore, no other nodes are locked by this thread during the period that the current node is locked. Thus, it is an important feature of the present invention that only one node at a time is locked by a particular thread, as described in more detail below. In a preferred embodiment, the current K location node may be locked and unlocked by setting and resetting the lock definition in the lock definition field 413 of the exemplary node 400 of the current K location node at the beginning and at the end of the operation of block 608.
Additionally, the asResult list of the root node may be updated by the node locking procedure 600 to include the newly created node. Therefore, as shown in block 612 the root node is locked and its asResult list is updated. The root node to be added may then be unlocked. In a preferred embodiment of the invention the root node is locked by the node locking procedure 600 only during the time it takes to update its asResult list.
During the time that the pointers of the exemplary node 400 for the new node are being established there are no pointers pointing to the new node. Therefore, no nodes can access the new node and there is no need to lock the new node. Furthermore, there is no need to lock either the Case node or the Result node while the new node is being established since their data is not changing during this period.
When the pointers of the new node are established the asCase list of the Case node and the asResult list of the root node may be updated one at a time so that only one or the other of the two are locked. Therefore, a very important feature of the system and method of the present invention is that only one node is locked at a time for this thread. All of the remaining nodes in the interlocking trees datastores 250, 290 may remain unlocked. This reduces the scope of the node locking to the minimum amount possible and permits faster operation of the interlocking trees datastore 250, 290.
Referring now to
A determination is then made in decision 706 whether the new node has been added prior to the locking operations of block 702. This determination should be made in the node locking procedure 700 since it is possible for a thread other than the instant thread to begin building the new node between the time that the instant thread determines that the new node must be built and the time thatthe current node is actually locked according to block 702. The determination of decision 706 can be made by checking the asCase list of the current node.
If the new node has not been added during the foregoing time period as determined in decision 706 of the node locking procedure 700, the new node may be created as shown in block 710. Since the current node was locked in block 702 a pointer to the new node can be added to the asCase list of the current node at this time as shown in block 714. Regardless of whether a new node is created in block 710 the current node is unlocked as shown in block 716.
If a new node was created in block 710 as determined in decision 718, the root node for the new node is locked as shown in block 720. A pointer to the new node is added to the asResult list of the root node as shown in block 722. The root node can then be unlocked as shown in block 724. Thus, within the node locking procedure 700 only one node is locked at a time.
While the invention has been described in detail and with reference to specific examples thereof, it will be apparent to one skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope thereof.
This application claims the benefit of U.S. Provisional Application No. 60/625,922 filed Nov. 8, 2004.
Number | Name | Date | Kind |
---|---|---|---|
4286330 | Isaacson | Aug 1981 | A |
4823310 | Grand | Apr 1989 | A |
5245337 | Bugajski | Sep 1993 | A |
5293164 | Bugajski | Mar 1994 | A |
5592667 | Bugajski | Jan 1997 | A |
5630125 | Zellweger | May 1997 | A |
5634133 | Kelley | May 1997 | A |
5829004 | Au | Oct 1998 | A |
5894311 | Jackson | Apr 1999 | A |
5918229 | Davis et al. | Jun 1999 | A |
5930805 | Marquis | Jul 1999 | A |
5963965 | Vogel | Oct 1999 | A |
5966709 | Zhang | Oct 1999 | A |
5970490 | Morgenstern | Oct 1999 | A |
5978794 | Agrawal et al. | Nov 1999 | A |
5983232 | Zhang | Nov 1999 | A |
6018734 | Zhang | Jan 2000 | A |
6029170 | Garger | Feb 2000 | A |
6031993 | Andrews et al. | Feb 2000 | A |
6102958 | Meystel | Aug 2000 | A |
6115715 | Traversat et al. | Sep 2000 | A |
6138115 | Agrawal et al. | Oct 2000 | A |
6138117 | Bayardo | Oct 2000 | A |
6144962 | Weinberg et al. | Nov 2000 | A |
6160549 | Touma et al. | Dec 2000 | A |
6233575 | Agrawal et al. | May 2001 | B1 |
6275817 | Reed et al. | Aug 2001 | B1 |
6278987 | Reed et al. | Aug 2001 | B1 |
6286002 | Axaopoulos et al. | Sep 2001 | B1 |
6341281 | MacNicol et al. | Jan 2002 | B1 |
6356902 | Tan et al. | Mar 2002 | B1 |
6360224 | Chickering | Mar 2002 | B1 |
6373484 | Orell et al. | Apr 2002 | B1 |
6381600 | Lau | Apr 2002 | B1 |
6389406 | Reed et al. | May 2002 | B1 |
6394263 | McCrory | May 2002 | B1 |
6453314 | Chan et al. | Sep 2002 | B1 |
6470277 | Chin et al. | Oct 2002 | B1 |
6470344 | Kothuri et al. | Oct 2002 | B1 |
6473757 | Garofalakis et al. | Oct 2002 | B1 |
6477683 | Killian et al. | Nov 2002 | B1 |
6499026 | Rivette et al. | Dec 2002 | B1 |
6505184 | Reed et al. | Jan 2003 | B1 |
6505205 | Kothuri et al. | Jan 2003 | B1 |
6581063 | Kirkman | Jun 2003 | B1 |
6591272 | Williams | Jul 2003 | B1 |
6604114 | Toong et al. | Aug 2003 | B1 |
6615202 | Ding et al. | Sep 2003 | B1 |
6624762 | End, III | Sep 2003 | B1 |
6635089 | Burkett et al. | Oct 2003 | B1 |
6662185 | Stark et al. | Dec 2003 | B1 |
6681225 | Uceda-Sosa et al. | Jan 2004 | B1 |
6684207 | Greenfield et al. | Jan 2004 | B1 |
6691109 | Bjornson et al. | Feb 2004 | B2 |
6704729 | Klein et al. | Mar 2004 | B1 |
6711585 | Copperman et al. | Mar 2004 | B1 |
6738762 | Chen et al. | May 2004 | B1 |
6745194 | Burrows | Jun 2004 | B2 |
6748378 | Lavender et al. | Jun 2004 | B1 |
6751622 | Puri et al. | Jun 2004 | B1 |
6760720 | De Bellis | Jul 2004 | B1 |
6768995 | Their et al. | Jul 2004 | B2 |
6769124 | Schoening et al. | Jul 2004 | B1 |
6799184 | Bhatt et al. | Sep 2004 | B2 |
6804688 | Kobayashi et al. | Oct 2004 | B2 |
6807541 | Bender et al. | Oct 2004 | B2 |
6816856 | Baskins et al. | Nov 2004 | B2 |
6826556 | Miller et al. | Nov 2004 | B1 |
6831668 | Cras et al. | Dec 2004 | B2 |
6868414 | Khanna et al. | Mar 2005 | B2 |
6900807 | Liongosari et al. | May 2005 | B1 |
6920608 | Davis | Jul 2005 | B1 |
6931401 | Gibson et al. | Aug 2005 | B2 |
6938204 | Hind et al. | Aug 2005 | B1 |
6952736 | Westbrook | Oct 2005 | B1 |
6965892 | Uceda-Sosa et al. | Nov 2005 | B1 |
7027052 | Thorn et al. | Apr 2006 | B1 |
7228296 | Matsude | Jun 2007 | B2 |
20020120598 | Shadmon et al. | Aug 2002 | A1 |
20020124003 | Rajasekaran et al. | Sep 2002 | A1 |
20020138353 | Schreiber et al. | Sep 2002 | A1 |
20020143735 | Ayi et al. | Oct 2002 | A1 |
20020143783 | Bakalash et al. | Oct 2002 | A1 |
20020188613 | Chakraborty et al. | Dec 2002 | A1 |
20020194173 | Bjornson et al. | Dec 2002 | A1 |
20030009443 | Yatviskly | Jan 2003 | A1 |
20030033279 | Gibson et al. | Feb 2003 | A1 |
20030093424 | Chun et al. | May 2003 | A1 |
20030115176 | Bobroff et al. | Jun 2003 | A1 |
20030120651 | Bernstein et al. | Jun 2003 | A1 |
20030200213 | Charlot et al. | Oct 2003 | A1 |
20030204513 | Bumbulis | Oct 2003 | A1 |
20030204515 | Shadmon et al. | Oct 2003 | A1 |
20030217335 | Chung et al. | Nov 2003 | A1 |
20040107186 | Najork et al. | Jun 2004 | A1 |
20040133590 | Henderson et al. | Jul 2004 | A1 |
20040143571 | Bjornson et al. | Jul 2004 | A1 |
20040169654 | Walker et al. | Sep 2004 | A1 |
20040230560 | Elza et al. | Nov 2004 | A1 |
20040249781 | Anderson | Dec 2004 | A1 |
20050015383 | Harjanto | Jan 2005 | A1 |
20050050054 | Clark et al. | Mar 2005 | A1 |
20050060325 | Bakalash et al. | Mar 2005 | A1 |
20050071370 | Atschul et al. | Mar 2005 | A1 |
20050080800 | Parupudi et al. | Apr 2005 | A1 |
20050097108 | Wang et al. | May 2005 | A1 |
20050102294 | Coldewey | May 2005 | A1 |
20050149503 | Raghavachari | Jul 2005 | A1 |
20050171960 | Lomet | Aug 2005 | A1 |
20050179684 | Wallace | Aug 2005 | A1 |
20050198042 | Russell et al. | Sep 2005 | A1 |
20050262108 | Gupta | Nov 2005 | A1 |
20070150474 | Szilagyi et al. | Jun 2007 | A1 |
Number | Date | Country |
---|---|---|
0 079 465 | Jan 1985 | EP |
WO 9934307 | Jul 1999 | WO |
WO 0146834 | Jun 2001 | WO |
WO 02063498 | Aug 2002 | WO |
Number | Date | Country | |
---|---|---|---|
60625922 | Nov 2004 | US |