The present invention relates to methods of implementing different types of data structures in database systems, and in particular, to a complex data structure for retrieving data from a database.
Data such as letters, numbers, and words are often maintained and stored in different types of data structures that complement their eventual with an application program. Since a List data structure enables the storing of data objects in an ordered set, this data structure is typically employed with an application program that often retrieves several sequentially related data objects at a time. For example, a multi-media player program usually generates streaming video from sequentially ordered data objects in a List data structure. Although retrieval times for large numbers of sequentially ordered or related data objects can be improved by the use of a List data structure, it may not always be the optimal data structure for retrieving individual and/or several non-sequential or unrelated data objects from a large set of data objects. It is well known that the maximum amount of time to locate an individual data object on a list increases as the number of data objects referenced by the list increase.
Another type of data structure is the Trie, which is often used with an application program that requests individual and/or several non-sequential or unrelated data objects. A Trie stores data in each transition between each node in a multi-level data structure, rather than at the node itself. In this way, all of the transitions in the path between the root and each leaf represent the data object (key) and each transition between each node in the path is associated with a single character/number of the key. Since the nodes themselves are unlabeled, each transition between each node is labeled with a particular character/number. Also, the Trie data structure facilitates the calculation of a constant maximum amount of time to retrieve a stored data object based on the number of levels (nodes associated with the number of alphanumeric characters in the longest data object) in the Trie, e.g., O (1). Thus, the maximum search time for a given number of levels in a Trie data structure remains relatively constant as the actual number of data objects stored in the data structure increases.
For example, a dictionary program often employs a Trie data structure to store words and provide relatively constant maximum search times as the number of words (but not their length) in the dictionary increase. In a Trie data structure for a dictionary program, every character in the alphabet (A through Z) is partitioned into individual and disjoint main level search nodes. Also, the total number of search node levels in a Trie data structure for a dictionary corresponds to the number of characters in the dictionary's longest possible word.
Although the List and Trie data structures can complement the operation of particular application programs, there are many other types of application programs that can generate requests for both individual/unrelated and sequential/related data objects. A facility that could employ a particular request generated by an application program to choose a type of data structure most suited to reference and retrieve requested data would be an improvement over the prior art. Also, a data structure that could provide access by one or more unique keys to the same data and enable the automatic removal of all references to deleted data would complement the operation of application programs that generate different types of requests.
A more complete appreciation of the invention and its improvements can be obtained by reference to the accompanying drawings, which are briefly summarized below, to the following detail description of presently preferred embodiments of the invention, and to the appended claims.
In accordance with the invention, the above and other problems are solved by employing a plurality of data structures to optimize the retrieval of at least one data object over a network. Each data object is stored in a data store and each data object is separately referenced in each of the plurality of data structures. In response to a request for one data object, one of the plurality of data structures is automatically determined to be best suited to retrieve the one data object. The determined data structure is employed to locate and retrieve the one data object from the data store. Also, in response to a request for a plurality of related data objects, another one of the plurality of data structures is automatically determined to be best suited to retrieve the plurality of related data objects. The other one of the plurality of data structures is employed to locate and retrieve the plurality of related data objects from the data store. Additionally, in response to a request to delete at least one data object, each reference to each deleted data object in the plurality of data structures is automatically deleted.
In accordance with other aspects of the invention, a parent object is associated with each data object. Each parent object identifies each reference for the associated data object in the plurality of data structures. When a data object is deleted, the parent object associated with the deleted data object is employed to identify each reference for the deleted data object in the plurality of data structures and each reference is deleted.
In accordance with still other aspects of the invention, the data object can be a collector object that is associated with a member object that identifies one or more other data objects that are related to the collector object. The member object is employed to reference and retrieve each data object related to the collector object when the collector object is retrieved.
In accordance with yet other aspects of the invention, the plurality of related data objects have at least one related characteristic, including port, IP address and type. Also, the plurality of data structures may include List, Hash and Trie.
In accordance with other aspects of the invention, one of the plurality of data structures can be a Trie data structure that identifies a key in the request for a data object. The Trie data structure divides the key into segments and each segment is employed to search the Trie data structure and locate a reference to the requested data object. The key can represent an IP address and/or a port. Also, each segment can be represented by at least one bit.
In accordance with still other aspects of the invention, the data store is a database. The database can be a relational, object-oriented, or some combination of the two, database. Also, the data store can be a data warehouse.
The invention may be implemented as a computer process, a computing system or as an article of manufacture such as a computer program product or computer readable medium. The computer program product may be a computer storage medium readable by a computer system and encoding a computer program of instructions for executing a computer process. The computer program product may also be a propagated signal on a carrier readable by a computing system and encoding a computer program of instructions for executing a computer process.
These and other features as well as advantages, which characterize the invention, will be apparent from a reading of the following detailed description and a review of the associated drawings.
An embodiment of the invention provides for employing a complex data structure to optimize the retrieval of data from a data store over a network. The complex data structure may include multiple data structures, e.g., List and Trie, which in parallel separately reference the same data objects in a data store. For a particular functional request to retrieve data, the complex data structure examines the request to determine whether it is associated the List or Trie data structures. In most cases, the Trie data structure is associated with a functional request for a single data object and the List data structure is associated with functional request to retrieve several related data objects.
Each data object is associated with a parent object that includes a list of every reference to the data object in both the Trie and List data structures. When a data object is subsequently deleted, the complex data structure employs the parent object list to automatically delete every reference to the deleted data object in both the Trie and List data structures. Additionally, the complex data structure provides for a particular type of data object, i.e., a collector object, that is associated with a member object and which includes a list of other related data/collector objects that are referenced in a sub-tree below the node level of the collector object in the Trie data structure. Also, when the data associated with a collector object is requested, other data associated with the other data/collector objects on the member object list are automatically retrieved.
Another embodiment of the invention may employ a complex data structure that includes a List data structure and Hash data structure, which can be used with the List data structure to process functional requests for single objects. Generally, the Hash data structure requires less overhead, e.g., memory and processor cycles, than a Trie data structure to process a request for data. However, the Hash data structure would not support collector objects as discussed in greater detail below.
The logical operations of the various embodiments of the invention are implemented (1) as a sequence of computer implemented actions or program modules running on a computing device; and/or (2) as interconnected hardware or logic modules within the computing device. The implementation is a matter of choice dependent on the performance requirements of the computing system implementing the invention. Accordingly, the logical operations making up the embodiments of the invention described herein are referred to alternatively as operations, actions or modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of program modules may be combined or distributed in various embodiments.
One or more application programs 24 are loaded into the memory system 22 and run on the operating system 26. Examples of applications include email programs, scheduling programs, network management programs, PIM (personal information management) programs, word processing programs, spreadsheet programs, Internet browser programs, and so forth. The computer 20 has a power supply (not shown), which can be implemented as one or more batteries or might further include an external power source that overrides or recharges built-in batteries, such as an AC adapter.
The computer 20 may operate with at least some form of computer readable media. Computer readable media can be any available media that can be accessed by the computer readable media device 32. By way of example, and not limitation, computer readable media may comprise computer storage media and communications media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, random access memory (RAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer 20. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
Objects in the data store 108 are retrieved/added/deleted through the Functional Interface 102. Retrieval functions are specific to either the Trie data structure 104 or the List data structure 106. For the Trie data structure 104, there is a Trie retrieval function that accepts a top level accessor object and a unique key and returns a container object, e.g., a data object or a collector object which are discussed in greater detail below. The List data structure 106 supports several retrieval functions, including mid level list accessor objects, changing the current state of a list, and retrieving at least one of the container objects that is a member of a list. Since the mid level list accessor objects reference associated top level Trie accessor object, they do not need to be specified by list functions.
A Trie retrieval function is generally employed to find a single container object when at least one of the Trie keys that give access to the object are known. For example, a Trie retrieval occurs when a data packet arrives at an appliance on an interface where a network address translation data store is not empty. In this example, a six byte source IP address and port can be used to attempt to retrieve information about a currently active network address translation connection. When the retrieval fails, the four byte source ip address can be used to attempt to find a network address translation initiator object which is stored as a collector object in the same complex data structure. If an initiator is found, a connection is added to the data store under the six byte source IP address and port retrieval path. At the same time, a connection object will be automatically added to the initiator object's member list because the initiator object is a collector object with a retrieval path that is the prefix of the connection object's retrieval path.
The list retrieval functions are used whenever the appliance needs to sequentially access a group of container objects. For example, a connection in a network address translation data store can be deleted when it has not been used for a length of time. In this case, an appliance could periodically step through part of the list of all connection objects in the data store to determine whether some connections should be deleted.
In another example, both Trie and List functions may be used when a request is received for information about all of the network address translation connection objects that were created using an initiator object identified by a four byte ip address. In this case, the initiator object is retrieved using the Trie retrieval function and a four byte key. Then using the member list of the initiator object, the connection objects collected by the initiator are sequentially retrieved.
Additionally, each collector object automatically adds all of the objects (data and/or collector) to its member object from the sub-tree (node levels) in the Trie data structure below the reference to the collector object. Thus, when a collector object is referenced by a Trie leaf node, the collector object will have an empty list in its member object.
In one embodiment of the invention, an add function automatically adds an object to the list for the member object of a collector object when the added object is added beneath the existing level of the collector object in a sub-tree of the Trie data structure. Additionally, when another add function is employed to add a new collector object to the Trie data structure at a level above the levels of previously existing objects in a sub-tree of the Trie data structure, these objects are automatically added to the list in the member object associated with the new collector object.
Returning to
In greater detail,
Additionally, a collector object 142 is referenced by the transition values associated with a path through the first four levels of the Trie data structure and which represent the first four values in a 32 bit IP address. In this case, the collector object 142 has automatically included a member object (not shown) that lists a reference to each data object (139 and 140) that is disposed in the sub-tree (node levels) below the reference to the collector object 142. Since the first four key segments of the collector object 142 and the data objects 139 and 140 are the same, these data objects were automatically included in the member object for the collector object 142.
Alternatively, when the determination at the decision block 150 is false, the flow moves to a block 152 where key segments are created for a Trie data structure based on the port and IP address associated with the requested data. In one embodiment of the invention, each key segment is four bits. In this example, four key segments are created for a 16 bit port and eight key segments are generated for a 32 bit IP address.
The operational flow moves to a block 154 where each key segment is employed to sequentially transition along a path through all of the node levels in the Trie data structure. Advancing to a decision block 156, a determination is made whether a null pointer value was found at any level in the path through the nodes of the Trie data structure. If true, the operational flow moves to an End block and resumes calling program modules.
Alternatively, when the determination at the decision block 156 is false, the operational flow advances to a block 158 where the complete path through the Trie data structure is employed to reference a data object and retrieve the requested data. The operational flow moves to a block 160 where when the referenced data object is determined to also be a collector object, a member list included with the collector object is employed to retrieve related data associated with other data objects on the member list. The operational flow transitions to the end block and resumes calling program modules.
Alternatively, when the object to be deleted is determined to not be a collector object at the decision block 170, the operational flow moves directly to the block 172 where substantially the same actions discussed above are performed.
Next, the operation flow advances to a decision block 174 where a determination is made whether the parent object is a trie node. If so, the operational flow moves to a block 184 where every trie node that references this parent object is deleted. The operational flow transitions to a decision block 178 where a determination is made whether more parent objects are included on the list associated with the data object to be deleted. If affirmative, the operational flow loops back to the block 172 and performs substantially the same actions as discussed above. However, when the determination at the decision block 178 is negative, the operational flow moves to a block 180 where the data object to be deleted is deallocated. The operational flow transitions to an end block and resumes calling other program modules.
Alternatively, if the determination at the decision block 174 is negative, i.e., the parent object is not a trie node, then the operational flow advances to a block 176 where the reference in the List data structure to the parent object is deleted. The operational flow moves to the decision block 178 where substantially the same actions discussed above are repeated.
One embodiment of the invention enables each data object in the data store to be accessed by one or more keys in constant time when the Trie data structure is employed to resolve a request for a data object. Each data object in the data store may be accessed by zero or more sequential lists. For one embodiment of the invention, an iterator object is employed to support multiple concurrent list states in the List data structure.
It is envisioned that the data store can be a relational, object oriented or a combination of the two, database. Also, the data space could be a data warehouse.
The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.
This application claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No. 60/140,272, filed Jun. 18, 1999.
Number | Name | Date | Kind |
---|---|---|---|
3950735 | Patel | Apr 1976 | A |
4644532 | George et al. | Feb 1987 | A |
4965772 | Daniel et al. | Oct 1990 | A |
5023826 | Patel | Jun 1991 | A |
5053953 | Patel | Oct 1991 | A |
5299312 | Rocco, Jr. | Mar 1994 | A |
5327529 | Fults et al. | Jul 1994 | A |
5367635 | Bauer et al. | Nov 1994 | A |
5371852 | Attanasio et al. | Dec 1994 | A |
5406502 | Haramaty | Apr 1995 | A |
5475857 | Dally | Dec 1995 | A |
5507030 | Sites | Apr 1996 | A |
5517617 | Sathaye et al. | May 1996 | A |
5519694 | Brewer et al. | May 1996 | A |
5519778 | Leighton et al. | May 1996 | A |
5521591 | Arora et al. | May 1996 | A |
5528701 | Aref | Jun 1996 | A |
5581764 | Fitzgerald et al. | Dec 1996 | A |
5596742 | Agarwal et al. | Jan 1997 | A |
5606665 | Yang et al. | Feb 1997 | A |
5611049 | Pitts | Mar 1997 | A |
5663018 | Cummings et al. | Sep 1997 | A |
5752023 | Chourci et al. | May 1998 | A |
5752244 | Rose et al. | May 1998 | A |
5761484 | Agarwal et al. | Jun 1998 | A |
5768423 | Aref et al. | Jun 1998 | A |
5774660 | Brendel et al. | Jun 1998 | A |
5790554 | Pitcher et al. | Aug 1998 | A |
5809495 | Loaiza | Sep 1998 | A |
5875296 | Shi et al. | Feb 1999 | A |
5892914 | Pitts | Apr 1999 | A |
5919247 | Van Hoff et al. | Jul 1999 | A |
5924098 | Kluge | Jul 1999 | A |
5936939 | Des Jardins et al. | Aug 1999 | A |
5946690 | Pitts | Aug 1999 | A |
5949885 | Leighton | Sep 1999 | A |
5959990 | Frantz et al. | Sep 1999 | A |
5974460 | Maddalozzo, Jr. et al. | Oct 1999 | A |
5983281 | Ogle et al. | Nov 1999 | A |
6006260 | Barrick, Jr. et al. | Dec 1999 | A |
6006264 | Colby et al. | Dec 1999 | A |
6026452 | Pitts | Feb 2000 | A |
6028857 | Poor | Feb 2000 | A |
6035326 | Miles et al. | Mar 2000 | A |
6051169 | Brown et al. | Apr 2000 | A |
6078956 | Bryant et al. | Jun 2000 | A |
6085234 | Pitts et al. | Jul 2000 | A |
6092196 | Reiche | Jul 2000 | A |
6108703 | Leighton et al. | Aug 2000 | A |
6111876 | Frantz et al. | Aug 2000 | A |
6192370 | Primsch | Feb 2001 | B1 |
6212184 | Venkatachary et al. | Apr 2001 | B1 |
6304879 | Sobeski et al. | Oct 2001 | B1 |
6324177 | Howes et al. | Nov 2001 | B1 |
6522632 | Waters et al. | Feb 2003 | B1 |
6804766 | Noel et al. | Oct 2004 | B1 |
20020083031 | De Varax | Jun 2002 | A1 |
20020194167 | Bakalash et al. | Dec 2002 | A1 |
Number | Date | Country |
---|---|---|
0 744 850 | Nov 1996 | EP |
WO 9114326 | Sep 1991 | WO |
WO 9505712 | Feb 1995 | WO |
WO 9709805 | Mar 1997 | WO |
WO 9745800 | Dec 1997 | WO |
WO 9905829 | Feb 1999 | WO |
WO 9906913 | Feb 1999 | WO |
WO 9910858 | Mar 1999 | WO |
WO 9939373 | Aug 1999 | WO |
WO 9964967 | Dec 1999 | WO |
WO 0004422 | Jan 2000 | WO |
WO 0004458 | Jan 2000 | WO |
Number | Date | Country | |
---|---|---|---|
60140272 | Jun 1999 | US |