Distributed database management system with dynamically split B-Tree indexes

Description

BACKGROUND
Field

This invention generally relates to database management systems and more specifically to a methodology for splitting indexes in a distributed database management system.

Description of Related Art

Databases typically incorporate indexes for enabling the efficient retrieval of certain information. AB-tree data structure is a popular indexing structure that is optimized for use in databases that read and write large blocks of data and that enables efficient database searching. AB-Tree data structure includes a root and a plurality of leaves. The root uses a key value to identify a corresponding leaf. Each leaf points to any records that contain the key value. The key values are sorted in order, typically in a plurality of leaves thereby to form a sorted list. Specifically, a given leaf includes a “left sibling” and a “right sibling” that identify leaf to the left of and a leaf to right of the given leaf thereby to maintain a list in sorted order. The first or left-most leaf and last or right-most leaf include entries denoting the ends of the list of leaves for that root.

Typically each leaf has a fixed memory size. When a size threshold is reached, it becomes necessary to define a key value and to “split” that leaf into “left” and “right” leaves. The “left” leaf receives values that are less than the defined key value and the “right” leaf receives the remaining values with appropriate modifications to the root. In centrally based and non-shared databases, the splitting process is efficient because generally there is only one copy of the index in the database system. The split is easy to effect by quiescing the data processing system during the actual splitting operation.

Recently there has been interest in the development of distributed databases. The above-identified U.S. Pat. No. 8,224,860 discloses an implementation of a distributed database wherein user access is provided through a network of transactional nodes and complete copies of the information in the database are only stored on archival nodes that act as storage managers and are not directly accessed by users. More specifically, a user connects to a transactional node to perform operations on the database by generating high-level queries that the transactional node processes. In this implementation a given transactional node need only contain that data and metadata as required to process queries from users connected to that node. The data and metadata are defined by an array of atom classes, such as an index class, and atoms where each atom corresponds to a different instance of the class, such as index atom for a specific index. Replications or copies of an atom may reside in multiple nodes wherein the atom copy and a given node are processed in that node at least in part independently of each other. When an atom is created at a node, it is designated as a “chairman.” The designation of a node as a chairman can be transferred to a replication of that atom in another node.

In the implementation of U.S. Pat. No. 8,224,860 asynchronous messages transfer atoms and information about atoms among the different nodes to maintain the database in a consistent and a concurrent state. Specifically, each node in the database network has a unique communications path to every other node. When one node generates a message involving a specific atom, it can communicate as necessary with those other nodes that contain replications of that specific atom. Each node generates these messages independently of other nodes. So, it is possible that, at any given instant, multiple nodes will contain copies of a given atom and different nodes may be at various stages of processing them. As these operations in different nodes normally are not synchronized, it is necessary to operate the database so each user is assured that interaction is with a concurrent and consistent database.

Splitting an index in a distributed database such as disclosed in the above-identified U.S. Pat. No. 8,224,860 involves splitting replications of an index atom that performs as a leaf at the transaction node requesting the split and each archival node plus any other transactional node that has a copy of that index atom. It is possible for multiple nodes to request a split of a given index whereupon a race problem can exist with an erroneous outcome. Prior methods, such as those involving quiescence, are not readily applicable to implementations of a distributed database of the type discussed above without introducing unacceptable system performance degradation. What is needed is a method for handling requests for splitting an index in a distributed database wherein copies of the index are located in multiple locations.

SUMMARY

Therefore it is an object of this invention to provide an implementation of a distributed database that processes requests to split an index in a consistent and concurrent fashion.

Another object of this invention is to provide an implementation of a distributed database that processes requests to split an index in consistent and concurrent fashion without any significant performance degradation.

Yet another object of this invention to provide an implementation of a distributed database that processes a requested split of an index and eliminates the involvement of nodes that do not include that specific index.

In accordance with one aspect of this invention a distributed database processing system includes a plurality of nodes, each of which includes means for establishing communications with every other node wherein the database has an atom class for each category of metadata and data including an index atom class that provides an index atom for each index in the database and each index atom includes a range of key values. An index atom can be replicated to other nodes. An index atom is split when a node detects a need to split the index atom based upon a split key value that defines lower and upper portions of the index. Another node identifies a location in its index atom based upon the key value for defining the approximate lower and upper portions of the keys for the index atom, creating a second index atom as a right sibling to the first index atom, transferring the key values in the upper portion of the first index atom to the to the lower portion of the second index atom and transmitting to all other nodes with the identified index atom an index split message including the split key value. Each other node responds to the receipt of the index split message by deleting corresponding to the key values in the upper portion of the first index atom being split, retrieving the populated second index atom copy from the one node with the right sibling and sending a split done message to the one node chairman whereupon the one node broadcasts an index split done message when all other nodes have generated the split done message.

In accordance with another aspect of this invention, a distributed database management system includes a plurality of transactional and archival nodes wherein each transactional node responds to queries by interacting with a portion of the database thereat and wherein an archival node stores a version of the entire database. Communications are established at each node and every other node. The system has an atom class for each category of metadata and data including an index atom class that provides a B-Tree index atom for each index in the database. Each index atom can be replicated on demand to the archival node and at least one transactional node. One index atom in one node is designated as a chairman for that index atom. The process of splitting an index atom includes detecting at one of the transactional nodes a need to split a B-Tree index at that node and, if the node does not contain the chairman, transmitting a split index request message to the chairman node including the identification of the requesting node and a split key value. The chairman responds to its internal request for splitting the index or the receipt of the split index request message from another node by defining the contents of lower and upper portions for the index in response to the split key value, creating a second index atom as a right sibling of the index atom being split, moving the upper portion of the index for that index atom to the second index atom, and transmitting to all nodes that contain that index atom, an index split message including the split key value that defines the split for the lower and upper portions. Each non-chairman node responds by deleting from its existing index atom in response to the split key value the contents in the upper portion of the index atom being split and retrieving from the chairman the populated second index atom copy as the right sibling for the index atom being split. Each node transmits a split done message to the chairman. The chairman sends an index split done message when all nodes involved in the index split operation have reported completion to the chairman.

BRIEF DESCRIPTION OF THE DRAWINGS

The appended claims particularly point out and distinctly claim the subject matter of this invention. The various objects, advantages and novel features of this invention will be more fully apparent from a reading of the following detailed description in conjunction with the accompanying drawings in which like reference numerals refer to like parts, and in which:

FIG. 1 is a diagram in schematic form of one implementation of an elastic, scalable, on-demand, distributed database to which this invention applies;

FIG. 2 depicts the organization of a transactional node;

FIGS. 3A and 3B depict a local organization of “atom” objects generated by atom classes shown in FIG. 2 that might be present at any given time in any node.

FIG. 4 depicts the information of an index atom can be split in accordance with this invention;

FIG. 5 depicts the syntax of an exemplary asynchronous message that transfers between transactional and archival nodes of FIG. 1;

FIG. 6 depicts messages that are useful in one implementation of this invention;

FIG. 7 is a flow diagram that is useful in understanding the response to a request for splitting an index atom in accordance with this invention;

FIG. 8, comprising FIGS. 8A through 8D, depicts a sequence of changes to an index being split in both a chairman and non-chairman node;

FIG. 9 depicts an index split process executed by the chairman;

FIG. 10 depicts an index split process of each non-chairman with a copy of the index atom being split; and

FIG. 11 is a flow diagram useful in understanding the processing of messages stored during a split operation.

DETAILED DESCRIPTION

FIG. 1 depicts an implementation of an elastic, scalable, on-demand, distributed database system 30 that operates over a plurality of nodes. Nodes N1 through N6 are “transactional nodes” that provide user access to the database; nodes A1 and A2 are “archival nodes” that act as storage managers and function to maintain a disk archive of the entire database at each archival node. While an archival node normally stores the entire database, a single transactional node contains only that portion of the database it determines to be necessary to support transactions being performed at that node at that time.

Each node in FIG. 1 can communicate directly with each other node in the system 30 through a database system network 31. For example, node N1 can establish a communications path with each of nodes N2 through N6, A1 and A2. Communications between any two nodes is by way of serialized messages. In one embodiment, the messaging is performed in an asynchronous manner to maximize the bandwidth used by the system thereby to perform various operations in a timely and prompt manner. Typically, the database system network 31 will operate with a combination of high-bandwidth, low-latency paths (e.g., an Ethernet network) and high-bandwidth, high-latency paths (e.g., a WAN network).

Each node has the capability to restrict use of a low-latency path to time-critical communications (e.g., fetching an atom). The high-latency path can be used for non-critical communications (e.g., a request to update information for a table). Also, and preferably, the data processing network of this invention incorporates a messaging protocol, such as the Transmission Control Protocol (TCP) and assures that each node processes messages in the same sequence in which they were sent to it by other nodes.

FIG. 2 depicts a representative transactional node 32 that links to the database system network 31 and various end users 33. The transactional node 32 includes a central processing system (CP) 34 that communicates with the database system network 31 through a network interface 35 and with the various users through a user network interface 37. The central processing system 34 also interacts with RAM memory 38 that contains a copy of the database management program that implements this invention. This program functions to provide a remote interface 40, a database request engine 41 and a set 42 of classes or objects. The database request engine 41 only exists on transactional nodes and is the interface between the high-level input and output commands at the user level and system level input and output commands at the system level. In general terms, its database request engine parses, compiles and optimizes user queries such as SQL queries into commands that are interpreted by the various classes or objects in the set 42.

In this system, the classes/objects set 42 is divided into a subset 43 of “atom classes,” a subset 44 of “message classes” and a subset 45 of “helper classes.” At any given time, a transactional node only contains those portions of the database that are then relevant to active user applications. Moreover, all portions of database in use at a given time at any transactional node are resident in random access memory 38. There is no need for providing supplementary storage, such as disk storage, at a transactional node during the operation of this system.

Referring to FIG. 3A, a Master Catalog atom 70 tracks the status of transactional and archival nodes in database system 30 of FIG. 1. It also can be considered as an active index that creates and monitors the Transaction Manager atom 71, the Database atom 72, each Schema atom 73, each corresponding set of Table atoms 74 and Table Catalog atoms 75, and Sequence ID Managers 82. The Table Catalog atom 75 acts as an active index that creates and monitors Index atoms 76, Record States atoms 77, Data atoms 78, Blob States atoms 80 and Blob atoms 81 associated with a single table. There is one Table Catalog atom 75 for each table.

FIG. 3B is useful in understanding the interaction and management of different atom types. In this context, neither the Master Catalog atom 70 nor the Table Catalog atom 75 performs any management functions. With respect to the remaining atoms, the Database atom 72 manages each Schema atom 73. Each Schema atom 73 manages each related Table atom 74 and Sequence ID Manager atom 82. Each Table atom 74 manages its corresponding Table Catalog atom 75, Index atoms 76, Record States atoms 77, Data atoms 78, Blob States atom 80 and Blob atoms 81. Still referring to FIG. 3B, the database request engine 41 communicates with the Master Catalog atom 70, Transaction Manager atom 71, the Database atom 72, each Schema atom 73, each Table atom 74 and the Sequence ID Managers 82. The database request engine 41 acts as compiler for a high-level language such as SQL. As a compiler, it parses, compiles, and optimizes queries and obtains metadata and data from atoms for the formation of the various fragments of data base information.

Each atom has certain common elements and other elements that are specific to its type. For purposes of describing this invention, FIG. 4 depicts an index atom 76 that is implemented as a B-Tree index and is split according to this invention. Element 76A is a unique identification for the index atom 76. Pointers 76B and 76C identify a master catalog atom and the creating catalog atom, respectively. Each atom must have a chairman that performs functions as described later. Element 76D points to the node where the chairman for that atom resides.

Each time a copy of an atom is changed in any transactional node, it receives a new change number. Element 76E records that change number. Whenever a node requests an atom from another node, there is an interval during which time the requesting node will not be known to other transactional nodes. Element 76F is a list of all the nodes to which the supplying node must relay messages that contain the atom until the request is completed.

Operations of the database system are also divided into cycles. A cycle reference element 76G provides the cycle number of the last access to the atom. Element 76H is a list of the active nodes that contain the atom. Element 761 includes several status indicators. Elements 76J contains a binary tree of index nodes to provide a conventional indexing function. Element 76K contains an index level. Such index structures and operations are known to those in skilled in the art.

As previously indicated, communications between any two nodes is by way of serialized messages which are transmitted asynchronously using the TCP or another protocol with controls to maintain messaging sequences. FIG. 5 depicts the basic syntax of a typical message 90 that includes a variable-length header 91 and a variable-length body 92. The header 91 includes a message identifier code 93 that specifies the message and its function. As this invention envisions a scenario under which different nodes may operate with different software versions, the header 91 also includes identification 94 of the software version that created the message. The remaining elements in the header include a local identification 95 of the sender and information 96 for the destination of the message and atom identification 97. From this information, a recipient node can de-serialize, decode, and process the message.

FIG. 6 depicts a set of messages that are helpful in implementing this invention. An Index Split Request message 147 is sent to the chairman by another (non-chairman) atom to institute an index split operation. An Index Split message 148 contains an index atom identifier and the key on which to split the index atom (i.e., a “split key”). An Index Split Start message 160 indicates that the chairman has begun processing the split and is broadcast to all nodes with a copy of the index being split. An Index Split Done message 161 indicates that a node has completed the split operation.

FIG. 7 broadly discloses a process 200 for implementing an index splitting function in accordance with this invention. This disclosure can be better understood by referring to FIG. 8 in which FIGS. 8A through 8D depict the state of the index at various stages of the process as initiated by a non-chairman node. Specifically, FIG. 8A depicts an un-split index 201C where “C” designates an index atom located at the chairman's node. Another copies of the index atom are designated as 20 IN to indicate that they are located in a node that does not contain the chairman. FIG. 8A also discloses the index 201C at the chairman node as comprising a lower portion 202C and an upper portion 203C. Similarly, the index 201N in a non-chairman node comprises a lower portion 202N and an upper portion 203N.

If either the chairman or a non-chairman index node needs to be split, only the chairman controls the splitting operation. If the chairman determines that the atom 201C requires splitting, step 204 transfers control to step 205 whereupon the chairman selects a key value upon which to base the split. In FIG. 8 this value is represented by a vertical dashed line 206C and typically would be chosen so that a lower portion 202C and an upper portion 203C have approximately the same size. If the non-chairman determines that its index atom needs to be split, the non-chairman generates an Index Split Request message at step 208 with a split key value corresponding to a split position 206N. As will be apparent, either request is directed to step 207 whereupon the chairman takes control of the operation.

At step 207, the chairman broadcasts an Index Split Started message to all nodes having a copy of the index atom to be split. Each receiving node responds to the Index Split Started message by buffering and processing subsequent incoming messages in the nodes that include the index atom to be split. As the chairman now controls split, further operations will involve the index atom 201C.

Next, the chairman creates a new index atom at step 210 shown as an empty atom 211C in FIG. 8B. This new index atom then becomes the right sibling of the index atom being split by the chairman and the left sibling of the index atom that had been the right sibling prior to the split. Then the chairman transfers control to an index split process 212 to split the index as shown in FIG. 9 wherein the chairman uses step 213 to determine a key value of the index upon which to base the splitting operation; i.e., a split key value. As previously indicated, that value is established by the chairman or non-chairman that originates the request and defines a boundary in the index atom between the lower and upper key value ranges. The chairman uses this information to place all the keys in the upper key value portion into the new index atom and to truncate the original index by deleting the keys with values in the lower portion as shown in FIG. 8C.

In step 214 of FIG. 9 the chairman broadcasts an Index Split message to all other nodes that contain an existing index atom including the chairman and non-chairman nodes in FIG. 7 and all other non-chairman nodes. At the chairman atom, this message initiates a process for assuring that messages received for and generated by the chairman atom have correct addresses for the left and right siblings for certain messages. An Index Node Added message 150 in FIG. 6 indicates that a new index node is added and contains an index key, record identification and other information and an Index Node Removed message contains corresponding information concerning the removal of an index node. From the occurrence of the split in the chairman node in step 214 until the end of the splitting operation there exists the possibility that non-chairman nodes may not have processed the split. Each time one of these messages is received during this interval, messages received at the left sibling that should be processed on the right sibling are rebroadcast from the right sibling. Messages received at the right sibling are rebroadcast from the left sibling.

This process continues until the chairman receives Index Message Done messages from all the other nodes/atoms involved in the split operation. When this occurs, step 217 terminates the rebroadcasts of step 215 and broadcasts an Index Split Done message 161 to all the nodes with a copy of the index atom.

When a non-chairman node receives an Index Split message from the chairman, an Index Split process 220 in FIG. 7 and shown in more detail in FIG. 10 begins. Each non-chairman node responds to that message by truncating its existing index atom 201N at step 222 to obtain a copy of the chairman's index atom as shown in FIG. 8C that becomes the left sibling of the new index atom. Step 223 obtains a copy of the populated new index 211C as shown in FIG. 8D which becomes the lower portion of the new index atom and prevents any processing on the keys from the right sibling.

Next step 224 “prunes” the right sibling by adding all its local keys from the upper portion copy 213N as also show in FIG. 8D. This occurs before allowing any message processing on the right sibling 213N and assures that all messages are processed against the same indexes as if no split had occurred. Next step 225 allows the non-chairman to begin to process any messages directed to it. Step 227 broadcasts an Index Split Done message to all the nodes with an existing index atom copy. This signifies that foregoing processing has been completed at the respective node. Control then transfers from process 220 back to the chairman Index Split Process 214 shown in FIG. 9. Specifically, as shown in FIG. 9, at this point the chairman broadcasts an Index Split Message to all of the nodes with an existing atom copy whereupon control transfers to a non-chairman tracked message process 230 as shown in FIG. 11.

Referring now to FIGS. 7 and 11, a stored message process 230 uses step 231 to divert control to step 232 to process each message received at that node for the index atom. When an Index Split Start message is received, step 231 diverts control to step 233 that responds to the receipt of the Index Split Done message in step 233. During the interval between these two messages, steps 231 and 233 divert control to step 234 to buffer each incoming message at step 234. Each message includes its atom ID, sender ID represented by anode number and a change number. If step 234 receives a first message or new message not previously received, step 235 diverts control to record the message in some manner, as by entry in a bit map at step 236. Then the message is processed. If step 235 determines that the bit map has recorded the receipt of that message from another node, control transfers to step 237 to drop or ignore the message because it is a duplicate. Consequently, redundant processing does not occur.

As will now be apparent, this invention ensures that an index in a distributed database can be split in a concurrent and consistent fashion even though copies of that index may exist at multiple nodes at which data processing occurs. This occurs without any need to quiesce the system so there is no significant performance degradation. Also, this invention eliminates the involvement of nodes that do not include that specific index.

This invention has been disclosed in terms of certain implementations that are directed to a specific implementation of a distributed database. Although the invention is disclosed for in one specific implementation, the application to other implementations will be apparent to others without departing from the invention. Therefore, it is the intent of the appended claims to cover all such variations and modifications as come within the true spirit and scope of this invention.

Claims

1. A method of splitting an index atom in a distributed database, the index atom defining data and metadata for an index in the distributed database, the method comprising: storing, at each of a plurality of nodes in the distributed database, a corresponding instance of the index atom, each node in the plurality of nodes comprising a corresponding processor and a corresponding memory to store the corresponding instance of the index atom;receiving, at a first node in the plurality of nodes, a request to insert a key value into the corresponding instance of the index atom;in response to the request, splitting the corresponding instances of the index atom into respective instances of a first index atom and a second index atom stored by the plurality of nodes without quiescing the distributed database, the respective instances of the second index atom including respective instances of the key value;while splitting the corresponding instances of the index atom, rebroadcasting messages involving the index atom;receiving, by the first node, a corresponding split done message from each other node in the plurality of nodes, each corresponding split done message indicating that the corresponding instance of the index atom has been split at that other node; andin response to receiving the corresponding split done messages from the other nodes in the plurality of nodes, terminating the rebroadcasting messages involving the index atom.
2. The method of claim 1, wherein the plurality of nodes comprises transactional nodes to perform database transactions on respective instances of the index atom and an archival node to archive another instance of the index atom.
3. The method of claim 1, wherein: splitting the corresponding instances of the index atom comprises selecting a split key value at which to split the corresponding instances of the index atom, andthe respective instances of the first index atom include key values less than the split key value and the respective instances of the second index atom include key values greater than the split key value.
4. The method of claim 3, wherein splitting the corresponding instances of the index atom further comprises truncating at least one instance of the index atom at the split key value.
5. The method of claim 1, wherein rebroadcasting messages involving the index atom comprises: receiving, at a first instance of the first index atom in a first node in the plurality of nodes, a message affecting a key value stored in a first instance of the second index atom; andtransmitting the message to each instance of the second index atom until each of the corresponding instances of the index atom has been split.
6. The method of claim 1, further comprising: at each node in the plurality of nodes except the first node, after splitting the corresponding instance of the index atom, transmitting the corresponding split done message to the first node indicating that the corresponding instance of the index atom has been split.
7. The method of claim 6, further comprising, in response to receiving the split done messages from the other nodes in the plurality of nodes: transmitting, by the first node to each other node in the plurality of nodes, a split done message indicating that each instance of the index atom in the distributed database has been split.
8. The method of claim 1, wherein the plurality of nodes is a first plurality of nodes, and further comprising: processing database transactions with a second plurality of nodes of the distributed database while splitting the corresponding instance of the index atom in the first plurality of nodes.
9. The method of claim 8, wherein no node in the second plurality of nodes stores an instance of the index atom.
10. The method of claim 8, wherein the second plurality of nodes comprises transactional nodes to perform database transactions on respective instances of another atom of the distributed database and an archival node to archive respective of the other atom.
11. The method of claim 1, wherein splitting the corresponding instances of the index atom into the respective instances of the first index atom and the second index atom without quiescing the distributed database comprises: at the first node, creating a first instance of the second index atom;transferring key values greater than a split key value from a first instance of the first index atom to the first instance of the second index atom;sending an index split message from the first node to a second node in the plurality of nodes, the index split message including the split key value and the second node including a second instance of the first index atom;at the second node, generating a second instance of the second index atom in response to the index split message;moving key values greater than the split key value from the second instance of the first index atom to the second instance of the second index atom; andsending a split done message from the second node to the first node, the split done message indicating that the second instance of the first index atom has been split.
12. The method of claim 1, wherein splitting the corresponding instances of the index atom into the respective instances of the first index atom and the second index atom without quiescing the distributed database, rebroadcasting messages involving the index atom, and terminating the rebroadcasting messages involving the index atom comprise: splitting, at the first node, a first instance of the index atom into a first instance of a first index atom and a first instance of the second index atom;receiving, at the first instance of a first index atom, a message affecting a key stored in the second index atom;rebroadcasting the message from the first instance of the first index atom to the first instance of the second index atom and to each other node in the plurality of nodes;receiving, by the first node, an indication that each instance of the index atom in the distributed database has been split into a corresponding instance of the first index atom and a corresponding instance of the second index atom; andin response to receiving the indication that each instance of the index atom in the distributed database has been split, terminating, by the first node, rebroadcasting the message.
13. The method of claim 12, wherein splitting the first instance of the index atom into the first instance of the first index atom and the first instance of the second index atom comprises: designating the first instance of the index atom as the first instance of the first index atom;creating the first instance of the second index atom; andmoving a key value greater than a split key value from the first instance of the first index atom to the first instance of the second index atom.
14. The method of claim 13, wherein splitting the first instance of the first index atom further comprises: truncating the first instance of the first index atom based on the split key value.
15. The method of claim 1, wherein the first node receives an index split message to split the first instance of the index atom before receiving the corresponding split done message from each node in the plurality of nodes.
16. The method of claim 15, wherein splitting the first instance of the index atom occurs before the first node receives the corresponding split done message from each node in the plurality of nodes.
17. The method of claim 15, further comprising: determining that the index split message is a duplicate index split message; and dropping or ignoring the duplicate index split message.

CROSS REFERENCE TO RELATED PATENT

U.S. Pat. No. 8,224,860 granted Jul. 17, 2012, for a Database Management System and assigned to the same assignee as this invention is incorporated in its entirety herein by reference. This application is a continuation of U.S. application Ser. No. 16/129,661, filed Sep. 12, 2018, for a “Distributed Database Management System with Dynamically Split B-Tree Indexes,” which is a continuation of U.S. application Ser. No. 14/215,401 filed Mar. 17, 2014, for a “Distributed Database Management System with Dynamically Split B-Tree Indexes,” which in turn claims priority from U.S. Provisional Application Ser. No. 61/789,479 filed Mar. 15, 2013 for a “Distributed Database Management System with Dynamically Split B-Tree Indexes.” Each of these applications is incorporated in its entirety herein by reference.

US Referenced Citations (125)

Number	Name	Date	Kind
4733353	Jaswa	Mar 1988	A
4853843	Ecklund	Aug 1989	A
5446887	Berkowitz	Aug 1995	A
5524240	Barbara et al.	Jun 1996	A
5555404	Torbjornsen et al.	Sep 1996	A
5568638	Hayashi et al.	Oct 1996	A
5625815	Maier	Apr 1997	A
5701467	Freeston	Dec 1997	A
5764877	Lomet et al.	Jun 1998	A
5806065	Lomet	Sep 1998	A
5960194	Choy et al.	Sep 1999	A
6216151	Antoun	Apr 2001	B1
6226650	Mahajan et al.	May 2001	B1
6275863	Leff et al.	Aug 2001	B1
6334125	Johnson et al.	Dec 2001	B1
6401096	Zellweger	Jun 2002	B1
6424967	Johnson et al.	Jul 2002	B1
6480857	Chandler	Nov 2002	B1
6499036	Gurevich	Dec 2002	B1
6523036	Hickman et al.	Feb 2003	B1
6748394	Shah et al.	Jun 2004	B2
6792432	Kodavalla	Sep 2004	B1
6862589	Grant	Mar 2005	B2
7026043	Jander	Apr 2006	B2
7039669	Wong	May 2006	B1
7080083	Kim et al.	Jul 2006	B2
7096216	Anonsen	Aug 2006	B2
7184421	Liu et al.	Feb 2007	B1
7219102	Zhou et al.	May 2007	B2
7233960	Boris et al.	Jun 2007	B1
7293039	Deshmukh et al.	Nov 2007	B1
7353227	Wu	Apr 2008	B2
7395352	Lam et al.	Jul 2008	B1
7401094	Kesler	Jul 2008	B1
7403948	Ghoneimy et al.	Jul 2008	B2
7562102	Sumner et al.	Jul 2009	B1
7599395	Wolf	Oct 2009	B1
7853624	Friedlander et al.	Dec 2010	B2
7890508	Gerber et al.	Feb 2011	B2
8108343	Wang et al.	Jan 2012	B2
8122201	Marshak et al.	Feb 2012	B1
8224860	Starkey	Jul 2012	B2
8266122	Newcombe et al.	Sep 2012	B1
8488943	Sharifi	Jul 2013	B1
8504523	Starkey	Aug 2013	B2
8756237	Stillerman et al.	Jun 2014	B2
8892569	Bowman	Nov 2014	B2
8930312	Rath	Jan 2015	B1
9008316	Acar et al.	Apr 2015	B2
9501363	Ottavio	Nov 2016	B1
9734021	Sanocki et al.	Aug 2017	B1
9824095	Taylor et al.	Nov 2017	B1
10067969	Rice et al.	Sep 2018	B2
10740323	Palmer et al.	Aug 2020	B1
11176111	Palmer et al.	Nov 2021	B2
11561961	Palmer et al.	Jan 2023	B2
11573940	Dashevsky	Feb 2023	B2
20020112054	Hatanaka	Aug 2002	A1
20020152261	Arkin et al.	Oct 2002	A1
20020152262	Arkin et al.	Oct 2002	A1
20020178162	Ulrich et al.	Nov 2002	A1
20030051021	Hirschfeld et al.	Mar 2003	A1
20030149709	Banks	Aug 2003	A1
20030204486	Berks et al.	Oct 2003	A1
20030220935	Vivian et al.	Nov 2003	A1
20040153459	Whitten et al.	Aug 2004	A1
20040263644	Ebi	Dec 2004	A1
20050013208	Hirabayashi et al.	Jan 2005	A1
20050086384	Ernst	Apr 2005	A1
20050171960	Lomet	Aug 2005	A1
20050198062	Shapiro	Sep 2005	A1
20050216502	Kaura et al.	Sep 2005	A1
20060010130	Leff et al.	Jan 2006	A1
20060168154	Zhang et al.	Jul 2006	A1
20070067349	Jhaveri et al.	Mar 2007	A1
20070156842	Vermeulen et al.	Jul 2007	A1
20070288526	Mankad et al.	Dec 2007	A1
20080086470	Graefe	Apr 2008	A1
20080106548	Singer	May 2008	A1
20080228795	Lomet	Sep 2008	A1
20080320038	Liege	Dec 2008	A1
20090113431	Whyte	Apr 2009	A1
20100094802	Luotojarvi	Apr 2010	A1
20100115246	Seshadri et al.	May 2010	A1
20100153349	Schroth et al.	Jun 2010	A1
20100191884	Holenstein et al.	Jul 2010	A1
20100235606	Oreland et al.	Sep 2010	A1
20100297565	Waters et al.	Nov 2010	A1
20110087874	Timashev et al.	Apr 2011	A1
20110231447	Starkey	Sep 2011	A1
20120016851	Hrle	Jan 2012	A1
20120136904	Ravi	May 2012	A1
20120254175	Horowitz et al.	Oct 2012	A1
20120303576	Calder	Nov 2012	A1
20130060922	Koponen et al.	Mar 2013	A1
20130086018	Horii	Apr 2013	A1
20130110766	Promhouse et al.	May 2013	A1
20130110774	Shah et al.	May 2013	A1
20130110781	Golab et al.	May 2013	A1
20130159265	Peh et al.	Jun 2013	A1
20130159366	Lyle et al.	Jun 2013	A1
20130232378	Resch et al.	Sep 2013	A1
20130259234	Acar et al.	Oct 2013	A1
20130262403	Milousheff et al.	Oct 2013	A1
20130278412	Kelly et al.	Oct 2013	A1
20130297565	Starkey	Nov 2013	A1
20130311426	Erdogan et al.	Nov 2013	A1
20140108414	Stillerman et al.	Apr 2014	A1
20140258300	Baeumges et al.	Sep 2014	A1
20140279881	Tan et al.	Sep 2014	A1
20140297676	Bhatia et al.	Oct 2014	A1
20140304306	Proctor et al.	Oct 2014	A1
20150019739	Attaluri et al.	Jan 2015	A1
20150032695	Tran et al.	Jan 2015	A1
20150066858	Sabdar et al.	Mar 2015	A1
20150135255	Theimer et al.	May 2015	A1
20150370505	Shuma et al.	Dec 2015	A1
20160134490	Balasubramanyan et al.	May 2016	A1
20160306709	Shaull et al.	Oct 2016	A1
20160350357	Palmer	Dec 2016	A1
20160350392	Rice et al.	Dec 2016	A1
20160371355	Massari et al.	Dec 2016	A1
20170039099	Ottavio	Feb 2017	A1
20170139910	Mcalister et al.	May 2017	A1
20220035786	Palmer et al.	Feb 2022	A1

Foreign Referenced Citations (21)

Number	Date	Country
1859326	Nov 2006	CN
101212679	Jul 2008	CN
101251843	Jun 2010	CN
102014152	Apr 2011	CN
101471845	Jun 2011	CN
102236873	Nov 2011	CN
101268439	Apr 2012	CN
102932754	Feb 2013	CN
002931	Oct 2002	EA
1403782	Mar 2004	EP
2003500919	Feb 2003	JP
2003256256	Sep 2003	JP
2006048507	Feb 2006	JP
2007058275	Mar 2007	JP
2012128516	Jul 2012	JP
2315349	Jan 2008	RU
2008106904	Aug 2009	RU
WO-0104754	Jan 2001	WO
WO-2004013725	Feb 2004	WO
2010034608	Apr 2010	WO
WO-2013028414	Feb 2013	WO

Non-Patent Literature Citations (68)

Entry
Non-Final Office Action dated Sep. 21, 2017 from U.S. Appl. No. 14/688,396, 31 pages.
Non-Final Office Action dated Sep. 23, 2016 from U.S. Appl. No. 14/616,713, 8 pp.
Notice of Allowance dated Apr. 1, 2013 from U.S. Appl. No. 13/525,953, 10 pp.
Notice of Allowance dated Feb. 29, 2012 from U.S. Appl. No. 13/051,750, 8 pp.
Notice of Allowance dated Jul. 27, 2016 from U.S. Appl. No. 14/215,372, 12 pp.
Notice of Allowance dated May 14, 2012 from U.S. Appl. No. 13/051,750, 8 pp.
Oracle Database Concepts 10g Release 2 (10.2), Oct. 2005, 14 pages.
Rahimi, S. K et al., “Distributed Database Management Systems: A Practical Approach,” IEEE Computer Society, John Wiley & Sons, Inc. Publications (2010), 765 pp.
Roy, N et al., “Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting,” IEEE 4th International Conference on Cloud Computing, 2011, pp. 500-507.
Searchcloudapplications.techtarget.com, Autoscaling Definition, Aug. 2012, 1 page.
Shaull, R et al., “A Modular and Efficient Past State System for Berkeley DB,” Proceedings of Usenix Atc '14:2014 USENIX Annual Technical Conference, 13 pp. (Jun. 19-20, 2014).
Shaull, R et al., “Skippy: a New Snapshot Indexing Method for Time Travel in the Storage Manager,” SIGMOD'08, Jun. 9-12, 2008, 12 pp.
Shaull, R., “Retro: A Methodology for Retrospection Everywhere,” A Dissertation Presented to the Faculty of the Graduate School of Arts and Sciences of Brandeis University, Waltham, Massachusetts, Aug. 2013, 174 pp.
Veerman, G et al., “Database Load Balancing, MySQL 5.5 vs PostgreSQL 9.1,” Universiteit van Amsterdam, System & Network Engineering, Apr. 2, 2012, 51 pp.
Yousif, M. “Shared Storage Clusters,” Cluster Computing, Baltzer Science Publishers, Bussum, NL, vol. 2, No. 4, pp. 249-257 (1999).
Extended European Search Report in European Patent Application No. 18845799.8 dated May 25, 2021, 8 pages.
International Search Report and Written Opinion in International Patent Application No. PCT/US18/00142 mailed Dec. 13, 2018. 11 pages.
Office Action with translation in Korean Application No. 10-2020-7006901 dated Dec. 16, 2022, 30 pages.
“Album Closing Policy,” Background, retrieved from the Internet at URL:http://tools/wiki/display/ENG/Album+Closing+Policy (Jan. 29, 2015), 4 pp.
“Distributed Coordination in NuoDB,” YouTube, retrieved from the Internet at URL:https://www.youtube.com/watch?feature=player_embedded&v=URoeHvflVKg on Feb. 4, 2015, 2 pp.
“Glossary—NuoDB 2.1 Documentation / NuoDB,” retrieved from the Internet at URL: http://doc.nuodb.com/display/doc/Glossary on Feb. 4, 2015, 1 pp.
“How It Works,” retrieved from the Internet at URL: http://www.nuodb.com/explore/newsql-cloud-database-how-it-works?mkt_tok=3RkMMJW on Feb. 4, 2015, 4 pp.
“How to Eliminate MySQL Performance Issues,” NuoDB Technical Whitepaper, Sep. 10, 2014, Version 1, 11 pp.
“Hybrid Transaction and Analytical Processing with NuoDB,” NuoDB Technical Whitepaper, Nov. 5, 2014, Version 1, 13 pp.
“No Knobs Administration,” retrieved from the Internet at URL: http://www.nuodb.com/explore/newsql-cloud-database-product/auto-administration on Feb. 4, 2015, 4 pp.
“NuoDB at a Glance,” retrieved from the Internet at URL: http://doc.nuodb.com/display/doc/NuoDB+at+a+Glance on Feb. 4, 2015, 1 pp.
“Snapshot Albums,” Transaction Ordering, retrieved from the Internet at URL:http://tools/wiki/display/ENG/Snapshot+Albums (Aug. 12, 2014), 4 pp.
“Table Partitioning and Storage Groups (TPSG),” Architect's Overview, NuoDB Technical Design Document, Version 2.0 (2014), 12 pp.
“The Architecture & Motivation for NuoDB,” NuoDB Technical Whitepaper, Oct. 5, 2014, Version 1, 27 pp.
“Welcome to NuoDB Swifts Release 2.1 GA,” retrieved from the Internet at URL: http://dev.nuodb.com/techblog/welcome-nuodb-swifts-release-21-ga on Feb. 4, 2015, 7 pp.
“What Is A Distributed Database? And Why Do You Need One,” NuoDB Technical Whitepaper, Jan. 23, 2014, Version 1, 9 pp.
Advisory Action dated May 2, 2018 for U.S. Appl. No. 14/215,461, 8 pages.
Advisory Action issued by The United States Patent and Trademark Office for U.S. Appl. No. 14/215,461, dated Jan. 10, 2017, 9 pages.
Amazon CloudWatch Developer Guide API, Create Alarms That or Terminate an Instance, Jan. 2013, downloaded Nov. 16, 2016 from archive.org., pp. 1-11.
Amazon RDS FAQs, Oct. 4, 2012, downloaded Nov. 16, 2016 from archive.org., 39 pp.
Bergsten et al., “Overview of Parallel Architectures for Databases,” The Computer Journal vol. 36, No. 8, pp. 734-740 (1993).
Connectivity Testing with Ping, Telnet, Trace Route and NSlookup (hereafter help.webcontrolcenter), Article ID:1757, Created: Jun. 17, 2013 at 10:45 a.m., https://help.webcontrolcenter.com/kb/a1757/connectivity-testing-with-ping-telnet-trace-route-and-nslookup.aspx, 6 pages.
Dan et al., “Performance Comparisons of Buffer Coherency Policies,” Proceedings of the International Conference on Distributed Computer Systems, IEEE Comp. Soc. Press vol. 11, pp. 208-217 (1991).
Decision to Grant dated Nov. 14, 2016 from Belarus Patent Application No. a20121441 with English Translation, 15 pp.
Durable Distributed Cache Architecture, retrieved from the Internet at URL: http://www.nuodb.com/explore/newsql-cloud-database-ddc-architecture on Feb. 4, 2015, 3 pp.
Final Office Action dated Dec. 13, 2016 from U.S. Appl. No. 14/247,364, 31 pp.
Final Office Action dated Jan. 10, 2018 from U.S. Appl. No. 14/215,461, 30 pages.
Final Office Action dated Nov. 24, 2017 from U.S. Appl. No. 14/215,401, 33 pages.
Final Office Action dated Nov. 3, 2016 from U.S. Appl. No. 14/215,401, 36 pp.
Final Office Action dated Nov. 7, 2017 from U.S. Appl. No. 14/247,364, 13 pages.
Final Office Action dated Sep. 9, 2016 from U.S. Appl. No. 14/215,461, 26 pp.
First Examination Report issued by the Canadian Intellectual Property Office for Application No. 2,793,429, dated Feb. 14, 2017, 3 pages.
Garding, P. “Alerting on Database Mirorring Events,” Apr. 7, 2006, downloaded Dec. 6, 2016 from technet.microsoft.com, 24 pp.
Hull, Autoscaling Mysql On Amazon EC2, Apr. 9, 2012, 7 pages.
International Preliminary Report on Patentability dated Oct. 13, 2015 from PCT/US2014/033270, 4 pp.
International Search Report and Written Opinion dated Aug. 21, 2014 from PCT/US2014/033270, 5 pp.
International Search Report and Written Opinion dated Jul. 15, 2016 from PCT/US2016/27658, 37 pp.
International Search Report and Written Opinion dated Oct. 28, 2016 from PCT/US16/34651, 16 pp.
International Search Report and Written Opinion dated Sep. 8, 2016 from PCT/US16/37977, 11 pp.
International Search Report and Written Opinion dated Sep. 9, 2016 from PCT/US16/34646, 12 pp.
International Search Report dated Sep. 26, 2012 from PCT/US2011/029056, 4 pp.
Iqbal, A. M. et al., “Performance Tradeoffs in Static and Dynamic Load Balancing Strategies,” Institute for Computer Applications in Science and Engineering, 1986, pp. 1-23.
Leverenz et al., “Oracle8i Concepts, Partitioned Tables and Indexes,” Chapter 11, pp. 11-12-11/66 (1999).
Non-Final Office Action dated Apr. 12, 2017 from U.S. Appl. No. 14/247,364, 12 pp.
Non-Final Office Action dated Feb. 1, 2016 from U.S. Appl. No. 14/215,461, 19 pp.
Non-Final Office Action dated Feb. 6, 2014 from U.S. Appl. No. 13/933,483, 14 pp.
Non-Final Office Action dated Jan. 21, 2016 from U.S. Appl. No. 14/215,401, 19 pp.
Non-Final Office Action dated Jun. 1, 2017 from U.S. Appl. No. 14/215,461, 21 pp.
Non-Final Office Action dated Jun. 2, 2017 from U.S. Appl. No. 14/744,546, 25 pp.
Non-Final Office Action dated May 19, 2016 from U.S. Appl. No. 14/247,364, 24 pp.
Non-Final Office Action dated May 31, 2017 from U.S. Appl. No. 14/215,401, 27 pp.
Non-Final Office Action dated Oct. 10, 2012 from U.S. Appl. No. 13/525,953, 8 pp.
Non-Final Office Action dated Sep. 19, 2017 from U.S. Appl. No. 14/726,200, 37 pages.

Related Publications (1)

	Number	Date	Country
	20220035786 A1	Feb 2022	US

Provisional Applications (1)

	Number	Date	Country
	61789479	Mar 2013	US

Continuations (2)

	Number	Date	Country
Parent	16129661	Sep 2018	US
Child	17502167		US
Parent	14215401	Mar 2014	US
Child	16129661		US

Distributed database management system with dynamically split B-Tree indexes

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer

Term Extension

Abstract