This Application is a Non-Provisional of Provisional (35 USC 119(e)) of U.S. Application Ser. No. 62/522,540, filed Jun. 20, 2017, entitled “SYSTEM, METHODS, AND INTERFACES FOR A NOSQL DATABASE SYSTEM” and is a Non-Provisional of Provisional (35 USC 119(e)) of U.S. Application Ser. No. 62/522,150, filed Jun. 20, 2017, entitled “SYSTEMS AND METHODS FOR OPTIMIZING DISTRIBUTED DATABASE DEPLOYMENTS”, which are herein incorporated by reference in their entirety.
Client systems use database systems for storing data for applications of the client system. A client system may submit operations to the database system. For example, a client system may submit a read operation to read data stored in the database. In another example, a client system may submit a write operation to insert data in the database.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
According to various aspects, a distributed database can provide various levels of consistency and/or redundancy depending on architecture of the database. In various embodiments, conventional distributed databases can be enhanced to provide for failure resolution with respect to distributed write operations. According to some embodiments, a distributed database architected with an eventual consistency model can see significant improvements in reliability and/or consistency based on enhancing failure resolution of write operations.
In the MongoDB database, the distributed database provides for data replication and distribution of operations via a replica set architecture. The replica set architecture is based on a primary node hosting a primary copy of at least a portion of the database data. The primary node of the database data is responsible for processing write requests and replication of the executed write operations can then take place to secondary nodes. The secondary nodes are configured to provide for scalability and redundancy. For example, the secondary nodes can take over for a primary in the event of failure. In conventional operation, write operations are processed by primary nodes and replicated to the secondary nodes under an eventual consistency model.
In various embodiments, the system includes augmented database drivers that are configured to automatically retry execution of write operations if a failure is encountered. In some embodiments, a database daemon is configured to manage the database functionality for a respective database node (e.g., primary or secondary node). Retrying execution of write operations allows the drivers to automatically retry certain write operations a threshold number of times if, for example, a network error is encountered, or if a healthy primary node is not available.
According to one aspect, a database system is provided. The database system comprises a distributed database having a dynamic schema architecture, the distributed database comprising a replica set hosting a respective shard of data, wherein the replica set comprises: a primary node configured to perform write operations on the distributed database; and at least one secondary node configured to replicate write operations performed by the primary node; at least one processor configured to: receive, from a client system, a submission of a write operation to perform on the distributed database; execute the submitted write operation at least in part by transmitting a command to the primary node to perform the write operation; determine that the execution of the write operation failed responsive to determining occurrence of an error during execution of the write operation; and trigger re-execution of the submitted write operation responsive to determining that the execution of the write operation failed at least in part by re-transmitting the command to the primary node to perform the write operation.
According to one embodiment, the at least one processor is further configured to: receive, from the primary node, an identification of the error that occurred during execution of the write operation.
According to one embodiment, the at least one processor is further configured to determine that the execution of the write operation failed responsive to determining occurrence of a network error that interrupted communication with the primary node.
According to one embodiment, the at least one processor is further configured to determine that the execution of the write operation failed responsive to determining that the primary node was unavailable to perform write operations during execution of the write operation. According to one embodiment, the at least one processor is further configured to re-transmit the command to perform the write operation to a new primary node that becomes available to perform write operations on the database.
According to one embodiment, the at least one processor is further configured to wait a period of time after determining that the execution of the write operation failed before triggering re-execution of the write operation.
According to one embodiment, the at least one processor is further configured to encode the command transmitted to the primary node, the encoding comprising: including, in the encoded command, a unique transaction identifier associated with the write operation. According to one embodiment, wherein the at least one processor is further configured to: generate a session with the primary node via which the at least one processor transmits commands to perform one or more write operations to the primary node; and assign a unique transaction identifier to each of the one or more write operations. According to one embodiment, the transaction identifier comprises: a session identifier; and a monotonically increasing integer unique to each of the one or more write operations associated with the session.
According to one embodiment, the at least one processor is further configured to: determine whether a threshold number of execution attempts have been reached; and prevent re-execution of the write operation if the threshold number of execution attempts has been reached. According to one embodiment, the threshold number of execution attempts is one.
According to another aspect, a computer-implemented method of managing a database is provided. The method comprises acts of: storing data in a distributed database having a dynamic schema architecture, the storing comprising storing a replica set hosting a respective shard of data; performing, by a primary node of the replica set, write operations on the distributed database; replicating, by at least one secondary node of the replica set, write operations performed by the primary node; receiving, by at least one processor from a client system, a submission of a write operation to perform on the distributed database; executing, by the at least one processor, the submitted write operation at least in part by transmitting a command to the primary node to perform the write operation; determining, by the at least one processor, that the execution of the write operation failed responsive to determining occurrence of an error during execution of the write operation; and triggering, by the at least one processor, re-execution of the submitted write operation responsive to determining that the execution of the write operation failed at least in part by re-transmitting the command to the primary node to perform the write operation.
According to one embodiment, the method further comprises receiving, by the at least one processor from the primary node, an identification of the error that occurred during execution of the write operation.
According to one embodiment, the method further comprises determining, by the at least one processor, that the execution of the write operation failed responsive to determining that the primary node was unavailable to perform write operations during execution of the write operation. According to one embodiment, triggering re-execution of the submitted write operation includes re-transmitting the command to perform the write operation to a newly elected primary node.
According to one embodiment, the method further comprises waiting, by the at least one processor, a period of time after determining that the execution of the write operation failed before triggering re-execution of the write operation.
According to one embodiment, the method further comprises encoding, by the at least one processor, the command transmitted to the primary node, the encoding comprising including, in the encoded command, a unique transaction identifier associated with the write operation.
According to one embodiment, the method further comprises generating, by the at least one processor, a session with the primary node to transmit commands to perform one or more write operations to the primary node; and assigning, by the at least one processor, a unique transaction identifier to each of the one or more write operations.
According to another aspect, at least one non-transitory computer-readable storage medium storing processor-executable instructions is provided. The processor-executable instructions, when executed by at least one processor, cause the at least one processor to perform a method comprising: storing data in a distributed database having a dynamic schema architecture, the storing comprising storing a replica set hosting a respective shard of data; performing, by the primary node of the replica set, write operations on the distributed database; replicating, by the at least one secondary node of the replica set, write operations performed by the primary node; receiving a submission of a write operation to perform on the distributed database; executing the submitted write operation at least in part by transmitting a command to the primary node of the replica set to perform the write operation; determining that the execution of the write operation failed responsive to determining occurrence of an error during execution of the write operation; and triggering re-execution of the submitted write operation responsive to determining that the execution of the write operation failed at least in part by re-transmitting the command to the primary node to perform the write operation.
Still other aspects, examples, and advantages of these exemplary aspects and examples, are discussed in detail below. Moreover, it is to be understood that both the foregoing information and the following detailed description are merely illustrative examples of various aspects and examples, and are intended to provide an overview or framework for understanding the nature and character of the claimed aspects and examples. Any example disclosed herein may be combined with any other example in any manner consistent with at least one of the objects, aims, and needs disclosed herein, and references to “an example,” “some examples,” “an alternate example,” “various examples,” “one example,” “at least one example,” “this and other examples” or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the example may be included in at least one example. The appearances of such terms herein are not necessarily all referring to the same example.
Various aspects of at least one embodiment are discussed below with reference to the accompanying figures, which are not intended to be drawn to scale. The figures are included to provide an illustration and a further understanding of the various aspects and embodiments, and are incorporated in and constitute a part of this specification, but are not intended as a definition of the limits of any particular embodiment. The drawings, together with the remainder of the specification, serve to explain principles and operations of the described and claimed aspects and embodiments. In the figures, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every figure. In the figures:
According to one aspect, a database system is able to automatically retry execution of a failed write operation. A write operation may refer to any database operation that inserts, updates, removes, and/or replaces data stored in the database system. In some embodiments, the database provides for data replication and distribution of operations via a replica set architecture. The replica set architecture is based on a primary node hosting a primary copy of at least a portion of the database data. The primary node of the database data is responsible for performing write operations. Secondary nodes provide for scalability and redundancy by replicating the write operations performed by the primary node. The database system receives a submission of a write operation from a client system. For example, a client application may generate new data that is to be stored in the database. The database system executes the submitted write operation by transmitting a command to perform the write operation to a primary node of the database system, and then determines whether the execution failed. The database system determines whether execution failed by determining whether one or more errors occurred during the execution of the write operation. For example, the database system determines whether a network error occurred during execution of the write operation that may have prevented connection to the primary node. In another example, the database system determines that execution of the write operation failed because the primary node was unavailable when the database system was executing the write operation. If the database system determines that the execution of the write operation fails, the database system triggers re-execution of the submitted write operation by re-transmitting the command to perform the write operation to the primary node.
The inventors have recognized that conventional database systems require failed write operations to be handled by a respective client application using the database. A failure of a write operation may result in loss of data, or loss in consistency of the database. For example, a network error may have occurred during execution of the write operation. In this situation, the client application does not know whether the database was updated according to the submitted write operation. In another example, a primary node of the database system may be down when the write operation was submitted. In this situation, the database system may be unable to execute write operations submitted by the client application. In conventional systems, a client application may include logic (e.g., code) that handles situations in which a write operation that is submitted to a database system may have failed.
In some embodiments, a database system can be configured to determine failures of one or more write operations and trigger re-execution of the write operation(s). By doing so, the database system may remove the need for client systems using the database system to handle potential failures of the write operation(s). This eliminates the need for client applications to have code and logic designed to address failed executions of the write operation(s), and thus eliminates computations required by a client system to (1) determine whether a potential failure occurred in execution of the write operation(s), and (2) retry execution of the write operation(s). Furthermore, by automatically retrying execution of write operations, the database system is more robust in handling transient issues. For example, a temporary network problem may have prevented a write operation from being communicated to a primary node. The temporary network problem may be resolved after a short amount of time. By automatically retrying the write operation after a period of time, the system may be able to communicate the write operation to the primary node. This may eliminate delays caused when a transient issue prevents completion of a write operation, as the system no longer is required to generate and output a failure indication, or to stop execution of a series of write operations due to a single transient issue.
Examples of the methods and systems discussed herein are not limited in application to the details of construction and the arrangement of components set forth in the following description or illustrated in the accompanying drawings. The methods and systems are capable of implementation in other embodiments and of being practiced or of being carried out in various ways. Examples of specific implementations are provided herein for illustrative purposes only and are not intended to be limiting. In particular, acts, components, elements and features discussed in connection with any one or more examples are not intended to be excluded from a similar role in any other examples.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. Any references to examples, embodiments, components, elements or acts of the systems and methods herein referred to in the singular may also embrace embodiments including a plurality, and any references in plural to any embodiment, component, element or act herein may also embrace embodiments including only a singularity. References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements. The use herein of “including,” “comprising,” “having,” “containing,” “involving,” and variations thereof is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. References to “or” may be construed as inclusive so that any terms described using “or” may indicate any of a single, more than one, and all of the described terms.
In some embodiments, each shard of data (e.g., 152-174) can be configured to reside on one or more servers executing database operations for storing, retrieving, managing, removing and/or updating data. In some embodiments, a shard server (e.g., 102-108) contains multiple partitions of data which can also be referred to as “chunks” of database data. In some embodiments, a shard of data corresponds to a chunk of data. A chunk is also a reference to a partition of database data. A chunk can be configured as a contiguous range of data from a particular collection in the database. In some embodiments, collections are logical organizations of subsets of database data. In some embodiments, a collection can comprise one or more documents. A document can comprise a unit of data storage. The document can include one or more fields and one or more values stored in the field(s). In one example, a collection of documents is a named grouping of the data, for example, a named grouping of documents. The named grouping can be homogenous or heterogeneous. In some embodiments, collections are organizations of database data similar to relational database tables.
In some embodiments, configurations within a shard cluster can be defined by metadata associated with the managed database referred to as shard metadata. Shard metadata can include information about collections within a given database, the number of collections, data associated with accessing the collections, database key properties for a given collection, ranges of key values associated with a given partition, shard, and/or chunk of data within a given collections, to provide some examples.
In some embodiments, establishing an appropriate shard key facilitates the efficient management of data within the shard cluster. To partition a collection, a shard key pattern can be specified. The shard key pattern, in some embodiments, can be similar to the key pattern used to define an index. The shard key pattern establishes one or more fields to define the shard key upon which the managed database can distribute data. In some embodiments, the shard key pattern can be input through a management process. The shard key pattern can be predefined and/or dynamically generated. Once established, the shard key pattern can be used to control the partitioning of data. The data can be partitioned in chunks of data. A shard of data can be a chunk. The chunks of data are typically constructed of contiguous ranges of data. According to one embodiment, the congruous range of data is defined based on database key values or database key patterns used associated with the data. In some examples, chunks are defined by a triple (collection, minKey, and maxKey). A given chunk can be configured with a named for the collection to which the chunk belongs corresponding to collection in the triples and a range of key values that defined the beginning and the end of the data found within the chunk corresponding to minKey and maxKey. In one example, the shard key K associated with a given document within a collection assigns that document to the chunk where the value for K falls within the values defined by minKey and maxKey. Thus, the shard database key/shard database key pattern defines the ranges of data found within a given chunk. The shard key ranges associated with a given partition can be used by the shard cluster (e.g. through a router process) to direct database requests to appropriate shard servers hosting the particular partition.
In some implementations, the maximum size can be predetermined. In some embodiments, the maximum size can be dynamically established. In some embodiments, a maximum size of 200 Mb establishes a good threshold that balances the costs of sharding (e.g., the computational burden associated with the copying/moving of the data and the versioning the chunks) against the improvement in processing by having sharded data. Some embodiments support compound shard keys/shard key patterns.
In some embodiments, the shard key should be selected to insure they are granular enough to provide for an even distribution of data. For instance, when a shard key is based on name, the database can be checked to insure there are not a disproportionate number of users with the same name. In such a case, an individual chunk can become too large and further, because of the key selected, be unable to split. In some implementations, logic can be implemented within the shard cluster to assist in selecting of the shard key. Distributions can be established and analyzed, for example during a testing phase, to insure that key does not invoke disproportionate distributions. For example, where the entire range comprises just a single key on name and a disproportionate number of users share the same name, it can become impossible to split chunks of the data without creating a new shard key. Thus, for a database where it is possible that a single value within a shard key range might grow exceptionally large, a compound shard key can be constructed that enables further discrimination of the values that a single key selection.
In some embodiments, a chunk of a data can also be associated with a maximum size threshold which defines that maximum size a given chunk can reach before a splitting operations is performed on the data within the chunk. In some embodiments, once the data within a given chunk reaches the maximum size, a managed database or a shard cluster can be configured to automatically generate a new chunk having its own range of contiguous data. In some examples, the data within the original chunk is split, approximately half the data remaining in the original chunk and approximately half the data being copied into the new created chunk. Although in some embodiments, the split can occur so that different portions of data remain in the original chunk and/or are copied into the new chunk.
In some embodiments, sharding of the database in data chunks, that is the partitioning of the data in the database, occurs based on database collections rather than the database as a whole. For example, when implementing a database management system for a service like the well-known TWITTER service, it is appreciated that the collection of “tweets” or messages within the database of the TWITTER service would be several orders or magnitude larger than the next largest collection. The size and throughput associated with the collection of tweets would be ideal for sharding, whereas smaller collections can be configured to reside on a single server. In some implementations, the data within the database is organized into documents. Some examples of document organization formats include the known JSON (JavaScript Object Notation) and BSON (binary encoded serialization of JSON) formatting for documents. BSON is a binary format in which zero or more key/value pairs are stored as a single entity. The BSON entity can be referred to as a document. In some examples, BSON is designed to be efficient in space, but in many cases is not much more efficient than JSON. In some cases BSON can employ more space than JSON to encode information. In one embodiment, this results from one of the BSON design goals: traversability. In some examples, BSON adds some additional information to documents, like length prefixes, that make it the document easier and faster to traverse. BSON is also designed to be fast to encode and decode. For example, integers are stored as 32 (or 64) bit integers, so they don't need to be parsed to and from text. This uses more space than JSON for small integers, but is much faster to parse.
Returning to
In some embodiments, a router process, e.g., 116, can be configured to operate as a routing and coordination process that makes the various components of the cluster look like a single system, for example, to client 120. In response to receiving a client request (e.g., a write operation) via the driver 122, the router process 116 routes the request to the appropriate shard or shards. The shard(s) return any results to the router process. The router process 116 can merge any results and communicate the merged result back to the driver 122. The driver 122 can use the results for additional processing and/or communicate results to the client 120.
In some embodiments, the driver 122 can be configured to manage retrying of write operations submitted by the client 120. The client 120 may submit a write operation to the database 100 via the driver 122. The driver 122 can be configured to transmit a write command to execute the write operation to one or more shard servers (e.g., one or more of shard servers 102-108). In some embodiments, a route router process (e.g., router process 116) can be configured to route the write command to the appropriate shard server(s). In some embodiments, the driver 122 can be configured to determine, based on connectivity to the shard server(s) whether to retry execution of the write operation. For example, the driver 122 can be configured to determine whether a network error occurred that prevented the driver 122 from connecting to the shard server(s) to transmit a command to perform a write operation. In this situation, the driver 122 may determine to trigger re-execution of the write operation. In some embodiments, the shard server(s) can be configured to return an indication of the outcome of a performed execution to the driver 122. For example, the shard server(s) can be configured to return an indication of an outcome from executing a command received from the driver 122. In some embodiments, the driver 122 can be configured to determine, based on the returned outcome, whether to retry execution of the write operation. If the driver 122 determines from the indication of the outcome that execution of the write operation by the shard server(s) failed, the driver 122 can be configured to retry execution of the write operation. In some embodiments, the driver 122 can be configure to retransmit the write command to one or more shard servers to retry execution of the write operation.
In some embodiments, a shard may be hosted by a replica set. The replica set may include a primary node and one or more secondary nodes. In some embodiments, each of the nodes of the replica set may be a separate shard server to provide redundancy, and protection against failures. In some embodiments, the primary node may perform write operations. The secondary node(s) may replicate write operations performed by the primary node to provide redundancy. In some embodiments, if the primary node is unavailable, the database system may be unable to perform a write operation. For example, if the primary node of a replica set hosting a shard shuts down, the database may be unable to execute the write operation on the shard during the period that the primary node is shut down, or until a new primary node is selected.
In some embodiments, the driver 122 can be configured to transmit one or more write commands to a primary node of a replica set to perform one or more write operations submitted by the client 120. For example, the driver 122 can be configured to connect to the primary node to transmit the write command(s) to the primary node to perform write operation(s) submitted by the client 120. In some embodiments, if the driver 122 can be configured to determine that an error occurred which may have prevented performance of a write operation. In this case, the driver 122 can be configured to trigger re-execution of the write operation by re-transmitting the write command to the primary node. Example errors are discussed herein.
In some embodiments, the database system 100 can be configured to disable retrying of write operations submitted by the client 120 by default. In some embodiments, the database system 100 can be configured to enable retrying of write operations submitted by the client 120 by default. In some embodiments, the database system 100 can be configured to provide a user configurable option that enables or disables automatic retrying of write operations. One example of code for enabling retrying of write operations in the driver 122 is shown below.
In some embodiments, the router process 116 is configured to establish current state information for the data distributed throughout the database by requesting metadata information on the database from the configuration server(s) 110-114. The request for metadata information can be executed on startup of a routing process. Further requests can be initiated by the routing process and/or can be initiated by a configuration server. In one example, a change at the configuration server can trigger a distribution of updates to any routing processes.
In some embodiments, any changes that occur on the configuration server(s) can be propagated to each router process 116-118, as needed. In one example, router processes 116-118 can be configured to poll the configuration servers(s) 110-114 to update their state information periodically. In others examples, router processes can be configured to poll the configuration servers(s) 110-114 to update their state information on a schedule, periodically, intermittently, and can be further configured to received updates pushed from the configuration server(s) 110-114 and/or any combination of thereof. According to one embodiment, the router processes capture metadata information on the shard cluster stored at the configuration servers. In some examples, the metadata information includes information on the data stored in the database, how the data is partitioned, version information associated with the partitions, database key values associated with partitions, etc. According to some embodiments, the router process 116 can be configured without persistent state information. For example, at initiation the router process 116 cannot fully route data requests until its state is updated with the metadata describing the distribution of data throughout the shards.
In some embodiments, router processes can run on any server within the managed database and/or on any number of server(s) that is desired. For example, the router processes can be executed on stand-alone systems, and in other examples the router processes can be run on the shard servers themselves. In yet other examples, the router processes can be run on application servers associated with the managed database. Under typical installations, there are no limits on the number of router processes that can be invoked. The addition of routing processes can permit the managed database to route greater number of requests to the appropriate shards of data. In some embodiments, additional routing process can enable additional client connections to the partitioned database. In other embodiments, additional routing processes can facilitate management of the distribution of data within the database.
In some embodiments, each router process can be configured to act independently of any other routing processes being executed within the managed database. In some examples, the router processes do not coordinate processing, rather each router process can be configured to act independently. In some environments, this property enables unlimited numbers of router processes with virtually no additional complexity, as all the router processes receive their state information from the configuration servers and no coordination between the router processes is required for routing data requests.
In some embodiments, configuration server(s) 110-114 are configured to store and manage the database's metadata. In some embodiments, the metadata includes basic information on each shard in the shard cluster including, for example, network communication information, server information, number of chunks of data, chunk version, number of shards of data, shard version, and other management information for routing processes, database management processes, chunk splitting processes, etc. According to some embodiments, chunk information can be the primary data stored by the configuration server(s) 110-116. In some examples, chunks are defined by a triple (collection, minKey, and maxKey) and the metadata stored on the configuration servers establishes the relevant values for a given chunk of data.
In some embodiments, each of the installed configuration server(s) has a complete copy of all the chunk metadata information for the managed database. According to one aspect, various replication strategies can be implemented to maintain consistency between configuration servers. In some embodiments, updates to configuration data stored on the configuration server can require additional processes for insuring consistency. For example, a two-phase commit operation, is used to ensure the consistency of the configuration data amongst the configuration servers. In another example, various atomic commitment protocols (ACP) are used to insure consistency of the database metadata on any configuration servers.
In some embodiments, the driver 204 can be configured to handle interactions with the client 202. For example, the driver 204 can be configured to include a client library which allows the client 202 to submit operations to the database system 200. In some embodiments, the driver 204 can be configured to interact with the client 202 in a software language (e.g., C, C++, C#, Java, Perl, PHP, Python, Ruby, and/or Scala) appropriate for the client 202. For example, the driver 204 can be configured to interact with the client 202 used by an application of the client system. The driver 204 can be configured to provide functions in the language used by the client 202 by which to interact with the database system 200. In some embodiments, the driver 204 can be configured to provide functions by which the client 202 can submit database operations. For example, the driver 204 may receive one or more write operations to be executed by the database system. The write operation(s) may specify insertion of new data, updating of existing data, removing of data, and or replacing data. In another example, the driver 204 may receive one or more read operations which specify data that the client 202 requests to read from the database.
In some embodiments, the driver 204 can be configured to generate and transmit commands to the shard server(s) 206 in order to execute requests received from the client 202. In some embodiments, the driver 204 can be configured to generate commands to execute database operations requested by the client 202. For example, the driver 204 may receive one or more write operations from the client 202, generate one or more write commands, and transmit the write command(s) to the shard server(s) 206 to execute the write operation(s).
In some embodiments, the shard server(s) 206 includes a primary node 206A that performs write commands. For example, the shard server(s) 206 may include multiple shard servers of which one is the primary node 206A. In some embodiments, the driver 204 can be configured to transmit commands to perform write operations to the primary node 206A. The write operations performed by the primary node 206A may then be replicated by secondary nodes 206B-C such that the entire replica set is synchronized. In some embodiments, the primary node 206A can be configured to store an operation log of operations that the primary node performs. The secondary nodes 206B-C can be configured to replicate the write operations by reading the operation log and performing the operations such that the secondary nodes match the primary node.
In some embodiments, a write operation is successfully executed when the primary node 206A has performed the write operation. In some embodiments, a write operation is successfully executed when a write completion requirement is met. In some embodiments, the write completion requirement may specify a threshold number of nodes have performed the write operation for the write operation to be determined as complete. In some embodiments, the threshold number of nodes may be a majority of nodes. In some embodiments, the threshold number of nodes may be a number of nodes specified by a user (e.g., by the client 120).
In some embodiments, the driver 204 can be configured to determine whether an error occurred while executing a write operation that may have prevented the write operation from being completed successfully. In some embodiments, if the primary node 206A is unavailable, the database system 200 may be unable to perform a write operation. For example, if the primary node 206A shuts down, the driver 204 may be unable to transmit a command to perform a write operation to the primary node 206A during the period that the primary node 206A is shut down, or until a new primary node is selected. If the driver 204 determines occurrence of an error that interfered with communication with the primary node 206A, the driver 204 may determine to retry execution of the write operation. In some embodiments, the driver 204 can be configured to trigger re-execution of the write operation by re-transmitting a write command to the primary node 206A.
In some embodiments, the driver 204 can be configured to determine whether an error occurred in performing the write operation by the shard server(s) 206. In some embodiments, the driver 204 can be configured to receive a response from the primary node 206A indicating an outcome of performing a received write command. In some embodiments, the driver 204 may receive a message indicating an outcome of performing the message. The message may indicate that the write operation was performed successfully. For example, the message may include a Boolean field that is set to a first value if the write operation was successful, and a second value if the write operation failed. In some embodiments, the message includes an indication of any errors that occurred during performance of a write operation. The message may include error codes associated with failures or problems that occurred during performance of the write operation. In some embodiments, the message may include additional information about the detected error. For example, the message may include a string describing the error.
In some embodiments, the database system 200 (e.g., the driver 204 and/or one of the nodes 206A-C) can be configured to determine whether certain errors occurred during execution of a write operation. The database system 200 may generate an error code that corresponds to a particular type of error. In some embodiments, the driver 204 can be configured to determine that execution of a write operation failed based on occurrence of the error. Example errors include:
In some embodiments, the driver 204 can be configured to use indication(s) of the outcome(s) to determine whether execution of one or more operations has failed. The driver 204 can be configured to determine occurrence of one or more errors based on error codes generated by the database system 200 during execution of the write operation. In some embodiments, the driver 204 can be configured to determine to trigger re-execution of the write operation(s) based on the indication(s). The driver 204 can re-transmit write command(s) to retry execution of the operation(s). For example, the driver 204 can be configured to determine, based on the indication(s), that a particular type of error occurred. The driver 204 can be configured to re-transmit the command(s) to the primary node 206A to retry execution of the operation(s).
In some embodiments, the driver 204 can be configured to communicate indications of outcomes of the submitted operation(s) to the client 202. For example, the driver 204 may transmit an indication that the submitted operation(s) were successful. In another example, the driver 204 may transmit data retrieved for a read operation. In yet another example, the driver 204 may transmit an indication of an error that occurred and prevented successful execution of the operation(s).
Although the driver 204 is illustrated in
Co-pending U.S. application Ser. No. 15/074,987 entitled “METHOD AND APPARATUS FOR MAINTAINING REPLICA SETS” filed on Mar. 18, 2016 incorporated herein by reference describes example election protocols and replica set architectures that can be augmented with some embodiments.
Process 300 begins at block 302 where the system initiates a session with a client system (e.g., client 202). In some embodiments, the system can be configured to receive a request from the client to establish a session in which the client can request execution of one or more operations. In some embodiments, the session can be configured to represent a set of operations that are submitted by the client. In some embodiments, the system can be configured to generate a data object that stores information related to the session established with the client. In some embodiments, the system can be configured to generate a session identifier for the session. The database can be configured to associate client submitted operations to the session using the session identifier. For example, the system can be configured to include the session identifier with commands that the system transmits to a primary node for execution of the operations. In some embodiments, the system can be configured to initiate a session based on a response to initiation of a session on the client system. For example, the client may start a session in order to perform database operations (e.g., reads and/or writes). In response, the system can be configured to initiate a session via which one or more shard servers can receive commands to execute operations requested by the client. In some embodiments, the system can be configured to associate multiple client sessions with a single session through which the server(s) may receive operations. For example, the database system may associate a single session with multiple different client sessions started by different users of a client application.
Next, process 300 proceeds to block 304 where the system receives submissions of one or more operations from the client. In some embodiments, the system can be configured to provide the client a library with which the client can request execution of operations. Through the established session, the system receives submissions of operations from the client. The pseudocode below illustrates initiation of a session and submission of an operation using the session.
Next, process 300 proceeds to block 306 where the system assigns a unique transaction identifier to received operation(s). In some embodiments, the system can be configured to generate a unique transaction identifier that includes a session identifier, and a unique value that identifies the operation within the session. In some embodiments, the unique value can comprise a monotonically increasing number that is incremented with for each operation received in the session. In some embodiments, the number may be an integer (e.g., a 32, 64, or 128-bit represented integer) that is assigned to the operation.
In some embodiments, the system only assigns a transaction identifier to certain types of received operations. In some embodiments, the system can be configured to assign a transaction identifier to write operations for which the system may retry execution in the case of failure. In some embodiments, the system may not assign a transaction identifier to certain types of received operations. For example, read operations may not have a need for a unique transaction identifier. In this example, the database system may not assign a transaction identifier to the read operations.
Next, process 300 proceeds to block 308 where the system performs the received operation(s). For example, the system may transmit one or more commands to one or more primary nodes of one or more replica sets hosting shards of the database. In response to the received command(s), the primary node(s) may perform updates to stored data, add new data, replace data, delete data, or read data according to the received operation(s). The system may further generate indications of outcomes of performing the operation(s). In the case of a read operation, the system may retrieve data specified in a read operation, and return the data to the client. In the case of a write operation, the system may generate an indication of an outcome of execution of the operation(s). For example, the system can generate a message indicating success or failure of the operation(s) and/or information specifying particular one or more errors that may have occurred during performance of the operations(s). Example processes by which the database system performs write operations are discussed below with reference to
In some embodiments, the system can be configured to use the transaction identifier to limit the number of times that an operation is performed. If the system commands one or more servers to execute an operation, and the server(s) have already performed the operation, the server(s) can return an indication of the outcome of the operation from a previous execution. For example, if the transmits a command to primary node to perform an operation, the primary node may have previously performed the operation. In this case, the primary node may return a stored indication of the outcome of previous performance of the operation. To do so, the primary node may recognize that the transaction identifier of the command matches one that was previously received. If the operation was executed successfully, the primary node may ignore the submission and reply with a stored indication of the outcome of the operation. This may prevent operations from inadvertently being executed multiple times. For example, an unreliable network connection may cause transmission of a command to a primary node multiple times by a driver. In this case, the primary node may determine to not perform the command multiple times after a successful completion.
Next, process 300 proceeds to block 310 where the system determines whether the session has ended. If the system determines that the session has ended 310, YES, process 300 ends. For example, the client may have completed a set of operations. In this case, the system can end the session. If the system determines that the session has not ended 310, NO, the process 300 returns to block 304 where the system may continue receiving operations from the client in the session.
Process 400 begins at block 402 where the system receives a write operation. The system may receive a write operation (e.g., update, insert, replace, and/or delete) to make a modification in the database. For example, the client may submit an operation to modify one or more documents stored in the database.
Next, process 400 proceeds to block 404, where the database system executes the write operation. In some embodiments, the system can be configured to transmit a write command to one or more shard servers to execute the write operation. The database system can be configured to transmit a write command to a primary node of a replica set to perform the write operation in the database. For example, the primary node of the replica set may perform the write operation in a shard hosted by the replica set. The write operation may then be replicated by one or more secondary nodes of a replica set. In some embodiments, the system can be configured to generate a unique transaction identifier associated with the command. In some embodiments, the database system can be configured to select a shard server that is to execute the write operation. For example, nodes of a replica set may be separate shard servers. To execute the write operation, the system can be configured to select the primary node shard server to execute the write operation.
In some embodiments, the system can be configured to select a shard server to perform an operation from among multiple shard servers of a sharded cluster. The system locates one or more shards that are associated with the write operation. For example, the system locates a shard where a document to be added to the database is to be added by performing the write operation. In another example, the system locates a shard where an existing document is to be updated by performing the write operation. The system then transmits a write command to a shard server storing the located shard. In some embodiments, the shard may be hosted by a replica set which includes multiple nodes which each host a copy of the shard. The system can be configured to transmit the command to the primary node of the replica set in order to perform the write operation.
An example process 500 of executing a write operation is discussed below with reference to
Next, process 400 proceeds to block 406, where the system determines whether execution of the write operation has failed. In some embodiments, a shard server that is performing the operation in response to a write command from the system can be configured to generate an indication of an outcome from performance of the operation. The indication of the outcome may indicate whether the execution of the operation failed. In some embodiments, the shard server can be configured to generate a an indication of an error that occurred when execution the operation. For example, the shard server can be configured to generate an error code indicating a specific problem that occurred when performing the operation. Example problems that may cause failure of execution of a write operation are discussed herein.
In some embodiments, a primary node can be configured to receive a write command from a system to perform the write operation. The primary node may generate a response to performing the write command. In some embodiments, the primary node can be configured to generate a value indicating whether the operation was completed successfully. For example, the primary node can be configured to generate an output message that the primary node transmits to the system which includes a Boolean value indicating whether the operation was successfully performed. In some embodiments, the primary node can be configured to generate an indication of an error (e.g., an error code and/or error description) that occurred during performance of an operation. The primary node can be configured to generate an output message that includes error codes associated with one or more errors that occurred during performance of the write operation. The primary node may transmit the message to a system of the database system.
In some embodiments, the shard server can be configured to store an indication of the outcome for a transaction identifier of the write operation. In some embodiments, the shard server can be configured to store the indication of the outcome in a session data object. The shard server may store a record of received write commands and respective outcomes received during a session. For example, the shard server may store a document that includes a record of write commands received from a system during a session. In some embodiments, the shard server may store an outcome of performing the write command in the record of the write command.
In some embodiments, the system can be configured to use an outcome of a write operation received in response to transmission of a write command to determine whether the write operation failed. In some embodiments, the system may determine whether the write operation failed based on a response received from a shard server (e.g., a primary node) indicating the outcome of the shard server performing the write operation in response to a write command transmitted by the system. For example, if the indication of the outcome is an error code indicating a problem that may have prevented the primary node from successfully performing the operation, the system may determine that the write operation failed. In another example, shard server may generate an acknowledgement message that indicates that the write operation was successful. In this instance, the system may determine that the write operation did not fail.
In some embodiments, the system can be configured to determine that execution of the write operation failed based on an error that interfered with communication with a shard server. For example, a network error may have prevented the system from connecting to a primary node to perform a write operation. In another example, a network error may have interrupted a connection between the system and the primary node. The system may determine that execution of the write operation failed if a network error occurs.
If, at block 406, the system determines that the execution of the write operation did not fail 406, NO, process 400 proceeds to block 410 where the database system outputs an indication of an outcome. For example, the database system may output a message to the client that the write operation was successful. After outputting the indication of the outcome, process 400 proceeds to act 402 where the database system may receive another write operation to execute.
If, at block 406, the system determines that the execution of the write operation failed 406, YES, process 400 proceeds to block 408 where the system determines whether to retry execution of the write operation. In some embodiments, the system can be configured to limit retrying of a write operation to a threshold number of times. The threshold may be 2, 3, 4, 5, 6, 7, 8, 9, or 10 retries. In some embodiments, the system can be configured to retry execution of the write operation one time, because if the execution fails on a single retry, the system may determine that the problem is persistent. As a result, retrying execution of the operation may be a waste of computational resources and time for the client system. If the system has determined that a threshold number of retries have been performed, the system may determine to not retry execution of the write operation.
In some embodiments, the system can be configured to wait for a period of time before retrying execution of the write operation. The system can be configured to wait for the period of time in order to allow the shard server(s) to recover from a problem that prevented execution of the write operation in the previous attempt. In some embodiments, the system can be configured to wait for the period of time before re-transmitting a write command to a primary node. For example, if a primary node was not available to execute the write operation in the first attempt, a new primary node may now be available to execute the write operation. In another example, if the primary node shut down and thus could not execute the write operation, the period of time may allow the shard server to restart. In some embodiments, the system can be configured to monitor the status of a shard server (e.g., a primary node) that was unavailable to perform the write operation. The system can be configured to wait for the period of time to allow recovery from the problem that caused the error. For example, the period of time may allow a new primary node to be elected or allow a shard server to start up after being in shut down. When the system has determined that the primary node has recovered from the problem, the system may determine to retry execution of the write operation. For example, the system can re-transmit a write command to the primary node to perform the write operation.
In some embodiments, the system can be configured to have a maximum period of time which the system will wait to retry execution of the write operation. If a failure or problem persists beyond the maximum period of time, the system can be configured to determine to not retry execution of the write operation. If the system does recover from the failure within the maximum period of time, the system may determine to retry execution of the write operation. This may allow retrying of write operations that failed because of temporary problems, while not causing a system delay by attempting to retry execution for persistent problems. For example, if a first execution of the write operation failed due to a server shutdown, the database system may wait for the maximum period of time. If the server does not recover within the maximum period of time, the database system may determine to not retry execution of the write operation. If the server does recover within the maximum period of time, the database system may determine to retry execution of the write operation.
In some embodiments, the system can be configured to determine whether to retry execution of the write operation based on whether the system is able to select a server to execute the write operation. For example, the system may select a server that is a primary node of a replica set to perform the write operation. If the system is unable to select a server to perform the write operation, the system may determine to not retry execution of the write operation. In some embodiments, the system can be configured to determine whether to retry execution of the write operation based on whether a selected server to execute the write operation is capable of retrying write operations. For example, some servers may not be updated and thus not have the capability to retry executions of a write operation. If the system determines that a selected server is unable to retry execution of the write operation, the system may determine to not retry execution of the write operation. In some embodiments, the system can be configured to search for another server that can retry execution of the operation.
If, at block 408, the system determines to retry execution of the write operation 408, YES, process 400 proceeds to block 404 where the system retries execution of the write operation. In some embodiments, the system can be configured to re-transmit a write command to a primary node to perform the write operation. For example, the system can be configured to determine, based on an indication of an error in a response from the primary node, or from occurrence of a network error, that the write operation failed. The system may then re-transmit a write command to the primary node to perform the write operation. In some embodiments, when the system retries execution of the write operation, the system can be configured to reselect a server that is to perform the write operation. For example, if a primary node of a replica set changes, the system may select a different server than the one in a previous execution attempt to execute the write operation.
If, at block 408, the system determines to not retry execution of the write operation 408, NO, then process 400 proceeds to act 410 where the database system outputs an indication of the outcome. In this case, the system may output an indication that the write operation has failed to the client. For example, the system can be configured to output an error code that indicates a problem that prevented the write operation from completing. The client may use the indication of the outcome to take appropriate action. After outputting the indication of the outcome at block 410, the process 400 proceeds to block 402 where the system may receive another write operation to execute.
Process 500 begins at block 500 where the system generates a write command. In some embodiments, the system may receive a submission of a request to complete a write operation from a client system. The system can be configured to generate a write command for executing the write operation. In some embodiments, the write operation may be represented in the database system by the generated write command. In some embodiments, the system can be configured to encode the write command that is then submitted to one or more shard servers for execution. In some embodiments, the system can be configured to (1) generate a transaction identifier that is included in the write command, (2) include a specification of which data items in the database to update, (3) include a specification of the updates to make to those data items, and/or (4) include a configuration setting of options associated with the operation. The code below illustrates an example of a write command generated by the system.
The above encoded write command specifies an update to a collection. The write command includes a transaction identifier which includes a session identifier (lsid) and a number (txnNumber) associated with the write operation. The write command further includes the updates to make to the collection (updates), and a setting of the “ordered” configuration for the operation. In some embodiments, the system can be configured to generate the write command as a data item. In some embodiments, the system can be configured to generate the write command as a JSON object, a BSON object, a text file, or in another format.
After generating the write command, process 500 proceeds to block 504 where the system attempts execution of the write command. In some embodiments, the system transmits the write command to a shard server that attempts to execute the write command. For example, the primary node may execute the write command according to the encoded instructions. In some embodiments, the system may transmit the write command to a primary node of a replica set. If successfully performed by the primary node, secondary nodes of the replica set may replicate the write operation as described herein.
In some embodiments, the system can be configured to select a shard server that is to perform the write command. For example, the write command may specify a write operation that updates data in a replica set hosted by a replica set that includes a primary node and one or more secondary nodes. Each of the nodes may be a separate shard server. In some embodiments, the primary node may be configured to handle write operations. The system can be configured to select the primary node to perform the write operation when executing of the write command.
Next process 500 proceeds to block 506 where the system determines an outcome of the attempted execution of the write command. In some embodiments, a shard server that attempts to perform the write operation may determine whether the write operation succeeded or failed. In some embodiments, the system can be configured to determine whether an error has occurred that affected execution of the write operation. Examples of errors are discussed herein. In some embodiments, the shard server can be configures to generate an indication of the outcome. For example, the system can be configured to generate an acknowledgement indicating that the execution was successful or an error code indicating a problem that occurred during an attempted execution. In some embodiments, the system can be configured to use the generated indication to determine whether to retry execution of the write operation (e.g., as described in process 400 discussed above with reference to
At 613, the system generates and assigns a transaction identifier to the write operation. For example, the system may generate the transaction identifier and store it in a write command as described above at block 502 in process 500 described above with reference to
At 615, the system selects a server to retry execution of the write operation. For example, the system selects a primary node of a replica set hosting a shard to perform the write operation. If the system is unable to select a server to retry execution of the write operation, then the system outputs the original error. At 616, the system determines if a selected server supports retrying execution of the write operation. If the selected server does not support retrying execution of the write operation, then the system may output the original error. At 617, the system retries execution of the write operation if it had previously failed. For example, if at 614, the system had determined occurrence of either a network error or that the primary node was unavailable, the system may trigger re-execution of the write operation at 617. The system may re-transmit a write command to a primary node to perform the write operation. If retrying execution of the write operation returns a second error, the database system outputs the second error.
Example Computer System
Referring to
As illustrated in
The memory 712 stores programs (e.g., sequences of instructions coded to be executable by the processor 710) and data during operation of the computer system 702. Thus, the memory 712 may be a relatively high performance, volatile, random access memory such as a dynamic random access memory (“DRAM”) or static memory (“SRAM”). However, the memory 712 may include any device for storing data, such as a disk drive or other nonvolatile storage device. Various examples may organize the memory 712 into particularized and, in some cases, unique structures to perform the functions disclosed herein. These data structures may be sized and organized to store values for particular data and types of data.
Components of the computer system 702 are coupled by an interconnection element such as the interconnection mechanism 714. The interconnection element 714 may include any communication coupling between system components such as one or more physical busses in conformance with specialized or standard computing bus technologies such as IDE, SCSI, PCI and InfiniBand. The interconnection element 714 enables communications, including instructions and data, to be exchanged between system components of the computer system 702.
The computer system 702 also includes one or more interface devices 716 such as input devices, output devices and combination input/output devices. Interface devices may receive input or provide output. More particularly, output devices may render information for external presentation. Input devices may accept information from external sources. Examples of interface devices include keyboards, mouse devices, trackballs, microphones, touch screens, printing devices, display screens, speakers, network interface cards, etc. Interface devices allow the computer system 702 to exchange information and to communicate with external entities, such as users and other systems.
The data storage element 718 includes a computer readable and writeable nonvolatile, or non-transitory, data storage medium in which instructions are stored that define a program or other object that is executed by the processor 710. The data storage element 718 also may include information that is recorded, on or in, the medium, and that is processed by the processor 710 during execution of the program. More specifically, the information may be stored in one or more data structures specifically configured to conserve storage space or increase data exchange performance. The instructions may be persistently stored as encoded signals, and the instructions may cause the processor 710 to perform any of the functions described herein. The medium may, for example, be optical disk, magnetic disk or flash memory, among others. In operation, the processor 710 or some other controller causes data to be read from the nonvolatile recording medium into another memory, such as the memory 712, that allows for faster access to the information by the processor 710 than does the storage medium included in the data storage element 718. The memory may be located in the data storage element 718 or in the memory 712, however, the processor 710 manipulates the data within the memory, and then copies the data to the storage medium associated with the data storage element 718 after processing is completed. A variety of components may manage data movement between the storage medium and other memory elements and examples are not limited to particular data management components. Further, examples are not limited to a particular memory system or data storage system.
Although the computer system 702 is shown by way of example as one type of computer system upon which various aspects and functions may be practiced, aspects and functions are not limited to being implemented on the computer system 702 as shown in
The computer system 702 may be a computer system including an operating system that manages at least a portion of the hardware elements included in the computer system 702. In some examples, a processor or controller, such as the processor 710, executes an operating system. Examples of a particular operating system that may be executed include a Windows-based operating system, such as, Windows NT, Windows 2000 (Windows ME), Windows XP, Windows Vista or Windows 7, 8, or 7 operating systems, available from the Microsoft Corporation, a MAC OS System X operating system or an iOS operating system available from Apple Computer, one of many Linux-based operating system distributions, for example, the Enterprise Linux operating system available from Red Hat Inc., a Solaris operating system available from Oracle Corporation, or a UNIX operating systems available from various sources. Many other operating systems may be used, and examples are not limited to any particular operating system.
The processor 710 and operating system together define a computer platform for which application programs in high-level programming languages are written. These component applications may be executable, intermediate, bytecode or interpreted code which communicates over a communication network, for example, the Internet, using a communication protocol, for example, TCP/IP. Similarly, aspects may be implemented using an object-oriented programming language, such as .Net, SmallTalk, Java, C++, Ada, C# (C-Sharp), Python, or JavaScript. Other object-oriented programming languages may also be used. Alternatively, functional, scripting, or logical programming languages may be used.
Additionally, various aspects and functions may be implemented in a non-programmed environment. For example, documents created in HTML, XML or other formats, when viewed in a window of a browser program, can render aspects of a graphical-user interface or perform other functions. Further, various examples may be implemented as programmed or non-programmed elements, or any combination thereof. For example, a web page may be implemented using HTML while a data object called from within the web page may be written in C++. Thus, the examples are not limited to a specific programming language and any suitable programming language could be used. Accordingly, the functional components disclosed herein may include a wide variety of elements (e.g., specialized hardware, executable code, data structures or objects) that are configured to perform the functions described herein.
In some examples, the components disclosed herein may read parameters that affect the functions performed by the components. These parameters may be physically stored in any form of suitable memory including volatile memory (such as RAM) or nonvolatile memory (such as a magnetic hard drive). In addition, the parameters may be logically stored in a propriety data structure (such as a database or file defined by a user space application) or in a commonly shared data structure (such as an application registry that is defined by an operating system). In addition, some examples provide for both system and user interfaces that allow external entities to modify the parameters and thereby configure the behavior of the components.
Based on the foregoing disclosure, it should be apparent to one of ordinary skill in the art that the embodiments disclosed herein are not limited to a particular computer system platform, processor, operating system, network, or communication protocol. Also, it should be apparent that the embodiments disclosed herein are not limited to a specific architecture or programming language.
It is to be appreciated that embodiments of the methods and apparatuses discussed herein are not limited in application to the details of construction and the arrangement of components set forth in the following description or illustrated in the accompanying drawings. The methods and apparatuses are capable of implementation in other embodiments and of being practiced or of being carried out in various ways. Examples of specific implementations are provided herein for illustrative purposes only and are not intended to be limiting. In particular, acts, elements and features discussed in connection with any one or more embodiments are not intended to be excluded from a similar role in any other embodiments.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. Any references to embodiments or elements or acts of the systems and methods herein referred to in the singular may also embrace embodiments including a plurality of these elements, and any references in plural to any embodiment or element or act herein may also embrace embodiments including only a single element. References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements. The use herein of “including,” “comprising,” “having,” “containing,” “involving,” and variations thereof is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. References to “or” may be construed as inclusive so that any terms described using “or” may indicate any of a single, more than one, and all of the described terms. Use of at least one of and a list of elements (e.g., A, B, C) is intended to cover any one selection from A, B, C (e.g., A), any two selections from A, B, C (e.g., A and B), any three selections (e.g., A, B, C), etc., and any multiples of each selection.
Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated that various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description and drawings are by way of example only.
Number | Name | Date | Kind |
---|---|---|---|
4918593 | Huber | Apr 1990 | A |
5379419 | Heffernan et al. | Jan 1995 | A |
5416917 | Adair et al. | May 1995 | A |
5471629 | Risch | Nov 1995 | A |
5551027 | Choy et al. | Aug 1996 | A |
5598559 | Chaudhuri | Jan 1997 | A |
5710915 | McElhiney | Jan 1998 | A |
5884299 | Ramesh et al. | Mar 1999 | A |
5999179 | Kekic et al. | Dec 1999 | A |
6065017 | Barker | May 2000 | A |
6088524 | Levy et al. | Jul 2000 | A |
6112201 | Wical | Aug 2000 | A |
6115705 | Larson | Sep 2000 | A |
6240406 | Tannen | May 2001 | B1 |
6240514 | Inoue et al. | May 2001 | B1 |
6249866 | Brundrett et al. | Jun 2001 | B1 |
6324540 | Khanna et al. | Nov 2001 | B1 |
6324654 | Wahl et al. | Nov 2001 | B1 |
6339770 | Leung et al. | Jan 2002 | B1 |
6351742 | Agarwal et al. | Feb 2002 | B1 |
6363389 | Lyle et al. | Mar 2002 | B1 |
6385201 | Iwata | May 2002 | B1 |
6385604 | Bakalash et al. | May 2002 | B1 |
6496843 | Getchius et al. | Dec 2002 | B1 |
6505187 | Shatdal | Jan 2003 | B1 |
6611850 | Shen | Aug 2003 | B1 |
6687846 | Adrangi et al. | Feb 2004 | B1 |
6691101 | MacNicol et al. | Feb 2004 | B2 |
6801905 | Andrei | Oct 2004 | B2 |
6823474 | Kampe et al. | Nov 2004 | B2 |
6920460 | Srinivasan et al. | Jul 2005 | B1 |
6959369 | Ashton et al. | Oct 2005 | B1 |
7020649 | Cochrane et al. | Mar 2006 | B2 |
7032089 | Ranade et al. | Apr 2006 | B1 |
7082473 | Breitbart et al. | Jul 2006 | B2 |
7177866 | Holenstein et al. | Feb 2007 | B2 |
7181460 | Coss et al. | Feb 2007 | B2 |
7191299 | Kekre et al. | Mar 2007 | B1 |
7246345 | Sharma et al. | Jul 2007 | B1 |
7447807 | Merry et al. | Nov 2008 | B1 |
7467103 | Murray et al. | Dec 2008 | B1 |
7469253 | Celis et al. | Dec 2008 | B2 |
7472117 | Dettinger et al. | Dec 2008 | B2 |
7486661 | Van den Boeck et al. | Feb 2009 | B2 |
7548928 | Dean et al. | Jun 2009 | B1 |
7552356 | Waterhouse et al. | Jun 2009 | B1 |
7558481 | Jenkins et al. | Jul 2009 | B2 |
7567991 | Armangau et al. | Jul 2009 | B2 |
7617369 | Bezbaruah et al. | Nov 2009 | B1 |
7634459 | Eshet et al. | Dec 2009 | B1 |
7647329 | Fischman et al. | Jan 2010 | B1 |
7657570 | Wang et al. | Feb 2010 | B2 |
7657578 | Karr et al. | Feb 2010 | B1 |
7668801 | Koudas et al. | Feb 2010 | B1 |
7761465 | Nonaka et al. | Jul 2010 | B1 |
7957284 | Lu et al. | Jun 2011 | B2 |
7962458 | Holenstein et al. | Jun 2011 | B2 |
8005804 | Greer | Aug 2011 | B2 |
8005868 | Saborit et al. | Aug 2011 | B2 |
8037059 | Bestgen et al. | Oct 2011 | B2 |
8078825 | Oreland et al. | Dec 2011 | B2 |
8082265 | Carlson et al. | Dec 2011 | B2 |
8086597 | Balmin et al. | Dec 2011 | B2 |
8099572 | Arora et al. | Jan 2012 | B1 |
8103906 | Alibakhsh et al. | Jan 2012 | B1 |
8108443 | Thusoo | Jan 2012 | B2 |
8126848 | Wagner | Feb 2012 | B2 |
8170984 | Bakalash et al. | May 2012 | B2 |
8260840 | Sirota et al. | Sep 2012 | B1 |
8296419 | Khanna et al. | Oct 2012 | B1 |
8305999 | Palanki et al. | Nov 2012 | B2 |
8321558 | Sirota et al. | Nov 2012 | B1 |
8352450 | Mraz et al. | Jan 2013 | B1 |
8352463 | Nayak | Jan 2013 | B2 |
8363961 | Avidan et al. | Jan 2013 | B1 |
8370857 | Kamii et al. | Feb 2013 | B2 |
8386463 | Bestgen et al. | Feb 2013 | B2 |
8392482 | McAlister et al. | Mar 2013 | B1 |
8539197 | Marshall et al. | Sep 2013 | B1 |
8572031 | Merriman et al. | Oct 2013 | B2 |
8589382 | Betawadkar-Norwood | Nov 2013 | B2 |
8589574 | Cormie et al. | Nov 2013 | B1 |
8615507 | Varadarajulu et al. | Dec 2013 | B2 |
8712044 | MacMillan et al. | Apr 2014 | B2 |
8712993 | Ordonez | Apr 2014 | B1 |
8751533 | Dhavale et al. | Jun 2014 | B1 |
8843441 | Rath et al. | Sep 2014 | B1 |
8869256 | Sample | Oct 2014 | B2 |
8996463 | Merriman et al. | Mar 2015 | B2 |
9015431 | Resch et al. | Apr 2015 | B2 |
9069827 | Rath et al. | Jun 2015 | B1 |
9116862 | Rath et al. | Aug 2015 | B1 |
9141814 | Murray | Sep 2015 | B1 |
9183254 | Cole et al. | Nov 2015 | B1 |
9262462 | Merriman et al. | Feb 2016 | B2 |
9268639 | Leggette et al. | Feb 2016 | B2 |
9274902 | Morley et al. | Mar 2016 | B1 |
9317576 | Merriman et al. | Apr 2016 | B2 |
9350633 | Cudak et al. | May 2016 | B2 |
9350681 | Kitagawa et al. | May 2016 | B1 |
9460008 | Leshinsky et al. | Oct 2016 | B1 |
9495427 | Abadi et al. | Nov 2016 | B2 |
9569481 | Chandra et al. | Feb 2017 | B1 |
9660666 | Ciarlini et al. | May 2017 | B1 |
9715433 | Mu | Jul 2017 | B2 |
9740762 | Horowitz et al. | Aug 2017 | B2 |
9792322 | Merriman et al. | Oct 2017 | B2 |
9800685 | Neerincx | Oct 2017 | B2 |
9805108 | Merriman et al. | Oct 2017 | B2 |
9881034 | Horowitz et al. | Jan 2018 | B2 |
9959308 | Carman et al. | May 2018 | B1 |
10031931 | Horowitz et al. | Jul 2018 | B2 |
10031956 | Merriman et al. | Jul 2018 | B2 |
10262050 | Bostic et al. | Apr 2019 | B2 |
10303570 | Nakajima | May 2019 | B2 |
10346430 | Horowitz et al. | Jul 2019 | B2 |
10346434 | Morkel et al. | Jul 2019 | B1 |
10366100 | Horowitz et al. | Jul 2019 | B2 |
10372926 | Leshinsky et al. | Aug 2019 | B1 |
10394822 | Stearn | Aug 2019 | B2 |
10423626 | Stearn et al. | Sep 2019 | B2 |
10430433 | Stearn et al. | Oct 2019 | B2 |
10474645 | Freedman | Nov 2019 | B2 |
10489357 | Horowitz et al. | Nov 2019 | B2 |
10496669 | Merriman et al. | Dec 2019 | B2 |
10614098 | Horowitz et al. | Apr 2020 | B2 |
10621050 | Horowitz et al. | Apr 2020 | B2 |
10621200 | Merriman et al. | Apr 2020 | B2 |
10671496 | Horowitz et al. | Jun 2020 | B2 |
10673623 | Horowitz et al. | Jun 2020 | B2 |
10698775 | Horowitz et al. | Jun 2020 | B2 |
10713275 | Merriman et al. | Jul 2020 | B2 |
10713280 | Horowitz et al. | Jul 2020 | B2 |
20010021929 | Lin et al. | Sep 2001 | A1 |
20020029207 | Bakalash et al. | Mar 2002 | A1 |
20020065675 | Grainger et al. | May 2002 | A1 |
20020065676 | Grainger et al. | May 2002 | A1 |
20020065677 | Grainger et al. | May 2002 | A1 |
20020143901 | Lupo et al. | Oct 2002 | A1 |
20020147842 | Breitbart et al. | Oct 2002 | A1 |
20020184239 | Mosher, Jr. et al. | Dec 2002 | A1 |
20030046307 | Rivette et al. | Mar 2003 | A1 |
20030212668 | Hinshaw et al. | Apr 2003 | A1 |
20030084073 | Hotti et al. | May 2003 | A1 |
20030088659 | Susarla et al. | May 2003 | A1 |
20030182427 | Halpern | Sep 2003 | A1 |
20030187864 | McGoveran | Oct 2003 | A1 |
20040078569 | Hotti | Apr 2004 | A1 |
20040133591 | Holenstein et al. | Jul 2004 | A1 |
20040168084 | Owen et al. | Aug 2004 | A1 |
20040186817 | Thames et al. | Sep 2004 | A1 |
20040186826 | Choi et al. | Sep 2004 | A1 |
20040205048 | Pizzo et al. | Oct 2004 | A1 |
20040236743 | Blaicher et al. | Nov 2004 | A1 |
20040254919 | Giuseppini | Dec 2004 | A1 |
20050027796 | San Andres et al. | Feb 2005 | A1 |
20050033756 | Kottomtharayil et al. | Feb 2005 | A1 |
20050038833 | Colrain et al. | Feb 2005 | A1 |
20050192921 | Chaudhuri et al. | Sep 2005 | A1 |
20050234841 | Miao et al. | Oct 2005 | A1 |
20050283457 | Sonkin et al. | Dec 2005 | A1 |
20060004746 | Angus et al. | Jan 2006 | A1 |
20060020586 | Prompt et al. | Jan 2006 | A1 |
20060085541 | Cuomo et al. | Apr 2006 | A1 |
20060090095 | Massa et al. | Apr 2006 | A1 |
20060168154 | Zhang et al. | Jul 2006 | A1 |
20060209782 | Miller et al. | Sep 2006 | A1 |
20060218123 | Chowdhuri et al. | Sep 2006 | A1 |
20060235905 | Kapur | Oct 2006 | A1 |
20060287998 | Folting et al. | Dec 2006 | A1 |
20060288232 | Ho et al. | Dec 2006 | A1 |
20060294129 | Stanfill et al. | Dec 2006 | A1 |
20070050436 | Chen et al. | Mar 2007 | A1 |
20070061487 | Moore et al. | Mar 2007 | A1 |
20070094237 | Mitchell et al. | Apr 2007 | A1 |
20070203944 | Batra et al. | Aug 2007 | A1 |
20070226640 | Holbrook et al. | Sep 2007 | A1 |
20070233746 | Garbow et al. | Oct 2007 | A1 |
20070240129 | Kretzschmar et al. | Oct 2007 | A1 |
20080002741 | Maheshwari et al. | Jan 2008 | A1 |
20080005475 | Lubbers et al. | Jan 2008 | A1 |
20080016021 | Gulbeden et al. | Jan 2008 | A1 |
20080071755 | Barsness et al. | Mar 2008 | A1 |
20080098041 | Chidambaran et al. | Apr 2008 | A1 |
20080140971 | Dankel et al. | Jun 2008 | A1 |
20080162590 | Kundu et al. | Jul 2008 | A1 |
20080288646 | Hasha et al. | Nov 2008 | A1 |
20090030986 | Bates | Jan 2009 | A1 |
20090055350 | Branish et al. | Feb 2009 | A1 |
20090077010 | Muras et al. | Mar 2009 | A1 |
20090094318 | Gladwin et al. | Apr 2009 | A1 |
20090222474 | Alpern et al. | Sep 2009 | A1 |
20090240744 | Thomson et al. | Sep 2009 | A1 |
20090271412 | Lacapra et al. | Oct 2009 | A1 |
20100011026 | Saha et al. | Jan 2010 | A1 |
20100030793 | Cooper et al. | Feb 2010 | A1 |
20100030800 | Brodfuehrer et al. | Feb 2010 | A1 |
20100049717 | Ryan et al. | Feb 2010 | A1 |
20100058010 | Augenstein et al. | Mar 2010 | A1 |
20100094851 | Bent et al. | Apr 2010 | A1 |
20100106934 | Calder et al. | Apr 2010 | A1 |
20100161492 | Harvey et al. | Jun 2010 | A1 |
20100198791 | Wu et al. | Aug 2010 | A1 |
20100205028 | Johnson et al. | Aug 2010 | A1 |
20100223078 | Willis et al. | Sep 2010 | A1 |
20100235606 | Oreland et al. | Sep 2010 | A1 |
20100250930 | Csaszar et al. | Sep 2010 | A1 |
20100333111 | Kothamasu et al. | Dec 2010 | A1 |
20100333116 | Prahlad et al. | Dec 2010 | A1 |
20110022642 | deMilo et al. | Jan 2011 | A1 |
20110125704 | Mordvinova et al. | May 2011 | A1 |
20110125766 | Carozza | May 2011 | A1 |
20110125894 | Anderson et al. | May 2011 | A1 |
20110138148 | Friedman et al. | Jun 2011 | A1 |
20110202792 | Atzmony | Aug 2011 | A1 |
20110225122 | Denuit et al. | Sep 2011 | A1 |
20110225123 | D'Souza et al. | Sep 2011 | A1 |
20110231447 | Starkey | Sep 2011 | A1 |
20110246717 | Kobayashi et al. | Oct 2011 | A1 |
20110307338 | Carlson | Dec 2011 | A1 |
20120054155 | Darcy | Mar 2012 | A1 |
20120076058 | Padmanabh et al. | Mar 2012 | A1 |
20120078848 | Jennas et al. | Mar 2012 | A1 |
20120079224 | Clayton et al. | Mar 2012 | A1 |
20120084414 | Brock et al. | Apr 2012 | A1 |
20120084789 | Iorio | Apr 2012 | A1 |
20120109892 | Novik et al. | May 2012 | A1 |
20120109935 | Meijer | May 2012 | A1 |
20120130988 | Nica et al. | May 2012 | A1 |
20120131278 | Chang et al. | May 2012 | A1 |
20120136835 | Kosuru et al. | May 2012 | A1 |
20120138671 | Gaede et al. | Jun 2012 | A1 |
20120158655 | Dove et al. | Jun 2012 | A1 |
20120159097 | Jennas, II et al. | Jun 2012 | A1 |
20120166390 | Merriman et al. | Jun 2012 | A1 |
20120166517 | Lee et al. | Jun 2012 | A1 |
20120179833 | Kenrick et al. | Jul 2012 | A1 |
20120198200 | Li et al. | Aug 2012 | A1 |
20120215763 | Hughes et al. | Aug 2012 | A1 |
20120221540 | Rose et al. | Aug 2012 | A1 |
20120254175 | Horowitz et al. | Oct 2012 | A1 |
20120274664 | Fagnou | Nov 2012 | A1 |
20120320914 | Thyni et al. | Dec 2012 | A1 |
20130019296 | Brandenburg | Jan 2013 | A1 |
20130151477 | Tsaur et al. | Jun 2013 | A1 |
20130290249 | Merriman et al. | Oct 2013 | A1 |
20130290471 | Venkatesh | Oct 2013 | A1 |
20130332484 | Gajic | Dec 2013 | A1 |
20130339379 | Ferrari et al. | Dec 2013 | A1 |
20130346366 | Ananthanarayanan et al. | Dec 2013 | A1 |
20140013334 | Bisdikian et al. | Jan 2014 | A1 |
20140032525 | Merriman et al. | Jan 2014 | A1 |
20140032579 | Merriman et al. | Jan 2014 | A1 |
20140032628 | Cudak et al. | Jan 2014 | A1 |
20140074790 | Berman et al. | Mar 2014 | A1 |
20140101100 | Hu et al. | Apr 2014 | A1 |
20140164831 | Merriman et al. | Jun 2014 | A1 |
20140180723 | Cote et al. | Jun 2014 | A1 |
20140258343 | Nikula | Sep 2014 | A1 |
20140279929 | Gupta et al. | Sep 2014 | A1 |
20140280380 | Jagtap et al. | Sep 2014 | A1 |
20150012797 | Leggette et al. | Jan 2015 | A1 |
20150016300 | Devireddy et al. | Jan 2015 | A1 |
20150074041 | Bhattacharjee et al. | Mar 2015 | A1 |
20150081766 | Curtis et al. | Mar 2015 | A1 |
20150242531 | Rodniansky | Aug 2015 | A1 |
20150278295 | Merriman et al. | Oct 2015 | A1 |
20150301901 | Rath et al. | Oct 2015 | A1 |
20150331755 | Morgan | Nov 2015 | A1 |
20150341212 | Hsiao et al. | Nov 2015 | A1 |
20150378786 | Suparna et al. | Dec 2015 | A1 |
20160005423 | Neppalli et al. | Jan 2016 | A1 |
20160048345 | Vijayrao et al. | Feb 2016 | A1 |
20160110284 | Athalye et al. | Apr 2016 | A1 |
20160110414 | Park et al. | Apr 2016 | A1 |
20160162354 | Singhai et al. | Jun 2016 | A1 |
20160162374 | Mutha et al. | Jun 2016 | A1 |
20160188377 | Thimmappa et al. | Jun 2016 | A1 |
20160203202 | Merriman et al. | Jul 2016 | A1 |
20160246861 | Merriman et al. | Aug 2016 | A1 |
20160306709 | Shaull | Oct 2016 | A1 |
20160323378 | Coskun et al. | Nov 2016 | A1 |
20160364440 | Lee et al. | Dec 2016 | A1 |
20170032007 | Merriman | Feb 2017 | A1 |
20170032010 | Merriman | Feb 2017 | A1 |
20170091327 | Bostic et al. | Mar 2017 | A1 |
20170109398 | Stearn | Apr 2017 | A1 |
20170109399 | Stearn et al. | Apr 2017 | A1 |
20170109421 | Stearn et al. | Apr 2017 | A1 |
20170169059 | Horowitz et al. | Jun 2017 | A1 |
20170262516 | Horowitz et al. | Sep 2017 | A1 |
20170262517 | Horowitz et al. | Sep 2017 | A1 |
20170262519 | Horowitz et al. | Sep 2017 | A1 |
20170262638 | Horowitz et al. | Sep 2017 | A1 |
20170264432 | Horowitz et al. | Sep 2017 | A1 |
20170270176 | Horowitz et al. | Sep 2017 | A1 |
20170286510 | Horowitz et al. | Oct 2017 | A1 |
20170286516 | Horowitz et al. | Oct 2017 | A1 |
20170286517 | Horowitz et al. | Oct 2017 | A1 |
20170286518 | Horowitz et al. | Oct 2017 | A1 |
20170322954 | Horowitz et al. | Nov 2017 | A1 |
20170322996 | Horowitz et al. | Nov 2017 | A1 |
20170344290 | Horowitz et al. | Nov 2017 | A1 |
20170344441 | Horowitz et al. | Nov 2017 | A1 |
20170344618 | Horowitz et al. | Nov 2017 | A1 |
20170371750 | Horowitz et al. | Dec 2017 | A1 |
20170371968 | Horowitz et al. | Dec 2017 | A1 |
20180004801 | Burchall | Jan 2018 | A1 |
20180004804 | Merriman et al. | Jan 2018 | A1 |
20180095852 | Keremane et al. | Apr 2018 | A1 |
20180096045 | Merriman et al. | Apr 2018 | A1 |
20180165338 | Kumar et al. | Jun 2018 | A1 |
20180173745 | Balasubramanian | Jun 2018 | A1 |
20180300209 | Rahut | Oct 2018 | A1 |
20180300381 | Horowitz et al. | Oct 2018 | A1 |
20180300385 | Merriman et al. | Oct 2018 | A1 |
20180314750 | Merriman et al. | Nov 2018 | A1 |
20180343131 | George et al. | Nov 2018 | A1 |
20190102410 | Horowitz et al. | Apr 2019 | A1 |
20190303382 | Bostic et al. | Oct 2019 | A1 |
20200097486 | Horowitz et al. | Mar 2020 | A1 |
Entry |
---|
Ongaro et al., In Search of an Understandable Consensus Algorithm. Proceedings of USENIX ATC '14: 2014 USENIX Annual Technical Conference. Philadelphia, PA. Jun. 19-20, 2014; pp. 305-320. |
[No Author Listed], Automated Administration Tasks (SQL Server Agent). https://docs.microsoft.com/en-us/sql/ssms/agent/automated-adminsitration-tasks-sql-server-agent. 2 pages. [downloaded Mar. 4, 2017]. |
Chang et al., Bigtable: a distributed storage system for structured data. OSDI'06: Seventh Symposium on Operating System Design and Implementation. Nov. 2006. |
Cooper et al., PNUTS: Yahoo!'s hosted data serving platform. VLDB Endowment. Aug. 2008. |
Decandia et al., Dynamo: Amazon's highly available key-value store. SOSP 2007. Oct. 2004. |
Nelson et al., Automate MongoDB with MMS. PowerPoint Presentation. Published Jul. 24, 2014. 27 slides. http://www.slideshare.net/mongodb/mms-automation-mongo-db-world. |
Poder, Oracle living books. 2009. <http://tech.e2sn.com/oracle/sql/oracle-execution-plan-operation-reference >. |
Stirman, Run MongoDB with Confidence using MMS. PowerPoint Presentation. Published Oct. 6, 2014. 34 slides. http://www.slideshare.net/mongodb/mongo-db-boston-run-mongodb-with-mms-20141001. |
Van Renesse et al., Chain replication for supporting high throughput and availability. OSDI. 2004: 91-104. |
Walsh et al., Xproc: An XML Pipeline Language. May 11, 2011. <https://www.w3.org/TR/xproc/>. |
Wikipedia, Dataflow programming. Oct. 2011. <http://en.wikipedia.org/wiki/Dataflow_programming>. |
Wikipedia, Pipeline (Unix). Sep. 2011. <http://en.wikipedia.org/wiki/Pipeline (Unix)>. |
Wilkins et al., Migrate DB2 applications to a partitioned database. developerWorks, IBM. Apr. 24, 2008, 33 pages. |
U.S. Appl. No. 16/846,916, filed Apr. 13, 2020, Horowitz et al. |
U.S. Appl. No. 16/890,948, filed Jun. 2, 2020, Merriman et al. |
U.S. Appl. No. 16/887,092, filed May 29, 2020, Horowitz et al. |
U.S. Appl. No. 16/883,653, filed May 26, 2020, Horowitz et al. |
U.S. Appl. No. 16/912,963, filed Jun. 26, 2020, Horowitz et al. |
Number | Date | Country | |
---|---|---|---|
20180365114 A1 | Dec 2018 | US |
Number | Date | Country | |
---|---|---|---|
62522540 | Jun 2017 | US | |
62522150 | Jun 2017 | US |