This technology generally relates to data storage systems and more particularly to methods and devices for implementing NoSQL database snapshots.
Enterprises increasingly have a need to store large amounts of data in data storage systems to support modern applications. However, relational databases are not generally designed to effectively scale and often are not agile enough to meet current application requirements. Accordingly, enterprises are increasingly employing NoSQL (e.g., “No SQL” or “Not Only SQL”) databases to meet their data storage needs. NoSQL databases are able to store large volumes of data in a highly scalable infrastructure that is relatively fast and agile and generally provides superior performance over relational databases. Advantageously, NoSQL databases can be manipulated through object-oriented Application Programming Interfaces (APIs), offer relatively reliable code integration, and require fewer administrator resources to set up and manage as compared to relational databases.
However, current NoSQL databases are designed to be leveraged as solutions for relatively near term data storage. Accordingly, NoSQL databases do not provide data management features useful for relatively long term data storage, such as snapshots and aging or migration of data to different types of devices or media over time. In particular, current NoSQL databases do not provide effective tiering of ingested or stored data, such as in various types of media, devices, or infrastructure (e.g., flash, SSDs, HDDs, or cloud storage) that might satisfy various data management, data protection and availability, or data access policies established by application administrators.
Additionally, current NoSQL databases also do not provide effective snapshot capabilities that allow users to save and restore the state of the database a particular point in time, which is a desirable feature particularly for relatively long term data storage. More specifically, many NoSQL databases do not support transactions, and therefore cannot provide a consistent image of the database. Other NoSQL databases support snapshots in a storage-inefficient manner or with a significant and negative impact on database performance.
A method for implementing NoSQL database snapshots includes generating by a system node computing device a snapshot identifier and returns the snapshot identifier in response to a received request to create a snapshot of a NoSQL database. When an entry in a transaction table has a first transaction value corresponding to a transaction that has been committed, and a second transaction value that is not assigned or corresponds to another transaction that has not been committed, is determined by the system node computing device. The snapshot identifier is inserted, into the entry when the entry is determined to have the first transaction value corresponding to the transaction that has been committed and the second transaction value that is not assigned or corresponds to the another transaction that has not been committed. The determining and inserting steps are repeated by the system node computing device for each additional entry.
A non-transitory computer readable medium having stored thereon instructions for implementing NoSQL database snapshots comprising executable code which when executed by a processor, causes the processor to perform steps including generating a snapshot identifier and returns the snapshot identifier in response to a received request to create a snapshot of a NoSQL database. When an entry in a transaction table has a first transaction value corresponding to a transaction that has been committed, and a second transaction value that is not assigned or corresponds to another transaction that has not been committed, is determined. The snapshot identifier is inserted into the entry when the entry is determined to have the first transaction value corresponding to the transaction that has been committed and the second transaction value that is not assigned or corresponds to the another transaction that has not been committed. The determining and inserting steps are repeated for each additional entry.
A system node computing device including a processor and a memory coupled to the processor which is configured to be capable of executing programmed instructions comprising and stored in the memory to generate a snapshot identifier and returns the snapshot identifier in response to a received request to create a snapshot of a NoSQL database. When an entry in a transaction table has a first transaction value corresponding to a transaction that has been committed, and a second transaction value that is not assigned or corresponds to another transaction that has not been committed, is determined. The snapshot identifier is inserted into the entry when the entry is determined to have the first transaction value corresponding to the transaction that has been committed and the second transaction value that is not assigned or corresponds to the another transaction that has not been committed. The determining and inserting steps are repeated for each additional entry.
This technology provides a number of advantages including providing methods, non-transitory computer readable media, and devices that facilitate a NoSQL database with integrated management. With this technology, relatively fast, highly available, and application integrated NoSQL databases can be established in a data storage network such that various data management policies are automatically implemented. This technology also provides a highly scalable infrastructure that can add capacity dynamically and that optimizes the storage of data on types of media having different characteristics in order to provide cost-effective storage meeting end-user needs. Moreover, snapshot operations are provided with this technology that are lightweight and storage-efficient and that facilitate snapshot management even for large scale NoSQL databases storing a significant amount of data.
A network environment 10 including exemplary application server computing devices 12(1)-12(n) that access and utilize a data storage network 14 via communication network(s) 16 is illustrated in
Each of the groups 20(2)-20(4) and 20(5)-20(7) in each respective zone 18(2) and 18(3) in this example have the same topology satisfying one or more service level capabilities (SLCs) and zone 18(1) is a system zone storing metadata for the data storage network 14, as described and illustrated in more detail later. This technology provides a number of advantages including methods, non-transitory computer readable media, and devices that facilitate a NoSQL database with integrated data management and more effectively manage stored data to satisfy SLCs required by application administrators.
Referring to
The application server computing device 12 in this example includes a processor 24, a memory 26, and a communication interface 28, which are all coupled together by a bus 30 or other communication link, although the application server computing device 12 can have other types and numbers of components or other elements. The processor 24 of the application server computing device 12 executes a program of stored instructions for one or more aspects of this technology, as described and illustrated by way of the embodiments herein, although the processor 24 could execute other numbers and types of programmed instructions. The processor 24 in the application server computing device 12 may include one or more central processing units or general purpose processors with one or more processing cores, for example.
The memory 26 of the application server computing device 12 may include any of various forms of read only memory (ROM), random access memory (RAM), Flash memory, non-volatile, or volatile memory, or the like, or a combination of such devices for example. In this particular example, the memory 26 further includes hosted application(s) 32 and a software development kit (SDK) 34 that includes a REpresentational State Transfer (REST) Application Programming Interface (API) 36.
The application(s) 32 generally require large scale data storage over a relatively long period of time in this example. The SDK 34 defines functionality for establishing, interacting with, and managing a NoSQL database hosted by the data storage network 14. The SDK 34 can leverage the REST API 36 to make calls to one or more of the system node computing devices 21(1)-21(3) or storage node computing devices 22(1)-22(15) by mapping operations in the HTTP protocol, as described and illustrated in more detail later. In other examples, the application server computing device 12 can invoke operations and submit transactions by calling the REST API 36 (over HTTP) directly. While in this example the SDK 34 is co-located on the application server computing device 12 with the application(s) 32, in other examples, the data storage network 16 can include a front-end server computing device (not shown) that encapsulates the logic of the SDK 34, and other implementations of and locations for the SDK 34 logic can also be used.
The communication interface 30 of the application server computing device 12 operatively couples and communicates over communication network(s) 16 between the application server computing device 12 and at least those system and storage node computing devices 21(3), 22(6), 22(7), 22(11), 22(14), 22(15), and 22(17) designated as primary storage nodes in this example, although other types and numbers of communication networks or systems with other types and numbers of connections and configurations to other devices and elements can also be used. By way of example only, the communication network(s) 16 can use TCP/IP over Ethernet, although other types and numbers of communication networks can be used. The communication network(s) 16 in this example may employ any suitable interface mechanisms and network communication technologies including, for example, teletraffic in any suitable form (e.g., voice, modem, and the like), Public Switched Telephone Network (PSTNs), Ethernet-based Packet Data Networks (PDNs), or combinations thereof.
Referring more specifically to
In this particular example, the system node computing device 21 includes a processor 38, a memory 40, and a communication interface 42, which are all coupled together by a bus 44 or other communication link, although the system node computing device 21 can have other types and numbers of components or other elements. The processor 38 of the system node computing device 21 executes a program of stored instructions for one or more aspects of this technology, as described and illustrated by way of the embodiments herein, although the processor 38 could execute other numbers and types of programmed instructions. The processor 38 may include one or more central processing units or general purpose processors with one or more processing cores, for example.
The memory 40 of the system node computing device 21 may include any of various forms of read only memory (ROM), random access memory (RAM), Flash memory, non-volatile, or volatile memory, or the like, or a combination of such devices for example. In this example, the memory 40 further includes system tables 46 that store metadata for the data storage network 14 including information regarding the nodes, groups, zones, databases, tables, and transactions, for example, as described and illustrated in more detail later.
The communication interface 42 of the system node computing device 21 in this example operatively couples and communicates between the system node computing device 21 and an associated primary system node, when the system node computing device 21 is designated as a secondary system node, or the application server computing devices 12(1)-12(n), when the system node computing device 21 is designated as a primary system node, via the communication network(s) 16, although other types and numbers of communication networks or systems with other types and numbers of connections and configurations to other devices and elements can also be used.
Referring more specifically to
The storage node computing devices 22(1)-22(15) can be designated as primary or secondary, as described and illustrated in more detail later, and have associated roles and responsibilities with respect to the associated ones of the groups 20(2)-20(7) based on the designation, also as described and illustrated in more detail later. However, the storage node computing devices 22(1)-22(15) that are in the same one of the groups 20(2)-20(7) satisfy the same set of SLCs associated with the corresponding one of the groups 20(2)-20(7) and the associated one of the zones 18(1)-18(3), also as described and illustrated in more detail later.
In this particular example, the storage node computing device 22 includes a processor 38, a memory 50, and a communication interface 52, which are all coupled together by a bus 54 or other communication link, although the storage node computing device 22 can have other types and numbers of components or other elements. The processor 48 of the storage node computing device 22 executes a program of stored instructions for one or more aspects of this technology, as described and illustrated by way of the embodiments herein, although the processor 48 could execute other numbers and types of programmed instructions. The processor 48 may include one or more central processing units or general purpose processors with one or more processing cores, for example.
The memory 50 of the storage node computing device 22 may include any of various forms of read only memory (ROM), random access memory (RAM), Flash memory, non-volatile, or volatile memory, or the like, or a combination of such devices for example. In this example, the memory 50 further includes storage devices 56(1)-56(n). The storage device(s) 56(1)-56(n) can include optical disk-based storage, solid state drives, tape drives, flash-based storage, other types of hard disk drives, or any other type of storage devices suitable for storing items for short or long term retention, for example. Other types and numbers of storage devices can be included in the memory 50 or coupled to the storage node computing device 22 in other examples.
The communication interface 52 of the storage node computing device 22 in this example operatively couples and communicates between the storage node computing device 22 and an associated primary storage node, when the storage node computing device 22 is designated as a secondary storage node, or the application server computing devices 12(1)-12(n), when the storage node computing device 22 is designated as a primary storage node, via the communication network(s) 16, although other types and numbers of communication networks or systems with other types and numbers of connections and configurations to other devices and elements can also be used.
Although examples of the application server computing devices 12(1)-12(n), system node computing devices 21(1)-21(3), and storage node computing devices 22(1)-22(15) are described herein, it is to be understood that the devices and systems of the examples described herein are for exemplary purposes, as many variations of the specific hardware and software used to implement the examples are possible, as will be appreciated by those skilled in the relevant art(s). In addition, two or more computing systems or devices can be substituted for any one of the systems in any embodiment of the examples.
The examples also may be embodied as one or more non-transitory computer readable media having instructions stored thereon for one or more aspects of the present technology, as described and illustrated by way of the examples herein, which when executed by a processor, cause the processor to carry out the steps necessary to implement the methods of this technology, as described and illustrated with the examples herein.
An exemplary method for facilitating a NoSQL database with integrated management will now be described with reference to
Referring more specifically to
Additionally, in this example, the system node computing device 21(3) is designated in the bootstrap request as a primary storage node computing device for the group 18(1), and the system node computing devices 21(2) and 21(3) are designated in the bootstrap request as secondary system node computing devices for the group 18(1). Accordingly, all system operations and other transactions requested by the one of the application server computing devices 12(1)-12(n) or an administrator computing device will be initially handled by the primary system node computing device 21(3), which will then decide whether to access the secondary system node computing devices 21(2) and 21(3) in order to service the request (e.g., for capacity, scalability, or redundancy/data protection purposes, although other reasons for utilizing the secondary system node computing devices 21(2) and 21(3) can also be used) or to instruct the requesting computing device to access one of the primary storage node computing devices 22(3), 22(4), 22(8), 22(11), 22(12), or 22(14) of another one of the zones 18(2) or 18(3).
In this example, the system node computing devices 21(1)-21(3) of the system zone 18(1) stores system tables including a nodes table, a groups table, a zones table, a databases table, and a tables table. The nodes table includes a row with a unique identifier, an IP address, and a physical location for each storage node computing device 22(1)-22(15) in the data storage network 14. The groups table describes each group 20(1)-20(7) formed, as described and illustrated in more detail later with reference to step 404 of
The databases table in this example includes a row for each database created as described and illustrated in more detail later with reference to step 408 of
Referring back to
Referring back to
In step 504, the system node computing device 21(3) establishes the groups 20(2)-20(7) of storage node computing devices 22(1)-22(15), although other numbers of groups can be added in other data storage networks in other examples. The system node computing device 21(3) establishes the groups 20(2)-20(7) of storage node computing devices 22(1)-22(15) in response to an add group request received from the one of the application server computing devices 12(1)-12(n) or an administrator computing device, as optionally generated using the SDK 34 and based on the REST API 36, for each of the groups 20(2)-20(7).
Referring back to
The group topology indication (e.g., gold or silver) identifies a topology that satisfies a set of SLCs. The SLCs correspond with established data placement policies, data protection policies, and/or data access policies for the data storage network 14. Additionally, the SLCs determine a performance, cost, data protection scheme, or transaction consistency, for example, implemented by the storage node computing devices 22(1)-22(15) in respective ones of the groups 20(2)-20(7) that share the same topology.
For example, a data placement policy may specify the location of data storage and can be defined by a service-level objective (SLO) that dictates the performance and cost of the data storage. Accordingly, a data placement policy can be specified at the database level, table level, or entity-group level based on an indication of the associated zone upon creation of the database, table, or entity-group, as described and illustrated in more detail later with reference to step 508 of
Data protection and availability policies can be specified at the database, table, snapshot, or entity-group level. In one example, a data protection policy may specify the annual-failure-rate (AFR) of 1% and a recovery-time-objective of 1 hour, which are provided by the storage node computing devices 22(10)-22(15) of the groups 20(5)-20(7) of the silver zone 18(3) implementing data replication, dynamic disk pooling, or erasure coding, for example, across the groups 20(5)-20(7).
Additionally, data access policies can be specified at the transaction level. In one example, one of the application(s) 32 issues operations on items (e.g., create-item, update-item, delete-item, or query-items) to one or more created tables, as described and illustrated in more detail later with reference to steps 510-516 of
In step 506, the system node computing devices 21(3) forms zones 18(2) and 18(3) of the groups 20(2)-20(4) and 20(5)-20(7), established as described and illustrated earlier with reference to step 504 of
Referring back to
In the exemplary data storage network 14, the basic functionality of the storage node computing devices 22(1)-22(15) is the same, but the storage node computing devices 22(1)-22(3), 22(4)-22(6), 22(7)-22(9), 22(10)-22(11), 22(12)-22(13), and 22(14)-22(15) are organized into groups 20(2), 20(3), 20(4), 20(5), 20(6), and 20(7), respectively. The groups 20(2)-20(4) and 20(5)-20(7) are then organized into zones 18(2) and 18(3), respectively, for easier management and hardware additions and removals, among other advantages.
The topology of the groups 20(2)-20(4) and 20(5)-20(7) of the zones 18(2) and 18(3), respectively, is determined by the set of SLCs that the groups 20(2)-20(4) and 20(5)-20(7) provide. Accordingly, zones 18(2) and 18(3) consist of groups 20(2)-20(4) and 20(5)-20(7), respectively, implementing the same topology. Since the groups 20(2)-20(4) and 20(5)-20(7) implement the same respective topology, they deliver the same performance. Accordingly, adding a new group to a zone increases the available capacity of a zone and thereby provides a highly scalable infrastructure with reliable performance.
In this particular example, the zone 18(2) is a gold zone and the zone 18(3) is a silver zone, although any number of zones having another identifier and/or associated set of SLCs or level of service can also be used in other data storage networks. Accordingly, the groups 20(2)-20(4) of the gold zone 18(2) each provide a gold level service (e.g., read-ops of 500, average latency of 2 ms, and a recovery-point-objective of 30 minutes). In this example, the gold level service is implemented by a set of three storage node computing devices 22(1)-22(3), 22(4)-22(6), and 22(7)-22(9) in which stored data is replicated across three Flash-based disk storage devices.
In another example, the groups 20(5)-20(7) of the silver zone 18(3) provide a silver level service that is implemented with a set of two storage node computing devices 22(10) and 22(11), 22(12) and 22(13), and 22(14) and 22(15) using local RAID on HDD storage devices. The topologies of the groups 20(2)-20(4) and 20(5)-20(7) can be established and provided by manufacturers of the storage node computing devices 22(1)-22(15) or defined by an administrator of the data storage network 14, for example.
In step 508, the system node computing device 21(3) creates a plurality of storage elements including at least a NoSQL database, an entity group associated with the database, and a table associated with the entity group. The system node computing device 21(3) creates the storage elements in response to requests to create the storage elements received from the one of the application server computing devices 12(1)-12(n) or an administrator computing device, as optionally generated using the SDK 34 based on the REST API 36. In this example, at least one of the requests to add one of the storage elements includes an indication of a zone name, which is used to store associated items in one of the zones 18(2) or 18(3), as described and illustrated in more detail later.
In this example, the data model used in the data storage network 14 consists of databases, tables, entity-groups, items, and attributes. At the highest level, an administrator of one of the application(s) 32 hosted by the one of the application server computing device 12(1)-12(n) creates a database as described and illustrated in more detail later with reference to
Also in this particular example, an item is represented using a JavaScript Object Notation (JSON) format consisting of a collection of attribute key/value pairs where the key is a string and the values may be strings, numbers, or other JSON objects, although other types of item formats can be used in other examples including XML or any other custom format. For example, an item representing a person is shown below in JSON format:
The “SSN”: “555-55-5555” in the above example represents an attribute, where “SSN” is the key and “555-55-5555” represents the value. The above example consists of five attributes. Of these, one of the item's attributes is designated as its primary key or attribute that uniquely identifies the item, which can be the “SSN” in the above example.
A table stores a collection of items and can be a ROOT table or a CHILD table. An item in a ROOT table is an entity-group, and therefore the primary key of an item in the ROOT table is an entity-group key. A CHILD table is a sub-table of a ROOT table such that items in a CHILD table need to be referenced using both an entity-group key and the child item's primary key. For example, a ROOT table “Persons” and a CHILD table “Addresses” can be created, as described and illustrated in more detail later with reference to
In this example, a person's address (John's address in the above example), is identified by using the entity-group key (“SSN”: “555-55-5555”) and the address id (“Id”: “Home”), which is the primary key for the address item.
Additionally, in this particular example, an entity-group is an item in the ROOT table. The entity-group defines a logical group of items and defines the interface between logical and physical data grouping. Accordingly, items in CHILD tables referenced by the same entity-group key are considered to be part of the same entity-group in this example. All operations within an entity-group can be executed atomically and can provide strongly-consistent guarantees or, alternatively, strongly-consistent guarantees can be provided for transactions that span entity-groups using an underlying algorithm such as 2PC as disclosed in Bernstein et al., “Concurrency Control and Recovery in Database Systems,” 1987, for example, which is incorporated by reference herein in its entirety or Paxos as disclosed in Lamport, “The Part-Time Parliament”, ACM Transactions on Computer Systems 16, 2, May, 1998, p. 133-169, for example, which is incorporated by reference herein in its entirety, although other algorithms for guaranteeing strong consistency for transactions can also be used.
Referring more specifically to
Optionally, the specified database zone (the “{DatabaseDefaultZone}” in this example) can correspond with the zone name used in the request to form the zone (e.g., “gold” or “silver” for the exemplary zones 18(2) and 18(3), respectively), as stored in the zones one of the system tables 46 by the system zone 18(1), as described and illustrated in more detail earlier with reference to step 506 of
Referring more specifically to
Optionally, the specified table zone (the “{TableDefaultZone}” in this example) can correspond with the zone name used in the request to form the zone, as stored in the zones table by the system zone 18(1), as described and illustrated in more detail earlier with reference to step 506 of
Referring more specifically to
Optionally, the specified entity group zone (the “{EntityGroupZone}” in this example) can correspond with the zone name used in the request to form the zone, as stored in the zones table by the system zone 18(1), as described and illustrated in more detail earlier with reference to step 506 of
Referring back to
In step 512, the system node computing device 21(3) determines where the item is to be stored based on contents of the system tables 46. Referring more specifically to
Accordingly, the item-put transaction received by the system node computing device 21(3) includes an indication of the table (e.g., “{TableName}”), which is used by the system node computing device 21(3) to identify a specified one of the zones 18(2) or 18(3) for the table in which the item is to be put. In one example, the system node computing device 21(3) determines from the tables one of the system tables 46 that the metadata for the table is stored in the gold zone 18(2), which is communicated back to the one of the application server computing devices 12(1)-12(n). The tables one of the system tables 46 could have been previously populated based on the “TableMetadataZone” indicating the gold zone 18(2) in the table-create request received by the system node computing device 21(3), as described and illustrated earlier with reference to step 508 of
In response, the one of the application server computing devices 12(1)-12(n) can communicate with one of the primary storage node computing devices 22(3), 22(4), or 22(8) of the gold zone 18(2). The one of the primary storage node computing devices 22(3), 22(4), or 22(8) of the gold zone 18(2) may determine that the table data (e.g., the items) is stored in the silver zone 18(3), which is communicated back to the one of the application server computing devices 12(1)-12(n). In response, the one of the application server computing devices 12(1)-12(n) can communicate with one of the primary storage node computing devices 22(11), 22(12), or 22(14) of the silver zone 18(3) in order to store the item specified in the original item-put operation request. Accordingly, the gold zone 18(2) in this example stored the metadata for the table including an indication that items stored in the table are located in the silver zone 18(2), which could have been explicitly established the table-create request or inherited from an entity group or database associated with the table in which the item is to be stored.
Optionally, the operation can be managed through transactions using the begin-transaction and transaction-commit operations illustrated in table 1000 and received by the system node computing device 21(3). Also optionally, transaction status can be retrieved via a transaction-status operation and transactions can be aborted via a transaction-abort operation, as illustrated in table 1000. The operation of transactions is described and illustrated in more detail later with reference to
Referring back to
If the system node computing device 21(3) determines that a request to get an item has not been received, then the No branch is taken back to step 500, although the system node computing device 21(3) could also proceed back to step 510 in other examples. Optionally, one or more of the steps 502-508 can also be repeated in order to expand the data storage network 14 and/or the databases or portions thereof that are stored therein. Also optionally, the system node computing device 21(3) can receive a snapshot request at any time subsequent to creation of the storage elements in step 508, as described and illustrated in more detail later with reference to
However, if the system node computing device 21(3) determines in step 514 that a request to get an item has been received, then the Yes branch is taken to step 516. In step 516, the system node computing device 21(3) determines where the item is stored. Referring more specifically to
Accordingly, in a first step in the example illustrated in
In response and in a second step, the one of the application server computing devices 12(1)-12(n) communicates with the primary storage node computing device 22(4) of the gold zone 18(2), which determines that the table data (e.g., the items) is stored in the silver zone 18(3) and communicates this information back to the one of the application server computing devices 12(1)-12(n). In response and in a third step, the one of the application server computing devices 12(1)-12(n) communicates with the primary storage node computing device 22(12) of the silver zone 18(3), which provides the requested item as identified based on the key information included in the item-get operation request and communicated to the storage node computing device 22(12), for example.
Referring more specifically to
Referring more specifically to
In step 1300 in this particular example, the system node computing device 21(3) determines whether a request to begin a transaction has been received. The request can be sent by one of the applications 32 hosted by one of the application server computing devices 14(1)-14(n) or an administrator computing device using the begin-transaction API call illustrated in table 1000, for example, although other methods of initiating a transaction can also be used. If the system node computing device 21(3) determines a request to begin a transaction has not been received, then the No branch is taken back to step 1300 and the system node computing device 21(3) effectively waits for a transaction begin request to be received. However, if the system node computing device 21(3) determines a request to begin a transaction has been received, then the Yes branch is taken to step 1302.
In step 1302, the system node computing device 21(3) generates a transaction identifier (also referred to herein as Tx ID) and stores a pending status indication associated with the transaction identifier in the memory 40. The status in this example can be retrieved by any other transaction, such as using the transaction-status API call illustrated in table 1000, as described and illustrated in more detail later. In this example, transaction identifiers are generated atomically, but transaction identifiers can be generated in other ways in other examples. Additionally, in step 1302, the system node computing device 21(3) returns the transaction identifier in response to the transaction begin request.
In step 1304, the system node computing device 21(3) determines whether a request (also referred to herein as a command) to insert an item has been received, such as from one of the applications 32 and via the item-put API call illustrated in table 1000. Insert commands in this example include at least an attribute key/value pair and a transaction identifier, such as the transaction identifier generated and return in step 1302. If the system node computing device 23(1) determines an insert command is received, then the Yes branch is taken to step 1306.
In step 1306, the system node computing device 23(1) inserts a new entry (also referred to herein as a row) in a transaction one of the system tables 46. The new entry includes the key and value included in the insert command and a first or minimum transaction value (also referred to herein as Min Tx or xmin) equal to the transaction identifier included in the insert command, although the first transaction value could be a maximum value in examples in which the transaction identifiers are assigned in a descending order, for example. The minimum transaction value indicates the earlier transaction for which the item is valid or visible, which in the example is the transaction associated with the item insert request. In one example, transaction 10 (TX-10) inserts a new item to the transaction one of the system tables 46 with the key “123” and the value “A”. The row in the transaction one of the system tables 46 will appear as illustrated below in Table 1 subsequent to step 1306:
In this example, TX-10 creates this item, so xmin is 10. Currently, the item is valid, so the second or maximum transaction value (also referred to herein as Max Tx or xmax) is “0” indicating that it is not assigned to a transaction and that the item has not been updated yet, as described and illustrated in more detail later, although the second transaction value could be a minimum value in examples in which the transaction identifiers are assigned in a descending order, for example, and other values and the absence of a transaction maximum value can also be used to indicate that an item has not been updated.
In examples in which a plurality of commands are associated with one transaction, the minimum command value (also referred to herein as cmin) and maximum command value (also referred to herein as cmax) are used in a corresponding manner as described and illustrated herein with reference to the minimum transaction value and maximum transaction value used in the context of multiple transactions. Additionally, in these examples, an entry is visible for a command if inserted before the command and not deleted yet, or deleted by another command not before the command.
Referring back to step 1304, if the system node computing device 21(3) determines that an insert command has not been received, then the No branch is taken to step 1308. In step 1308, the system node computing device 21(3) determines whether a request to update an item has been received, such as from one of the applications 32. In this example, the item update request includes at least an attribute key/value pair and a transaction identifier. Accordingly, the update request is initiated in order to replace a stored value associated with the provided key, with a new value included with the request. If the system node computing device 21(3) determines that a request to update an item has been received, then the Yes branch is taken to step 1310.
In step 1310, the system node computing device 21(3) inserts a new entry into the transaction one of the system tables 46. The new entry includes the key and value included in the item update request and a minimum transaction value equal to the transaction identifier included in the item update request, which can be the transaction identifier generated and returned in step 1302, for example.
In step 1312, the system node computing device 21(3) identifies an entry corresponding to the key included in the item update request and sets a maximum transaction value equal to the transaction identifier included in the item update request. The maximum transaction value in the identified entry is replaced with the transaction identifier associated with the item update request because the value in the identified entry will not be valid for the item after the transaction corresponding to the item update request commits, as described and illustrated in more detail later.
Referring to the example described and illustrated earlier with reference to Table 1, assuming TX-11 is an item update request to update the item inserted in TX-10 with the value “B”, then the rows in the transaction one of the system tables 46 appear as illustrated below in Table 2 subsequent to step 1312 (and TX-11 committing):
Referring back to step 1308, if the system node computing device 21(3) determines that an item update command has not been received, then the No branch is taken to step 1314. In step 1314, the system node computing device 21(3) determines whether a request to delete an item has been received, such as from one of the applications 32. In this example, the item delete request includes at least an attribute key and a transaction identifier. Accordingly, the update request is initiated in order to delete an item associated with the provided key. If the system node computing device 21(3) determines that a request to delete an item has been received, then the Yes branch is taken to step 1316.
In step 1316, the system node computing device 21(3) identifies an entry of the transaction one of the system tables 46 that corresponds to the key included in the item delete request and that does not have a maximum transaction value (or has a transaction maximum value of “0” in the examples described and illustrated herein). An entry corresponding to the key and without a maximum transaction value will be a currently visible entry for the item associated with the key.
Accordingly, the system node computing device 21(3) also sets the maximum transaction value of the identified entry equal to the transaction identifier included in the item delete request to indicate that the item is only visible for transactions having an associated transaction identifier less than the transaction identifier of the transaction that deleted the item. Referring to the example described and illustrated earlier with reference to Table 1-2, assuming TX-12 is an item delete request to delete the item inserted in TX-10 and updated in TX-11, then the rows in the transaction one of the system tables 46 appear as illustrated below in Table 3 subsequent to step 1316 (and TX-12 committing):
In this example, item value “B” expires after TX-12 is committed, and the item with key “123” does not exist any more, but the two out-of-date versions, “A” and “B” are still maintained in the transaction one of the system tables 46. Referring back to step 1314, if the system node computing device 21(3) determines that an item delete command has not been received, then the No branch is taken to step 1318. In step 1318, the system node computing device 21(3) determines whether a request to read an item has been received, such as from one of the applications 32 via the item-get API call described and illustrated earlier with reference to table 1000.
In this example, the item read request includes at least an attribute key and a transaction identifier. Accordingly, the read request is initiated in order to read an item associated with the provided key. If the system node computing device 21(3) determines that a request to read an item has been received, then the Yes branch is taken to step 1320. In step 1320, the system node computing device 21(3) identifies an entry in the transaction one of the system tables 46 that is visible based on the transaction identifier and the key included in the item read request. Accordingly, the system node computing device 21(3) identifies the entries in the transaction one of the system tables 46 that has a key matching the key included in the item read request.
Next, the system node computing device 21(3) determines the one of the entries that is currently visible for the transaction associated with the item read request and returns the value of the visible entry in response to the item read request. For an entry to be visible, the item must be inserted by a committed transaction. Accordingly, if the inserting transaction associated with an entry is still in process or it is aborted, then the entry is not visible. However, if the item associated with an entry is not deleted, is not deleted successfully (e.g., the deleting transaction aborts), or is not deleted yet (e.g., by a non-committed transaction), it is visible.
In this example, TX-5 inserted item (123, A) and successfully committed. Later, there are two concurrent transactions 10 and 11 in progress and executing concurrently. TX-10 reads the data and TX-11 tries to update it to (123, B). Before transactions 10 and 11 start, the rows of the transaction one of the system tables 46 in this example appear as illustrated below in Table 4, when TX-11 updates, the rows will appear as illustrated below in Table 5 in a first step and in Table 6 in a second step:
In this example, TX-10 can read the item associated with key 123 at three different times. If TX-10 reads before TX-11 makes any update, the row (123, A) will be visible to TX-10. If TX-10 reads after step 1 illustrated above in Table 5, there are two versions. The second version is inserted by a not-yet-committed transaction in this example, thus only (123, A) is visible to TX-10. If TX-10 reads after step 2 illustrated above in Table 6, the first version is updated by a not-yet-committed transaction, and thus is still visible to TX-10. Therefore, irrespective of how the execution proceeds, TX-10 always sees the same version, and therefore consistency is always advantageously guaranteed.
In this example, TX-5 inserted (123, A) and committed successfully. TX-10 tries to update it to (123, B). However, TX-10 aborts prior to being committed. Later, TX-20 tries to read the data. Accordingly, in this example, after TX-5 commits, the rows in the transaction one of the system tables 46 appear as illustrated in Table 7 below. Additionally, although TX-10 aborts, it updates the transaction one of the system tables 46 as illustrated below in Table 8.
Accordingly, when TX-20 attempts to read the item associated with key 123, xmax of version 1 is a not-committed transaction, while xmin of version 2 is also a not-committed transaction. Therefore, only version 1 is visible to TX-20 (and item value “A” will be returned).
Referring back to
In step 1322, the system node computing device 21(3) receives a request to abort or commit a transaction, such as the transaction associated with the begin transaction request received in step 1300. In this example, the request to abort or commit a transaction includes at least a transaction identifier and can be sent by one of the application(s) 32 via the transaction-abort and transaction-commit API calls illustrated in table 1000, for example. If the system node computing device 21(3) determines in step 1322 that a request to abort or commit a transaction has been received, then the Yes branch is taken to step 1324.
In step 1324, the system node computing device 21(3) updates a stored status indication associated with the transaction identifier included in the received request to indicate that the transaction is aborted or committed according to the type of the received request. The status indication could have been previously stored in the memory 40 as described and illustrated earlier with reference to step 1302, for example. Subsequent to updating the stored status, or if the system node computing device 21(3) determines in step 1322 that a request to abort or commit a transaction has not been received and the No branch is taken, the system node computing device 21(3) proceeds back to step 1300 and waits to receive another begin request for another transaction.
Referring back to step 1318, if the system node computing device 21(3) determines that an item read command has not been received, then the No branch is taken and the system node computing device 21(3) proceeds back to step 1304 and the system node computing device 21(3) effectively waits for a command associated with the transaction begin in step 1300 to be received. Accordingly, in the examples described and illustrated herein, each transaction encapsulates one command, although the transactions can encapsulate any number of commands in other examples, as described and illustrated in more detail earlier.
Additionally, any of steps 1304-1320 can occur in parallel for any number of commands associated with a same transaction. Additionally, any of steps 1300-1324 can occur in parallel for any number of transactions. While insert, update, delete, and read commands are identified for purposes of the examples described and illustrated herein, other numbers and types of commands can also be used in other examples.
The examples described and illustrated herein for snapshot management leverage the transaction processing and associated transaction table management described and illustrated earlier with reference to
Referring more specifically to
In step 1402, the system node computing device 21(3) generates a snapshot identifier (ID) and a transaction identifier and returns the snapshot identifier in response to the snapshot create request. In this example, the transaction identifier can be generated as described and illustrated earlier with reference to step 1302 of
In step 1404, the system node computing device 21(3) retrieves an entry in the transaction one of the system tables 46. The retrieved entry can be located anywhere in the transaction one of the system tables 46 as a plurality of entries (e.g., all those associated with a particular entity group identified by an entity group key in the create snapshot request) will be accessed or parsed as part of the processing of the snapshot create request.
In step 1406, the system node computing device 21(3) determines whether the entry has a transaction minimum value corresponding to a transaction that has been committed and a transaction maximum value corresponding to a transaction that has not been committed. In this example, if the entry does not have a maximum transaction value (or has a value of “0”), then the entry does not have a transaction maximum value corresponding to a transaction that has not been committed. In order to determine whether the transaction has been committed, the system node computing device 21(3) can retrieve the status identifier from the memory 40, for example, which can be stored as described and illustrated earlier with reference to steps 1302 and 1324.
If the entry has a minimum transaction value corresponding to a transaction that has not been committed yet, then that transaction could still be aborted, and therefore is not visible to the snapshot. However, if the entry has a minimum transaction value corresponding to a transaction that has been committed but a transaction maximum value corresponding to a transaction that has not been committed, then the transaction not committed could still be aborted, and the entry is visible to the snapshot. Additionally, if the entry has a minimum transaction value corresponding to a transaction that has been committed but does not have a maximum transaction value (and therefore does not have a maximum transaction value corresponding to a transaction that has not been committed), then the entry has not expired and is visible to the snapshot.
Finally, if the entry has a minimum transaction value corresponding to a transaction that has been committed but has a maximum transaction value corresponding to a transaction that has been committed, then the entry has expired (e.g., been updated or deleted, as described and illustrated in more detail earlier with reference to steps 1312 and 1316, respectively), and therefore is not visible to the snapshot. Accordingly, if the system node computing device 21(3) determines that the entry has a transaction minimum value corresponding to a transaction that has been committed and a transaction maximum value corresponding to a transaction that has not been committed, then the Yes branch is taken to step 1408.
In step 1408, the system node computing device 21(3) inserts the snapshot identifier generated in step 1402 into the entry accessed in step 1404. The snapshot identifier is inserted into the entry because the entry is visible to the snapshot at the point in time corresponding to the snapshot create request, and can be used to restore based on the snapshot in the event of a loss, for example, as described and illustrated in more detail later with reference to
Subsequent to inserting the snapshot identifier in step 1408, or if the system node computing device 21(3) determines in step 1406 that the entry does not have a transaction minimum value corresponding to a transaction that has committed and a transaction maximum value corresponding to a transaction that has not been committed, and the No branch is taken, then the system node computing device 21(3) proceeds to step 1410. In step 1410, the system node computing device 21(3) determines whether there are additional entries (e.g., corresponding to the entity group key included in the snapshot create request) in the transaction one of the system tables 46 that have not been accessed.
If the system node computing device 21(3) determines that there is at least one additional entry, then the Yes branch is taken back to step 1404 and the additional entry is accessed as described and illustrated earlier. However, if the system node determines in step 1410 that there are no additional entries, then the No branch is taken back to step 1400 and the system node computing device 21(3) waits to receive another snapshot create request.
Tables 9-12 set forth below illustrate an exemplary processing of two snapshot create requests. In Table 9 in this example, the transaction one of the system tables 46 is shown after a first snapshot having a transaction identifier of “20” and a snapshot identifier of “1” is received and processed.
In Table 10 in this example, the exemplary transaction one of the system tables 46 is shown after an update command having associated transaction identifier “30” has been received to update key “456” to value “B2”. In this example, the snapshot identifier “1” is inserted into the first entry because the minimum transaction value of this entry corresponds with a committed transaction and the entry does not include a maximum transaction value (the maximum transaction value is “0” in this example indicating an absence of a maximum transaction). Additionally, the snapshot identifier “1” is inserted into the second entry because the maximum transaction value corresponds to a transaction that has not yet been committed.
In Table 11, the exemplary transaction one of the system tables 46 is shown after an update command having a transaction identifier of “50” has been received to update key “123” to value “A2”, but has not committed, and after a second snapshot having a transaction identifier of “51” and a snapshot identifier of “2” is received, but not yet processed.
Table 12 illustrates the exemplary transaction one of the system tables 46 after the second snapshot having a transaction identifier of “51” and a snapshot identifier of “2” is processed. In this example, the snapshot identifier “2” is inserted into the first entry because the transaction corresponding to the minimum transaction value “5” has been committed but the transaction corresponding to the maximum transaction value “50” has not been committed yet, and the entry is therefore visible to the second snapshot. The snapshot identifier “2” is not inserted into the second entry because the second entry has a maximum transaction value that corresponds with a transaction that has been committed.
The snapshot identifier “2” is inserted into the third entry in this example because the third entry has a minimum transaction value that corresponds with a transaction that has been committed but does not have any maximum transaction value (and therefore does not have a maximum transaction value corresponding to a transaction that has not been committed), and is therefore visible to the snapshot. Finally, the snapshot identifier “2” is not inserted into the fourth entry because the fourth entry does not have a transaction minimum value corresponding to a transaction that has been committed, and the fourth entry therefore is not visible to the second snapshot.
Referring more specifically to
If the system node computing device 21(3) determines that a snapshot restore request has not been received, then the No branch is taken back to step 1500 and the system node computing device 21(3) effectively waits for a snapshot restore request to be received. However, if the system node computing device 21(3) determines that a request to restore a snapshot has been received, then the Yes branch is taken to step 1502.
In step 1502, the system node computing device 21(3) generates a transaction identifier and extracts the snapshot identifier from the snapshot restore request. The transaction identifier can be a next transaction identifier, as described and illustrated earlier. In step 1504, the system node computing device 23(1) retrieves or accesses an entry in the transaction one of the system tables 46. The accessed entry can be located anywhere in the transaction one of the system tables 46.
In step 1506, the system node computing device 23(1) determines whether a snapshot identifier of the accessed entry matches the snapshot identifier included in the snapshot restore request. The entry having a matching snapshot identifier will also include the item value that should be restored. If the system node computing device 23(1) determines that the snapshot identifier match, then the Yes branch is taken to step 1508.
In step 1508, the system node computing device 23(1) inserts a new entry into the transaction one of the system tables 46. The new entry includes a minimum transaction value equal to the transaction identifier generated in step 1502, and has a key and item value of the retrieved entry. Accordingly, the new entry is visible beginning with the transaction corresponding to the snapshot restore request, but the access entry is maintained until recycled in a system cleanup process described and illustrated in more detail later with reference to
In step 1510, the system node computing device 23(1) identifies all other entries that have a key matching the key of the new entry (and of the retrieved entry) and sets a maximum transaction value of all of the other entries equal to the transaction identifier generated in step 1502. Accordingly, all of the other entries corresponding to the key will not be visible to transactions having an identifier after the transaction identifier of the system restore since those entries are expired and have an item value that is no longer valid subsequent to the snapshot restore.
Subsequent to identifying the other entries and marking the entries as expired, or if the system node computing device 21(3) determines in step 1506 that the snapshot identifier of the accessed entry does not match the snapshot identifier of the snapshot restore request and the No branch is taken, then the system node computing device 21(3) proceeds to step 1512. In step 1512, the system node computing device 21(3) determines whether there are additional entries in the transaction one of the system tables 46 that have not been accessed.
If the system node computing device 21(3) determines that there is at least one additional entry, then the Yes branch is taken back to step 1504 and the additional entry is accessed as described and illustrated earlier. However, if the system node computing device 21(3) determines in step 1512 that there are no additional entries, then the No branch is taken back to step 1500 and the system node computing device 21(3) waits to receive another snapshot restore request.
Tables 13-14 set forth below illustrate an exemplary processing of a snapshot restore request based on the example described and illustrated earlier with reference to Tables 9-12. Table 13 in this example corresponds to Table 12 illustrated earlier except that all of the identifier transaction identifiers correspond to committed transactions.
Tables 14 reflects this exemplary transaction one of the system tables 46 subsequent to a snapshot restore request having an associated transaction identifier of “100” and including a snapshot identifier of “1”. Accordingly, in this example, the first and second entries have a matching snapshot identifier of “1” and therefore include the item values for their respective keys that should be visible subsequent to the snapshot restore transaction. Therefore, the fifth and sixth entries are added as new entries to the transaction one of the system tables 46 with a transaction minimum value equal to the transaction identifier “100” of the snapshot restore and having the key “123” and “456” and value “A” and “B” of the accessed first and second entries, respectively.
Additionally, the third and fourth entries have keys matching the keys of the new entries, and therefore the maximum transaction value of the third and fourth entries has been set equal to the transaction identifier of the snapshot restore transaction. Therefore, these entries will not longer be visible to transaction having a transaction identifier great than “100” after the snapshot restore transaction has been committed.
Accordingly, in this example any transaction after the snapshot restore transaction having associated transaction identifier “100” has been committed will see the old value from snapshot “1”. The item value “B2” corresponding to the key “456”, which is captured by snapshot having snapshot identifier “2”, will remain in the transaction one of the system tables 46. Therefore, if a snapshot restore request is subsequently received including snapshot identifier “2”, the item value “B2” will then be visible for the key “456”. Accordingly, with this technology, a snapshot restore can advantageously be implemented without significant additional data management overhead.
Referring more specifically to
Alternatively, the system node computing device 21(3) can be configured to periodically initiate a system cleanup, and other methods for initiating a system cleanup can also be used. If the system node computing device 21(3) determines that a system cleanup has not been initiated, then the No branch is taken back to step 1600 and the system node computing device effectively waits for a system cleanup to be initiated. However, if the system node computing device 21(3) determines that a system cleanup has been initiated, then the Yes branch is taken to step 1602.
In step 1602, the system node computing device 23(1) retrieves or accesses an entry in the transaction one of the system tables 46. The accessed entry can be located anywhere in the transaction one of the system tables 46. In step 1604, the system node computing device 21(3) determines whether the entry includes a snapshot identifier. If the accessed entry includes a snapshot identifier, it will not be removed or recycled (until the snapshot is deleted) since a subsequent snapshot restore request could be received that includes the snapshot identifier. Accordingly, if the system node computing device 21(3) determines in step 1604 that the entry does not include a snapshot identifier, then the No branch is taken to step 1606.
In step 1606, the system node computing device 21(3) determines whether the entry is visible to any pending transactions. In one example, an entry is not visible to any pending transaction if it includes transaction minimum and maximum values that correspond to transactions that have been committed, and no transaction corresponding with a transaction identifier between the minimum and maximum transaction values are pending (e.g., as determined based on the stored status indicator as described and illustrated in more detail earlier).
In another example, entries that were inserted by transactions that were aborted prior to being committed are not visible to any pending transactions. Such entries can be identified based on a minimum transaction value corresponding to a transaction having an aborted status, as identified based on a stored status indicator as described and illustrated in more detail earlier. In yet another example, entries that are older versions of newer entries created by an item update command will not be visible to any pending transactions assuming no transactions having a transaction identifier less than the maximum transaction value are pending. Other types and numbers of entries can also be not visible to any pending transactions in other examples.
Accordingly, if the system node computing device 21(3) determines that the entry is not visible to any pending transactions, then the No branch is taken to step 1608. In step 1608, the system node computing device 21(3) removes the entry from the transaction one of the system tables 46. In some examples the content of the entry is removed and in other examples the entry can be marked as recyclable, and other methods of removing the entry can also be used in yet other examples. Subsequent to removing the entry, or if the system node computing device 21(3) determines that the entry has a snapshot identifier in step 1604 and the Yes branch is taken, or is visible to at least one pending transaction in step 1606 and the Yes branch is taken, the system node computing device 21(3) proceeds to step 1610.
In step 1610, the system node computing device 21(3) determines whether there are additional entries in the transaction one of the system tables 46 that have not been accessed during the current system cleanup iteration. If the system node computing device 21(3) determines that there is at least one additional entry, then the Yes branch is taken back to step 1602 and the additional entry is accessed as described and illustrated earlier. However, if the system node computing device 21(3) determines in step 1610 that there are no additional entries, then the No branch is taken back to step 1600 and the system node computing device 21(3) waits for another system cleanup to be initiated.
Accordingly, with this technology, application administrators can more effectively leverage NoSQL databases by storing data in tables located on storage node computing devices in groups and zones that have associated SLCs, as previously established upon creation of the tables or an associated entity group or database. Accordingly, management of the data is relatively integrated and policies do not have to be analyzed for every ingested item in order to provide appropriate data tiering for the data storage network. By more efficiently implementing data tiering for NoSQL databases, data can be aged more effectively. Additionally, this technology is highly scalable as capacity having predictable and established service levels can be added dynamically.
Moreover, this technology advantageously facilitates more efficient creation and management of snapshots and the restoration of data using snapshots. With this technology, management resources required to implement snapshots of a NoSQL database are reduced. In particular, this technology generates and maintains snapshots by modifying rows in a transaction table to include certain markers, and without making extra copies of existing data. Advantageously, snapshot operation and management is relatively lightweight and storage efficient with this technology.
Having thus described the basic concept of the invention, it will be rather apparent to those skilled in the art that the foregoing detailed disclosure is intended to be presented by way of example only, and is not limiting. Various alterations, improvements, and modifications will occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested hereby, and are within the spirit and scope of the invention. Additionally, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes to any order except as may be specified in the claims. Accordingly, the invention is limited only by the following claims and equivalents thereto.