This technology generally relates to methods and devices for persistent storage and, more particularly, to methods for persisting data on nonvolatile memory for fast updates and instantaneous recovery and devices thereof.
In the near future, nonvolatile memories (NVM), also called storage class memories (SCM) with a lower cost per gigabit (GB) than dynamic random access memory (DRAM) and with performance comparable to DRAM, will be available in a dual in-line memory module (DIMM) form factor on the memory bus next to the processor. Since the NVM's can be accessed via load and store instructions from the processor, any write to NVM via a store instruction will make the data persistent immediately after execution. As a result, data structures manipulated by the processor can be persisted in native form eliminating the need for data transformation as done in the case of disk persistency. In contrast for example, with disk persistency the data must be transformed, e.g. serialized, to disk format before issuing the write.
Persisting data structures on NVM has several advantages including providing persistency at DRAM like speed, i.e. in nanoseconds as opposed to tens of microseconds to persist data on a solid state drive (SSD) or a few milliseconds to persist data on a hard disk drive (HDD). As a result, data persistency on NVM can be achieved a hundred times faster than on a SSD and a thousand times faster than on a HDD.
Additionally, persisting data structures on NVM provides log-less durability because only one write is needed to make the data durable as opposed to two writes required in current persistency architectures, i.e. one to log the write on durable media and one to update the in-memory volatile state. As a result, persisting data structures on NVM provides a log less durable scheme with a single layer store that saves latency and space overhead.
Further, persisting data structures on NVM provides instant recovery because the data is durable immediately after a write to NVM. When a failure occurs, recovery involves reinitializing the data structures and reverting back any partial updates. Since no log replay is required when data is persisted on NVM, recovery is enabled within sub-seconds.
Accordingly, as illustrated above there are numerous advantages to persisting data structures on NVM, however doing so is not trivial and presents its own unique challenges. In particular, to persist the data structures on NVM several problems need to be solved including how to maintain application write order. Maintaining application write order on a NVM is not trivial because the processor could evict modified data in cache to NVM in an order different from an application order. As a result, while the modified data is being evicted, a system crash would leave the data on NVM in an inconsistent state potentially corrupting the data. This problem is solved by using a cache-line flush (CLFLUSH) instruction explicitly notifying the processor to evict the cache and a store fence (SFENCE) instruction to put a memory barrier before pointers to data structures in NVM are updated.
Another problem with persisting data structures on NVM is with reverting partial updates for consistency. Caches are volatile and hold the modified data before they reach NVM. When large updates need to be done in a transaction, a power failure could leave the NVM with partial updates. These partial updates need to be reverted back before any more updates are applied.
This problem is solved by knowing the history of changes so that when there is a failure the recovery involves going over the changes and reverting back the partial updates. Techniques for solving this partial update problem include logging to a durable media and then replaying based on the log when a failure occurs or using data structures that maintain a history of changes, like multi-version data structures. Using multi-version data structures has the inherent advantage of being log-less.
To maintain persistent index structures on NVM, a volatile B+Tree data storage structure can be used. Unfortunately, there are problems with this type of persistent index structures on NVM. With this type of data structure the write or insert speed was>10 u sec for data>64 Byte because of the ongoing sorting in the existing B+Tree data storage structure which is required. In particular, each new write or insertion creates a right shift, i.e. an insert sort, in the leaf nodes requiring execution of an additional CLFLUSH instruction to persist the right shifted data and achieve the desired durability and consistency of data structures on NVM, but this introduces undesirable latency. Additionally, with a B+Tree data storage structure when multiple updates to same key are required, the write amplification to achieve the desired durability and consistency of data structures is even more pronounced.
An example of this is illustrated in a prior art multi-version B+tree data storage structure shown in
Next, to insert a key/value pair of (1,4), the slots in the existing node are all full so two new leaf nodes must be formed. Once the two new leaf nodes formed, the key/value pairs of (12,8) and (8,3) must be right-shifted to one of the new leaf nodes and the key/value pair of (5,1) must be right shifted in the other new leaf node before the new key value pair of (1,4) can be inserted in that leaf node. Accordingly, as illustrated in this example the insertion of the key/value pair of (1,4), which has a key which matched an existing key in the key/value pair (1,9) already in the node, has a pronounced write amplification to achieve this update.
A storage management computing device including at least one memory coupled to at least one processor which is configured to execute programmed instructions comprising and stored in the at least one memory to maintain a data storage structure comprising multiple nodes on non-volatile memory in at least one storage server. A determination is made when a received key in an update matches an existing key in one of the multiple nodes in the data storage structure. When the determination indicates the match, the update is provided for insertion in a slot in a vector extending from the existing key in the one of the multiple nodes for the data storage structure which matches the received key.
A method for persisting data on non-volatile memory includes maintaining, by a storage management computing device, a data storage structure comprising multiple nodes on non-volatile memory in at least one storage server. A determination is made, by the storage management computing device, when a received key in an update matches an existing key in one of the multiple nodes in the data storage structure. When the determination indicates the match, the update is provided, by the storage management computing device, for insertion in a slot in a vector extending from the existing key in the one of the multiple nodes for the data storage structure which matches the received key.
A non-transitory computer readable medium having stored thereon instructions for persisting data in non-volatile memory comprising machine executable code which when executed by a processor, causes the processor to perform steps includes maintaining a data storage structure comprising multiple nodes on non-volatile memory in at least one storage server. A determination is made when a received key in an update matches an existing key in one of the multiple nodes in the data storage structure. When the determination indicates the match, the update is provided for insertion in a slot in a vector extending from the existing key in the one of the multiple nodes for the data storage structure which matches the received key
This technology provides a number of advantages including providing methods, non-transitory computer readable media, and devices for persisting data on nonvolatile memory for fast updates and instantaneous recovery. With this technology a leaf node is made to grow vertically for updates to the same key and to grow horizontally only for a unique key. Since the updates to the vector are done within a single insert, there are zero right shifts as compared to prior implementations which drastically reduces the write amplification. Additionally with this technology, data can be persisted in B+Tree data storage structure on NVM in native form, i.e. without any transformation using load and store memory instructions. Further, this technology optimizes the index data storage structure layout for multiple updates, e.g.
ranging from one to one million updates, which improves write latency by four times and read latency by two times over prior existing systems. Even further, this technology may utilize multi-version data storage structures to take a snapshot, i.e. an application consistent view, to provide a log-less recovery mechanism with the newly added vectors.
A network environment 10 with an example of nonvolatile memory (NVM) storage management computing device 12 that persists data on nonvolatile memory for fast updates and instantaneous recovery is illustrated in
The exemplary environment 10 includes the NVM storage management computing device 12, client computing devices 14(1)-14(n), NVM storage device 16, and dynamic random access memory (DRAM) storage device 17, although this environment 10 can include other types and/or numbers of systems, devices, components, and/or elements in other configurations. This technology provides a number of advantages including providing methods, non-transitory computer readable media, and devices for persisting data on nonvolatile memory for fast updates and instantaneous recovery.
The client computing devices 14(1)-14(n) are in communication with the NVM storage management computing device 12 through communication network 18 and the NVM storage management computing device 12 is in communication with the NVM storage device 16, and dynamic random access memory (DRAM) storage device 17 through another communication network 20. By way of example, the communication networks 18 and 20 can be interconnects, local area networks (LANs), wide area networks (WANs), and/or combinations thereof, although other types and/or numbers of communication networks could be used.
Each of the client computing devices 14(1)-14(n) in this example can include a processor, a memory, a communication interface, an input device, and a display device, which are coupled together by a bus or other link, although each of the client computing devices 14(1)-14(n) can have other types and numbers of components. The client computing devices 14(1)-14(n) may run interface applications that provide an interface to exchange data with applications hosted by the NVM storage management computing device 12, for example. Each of the client computing devices 14(1)-14(n) may be, for example, a conventional personal computer (PC), a workstation, a smart phone, or other processing and/or computing system.
The NVM storage device 16 and DRAM storage device 17 in this example receive and respond to various read and write requests from the NVM storage management computing device 12, such as requests to write or store data as illustrated and described in the examples herein. Each of the NVM storage device 16 and the DRAM storage device 17 can include a processor, a memory, and a communication interface, which are coupled together by a bus or other link, although each of the NVM storage device 16 and the DRAM storage device 17 can have other types and numbers of components. By way of example, the NVM storage device 16 can include any type of non-volatile persistent storage and the DRAM storage device 17 can include conventional magnetic or optical disks, or suitable for storing data in a block-based architecture.
The NVM storage management computing device 12 can be utilized by the client computing devices 14(1)-14(n) to access and utilize the NVM storage device 16 and the DRAM storage device 17 to store and persist data, although other types and/or numbers of storage management computing device can be used. Referring more specifically to
The processor 22 in the NVM storage management computing device 12 executes a program of stored instructions one or more aspects of the present invention, as described and illustrated by way of the embodiments herein, although the processor 22 could execute other numbers and types of programmed instructions. The processor 22 in the NVM storage management computing device 12 may include one or more central processing units or general purpose processors with one or more processing cores, for example.
The memory 24 in the NVM storage management computing device 12 stores these programmed instructions for one or more aspects of the present invention, as described and illustrated herein, although some or all of the programmed instructions can be stored and/or executed elsewhere. A variety of different types of memory storage devices including random access memory (RAM), such as dynamic RAM (DRAM), or other computer readable medium which is read from and/or written to by a magnetic, optical, or other reading and/or writing system that is coupled to the processor 22 can be used. In this example, the memory 24 includes caches 30, although the memory can include other types and/or numbers of data storage, modules, and/or other programmed instructions.
The communication interface 26 in the NVM storage management computing device 12 is used to communicate between the client computing devices 14(1)-14(n) and storage server devices 16(1)-16(n), which are all coupled together via the communication networks 18 and 20, although other types and numbers of communication networks or systems with other types and numbers of connections and configurations to other devices and elements can also be used. By way of example only, one or more of the communication networks 18 and 20 can use TCP/IP over Ethernet and industry-standard protocols, including hypertext transfer protocol (HTTP), and/or secure HTTP (HTTPS), although other types and numbers of communication networks each having their own communications protocols can also be used.
Although examples of the NVM storage management computing device 12, client computing devices 14(1)-14(n), the NVM storage device 16, and the DRAM storage device 17 are described herein, the devices and/or systems of the examples described herein are for exemplary purposes, as many variations of the specific hardware and software used to implement the examples are possible, as will be appreciated by those skilled in the relevant art(s). In addition, two or more computing systems or devices can be substituted for any one of the systems in any embodiment of the examples.
The examples may also be embodied as a non-transitory computer readable medium having instructions stored thereon for one or more aspects of the present technology as described and illustrated by way of the examples herein, as described herein, which when executed by the processor 22 in the NVM storage management computing device 12, cause the processor 22 to carry out the steps necessary to implement the methods of the examples, as described and illustrated herein.
Referring to
In step 102 the NVM storage management computing device 12 receives an update with a key/value pair from one of the client computing devices 14(1)-14(n) to be persisted in the NVM storage device 16, although the update can be persisted to non-volatile memory in other locations. Additionally, in this particular example, the update from one of the client computing devices 14(1)-14(n) is a write request, although the NVM storage management computing device 12 could also receive other types and/or numbers of requests from the client computing devices 14(1)-14(n), such as a read request which is also processed more effectively with this technology as illustrated in the graph in
In step 104 the NVM storage management computing device 12 determines when the received key in the key/value pair in the update from one of the client computing devices 14(1)-14(n) matches an existing key in one of the nodes in the vectored, multi-version, B+tree data storage structure. Following the example disclosed earlier in the background with reference to
In step 106 the NVM storage management computing device 12 provides the update with the key/value pair with a received key which matches an existing key to the vectored, multi-version, B+tree data storage structure to be added into a vertical vector below the matched existing key in the node with a single operation, although other types and/or numbers of vectors or other structures in other orientations with respect to the matched existing key could be used. In this example, the update with the key/value pair with the received key is provided for addition into the vertical vector below the matched existing key in the node using a youngest first approach, i.e. the vector is reverse filled and access to youngest value is a 0(1) operation, although other approaches could be used, such as an oldest first approach by way of example only. With an oldest first approach is used, the vector is filled in a forward direction and access to youngest value is an 0(n) operation in this example. Accordingly, depending on which approach is utilized, e.g. youngest first or oldest first, the search time for a particular value in a vector varies. When for example a youngest first approach is used, searching for the youngest value in the vector is fastest and searching for the oldest value is the slowest. When for example an oldest first approach is used, then searching for the youngest value in the vector is slowest and searching for the oldest value is the fastest. By way of example, in
In step 108 the NVM storage management computing device 12, determines if the vector below the existing key in the node of the vectored, multi-version, B+tree data storage structure on the NVM storage device 16 which matches the received key is full. If in step 108 the NVM storage management computing device 12, determines the vector below the existing key which matches the received key is full, then the Yes branch is taken to step 110.
In step 110 the NVM storage management computing device 12 triggers the addition of another vector in the node of the vectored, multi-version, B+tree data storage structure on the NVM storage device 16 when the vector extending from the existing key in the one of the multiple nodes for the data storage structure which matches the received key is full. By way of example, in
If back in step 108 the NVM storage management computing device 12, determines the vector below the existing key which matches the received key is not full, then the No branch is taken to step 112. In step 112, the value in the key/value pair in the received update is added to the vector below the existing key in the node of the vectored, multi-version, B+tree data storage structure which matches the received key. Additionally, in this example the vectored, multi-version, B+tree data storage structure maintains the last current consistent version for the nodes including any values in vectors. By way of example, in
Additionally, as illustrated in this example in
If back in step 104 the NVM storage management computing device 12 determines the received key in the key/value pair in the update from one of the client computing devices 14(1)-14(n) does not match an existing key in one of the nodes in the vectored, multi-version, B+tree data storage structure, i.e. is unique, then the No branch is taken to step 114. In step 114, the NVM storage management computing device 12 may provide the received key in the key/value pair for insertion in a node in the vectored, multi-version, B+tree data storage structure.
By way of example only, in
Accordingly, as illustrated and described with the examples herein, this technology provides methods, non-transitory computer readable media, and devices for persisting data on nonvolatile memory for fast updates and instantaneous recovery. With this technology the leaf node is made to grow vertically for updates to the same key and to grow horizontally only for a unique key. Since the updates to the vector are done within a single insert, there are zero right shifts as compared to prior implementations which drastically reduces the write amplification. Additionally, with this technology, data can be persisted in B+Tree data storage structure on NVM in native form, i.e. without any transformation, using load and store memory instructions.
Further, this technology optimizes the index data structure layout for multiple updates, e.g. ranging from one to one million updates, so that write inserts are four times faster than base persistent MVDS and reads are two times faster than base persistent MVDS. By way of example only, a graph illustrating an improvement with a vectored multi-version B+tree data storage structure in accordance with an example of this technology in write latency by four times over prior existing systems is illustrated in
Even further, this technology utilizes and extends the multi-version data structures to include vectors to provide a snapshot, i.e. an application consistent view, to provide a log-less recovery mechanism. Since no log replay is required when data is persisted on NVM in accordance with examples of this technology, recovery can be enabled within sub-seconds.
Having thus described the basic concept of the invention, it will be rather apparent to those skilled in the art that the foregoing detailed disclosure is intended to be presented by way of example only, and is not limiting. Various alterations, improvements, and modifications will occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested hereby, and are within the spirit and scope of the invention. Additionally, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes to any order except as may be specified in the claims. Accordingly, the invention is limited only by the following claims and equivalents thereto.