The subject application generally relates to data storage, and, for example, to a data storage system that reclaims storage capacity by copying stored object data from existing low-capacity (underloaded) chunks into a newly created chunk and deleting the underloaded chunks, and related embodiments.
Contemporary cloud-based data storage systems, such as Dell EMC® Elastic Cloud Storage (ECS™) service, store data in a way that ensures data protection while retaining storage efficiency. ECS™ is referred to as “elastic” storage because the data storage system is able to store arbitrary data sets having any amount of data of any size within the available physical storage capacity, without limitations enforced at the software level.
In ECS™, object data is stored in storage units referred to as chunks, with one chunk typically storing the object data of multiple objects. When storage clients delete data, sections of dead storage space result within a chunk. To reclaim this storage space ECS™ implements a copying garbage collection in which the data from two or more low-capacity (underloaded) chunks are copied to one or more new chunk in a way that assures higher capacity utilization, after which the underloaded chunks are deleted to reclaim their capacity.
While this works well, copying garbage collection is relatively inefficient. One reason is that the same object data can be copied during copying garbage collection many times.
This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
Briefly, one or more aspects of the technology described herein are directed towards maintaining generation numbers in association with chunks stored in a data storage system. Aspects comprise detecting a group of underloaded chunks, in which each chunk of the group has a matching generation number with respect to each other chunk of the group. Described herein is accessing a destination chunk, (e.g., opening an existing chunk with a next generation number, or creating one or more new chunks for inclusion in the chunks stored in the data storage system and setting the generation number of the one or more new chunks based on adjusting the matching generation number of the group of underloaded chunks to the next generation number). Aspects comprise garbage collecting the group of underloaded chunks by copying object data from the underloaded chunks into the one or more new chunks and deleting the underloaded chunks.
Other embodiments may become apparent from the following detailed description when taken in conjunction with the drawings.
The technology described herein is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Various aspects of the technology described herein are generally directed towards garbage collection that operates in part by grouping objects into chunks based on a generation number that generally corresponds to the age of the objects in the chunks. In this way, as garbage collection occurs over time, data objects with relatively longer lifetimes tend to be stored separately from data objects with relatively shorter lifetimes; as a result, the chunks that have data objects with relatively longer lifetimes are do not become underloaded very often, and thus are not garbage collected very often.
By way of example, consider a copying garbage collector that is not aware of the age of the objects that are copied during copying garbage collection, and therefore may store relatively old, long-living objects and relatively young, short-living objects together in one chunk. Because the short living objects are deleted relatively quickly, the chunk becomes underloaded relatively quickly, and is thus garbage collected, which copies the, long-living objects into a new chunk, possibly again with short-living objects. This can repeat over and over; indeed, an object can be copied up to (L/l−1) times, where L and l are a longest and a shortest object lifetime in a system, respectively. For example, if there are objects that live for one quarter (e.g., call records) and an object that lives for five years (e.g., a financial document), then the long-living object can be copied by the copying garbage collector up to nineteen times over its five-year life. Note that such a variety of lifecycles can be commonplace in a data storage system like Elastic Cloud Storage (ECS™), because ECS™ provides a single archive platform for all types of data.
Instead, by having a copying garbage collector as described herein, relatively long-living objects get grouped together over time (over garbage collection cycles). As a result, a long-living object is not copied more than (N-1) times, where N is a number of object groups with an approximately similar lifetime. In actual operations, the number of times an object is copied is normally less, because of other delays; e.g., a complete garbage collection cycle takes a relatively long time.
As will be understood, the implementation(s) described herein are non-limiting examples, and variations to the technology can be implemented. For example, in ECS™ cloud storage technology a “chunk” is a data storage unit/structure in which data objects are stored together, garbage collected and so on; however any data storage unit/structure can be used, such as in other data storage systems. As another example, as will be understood, a chunk is associated with a generation number; however it is feasible to use another mechanism to group together objects of generally the same age, e.g., generation numbers associated with the objects, and so on.
Indeed, it should be understood that any of the examples herein are non-limiting. For instance, some of the examples are based on ECS™ cloud storage technology; however virtually any storage system may benefit from the technology described herein. Thus, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the technology may be used in various ways that provide benefits and advantages in computing and data storage in general.
Clients 108 make data system-related requests to the cluster 102, which in general is configured as one large object namespace; there may be on the order of billions of objects maintained in a cluster, for example. To this end, a node such as the node 104(2) generally comprises ports 112 by which clients connect to the cloud storage system. Example ports are provided for requests via various protocols, including but not limited to SMB (server message block), FTP (file transfer protocol), HTTP/HTTPS (hypertext transfer protocol) and NFS (Network File System); further, SSH (secure shell) allows administration-related requests, for example.
In general, and in one or more implementations, e.g., ECS™, disk space is partitioned into a set of relatively large blocks of fixed size (e.g., 128 MB) referred to as chunks; user data is generally stored in chunks, e.g., in a user data repository. Normally, one chunk contains segments of several user objects. In other words, chunks can be shared, that is, one chunk may contain segments of multiple user objects; e.g., one chunk may contain mixed segments of some number of (e.g., three) user objects.
Each node, such as the node 104(2), includes an instance of a data storage system 114 and data services; (note however that at least some data service components can be per-cluster, rather than per-node). For example, ECS™ runs a set of storage services, which together implement storage business logic. Services can maintain directory tables for keeping their metadata, which can be implemented as search trees. A blob service can maintain an object table that keeps track of objects in the data storage system 114 and generally stores their metadata, including an object's data location within a chunk. There is also a “reverse” directory table (maintained by another service) that keeps a per chunk list of objects that have their data in a particular chunk.
Further, as described herein, a garbage collector 122 is coupled to the chunk table 120 and the chunk manager 118 to group chunks based on generation number (block 124) and then garbage collect underloaded chunks 126 of one generation (e.g., with a generation number of i) into destination chunk(s) 128 of a next generation (e.g., with a generation number of i+1).
In
As represented in
As represented in
Continuing with the example of
As can be seen in
To summarize, the technology described herein uses different chunks to store objects with different lifetimes. As is understood, the lifetime of an object is counted as the number of times of copying by the garbage collector through which the object has been run.
Thus, generation 0 chunks store new objects. When a new chunk is needed to store new object(s), a new chunk is created and associated with the original generation number, which is generation 0 in one or more implementations.
Generation 1 chunks contain objects copied by the garbage once, that is, these object survived one copying by the garbage collector. Generation 2 chunks contain objects copied by the garbage collector from generation 1, that is, such objects survived two copying iterations by the garbage collector, and so on.
Thus, iterations of the garbage collector work to reduce the total number of chunks in the data storage system; (while newly created chunks increase the total number. The objects with the longest lifetime get grouped into later and later generations. Thus, the objects with the longest lifetimes start their life in generation 0, but the number of generation i chunks in a system decreases with each increment of i corresponding to each garbage collector run.
Turning to
In one or more implementations, the generation to which a chunk belongs is maintained as a chunk attribute, which is set at chunk creation, e.g., generation 0 is the original default value. Thus, new chunks created to serve new data writes belong to generation 0. Chunks of other generations are created on demand by the garbage collector as needed. More particularly, when the garbage collector offloads a chunk from generation i, the garbage collector copies data to one of an open generation i+1 chunk. If there is no such open chunk, or the object data to be copied cannot fit in an open chunk, the garbage collector creates a new chunk of generation i+1.
Turning to another aspect, there is no way to guarantee that data from generation i−1 chunks fill up a created generation i chunk. Indeed, there is a reasonably high likelihood that at least one next generation chunk is itself underloaded. A threshold capacity value (e.g., eighty percent) can be defined for such underloaded chunks. If utilization of an underloaded chunk is below the threshold capacity, the chunk is left open so that the chunk can later store less mature data. Otherwise, an underloaded chunk is sealed with respect to any new data writes.
This aspect is generally represented in
Operation 806 sets the current generation number to be that of the lowest group. Typically this will be zero, because garbage collection cycles take a long time. Operation 808 selects that group for garbage collection.
As can be understood, there needs to be at least two underloaded chunks in a group, otherwise copying would basically only compact a single underloaded chunk, which is not worth the expense. Also, a group can be empty, e.g., there may not be any underloaded chunks in a group. Thus, operation 810 can check for such a situation.
The operations of
Operation 906 represents the threshold evaluation on the destination chunk or chunks. A chunk is selected at operation 906 and evaluated with respect to the threshold capacity at operation 908. Note that every destination chunk need not be evaluated in this way, e.g., operations 902 and/or 904 can mark a chunk as not needing evaluation if the destination chunk is filled (or the evaluation/sealing or leaving open can occur at part of those operations). In any event, if a chunk is not below the capacity, the chunk is sealed at operation 910, otherwise the chunk is left open.
Note that a sealed, underloaded chunk need not be garbage collected, because such a chunk has high capacity usage. Thus, a previously open chunk may be in the detected generational group of underloaded chunks for the next generation, but following copying, may no longer be considered sufficiently underloaded to merit garbage collection. Operations 912 and 914 remove such a sealed, underloaded chunk from its generational group, if previously detected.
Similarly, an open, underloaded chunk can still be garbage collected, and thus if newly created, the open, underloaded chunk is not in the detected generational group of underloaded chunks for the next generation. Operations 916 and 918 add such an open, new underloaded chunk to its generational group
Operation 920 repeats the threshold evaluation process as needed until the next generation destination chunks are sealed or open. The process returns to
One or more aspects are represented as a data storage system 114 in
The original generation number can be zero, and the adjusted generation number can be one.
The data storage system can be further configured to maintain first chunk metadata that associates the original generation number with the original generation chunk and to maintain second chunk metadata that associates the adjusted generation number with the destination chunk.
The destination chunk can be a first destination chunk, and the data storage system can be further configured to delete an object from the first destination chunk to change the first destination chunk to a third underloaded chunk, select a fourth underloaded chunk that has a generation number that matches the adjusted generation number of the third underloaded chunk, access a second destination chunk with a further adjusted generation number that is based on the adjusted generation number, and garbage collect the third underloaded chunk and the fourth underloaded chunk by copying object data from the third underloaded chunk and from the fourth underloaded chunk into the second destination chunk and deleting the third underloaded chunk and the fourth underloaded chunk. The further adjusted generation number that is based on the adjusted generation number can be obtained by incrementing the adjusted generation number to the further adjusted generation number.
The data storage system can be further configured to detect an empty underloaded chunk from which all object data has been deleted, and garbage collect the empty underloaded chunk by deleting the empty underloaded chunk.
The data storage system can be further configured to evaluate the second destination chunk with respect to a threshold capacity value, and, in response to the second destination chunk being determined to be below the threshold capacity value, leave the second destination chunk open for storing additional object data, and in response to the second new chunk being determined not to be below the threshold capacity value, seal the second destination chunk as a sealed underloaded chunk.
One or more aspects, generally exemplified in
Maintaining the generation numbers in association with the chunks can comprise maintaining respective generation numbers for the chunks in respective metadata associated with the chunks. Maintaining the generation numbers in association with the chunks can comprise maintaining respective generation numbers in respective attributes associated with the chunks.
Aspects can comprise deleting object data from a chunk to obtain an underloaded chunk.
Setting the generation number of the one or more destination chunks based on the generation number of the group of underloaded chunks can comprise incrementing the matching generation number of the group of underloaded chunks.
Other aspects can comprise obtaining newly created object data to store, and storing the newly created object data as part of a chunk associated with a generation number of zero. Still other aspects can comprise detecting an empty underloaded chunk from which all object data has been deleted, and garbage collecting the empty underloaded chunk by deleting the empty underloaded chunk.
Aspects can comprise evaluating a chunk with respect to a threshold capacity value, and in response to the chunk being determined to be below the threshold capacity value, leaving the chunk open for storing additional object data relative to stored object data in the chunk, and in response to the chunk being determined not to be below the threshold capacity value, sealing the chunk as a sealed underloaded chunk.
Other aspects can comprise determining that a destination chunk of the one or more destination chunks is above a threshold capacity value, sealing the new chunk as a sealed underloaded chunk, and garbage collecting the sealed underloaded chunk.
One or more aspects, such as implemented in a machine-readable storage medium, comprising executable instructions that, when executed by a processor, facilitate performance of operations, can be directed towards operations exemplified in
Accessing the destination chunk can comprise creating the destination chunk, and wherein the operations further comprise increasing the first generation number into a third generation number and maintaining the third generation number in association with the destination chunk. Accessing the destination chunk can comprise selecting an existing chunk as the destination chunk, wherein the existing chunk has a next generation number relative to the first generation number.
The destination chunk can be a first destination chunk, and the operations can further comprise deleting an object from the first destination chunk to change the first destination chunk to a third underloaded chunk, detecting a fourth underloaded chunk associated with a fourth generation number that matches the third generation number, creating a second destination chunk, increasing the third generation number into a fourth generation number and maintaining the fourth generation number in association with the second destination chunk, and garbage collecting the third underloaded chunk and the fourth underloaded chunk comprising copying object data from the third underloaded chunk and the fourth underloaded chunk into the second destination chunk and deleting the third underloaded chunk and the fourth underloaded chunk.
As can be seen, the technology described herein for garbage collection method is practical to implement. By using generational groups of chunks, objects with longer lives get grouped together, whereby they are less frequently copied during garbage collection. The technology thus helps to reclaim storage capacity, yet does so efficiently, producing less undesirable disk and network traffic relative to non-generational garbage collection.
The techniques described herein can be applied to any device or set of devices (machines) capable of running programs and processes. It can be understood, therefore, that servers including physical and/or virtual machines, personal computers, laptops, handheld, portable and other computing devices and computing objects of all kinds including cell phones, tablet/slate computers, gaming/entertainment consoles and the like are contemplated for use in connection with various implementations including those exemplified herein. Accordingly, the general purpose computing mechanism described below with reference to
Implementations can partly be implemented via an operating system, for use by a developer of services for a device or object, and/or included within application software that operates to perform one or more functional aspects of the various implementations described herein. Software may be described in the general context of computer executable instructions, such as program modules, being executed by one or more computers, such as client workstations, servers or other devices. Those skilled in the art will appreciate that computer systems have a variety of configurations and protocols that can be used to communicate data, and thus, no particular configuration or protocol is considered limiting.
With reference to
Computer 1310 typically includes a variety of machine (e.g., computer) readable media and can be any available media that can be accessed by a machine such as the computer 1310. The system memory 1330 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM), and hard drive media, optical storage media, flash media, and so forth. By way of example, and not limitation, system memory 1330 may also include an operating system, application programs, other program modules, and program data.
A user can enter commands and information into the computer 1310 through one or more input devices 1340. A monitor or other type of display device is also connected to the system bus 1322 via an interface, such as output interface 1350. In addition to a monitor, computers can also include other peripheral output devices such as speakers and a printer, which may be connected through output interface 1350.
The computer 1310 may operate in a networked or distributed environment using logical connections to one or more other remote computers, such as remote computer 1370. The remote computer 1370 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, or any other remote media consumption or transmission device, and may include any or all of the elements described above relative to the computer 1310. The logical connections depicted in
As mentioned above, while example implementations have been described in connection with various computing devices and network architectures, the underlying concepts may be applied to any network system and any computing device or system in which it is desirable to implement such technology.
Also, there are multiple ways to implement the same or similar functionality, e.g., an appropriate API, tool kit, driver code, operating system, control, standalone or downloadable software object, etc., which enables applications and services to take advantage of the techniques provided herein. Thus, implementations herein are contemplated from the standpoint of an API (or other software object), as well as from a software or hardware object that implements one or more implementations as described herein. Thus, various implementations described herein can have aspects that are wholly in hardware, partly in hardware and partly in software, as well as wholly in software.
The word “example” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “example” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent example structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used, for the avoidance of doubt, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements when employed in a claim.
As mentioned, the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. As used herein, the terms “component,” “module,” “system” and the like are likewise intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
The aforementioned systems have been described with respect to interaction between several components. It can be appreciated that such systems and components can include those components or specified sub-components, some of the specified components or sub-components, and/or additional components, and according to various permutations and combinations of the foregoing. Sub-components can also be implemented as components communicatively coupled to other components rather than included within parent components (hierarchical). Additionally, it can be noted that one or more components may be combined into a single component providing aggregate functionality or divided into several separate sub-components, and that any one or more middle layers, such as a management layer, may be provided to communicatively couple to such sub-components in order to provide integrated functionality. Any components described herein may also interact with one or more other components not specifically described herein but generally known by those of skill in the art.
In view of the example systems described herein, methodologies that may be implemented in accordance with the described subject matter can also be appreciated with reference to the flowcharts/flow diagrams of the various figures. While for purposes of simplicity of explanation, the methodologies are shown and described as a series of blocks, it is to be understood and appreciated that the various implementations are not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Where non-sequential, or branched, flow is illustrated via flowcharts/flow diagrams, it can be appreciated that various other branches, flow paths, and orders of the blocks, may be implemented which achieve the same or a similar result. Moreover, some illustrated blocks are optional in implementing the methodologies described herein.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated implementations thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.
In addition to the various implementations described herein, it is to be understood that other similar implementations can be used or modifications and additions can be made to the described implementation(s) for performing the same or equivalent function of the corresponding implementation(s) without deviating therefrom. Still further, multiple processing chips or multiple devices can share the performance of one or more functions described herein, and similarly, storage can be effected across a plurality of devices. Accordingly, the invention is not to be limited to any single implementation, but rather is to be construed in breadth, spirit and scope in accordance