Data storage systems are arrangements of hardware and software in which storage processors are coupled to arrays of non-volatile storage devices, such as magnetic disk drives, electronic flash drives, and/or optical drives. The storage processors service storage requests, arriving from host machines (“hosts”), which specify blocks, files, and/or other data elements to be written, read, created, deleted, and so forth. Software running on the storage processors manages incoming storage requests and performs various data processing tasks to organize and secure the data elements on the non-volatile storage devices.
Applications running on host machines commonly issue read requests to data objects served by data storage systems. For example, a host application may issue a read request for obtaining data contained in a specified range of a LUN (Logical UNit). The read request may identify the LUN by logical unit number, and may specify the range as an offset into the LUN and a length. A host application may also issue a read request to obtain a specified range of data of a particular file, e.g., by identifying a file system, file name, and range within the indicated file.
In addition to receiving read requests from hosts, data storage systems may also issue their own internal read requests. For example, a storage system may read data as part of performing data deduplication, migration, replication, relocation, or defragmentation.
Unfortunately, read requests can involve significant delays. For example, processing a read request normally entails directing a disk controller to obtain the requested data from backend storage (e.g., one or more magnetic disk drives or flash drives), which can take a significant amount of time. If the data is compressed, additional delays may be required to decompress the data. Sometimes, it is not the data itself that is relevant to the operation to be performed but rather attributes of the data. But issuing a customary read request does not return the desired attributes. What is needed, therefore, is a way of reading attributes of specified data without obtaining the data itself.
The above need is addressed at least in part by an improved technique that includes providing an attribute-only read request directed to a specified data element, accessing metadata structures that store one or more attributes associated with the specified data element, and returning the attribute (or attributes) but not the data itself in response to the request. Advantageously, the improved technique obtains attributes for desired operations without suffering the delays or processing burdens normally associated with data reads. As the accessed metadata structures are frequently cached, attribute-only read requests can often be processed with minimal delay, without having to do many if any reads of backend storage.
Certain embodiments are directed to a method of obtaining attributes associated with data. The method includes forming a read request directed to a specified data element. The read request indicates an attribute-only read of a set of attributes associated with the specified data element. In response to the read request, the method further includes accessing a set of metadata structures that store the set of attributes. The method still further includes returning the set of attributes but not the specified data element itself in a response to the read request.
Other embodiments are directed to a computerized apparatus constructed and arranged to perform a method of obtaining attributes associated with data, such as the method described above. Still other embodiments are directed to a computer program product. The computer program product stores instructions which, when executed by control circuitry of a computerized apparatus, cause the computerized apparatus to perform a method of obtaining attributes associated with data, such as the method described above.
The foregoing summary is presented for illustrative purposes to assist the reader in readily grasping example features presented herein; however, this summary is not intended to set forth required elements or to limit embodiments hereof in any way. One should appreciate that the above-described features can be combined in any manner that makes technological sense, and that all such combinations are intended to be disclosed herein, regardless of whether such combinations are identified explicitly or not.
The foregoing and other features and advantages will be apparent from the following description of particular embodiments, as illustrated in the accompanying drawings, in which like reference characters refer to the same or similar parts throughout the different views.
Embodiments of the improved technique will now be described. One should appreciate that such embodiments are provided by way of example to illustrate certain features and principles of the disclosure but are not intended to be limiting.
An improved technique of obtaining attributes associated with data includes providing an attribute-only read request directed to a specified data element, accessing metadata structures that store one or more attributes associated with the specified data element, and returning the attributes but not the data itself in response to the request.
The network 114 may be any type of network or combination of networks, such as a storage area network (SAN), a local area network (LAN), a wide area network (WAN), the Internet, and/or some other type of network or combination of networks, for example. In cases where hosts 110 are provided, such hosts 110 may connect to the nodes 120 using various technologies, such as Fibre Channel, iSCSI (Internet small computer system interface), NVMeOF (Nonvolatile Memory Express (NVMe) over Fabrics), NFS (network file system), and CIFS (common Internet file system), for example. As is known, Fibre Channel, iSCSI, and NVMeOF are block-based protocols, whereas NFS and CIFS are file-based protocols. The nodes 120 may each be configured to receive I/O requests 112 according to block-based and/or file-based protocols and to respond to such I/O requests 112 by reading or writing the storage 190.
The depiction of node 120a is intended to be representative of all nodes 120. As shown, node 120a includes one or more communication interfaces 122, a set of processing units 124, and memory 130. The communication interfaces 122 include, for example, SCSI target adapters and/or network interface adapters for converting electronic and/or optical signals received over the network 114 to electronic form for use by the node 120a. The set of processing units 124 includes one or more processing chips and/or assemblies, such as numerous multi-core CPUs (central processing units). The memory 130 includes both volatile memory, e.g., RAM (Random Access Memory), and non-volatile memory, such as one or more ROMs (Read-Only Memories), disk drives, solid state drives, and the like. The set of processing units 124 and the memory 130 together form control circuitry, which is constructed and arranged to carry out various methods and functions as described herein. Also, the memory 130 includes a variety of software constructs realized in the form of executable instructions. When the executable instructions are run by the set of processing units 124, the set of processing units 124 is made to carry out the operations of the software constructs. Although certain software constructs are specifically shown and described, it is understood that the memory 130 typically includes many other software components, which are not shown, such as an operating system, various applications, processes, and daemons.
As further shown in
The data object 180 may be composed of blocks, where a “block” is a unit of allocatable storage space. Blocks are typically uniform in size, with typical block sizes being 4 kB (kilo Bytes), 8 kB, or 16 kB, for example. No particular block size is required, however, and embodiments may support non-uniform block sizes. The data storage system 116 is configured to access the data object 180, for example, by specifying blocks of the data object 180 to be created, read, updated, or deleted.
Cache 140 is configured to receive data of incoming write requests 112w from hosts 110 and to arrange the data into pages 142, which may be block-size, for example. The cache 140 may also store recently-read data of the data objects, e.g., blocks obtained from storage 190 in response to read requests directed to specified data. The cache 140 may further store various metadata structures 144, such as those which are part of the data path 160 that have recently been accessed for reading or writing data.
Deduplication facility 150 is configured to perform deduplication, a process whereby redundant blocks are replaced with pointers to a fewer number of retained copies of those blocks. Deduplication may be performed in an inline or near-inline manner, where pages 142 in the cache 140 are compared with a set of existing blocks in the data storage system 116, e.g., using fingerprint-based matching, and duplicate copies are avoided prior to being written to persistent data-object structures. In some examples, deduplication may also be performed in the background, i.e., out of band with the initial processing of incoming writes. Deduplication is sometimes abbreviated as “dedupe,” and the ability to perform deduplication on data of a data object may be described as that data object's “dedupability.” In an example, metadata may be used to track whether particular blocks are duplicates or originals, e.g., via a dedupe flag.
Compression facility 152 is configured to perform data compression. As with deduplication, compression may be performed inline or near-inline, with pages 142 in cache 140 compressed prior to being written to persistent data-object structures. In an example, metadata of data objects track the compressed sizes of blocks. Some blocks are more compressible than others. Typically, compression is performed on a per-block basis after deduplication is attempted.
Storage tiering facility 154 is configured to perform storage tiering, i.e., placement of data into storage tiers within storage 190. Storage “tiers” refer to respective classes of storage providing respective levels of performance. For example, the data storage system 116 may support multiple storage tiers that provide, for example, “highest,” “high,” and “medium” levels of performance, with each tier including storage drives (e.g., magnetic disk drives or solid-state drives) capable of meeting the performance requirements of the respective level. In some implementations, the data storage system 116 tracks access patterns of data and moves the data from one tier to another as access patterns change. For example, a data unit that was previously identified as “cold,” meaning that it was accessed infrequently for reading and/or writing, may be promoted from the medium tier to the high tier if its access frequency increases. Likewise, a data unit previously identified as “hot” may be moved from the highest tier to the high tier if its access frequency decreases.
Ransomware facility 156 is configured to detect suspected ransomware attacks, e.g., based on patterns in blocks received by the data storage system 116, and to protect against such attacks. An example of ransomware detection and protection may be found in copending U.S. patent application Ser. No. 17/714,689, filed Apr. 6, 2022, the contents and teachings of which are incorporated herein by reference in their entirety.
Data path 160 is configured to provide metadata for accessing data objects, such as data object 180. As described in more detail below, data path 160 may include various logical blocks, mapping pointers, and block virtualization structures, some of which may track various attributes 170 of blocks. Such attributes 170 may be available for reading using attribute-only read requests as described herein.
In example operation, hosts 110 issue I/O requests 112 to the data storage system 116. Node 120a receives the I/O requests 112 at the communication interfaces 122 and initiates further processing. Such processing may involve reading and/or writing data objects, such as data object 180. In the course of writing to data objects and/or performing other activities, node 120a may generate and store attributes 170 associated with data, such as attributes associated with individual data blocks.
For example, node 120a may receive a new data block in a write request 112w and attempt to deduplicate the new block. To this end, the deduplication facility 150 may calculate a fingerprint (such as a hash value) that represents the new block and may attempt to match that fingerprint to fingerprints calculated for other blocks that were processed previously. If a match is found, redundant storage of the new block can be avoided. As this processing occurs, node 120a may store a “fingerprint” attribute that provides the calculated fingerprint in metadata associated with the new block. The node 120a may also store a “dedupe flag” attribute (e.g., a Boolean value) to indicate whether the new block was successfully deduplicated. If the new block cannot be deduplicated (e.g., no matching fingerprint is found) then the new block may be compressed instead. For example, the compression facility 152 compresses the new block. Node 120a then places the compressed block in storage 190. Node 120a may arrange mapping pointers in the data path 160 to point to the new block and may store a “compressed size” attribute in the metadata. For example, the compressed-size attribute provides the size of the compressed blocks in bytes or sectors (512-Byte units). When placing the new block in storage 190, the storage tiering facility 154 may assign the new block to a particular storage tier. The node 120a may also write a “tiering level” attribute to the metadata associated with the new block. The tiering level may be expressed as a value that explicitly denotes the assigned storage tier (e.g., highest, high, or medium) and/or in some other form, such as by using a data temperature (e.g., hot, warm, or cold).
Some attributes 170, such as the fingerprint attribute, may remain the same over time, whereas other attributes 170 may change. For example, the tiering-level attribute may change if the data temperature of the new block changes and/or if the new block is moved to a different storage tier. Likewise, the dedupe-flag attribute may change if a later-performed deduplication procedure (such as a background procedure) manages to deduplicate the new block.
In accordance with improvements hereof, attribute-only read requests may obtain attributes 170 associated with specified data without retrieving or returning the specified data itself. For example, the data path 160 may receive an attribute-only read request 112ao. The request 112ao is directed to a specified data element, such as a particular block or set of blocks. In response to receiving the request 112ao, the data path 160 may access one or more metadata structures associated with the specified data element, obtain one or more attributes 170 from the metadata structures, and return the attributes 170 in a response 112a. The data path 160 does not retrieve the specified data element, however. Thus, the response 112a includes one or more attributes 170 of the specified data element but not the specified data element itself.
Attribute-only read requests 112ao may provide a useful and efficient option in certain contexts. For example, the ransomware detection facility 156 may issue an attribute-only read request 112ao directed to recently-written blocks for accessing attributes 170 that are relevant to detecting a ransomware attack. Such attributes may include compressed size and dedupe flag, for example. Significantly, the attributes 170 may be read quickly, without suffering the delays normally associated with read requests, which would involve retrieving data from backend storage and may include decompressing the data. Also, the metadata that stores attributes 170 may frequently be found in cache 140, such that an attribute-only read request 112ao can often be achieved just by reading from cache 140, which is much faster than reading from backend storage 190.
As another example, the deduplication facility 150 may issue attribute-only read requests 112ao to obtain fingerprints of blocks quickly and efficiently, e.g., for purposes of block matching. As yet another example, the storage tiering facility 154 may issue attribute-only read requests 112ao to obtain the tiering level of specified blocks. As yet another example, a file system (not shown) may issue an attribute-only read request to blocks of a specified file, to determine, for example, how much storage space can be reclaimed by deleting the file, e.g., by checking the compressed-size attribute of the blocks of the file. Many other use cases are envisioned.
As shown, the data path 160 includes a namespace 210, a mapping structure (“mapper”) 220, and a physical block layer 230. The namespace 210 is configured to organize logical data, such as that of LUNs, file systems, virtual machine disks, snapshots, clones, and/or the like. In an example, the namespace 210 provides a large logical address space and is denominated in logical blocks 212.
The mapper 220 is configured to map logical blocks 212 in the namespace 210 to corresponding physical blocks 232 in the physical block layer 230. The physical blocks 232 are normally compressed and may thus have non-uniform size. The mapper 320 may include multiple levels of mapping structures, such as pointers, which are arranged in a tree. The levels include tops 222, mids 224, and leaves 226, which together are capable of mapping large amounts of data. The mapper 220 may also include a layer of virtuals 228, i.e., block virtualization structures for providing indirection between the leaves 226 and physical blocks 232, thus enabling physical blocks 232 to be moved without disturbing leaves 226. The tops 222, mids 224, leaves 226, and virtuals 228 depict individual pointer structures. Such pointer structures may be grouped together in arrays (not shown), which themselves may be stored in blocks.
In general, logical blocks 212 in the namespace 210 point to respective physical blocks 232 in the physical block layer 230 via mapping structures in the mapper 220. For example, a logical block 212a in the namespace 210 may point, via a path 216, to a particular top 222a, which points to a particular mid 224a, which points to a particular leaf 226a. The leaf 226a then points to a particular virtual 228a, which points to a particular physical block 232a. With this arrangement, leaves 228 represent corresponding logical blocks 212 in the namespace 210, e.g., each allocated leaf pointer 226 corresponds one-to-one to a respective logical block 212 at a respective logical address 214. Because of block sharing, however, the relationship between leaves 226 and virtuals 228 is not necessarily one-to-one. For example, multiple leaf pointers 226 can point to the same virtual (see virtual 228a).
As shown to the right of
Virtual pointer structures 228 may also store various attributes 170. For example, virtual 228 may store its own attribute 170b for tiering level. Unlike the attribute 170a, which is specific to a particular logical block 212, attribute 170b may be common to all logical blocks that share the same physical block 232. Virtual 228 may also store a fingerprint 170c, e.g., a hash value calculated from the physical block 232a prior to compression. Virtual 228 may further store an attribute 170d for compressed size, e.g., the size of compressed block 232a and an attribute for a dedupe flag 170e, which indicates whether the associated physical block, e.g., 232a, is deduplicated. As shown, virtual 228 includes a pointer 260 to a physical block 232, such as physical block 232a. Virtual 228 may also include a virtual address 262, i.e., an address of the virtual 228 within a virtual address space (one that organizes virtuals 228).
The particular attributes 170a through 170e are useful examples, but they are not intended to be limiting. For example, additional attributes 170 may be provided, and the indicated attributes may be replaced with different ones.
In some examples, attributes 170 are placed in mapping structures while processing data blocks for writing, or at other suitable times. The attributes 170 may then be obtained via attribute-only read requests. For example, an attribute-only read request 112ao may be directed to logical block 212a, which may be identified by a logical address 214. The logical address 214 may be expressed simply as a number or range of numbers that represents one or more logical blocks 212.
Once the logical address 214 has been identified, the attribute-only read request 112ao may follow the pointers through the associated mapping structures toward (but not to) the physical data, e.g., physical block 232a. For example, the read request 112ao proceeds from logical block 212a to top 222a, then to mid 224a, and then to leaf 226a. If the desired attribute or attributes are found in leaf 226a, then the read request 112ao may proceed no further, reading those attributes and returning them to the requestor. Otherwise, the read request 112ao may proceed to the pointed-to virtual 228a, where it may retrieve the desired attributes or additional desired attributes and return all obtained attributes to the requestor, proceeding no further down the data path 160.
The format shown in
In an example, an attribute-only read request 112ao is formed by specifying the indicated fields in a computer instruction. The above-described format may be defined by an API (application programming interface), such as an API provided for data I/O. In an example, the format defines one or more return values. In an example, an instruction formed using the above format may return a data structure, or multiple data structures, which provide the requested attributes 170 retrieved for the specified data element. If multiple blocks are specified (e.g., Size >1), separate data structures or separate portions of a single data structure may be returned for providing attributes of respective blocks.
In some examples, the attribute-only read request 112ao may specify a range of logical blocks, rather than a single block, and the node 120a may return a value indicating whether the range of logical blocks forms a sequential pattern, e.g., whether the logical blocks of the range map to virtuals at sequential virtual addresses 262. The returned information may also indicate a partially sequential pattern. For example, if the specified range of logical blocks includes 16 blocks but only the first 8 blocks map to sequential virtuals, then the read request 112ao may indicate the sequential range in its response.
At 820, in response to the read request 112ao, a set of metadata structures is accessed that store the set of attributes 170. For example, the read request 112ao traces a path 216 from the specified data element to an associated leaf 226 and/or virtual 228. The read request 112ao then accesses one or more attributes 170 of the specified data from the associated leaf 226 and/or virtual 228.
At 830, the set of attributes but not the specified data element itself is returned in a response to the read request 112ao. For example, attributes obtained from the leaf 226 and/or virtual 228 are returned in one or more data structures to the requestor of the read request 112ao. Data of one or more physical blocks 232 is not returned, however.
An improved technique has been described for obtaining attributes 170 associated with data. The technique includes providing an attribute-only read request 112ao directed to a specified data element, accessing metadata structures 226 and/or 228 that store one or more attributes 170 associated with the specified data element, and returning the attribute (or attributes) but not the data itself in response to the request 112ao.
Having described certain embodiments, numerous alternative embodiments or variations can be made. For example, embodiments have been described in which read requests return attributes 170 but not data. However, embodiments may also be constructed in which read requests return both attributes and data. Such embodiments may be arranged similarly to those described above, except that, in addition to accessing and returning attributes 170, they also access and return one or more associated physical blocks 232, which may include decompressing such blocks.
Also, embodiments have been described in which attributes 170 are accessed from leaves 226 and/or virtuals 228. This is merely an example, however, as some embodiments may obtain attributes from mids 224, tops 222, or other metadata structures.
Also, although embodiments have been described in which attribute-only read requests originate from components that operate within a data storage system, this is merely an example, as attribute-only read requests 112ao may also originate from hosts 110.
Further, although embodiments have been described that involve one or more data storage systems, other embodiments may involve computers, including those not normally regarded as data storage systems. Such computers may include servers, such as those used in data centers and enterprises, as well as general purpose computers, personal computers, and numerous devices, such as smart phones, tablet computers, personal data assistants, and the like.
Further, although features have been shown and described with reference to particular embodiments hereof, such features may be included and hereby are included in any of the disclosed embodiments and their variants. Thus, it is understood that features disclosed in connection with any embodiment are included in any other embodiment.
Further still, the improvement or portions thereof may be embodied as a computer program product including one or more non-transient, computer-readable storage media, such as a magnetic disk, magnetic tape, compact disk, DVD, optical disk, flash drive, solid state drive, SD (Secure Digital) chip or device, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), and/or the like (shown by way of example as medium 850 in
As used throughout this document, the words “comprising,” “including,” “containing,” and “having” are intended to set forth certain items, steps, elements, or aspects of something in an open-ended fashion. Also, as used herein and unless a specific statement is made to the contrary, the word “set” means one or more of something. This is the case regardless of whether the phrase “set of” is followed by a singular or plural object and regardless of whether it is conjugated with a singular or plural verb. Also, a “set of” elements can describe fewer than all elements present. Thus, there may be additional elements of the same kind that are not part of the set. Further, ordinal expressions, such as “first,” “second,” “third,” and so on, may be used as adjectives herein for identification purposes. Unless specifically indicated, these ordinal expressions are not intended to imply any ordering or sequence. Thus, for example, a “second” event may take place before or after a “first event,” or even if no first event ever occurs. In addition, an identification herein of a particular element, feature, or act as being a “first” such element, feature, or act should not be construed as requiring that there must also be a “second” or other such element, feature or act. Rather, the “first” item may be the only one. Also, and unless specifically stated to the contrary, “based on” is intended to be nonexclusive. Thus, “based on” should be interpreted as meaning “based at least in part on” unless specifically indicated otherwise. Although certain embodiments are disclosed herein, it is understood that these are provided by way of example only and should not be construed as limiting.
Those skilled in the art will therefore understand that various changes in form and detail may be made to the embodiments disclosed herein without departing from the scope of the following claims.