The present invention relates to mitigation of metadata representation of compressed data in a primary storage system. More specifically, the invention relates to a piecewise continuous function representing a mapping between a virtual address segment and a compressed data extent.
Compression enabled primary storage system use on disk metadata to map between raw and compressed data space. In one embodiment, the metadata is in the form of a B-tree and is stored on disk. The metadata functions as a layer on disk. For random accesses to compressed data to support a read request, this additional layers slows down the time for processing the request. There is a similar delay in processing write requests as well. The metadata layer generally represents a percentage of the size of the data stored in the associated storage system. In a large storage system, such as a hundred terabyte system, the size of the metadata increases significantly and may occupy a few terabytes of space. So, in addition to extending processing time, the metadata layer may also occupy a significant amount of storage space.
The metadata layer may be architecturally configured to be stored on flash storage, which is a block level mapping of the metadata. However, this metadata layer would compete for flash space with other types of metadata, such as thin provisioning, file system, etc. Accordingly, configuring the metadata layer for flash storage only serves to support the need to minimize the metadata needed for representing compressed data in primary storage systems.
The invention includes a method, computer program product, and system for minimizing metadata representation in a primary storage system.
A method, computer program product, and system are provided to support and enable mitigated metadata representation of compressed data. A processing unit is operatively coupled to a persistent storage device, and partition compression units are stored local to the storage device. Each partition is a set of compressed data, and each partition is provided with a header that contains a virtual address of data in the partition. A linear function representing a mapping between a virtual address segment and a compressed data extent is provided. In response to a read operation, the function is consulted and a compressed data block is located and de-compressed from the mapping. Similarly, in response to a write operation, content of a new segment is compressed, a new mapping of compressed metadata is computed, and at least one candidate location is found. The linear function is updated based on the placement location of the new segment.
The drawings reference herein form a part of the specification. Features shown in the drawings are meant as illustrative of only some embodiments of the invention, and not of all embodiments of the invention unless otherwise explicitly indicated.
It will be readily understood that the components of the present invention, as generally described and illustrated in the Figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the apparatus, system, and method of the present invention, as presented in the Figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.
Reference throughout this specification to “a select embodiment,” “one embodiment,” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “a select embodiment,” “in one embodiment,” or “in an embodiment” in various places throughout this specification are not necessarily referring to the same embodiment.
The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and processes that are consistent with the invention as claimed herein.
Data is compressed in small independent units referred to herein as partitions. In one embodiment, the size of each partition ranges from 8 KB to 64 KB. Each partition is configured with a header containing the uncompressed address of the partition, also referred to herein as a virtual address. In one embodiment, the header can be inside each partition. Similarly, in one embodiment, the header can be outside the partition and function as a table of contents shared by a group of partitions. In one embodiment, the header is stored in cache. The virtual address space in a block storage device is divided into a plurality of segments. In one embodiment, each segment may be a fixed size, e.g. 10 GB each. The segments are further divided into sub-segments, which are employed as a basic unit for piecewise linear mapping. The physical address space is divided into sections referred to herein as extents. In one embodiment, the extents may be a fixed size with one segment associated with one extent. Similarly, in one embodiment, the metadata representing an associated extent may be reduced by one segment sharing two or more extents. The aspect of virtualization maps each sub-segment into a contiguous range within an extent. In one embodiment, the range is referred to as a sub-extent.
With reference to
The I/O engine (140) includes a read manager (142) to support a read operation, and a write manager (144) to support a write operation. In response to receipt of a read operation by the I/O engine (140), the read manager (142) consults the linear function in the map (162), and from the map computes a physical address neighborhood containing the requested data. In one embodiment, the physical address neighborhood is larger than the compressed extent that contains the requested data. The read manager (142) reads content of the physical address neighborhood, locates a compressed data block in the read content, de-compresses the compressed block, and returns the requested data to the storage client (120) in a de-compressed format. Similarly, in response to receipt of a write operation by the I/O engine (140), the write manager (144) writes a new data segment. More specifically, the write manager (144) compresses all content in the new data segment, computes a new mapping of the compressed segment metadata in the memory (160), and determines at least one candidate write location for the new segment. The new mapping may be mutable, to accommodate the new segment, or immutable. With respect to the new segment and a mutable mapping, the write manager (144) assesses the linear function, and if there is a difference in the slope, the write manager (144) places a knot in the linear function, with the knot characterizing the change in the slope for the new mapping.
A virtualization layer is structured to reduce the metadata size. The partitions are written in accordance with a linear approximate mapping. More specifically, the data for a given range of virtual addresses, called a sub-segment, is written so that the nominal location in the physical storage is a linear function of the virtual address. In one embodiment, the mapping is approximate and the compressed partitions may be placed within a known margin of a nominal location. An example of the mapping is shown and described in
Following step (208), the relevant partitions are de-compressed (210), and the requested data is returned in a de-compressed format (212). The expanded read size as showed at step (204) includes a negligible incremental performance cost, while offering latitude for data units of different compressibilities to be placed according to one linear function.
As discussed above with respect to the read operation, the mapping is a function, and in one embodiment is a linear function. Referring to
As noted above, the sub-segment mapping metadata, hereinafter referred to as sub-segment mapping, is stored in cache. The sub-segment mapping represents a minimal amount of metadata needed to represent compressed data in the physical storage. The compression ratio of each sub-segment is represented in the sub-segment mapping. Each knot in the linear mapping is named in an associated data structure holding the stored interpolation information. Each knot entry in the data structure includes: an extent identifier, an extent offset, and a sub-segment slope. In one embodiment, the metadata of the knot entry is about 5 bytes per knot. Accordingly, data inherent to each knot is represented in the header.
Referring to
The write operation may result in maintaining the associated data as a unit, e.g. full sub-segment, or in one embodiment, may result in scattering the associated data within the physical space. When the data is scattered, an excessive quantity of indirection records and/or excessive padding may results. In another embodiment, mapping of the metadata to the physical space may be immutable or mutable. A mutable mapping is subject to change compatible with locations of data already in the physical storage. In one embodiment, with respect to mutable mapping, adjustments are made as new data arrives that is more or less compressible than predicted. The slope of the function represents the compressibility of the sub-segment. As the mutable mapping is amended, the slope of the function may change, and associated knots in the slope representation may be inserted or moved. In one embodiment, the mutable mapping is used for progressive sequential write operations into a new sub-segment, e.g. the sequential writes are concatenated. Similarly, in one embodiment, the mutable mapping is used when additional bytes are needed to express constraints from already written content.
An immutable mapping is not subject to change. The immutable mapping may be used for almost all sub-segments. In one embodiment, the immutable mapping is about 10 bytes per sub-segment. The immutable mapping may be made mutable in some circumstances, such as a case of a sub-segment tail overwrite. In one embodiment, spare bytes between adjacent partitions, e.g. padding, or the addition of knots into the linear representation may be incorporated into the new tentative mapping. Similarly, in one embodiment, the mapping might be mutable, e.g. subject to change, depending on the compressibility of the partition. Accordingly, the tentative mapping at step (406) accounts for the compressibility of the partition.
Following step (406) new physical address space, e.g. a new sub-extent, corresponding to the virtual address space is allocated to hold the image of the sub-segment (408). Both content and any inter-leaved padding are written to the new physical address space in accordance with the tentative mapping (410). In addition, once the new mapping has been committed, a global mapping is updated with the new mapping (412). The global mapping is an in-memory continuous map of expected location and margins for an address spaced associated with the logical capacity. Accordingly, the process shown herein demonstrates a new sub-segment write operation and the interface of the new write with the continuous map.
Referring to
In one embodiment, to facilitate the update operation, free space is provided during the write operation when the initial compression of the partition takes place. Free space may be employed by relaxing the estimated compression ratio, also referred to here as a slope of the function, when the partition is initially written. Similarly, in one embodiment, a constant amount of free space is left with the partition. In some embodiments, the free space is zeroed so that it is subject to detection when one or more compressed partitions are subject to a read operation. Similarly, the header of the compressed partitions contains the raw address of the free space so that they can be located in a read window.
As data is written, compressed partitions are initially placed into extents for various scenarios, including sequential, quasi-sequential, and random. The goal for data placement is to maintain a readable linear interpolation. For a sequential write, knots, as described above, are used to indicate a change in the interpolation slope or error-window, or to reflect a change in the target sub-extent. Similarly, for a quasi-sequential write, a redirection record is placed on disk and written to a location on a different extent than the one used in current sub-segment mapping. The redirection record does not require additional memory metadata, but will require an additional read access. For a random write, an entire sub-segment is written to a new location.
The mapping between the segments and the extents, e.g. between the raw data and the compressed data, is a non-complex representation of the mapping that provides locality in compressibility of data. The slope of the linear representation provides reliability for predicting the compression ratio. A compression ratio between an extent and the associated segment may be predicted with the slope of the function. In one embodiment, the slope enables placement of a non-sequential compressed partition. Similarly, in one embodiment, a sequence of disk sectors may be copied from one location to another location to maintain the linear mapping in response to a sub-set of partitions with a significantly different compression ratio than the others in the segment.
The system described above in
Indeed, executable code could be a single instruction, or many instructions, and may even be distributed over several different code segments, among different applications, and across several memory devices. Similarly, operational data may be identified and illustrated herein within the tool, and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, as electronic signals on a system or network.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of agents, to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. Accordingly, the implementation of representation of data compression as a linear mapping lowers metadata in compressed storage system.
It will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without departing from the spirit and scope of the invention. In particular, each segment may be dynamically assigned a limited number of extents into which the segment's data may be stored. An extent may be owned by one segment, or a quantity of segments may use different parts of the same extent. Similarly, in one embodiment, the slope of the map function may represent an average compression ratio of a plurality of segments. Accordingly, the scope of protection of this invention is limited only by the following claims and their equivalents.