The presently disclosed subject matter relates to the field of storage space allocation for objects of a file system.
A filesystem is a means for managing logical objects and organizing data that is stored in a storage device, as a collection of logical objects, such as files, directories, hard links, soft links, access control lists (ACLs) and the like. The filesystem may be part of an operating system or an add-on program capable of managing the organization of logical objects on a storage media and allocating respective storage space. In order to present the data as a collection of logical objects, the filesystem maintains structures of metadata. The term “metadata” as used herein in a context of a filesystem should be expansively construed to cover any kind of descriptive data related to the logical objects that does not constitute a part of the logical object's content. The descriptive data may include information that describes volumes, files, directories, or any other logical objects. For example, the following descriptive data describe a file and are considered as part of the file's metadata: a file name, file size, creation time, last access/write time and block pointers that point to the actual data of the file on a storage device.
The filesystem is further responsible for allocating storage space required to store files data and for keeping track of which blocks of the storage device belong to which file and which blocks are not being used. File systems allocate storage space in a granularity of physical blocks that compose the underlying storage device. A physical block is the smallest unit writable by a disk. A file system block (the basic allocation quantum used by the filesystem) is at least the same size as or larger (in integer multiples) than the physical block size.
Filesystem allocation schemes determine the size of additional storage space to be allocated for new data of a file, so as to satisfy the size required to store the new data. Fixed sized allocation units (blocks) are used, such that in each allocation request a block or multiple blocks are allocated.
The filesystem is associated with a volume that has been initialized for hosting the filesystem. The volume is a collection of blocks on one or more storage devices (e.g. disks). The volume may be all of the blocks on a single storage device, the blocks of a partition, which is a portion of the storage device, or it may even span over multiple storage devices.
The files' metadata is generally stored in a dedicated area of the same volume that stores files and directories of the filesystem.
As mentioned above, filesystems stores for each file, as part of the file's metadata, references to data blocks that point to the file's data on the volume. Space allocation and reference to allocated space in the file's metadata is implemented by using one of the following techniques:
(i) Block based allocation—uses fixed size blocks for storing and pointing to file data; and
(ii) Extent based allocation—stores the data in variable length extents. An extent includes a range of blocks, expressed by a reference to a starting block and a length that indicates the number of successive blocks following the starting block.
An Mode (index node) is a structure that contains metadata of one file, including a mapping of the file's data, expressed by either block pointers or extent pointers.
When using the block based allocation scheme, the Mode contains, among other metadata parameters, a list of block references (pointers), one block reference for each of the blocks of the file, which are used to store the data of the file. Generally, only a limited number of block references are directly stored in the Mode, which therefore limits the amount of data the file can contain.
When an object, particularly a file, is created in the system, an Mode is allocated for holding the file metadata including the block pointers. Usually, an Mode must fit into a single block, imposing an apparent upper limit on file size. Consider a system with 512B blocks (this block size applies to both data blocks and metadata blocks). If each block pointer within the Mode is 4B large, and each Mode consists solely of block pointers, then a file can be no larger than (512B/4B)*512B=65536B=64K. Hence, modern UNIX systems use a hierarchical Mode structure, where the Mode contains pointers to data blocks and blocks of pointers (the so-called indirect blocks). On Linux, for example, the first 12 pointers, of the Mode, directly point to data blocks. This works just fine for small files. If more space is needed, then pointer 13 points to a block that contains references to more data blocks (the indirect block). If even more space is needed, then pointer 14 points to a block that contains pointers to indirect blocks (the doubly-indirect block). If even more space is needed, then pointer 15 points to a block that contains pointers to doubly indirect blocks (the triply-indirect block). Using this scheme, small files (files that fit into 12 or fewer blocks) use only one block (the Mode block) for indexing, but large files can be accommodated as well. For example a file of 30 block size and 4B per block pointer occupies 12*512B=6144B for the direct blocks, (512/4)*512B=64K for the indirect block and (512/4)*64K=8 MB for the doubly-indirect blocks and last, (512/4)*8 MB=1 GB for the triply-indirect block.
Block based allocation is simple and easy to implement. The drawback is the need to read more than one block of metadata in order to access the file's data that is indirectly referenced. Reading/writing multiple indirect blocks or extents tree-nodes in addition to file's data upon read/write requests slows down access. Examples of filesystems that use block allocations include UFS, Ext2/3, ZFS, FAT and more.
Extent based allocation uses more compact descriptors and requires fewer levels of indirection. Because of the fact that extents have variable lengths, the extents are usually stored in some kind of a B-tree, which adds some complexity. Examples of filesystems that use extent based allocation include NTFS, XFS, Ext4, VXFS and more.
By way of non-limiting example, allocating the volumes can be provided using a technique of thick provisional or technique of thin provisional. Thick volume provisioning is a traditional volume provisioning of allocating all the physical blocks up front. Thin volume provisioning is a technique using virtualization technology to give the appearance of more physical storage space than is actually allocated. The space allocated to the thin volume, upon volume creation, is a virtual space rather than a physical storage space. Ranges of the physical storage space are allocated, only upon writing actual data. Mapping techniques are used for mapping ranges of virtual address space into ranges of allocated physical storage space.
According to certain aspects of the presently disclosed subject matter there is provided a method of allocating space for logical objects of a filesystem, utilizing a processor, operatively coupled to one or more physical storage devices constituting a physical storage space. The method includes: (a) responsive to an allocation requirement related to a logical object in the filesystem, allocating, by the processor, in a virtual address space corresponding to the filesystem, a virtual allocation unit, comprising a range of contiguous virtual block addresses; wherein a size of the virtual allocation unit is determined in accordance with a current physical size of the logical object; and wherein the size of the virtual allocation unit is substantially larger than a size required with respect to the allocation requirement; and (b) responsive to subsequent write requests, related to the logical object, enabling allocating, per each of the write requests, a physical block address range in the physical storage space and enabling associating the physical block address range with a respective portion of the virtual allocation unit.
In accordance with further aspects and, optionally, in combination with other aspects of the presently disclosed subject matter, the method can further include assigning to the filesystem the virtual address space and a maximum physical space size available for use by the filesystem in the physical storage space, wherein a size of the virtual address space is substantially larger than the maximum physical space size.
In accordance with certain aspects of the presently disclosed subject matter, the virtual address space can be associated with a logical volume assigned for the filesystem.
The method can further include associating the virtual allocation unit with an offset within an address range of the logical object.
In accordance with further aspects of the presently disclosed subject matter, the method can further include determining the size of the virtual allocation unit by comparing the current physical size of the logical object to multiple size thresholds and selecting the size of the virtual allocation unit from multiple allocation unit sizes respectively associated with the multiple size thresholds.
In accordance with certain aspects of the presently disclosed subject matter, the values of the multiple allocation unit sizes can respectively depend on the multiple size thresholds and represent a growth sequence.
In accordance with further aspects of the presently disclosed subject matter, the method can further include, upon initialization of the filesystem, logically dividing the virtual address space into multiple allocation zones, respectively associated with multiple allocation unit sizes; wherein each allocation zone includes a plurality of virtual allocation units of equal size, the equal size being one of the multiple allocation unit sizes.
In accordance with further aspects of the presently disclosed subject matter, the step of allocating a virtual allocation unit can include selecting a specific allocation zone from the multiple allocation zones, in accordance with the current physical size of the logical object and allocating the virtual allocation unit from the plurality of virtual allocation units of the specific allocation zone.
According to the other aspects of the presently disclosed subject matter there is provided a system for managing logical objects. The system includes a processor operatively coupled to a memory accessible by the processor, wherein the system is operatively coupled to at least one storage device constituting a physical storage space, wherein the memory is configured to handle a virtual address space that includes virtual block addresses, and wherein the processor is configured to: (i) responsive to an allocation requirement, related to a logical object in a filesystem, allocate, in a virtual address space corresponding to the filesystem, a virtual allocation unit that includes a range of contiguous virtual block addresses; wherein a size of the virtual allocation unit is determined in accordance with a current physical size of the logical object; and wherein the size of the virtual allocation unit is substantially larger than a size required with respect to the allocation requirement; and (ii) responsive to subsequent write requests, related to the logical object, enable allocation, per each of the write requests, a physical block address range in the physical storage space and enable association of the physical block addresses with a respective portion of the virtual allocation unit.
According to the other aspects of the presently disclosed subject matter there is provided a storage system for managing logical objects. The storage system comprising an object management system and a block management system, wherein the storage system is coupled to at least one storage device constituting a physical storage space; wherein, responsive to an allocation requirement related to a logical object of a filesystem, the object management system is configured to allocate, in a virtual address space corresponding to the filesystem, a virtual allocation unit, comprising a range of contiguous virtual block addresses; wherein a size of the virtual allocation unit is determined in accordance with a current physical size of the logical object; and wherein the size of the virtual allocation unit is substantially larger than a size required with respect to the allocation requirement; and wherein, responsive to subsequent write requests related to the logical object, the block management system is configured to allocate, per each of the subsequent write requests, a physical block address range in the physical storage space and associate the physical block addresses with a respective portion of the virtual allocation unit
Among advantages of certain embodiments of the presently disclosed subject matter is reducing the fragmentation of a virtual address space allocated to a filesystem, so as to reduce the amount of entries in a mapping data structure, associated with the virtual address space. Among further advantages of certain embodiments of the presently disclosed subject matter is reducing the number of blocks/extents of a file to a small set of block extents, even for very large files, so that the whole block mapping may fit in the metadata entry of the file and thus speeding up I/O and access to metadata by eliminating indirect extent blocks access for large files.
In order to understand the presently disclosed subject matter and to see how it may be carried out in practice, the subject matter will now be described, by way of non-limiting examples only, with reference to the accompanying drawings, in which:
a and 1b illustrate a functional block diagram of a system capable of managing logical objects in accordance with certain embodiments of the currently presented subject matter;
c illustrates a logical functional diagram of a system capable of managing logical objects in accordance with certain embodiments of the currently presented subject matter;
a-4c illustrate virtual and physical allocation for a file, in accordance with an embodiment of the presently disclosed subject matter;
a are flowcharts illustrating a method for allocating space, in accordance with an embodiment of the presently disclosed subject matter; and
In the drawings and descriptions set forth, identical reference numerals indicate those components that are common to different embodiments or configurations.
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as “allocating”, “determining”, “enabling”, “assigning”, “associating”, dividing”, “selecting” or the like, refer to the action and/or processes of a computer that manipulate and/or transform data into other data, said data represented as physical quantities, e.g. such as electronic quantities, and/or said data representing the physical objects. The term “computer” as used herein should be expansively construed to cover any kind of electronic device with data processing capabilities.
As used herein, the phrase “for example,” “such as”, “for instance” and variants thereof describe non-limiting embodiments of the presently disclosed subject matter. Reference in the specification to “one case”, “some cases”, “other cases” or variants thereof means that a particular feature, structure or characteristic described in connection with the embodiment(s) is included in at least one embodiment of the presently disclosed subject matter. Thus the appearance of the phrase “one case”, “some cases”, “other cases” or variants thereof does not necessarily refer to the same embodiment(s).
It is appreciated that certain features of the presently disclosed subject matter, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the presently disclosed subject matter, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination.
a and 1b illustrate a schematic block diagram of an object management system 100 for managing at least one filesystem and the logical objects thereof and more particularly, for managing memory allocation for the logical objects, according to embodiments of the presently disclosed subject matter.
Object management system 100 implements one or more filesystems (e.g. NFS, CIFS and the like) or of an OSD (Object Storage Device) interface and enables external applications or hosts, such as hosts 1011-n, to access objects, e.g. files, that are stored in storage devices 1041-n. Hosts 1011-n interface object management system 100, using a client side filesystem application or any other file access interface.
Object management system 100 is responsible for managing the objects' metadata of one or more filesystems. Each filesystem, supported by object management system 100, utilizes a metadata table, such as an Mode table. The Mode table stores for each object (e.g. a file) an Mode (a metadata record) including all the metadata of a file, and particularly: pointers to allocation units or extents of allocation units that holds the entire object's data. Object management system 100 is further configured to allocate allocation units for storing files' data, upon demand, according to embodiments of the presently disclosed subject matter.
Object management system 100 may include or be otherwise associated with at least one processing unit, such as object control processor 121, configured to control and execute commands, such as filesystem commands which are issued by other applications or hosts 1011-n and more specifically commands that are related to extent allocation for a file. Such commands may include for example: a write request that causes augmentation of a file size or an explicit command to increase a size of a file, e.g. SetAttributes command of NFS (Network FileSystem). Object control processor 121 is further configured to operate as further detailed with reference to
a illustrates object management system 100 that is operatively coupled to a block management system 120 that includes a block control layer 103 and one or more storage devices 1041-n. Object management system 100 benefits block access services provided by block control layer 103. By way of non-limiting example, block control layer can enable thin volume provisioning or other allocation techniques for implementing extent allocation according to embodiments of the presently disclosed subject matter.
Block control layer 103 is coupled to a plurality of data storage devices 1041-n constituting a physical storage space. Block control layer 103 includes one or more processors that are operable to handle a virtual representation of the physical storage space and to facilitate mapping between the physical storage space and its virtual representation. In such cases, block control layer 103 can be configured to create and manage at least one virtualization layer interfacing between object management system 100 (or other external applications and hosts) and the physical storage space. The virtualization functions may be provided in hardware, software, firmware or any suitable combination thereof.
Object management system 100 interfaces with hosts 101 using an object representation. The object interface used to communicate with hosts 101 includes, for example: a filesystem (or volume) identifier, a file identifier (e.g. Mode number, filename and path) and an offset within the file (e.g. a byte offset or block offset within the file). On the other side, object management system 100 interfaces with block control layer 103 using a block virtual representation. The interface between object management system 100 and block control layer 103 includes, for example: a volume identifier and a block offset within the volume.
The physical storage space may comprise any appropriate permanent storage medium and may include, by way of non-limiting example, one or more disk units (DUs), also called “disk enclosures”, including several disk drives (disks).
The physical storage space further includes a plurality of physical data blocks, each physical data block may be characterized by a pair (DDid, DBA) where DDid is a serial number associated with the disk drive accommodating the physical data block, and DBA is a block number within the respective disk.
The entire address space of the storage system is divided into logical volumes, and each logical volume becomes an addressable device. A logical volume (LV) or logical unit (LU) represents a plurality of data blocks characterized by successive Logical Block Addresses (LBA). Different logical volumes may comprise different numbers of data blocks, which are typically of equal size within a given system (e.g. 512 bytes).
A logical volume is used by object management system 100 for hosting a filesystem. The logical volume stores all the filesystem objects' data and further includes a dedicated area or file for storing the Mode table of the filesystem.
b illustrates a storage system 150 that includes both object management system 100 and block management system 120. The content, capabilities and functions of object management system 100 and block management system 120 within storage system 150 is the same as described for
a and 1b, described above, illustrates a general schematic diagram of the system architecture in accordance with an embodiment of the presently disclosed subject matter. Certain embodiments of the present invention are applicable to the architecture of a computer system described with reference to
c is a logical functional diagram of object management system 100 that illustrates the relation between filesystems, logical objects and virtual address space allocated for accommodating the logical objects of each filesystem. Object management system 100 includes one or more filesystems, such as filesystems 161, 162 and 163. Each filesystem is assigned with a virtual address space, which may be part of or all of a global virtual address space 170. Global virtual address space 170 may be managed by another entity, such as block management system 120. The assignment of virtual address space to a filesystem is generally assigned upon an initialization of the filesystem. The virtual address space assigned to a filesystem may be a contiguous virtual address range within global virtual address space 170, such as virtual address space 171 and 172, or may be composed of more than one virtual address range, such as virtual address space 173. The virtual address space may be defined as a logical volume (LV) that is assigned for the filesystem, for example, virtual address space 171 is defined as a logical volume 181 that is assigned for filesystem 161. Alternatively, the virtual address space may be spanned over more than one logical volume, such as virtual address space 172 that is spanned over logical volumes 182 and 183. The virtual address space may otherwise be defined as a sub-volume or a partition within a logical volume.
Each filesystem owns multiple logical objects, for example: filesystem 161 owns logical objects 191, 192 and 193 that are stored in virtual address space 171 (or logical volume 181), while filesystem 162 owns logical objects 194, 195 and 196 that are stored in virtual address space 172. The logical objects as referred to hereinafter are objects that require space allocation for storing data thereof. Such logical objects are typically files, but other objects may also require space allocation for data, for example, ACLs (access control list). In the following description, the term ‘file’ may be used as an example for a logical object that requires space allocation. It should be noted that the term file can be replaced with the term ‘logical object’, referring to an object that requires space allocation.
Object control processor 121 is configured to implement extent allocation so that the amount of additional virtual space allocated to new data of a file, upon each allocation request, depends on the current file size rather than the size indicated in the allocation request. As the file grows, the size of the virtual space allocated for new data, grows. Such allocation is referred to hereinafter also as progressive extent size allocation.
When a small file needs additional storage space, virtual allocation units of a basic size are allocated to fulfill the allocation request. For example: if the file size is smaller than e.g. 2 MB, virtual allocation units having a size of e.g. 64 KB are allocated, upon demand. When a file access operation (e.g. a write operation) triggers additional space allocation, the current physical size of the file is evaluated by comparing the current physical file size to multiple size-thresholds. If the file size traverses one of the size thresholds, a virtual allocation unit of a bigger size is allocated. For example: if the file has just traversed the 2 MB threshold, virtual allocation units of e.g. 1 MB will be allocated upon subsequent allocation requests, until the file size exceeds a higher size-threshold (e.g. 6 MB size threshold). At any stage, the size of the allocated units depends on the current size of the file. The growth of allocated unit sizes, upon each size threshold traversal, may be according to a growth function, such as: an exponential growth, factor growth, linear growth or any other growth function or any other predetermined growth definition. Values of the multiple allocation unit sizes respectively depend on the multiple size thresholds and represent a growth sequence, which may be a progressive growth sequence (i.e. more rapid than a linear trend). For example: suppose there are 4 size thresholds: 1M, 2M, 6M and 64M. The sizes of allocation units allocated for files having a size below these thresholds, may be chosen as, e.g.: 64K, 500K, 2M and 32M, respectively.
Certain embodiments of the presently disclosed subject matter, utilize virtual allocation units of various sizes, such that the size variation among the different allocation units can be of many orders of magnitude. For example: the different between sizes of allocation units belonging to two consecutive classes can be a factor of e.g. 16 (example: a first size is 64 KB and a second size is 1 MB), so that the difference between the sizes of allocation units of the first class and sizes of allocation units of the third class is 16×16=256, the difference between the sizes of the first class and the forth class is 16×16×16=4096, etc. Note, that the difference between sizes of allocation units of consecutive classes can be other than a factor of 16 and the factor can be smaller when dealing with smaller allocation units and can grow as the size of allocation units grows.
The following allocation mechanism is adapted, so as to facilitate the allocation process of allocation units having a large size variance.
Referring back to
Logical volume 200 that implements the progressive extent size allocation, is preferably a thin provisioned volume and thus can benefit the following features provided by a thin volume provisioning: (i) logical volume 200 is mapped within a virtual address space, provided by one of the virtualization layers of the system, and can have a substantial large size, so that it can accommodate virtual allocation units of almost unlimited size. The size of logical volume 200 may be significantly larger than the physical size utilized or allowed for use by the filesystem associated with the volume; (ii) allocating physical storage ranges for actual data only upon demand (actual writing of data); and (iii) accessing high addresses of logical volume 200 for writing to allocation zones that reside, all over volume 200 and without needing to physically allocate the unused space between allocation zones or within allocation zones. Though volume 200 preferably utilizes thin volume provisioning, volume 200 may otherwise utilize any other volume provisioning.
Volume management employs data structures for mapping virtual address blocks (such as virtual allocation units within volume 200, as presented to object management system 100) into physical address blocks. An efficient implementation of a mapping data structure utilizes a sparse data structure that may be implemented using one mapping entry per each contiguous virtual address range. The mapping entry of the contiguous virtual address range also includes an associated physical address range, if allocated. If no physical address range is allocated for the corresponding virtual address range, then the mapping entry points to null, or otherwise, the entry does not exists. Thus, a highly fragmented logical volume requires a mapping data structure having a large number of entries, one per each contiguous virtual address range (fragment) that consume a substantial amount of memory. Thus, it is advantageous to reduce the fragmentation of the virtual volume so as to reduce the amount of entries in the mapping data structure, associated with the logical volume. Allocating a virtual allocation unit, as disclosed herein, enables a reservation of a contiguous virtual address range, for future use by the file related to the allocation. Physical address ranges are allocated only upon demand, i.e. upon writing real data and are associated with virtual address ranges within the virtual allocation unit.
The division of logical volume 200 into virtual allocation zones may only be known to object management system 100, while block control layer 103 may not be aware of the organization of logical volume 200 or of the extent allocation, disclosed herein. Allocation of a virtual allocation unit from one of the allocation zone is preferably managed by the object management system 100. Block control layer 103 does not allocate virtual nor physical space to accommodate the new allocation unit. Only when new data is actually written, a range of physical address space is allocated in data storage devices 1041-n and a portion of the virtual address space, included within the allocation unit, is mapped into the range of physical address space. Note that allocating a substantial amount of virtual space provided by an allocation unit, ensures that a sequential virtual space is preserved for future writing, so that a fragmentation of the virtual address space is reduced.
a-4c demonstrate space allocation for a file. Referring to
b illustrates space allocation of a file, having two physical address ranges, 421, 422, each of 500 bytes, allocated for data of the file and mapped to two successive portions (411a, 411b) of allocation unit 411. The current physical size of the file is 1000 bytes (the sum of the sizes of physical address ranges 421, 422, as well as the size of all occupied portions in allocated virtual allocation units). The available virtual space for future writings is 500 bytes, provided by portion 411c.
c illustrates a similar space allocation of a file, however, the virtual address space allocated for the file is non-continuous, as non-continuous portions 411a and 411c of allocation unit 411 are mapped into physical address ranges, while the middle portion 411b is not mapped. This scenario may be a result of punching a hole in the file (at offset 500 from the start of the file to offset 999). The punching causes freeing the physical address range that corresponds to portion 411b. According to an alternative scenario, the hole may be a result of writing data in a non-sequential manner, for example: at the time the file had a capacity of 500 bytes, occupying offsets 0-499, a write request was issued for writing 500 bytes at an offset 1000 from the start of the file. The non-sequential write request caused the allocation of only 500 bytes in the physical storage space, leaving a hole in the virtual allocation unit (i.e. an unmapped portion).
Step 510 is executed upon initialization of the filesystem and includes assigning a virtual address space for the filesystem and logically dividing the virtual address space, into multiple allocation zones, respectively associated with multiple allocation unit sizes. Each allocation zone includes a plurality of virtual allocation units of equal size, the equal size being one of the multiple allocation unit sizes, i.e. there are n allocation zones and n allocation unit sizes, S1 to Sn, wherein a first allocation zone includes a certain number, X1 of virtual allocation units, each has a size of S1, a second allocation zone includes X2 virtual allocation units, each has a size of S2 and a nth allocation zone includes Xn virtual allocation units, each has a size of Sn. The virtual address space assigned for the filesystem may be a logical volume, multiple logical volumes, part of a logical volume or a portion of a virtual address layer used by object management system 100. Step 510 further includes assigning, for the filesystem, a maximum physical space size that defines the total amount of physical space available/allowed for use by the filesystem in the physical storage space. The size of the virtual address space is substantially larger than the maximum physical space size. The virtual address space can be larger by orders of magnitude than the maximum physical space size and at least ten times larger than the maximum physical space size.
Step 510 is followed by a step 520 of receiving a command that involves an allocation requirement for allocating space to a logical object, e.g. a file. The command may be an explicit request for allocating space or for increasing the size of the file, for example: The NFS command SetAttributes that includes a size attribute with a value that is bigger than the current size of the file. The command may otherwise include an implicit requirement for space allocation, i.e.: a write request that involves writing beyond the virtual space currently allocated for the file. Following are examples of allocation requirement and a required size: referring back to
Step 520 is followed by a step 525 of checking whether a current allocation unit used by the logical object can accommodate the additional space imposed by the command. If so, step 525 is followed by step 540. If the current allocation unit is full or cannot provide the entire space required, step 520 is followed by step 530.
Step 530 includes allocating, in the virtual address space corresponding to the filesystem, at least one virtual allocation unit including a range of contiguous virtual block addresses. Step 530 includes determining a size for allocation and selecting a virtual allocation unit having the determined size. The size is determined in accordance with the current physical size of the file, regardless of the size required with respect to the allocation requirement. The size of the selected virtual allocation unit is substantially larger than a size required. The size required with respect to the allocation requirement may be explicitly specified in the request or may be implied by the request. Suppose the size required with respect to the allocation requirement is e.g. 100 bytes and the file size is e.g. 2 M bytes. The size of allocation units allocated for files of such size is 1 M bytes, regardless of the size required (100 bytes). Prior art allocation schemes may allocate the exact size required or may round up the allocation size to a block boundary. For example: if the size required for allocation is 600 bytes and a block size used by the filesystem or by the underlying storage device is 512 bytes, the selected allocation size is 1024 instead of 600. According to the presently disclosed subject matter, the size allocated to satisfy the request is more than just rounding up the required size to a block boundary, for example: the size allocated can be a factor or even orders of magnitude larger than the size required and at least as twice the size requested.
Step 530 may include selecting a specific allocation zone from the multiple allocation zones, based on the current physical size of the logical object. The current physical size of the file may be compared to multiple size thresholds, and the size of the virtual allocation unit is selected from multiple allocation unit sizes, respectively associated with the multiple size thresholds. An allocation zone that corresponds to the requested size of the virtual allocation unit, is then selected. Step 530 may include allocating one or more allocation units from the plurality of virtual allocation units of the specific allocation zone. The default number of allocation units is one. For example: suppose that the selected allocation size is 2 MB. The allocation zone that best fits this allocation size is allocation zone 201(2) that includes 1 MB allocation units. Accordingly, the number of allocation units is selected as two.
The size of the allocation unit and/or the total allocation size (i.e. “allocation unit size” times “number of allocation units”) is proportional to the file size, i.e. the bigger the file is, the bigger is the allocation unit size (or total allocation size).
Note that the allocation is a logical allocation that ensures reservation of contiguous space from the virtual address space of the logical volume.
Step 530 is followed by step 535 of associating a start virtual address of the virtual allocation unit with an offset within an address range of the logical object (file's logical block number, LBN). Step 535 further include storing the association information in a mapping metadata structure of the file, e.g. in an Mode or a B-tree that is used for mapping file blocks into LBAs within the volume. The start virtual address of the virtual allocation unit serves as the LBA. Note that since the allocation unit maps virtual addresses, within the range of the allocation unit, into multiple physical address ranges (each range is allocated as a result of a different write request), one entry in the mapping metadata structure aggregates multiple physical address ranges, represented by the virtual allocation unit.
Step 540 is executed upon receiving subsequent write requests, related to the logical object and indicative of a write size, i.e. the size of the data to be written. A write request may involve or include the allocation requirement that is handled in step 520, in case the write request involves writing beyond the virtual space currently allocated for the file, i.e. a previously allocated virtual allocation unit is full or cannot satisfy the write request. Alternatively, the write request is separate from the allocation requirement. Step 540 includes enabling allocating, per each of the subsequent write requests, a physical block address range in the physical storage space and enabling associating the physical block address range with a respective portion of the virtual allocation unit. The size of the portion corresponds to the write size, e.g. the size of the portion may be the same as the write size or may be rounded up to the next block boundary. The allocation of the physical block addresses and their association with the portion of the virtual allocation unit may be performed by object management system 100 or may be performed by an underlying storage system, such as block control layer 103.
In case the association of the portion and the physical block addresses is performed by another entity, such as block control layer 103, the enabling of the association (performed by object management system 100) includes providing at least the start virtual address of the portion and the write size. Block control layer 103 is configured to handle a mapping data structure for associating virtual addresses and physical addresses.
Note that multiple subsequent write requests (including the write request that triggered the allocation requirement), related to the file, can be served by the same virtual allocation unit that was allocated in step 520, such that per each of the write requests, the physical block address range allocated in the physical storage space is associated with a respective portion of the virtual allocation unit. Each physical block address range is associated with a different portion of the virtual allocation unit, as illustrated in
a illustrates a method 500′ that is performed by storage system 150. Steps 510-535 of method 500′ are identical to the respective steps in method 500 of
Since the reference to the volume location is the reference to the virtual allocation unit and the allocation unit is substantially large for large files, the number of extent entries is reduced, as an entry is created only upon allocation of virtual allocation unit, which in turn serves multiple write requests.
The presently disclosed subject matter further contemplates a machine-readable storage device tangibly embodying a program of instructions executable by the machine for executing the method of the presently disclosed subject matter.
It is to be understood that the presently disclosed subject matter is not limited in its application to the details set forth in the description contained herein or illustrated in the drawings. The presently disclosed subject matter is capable of other embodiments and of being practiced and carried out in various ways. Hence, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception upon which this disclosure is based may readily be utilized as a basis for designing other structures, methods, and systems for carrying out the several purposes of the present presently disclosed subject matter.
It will also be understood that the system according to the presently disclosed subject matter may be a suitably programmed computer. Likewise, the presently disclosed subject matter contemplates a computer program being readable by a computer for executing the method of the presently disclosed subject matter. The presently disclosed subject matter further contemplates a machine-readable memory tangibly embodying a program of instructions executable by the machine for executing the method of the presently disclosed subject matter.