Computing devices generate, use, and store data. The data may be, for example, images, document, webpages, or meta-data associated with any of the files. The data may be stored locally on a persistent storage of a computing device and/or may be stored remotely on a persistent storage of another computing device.
In one aspect, a data storage device in accordance with one or more embodiments of the invention includes virtual storage devices, hosted by physical storage devices, and a processor. The processor obtains a data storage request, divides a file specified by the data storage request into blocks, and stores the blocks in the virtual storage devices based on an input output (IO) limitation of the virtual storage devices.
In one aspect, a method of operating a data storage device in accordance with one or more embodiments of the invention includes obtaining, by the data storage device, a data storage request. The method includes dividing, by the data storage device, a file specified by the data storage request into blocks. The method includes storing, by the data storage device, the blocks in virtual storage devices of the data storage device based on an input output (IO) limitation of the virtual storage devices.
In one aspect, a non-transitory computer readable medium in accordance with one or more embodiments of the invention includes computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for operating a data storage device, the method obtaining, by the data storage device, a data storage request. The method includes dividing, by the data storage device, a file specified by the data storage request into blocks. The method includes storing, by the data storage device, the blocks in virtual storage devices of the data storage device based on an input output (IO) limitation of the virtual storage devices.
Certain embodiments of the invention will be described with reference to the accompanying drawings. However, the accompanying drawings illustrate only certain aspects or implementations of the invention by way of example and are not meant to limit the scope of the claims.
Specific embodiments will now be described with reference to the accompanying figures. In the following description, numerous details are set forth as examples of the invention. It will be understood by those skilled in the art that one or more embodiments of the present invention may be practiced without these specific details and that numerous variations or modifications may be possible without departing from the scope of the invention. Certain details known to those of ordinary skill in the art are omitted to avoid obscuring the description.
In the following description of the figures, any component described with regard to a figure, in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments of the invention, any description of the components of a figure is to be interpreted as an optional embodiment, which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.
In general, embodiments of the invention relate to devices, methods, and systems for load balancing data storage. In one or more embodiments of the invention, a data storage device may include a number of virtual storage devices. Each of the virtual storage devices may utilize physical storage devices that have limited input-output (IO) capacity. Some of the virtual storage devices may utilize the same physical storage device, e.g., share the physical storage device.
As used herein, a virtual storage device hosted by a physical storage device means that the virtual storage device utilizes at least a portion of the data storage capacity of the physical storage device. A virtual storage device may be hosted by a single physical storage device or may be hosted by multiple physical storage devices.
In one or more embodiments of the invention, the data storage device may group virtual storage devices based on shared physical storage devices. For example, two virtual storage devices that share a physical storage device may be grouped together into an IO bound storage group.
In one or more embodiments of the invention, the data storage device may distribute storage based on the groupings of virtual storage devices. For example, when a data storage request is received, the data storage device may divide a to-be-stored file into a number of blocks and distribute each of the blocks to different groupings of virtual storage devices. By distributing the blocks to different groups, rather than different virtual storage devices, each block will be stored using different physical storage devices and thereby load balance data storage across the physical storage devices of the data storage system.
The data storage device (100) may be a computing device. The computing device may be a physical device. The computing devices may be, for example, a mobile phone, a tablet computer, a laptop computer, a desktop computer, or a server. The computing device may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device to perform the functions described in this application and illustrated in at least
The data storage device (100) may be a cloud resource. The cloud resource may be a logical computing device that utilizes the physical resources of multiple computing devices. Each of the utilized computing devices may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device to perform the functions described in this application and illustrated in at least
The data storage device may include a file block allocator (110), IO bound storage groups (120), IO bound storage group utilization heuristics (130), virtual storage devices (140), and local physical storage devices (160). Additionally, the data storage device may be operably connected to remote storage devices (180). Each component of the data storage device (100) is described below.
The data storage device (100) may include a file block allocator (110). The file block allocator (110) may divide data, e.g., files, into multiple blocks and distribute the blocks to storage devices of the data storage device (100) for storage. Distributing the blocks may load balance data storage. Load balancing the data storage may improve the operation of the data storage device by increasing the rate of data storage when compared to other data storage devices that do not load balance data storage.
In one or more embodiments of the invention, the file block allocator may be a physical device. The physical device may include circuitry. The physical device may be, for example, a field programmable gate array, an application specific integrated circuit, a digital signal processor, or any other type of physical computing device. The physical device may be configured to perform the functions described in this application and, more specifically, the methods illustrated in at least
In one or more embodiments of the invention, the file block allocator may be implemented as computer instructions, e.g., computer code, stored on a non-transitory computer readable storage medium operably connected to a processor. The computer instructions, when executed by the processor, may cause the processor to perform the functions described in this application and, more specifically, the methods illustrated in at least
The data storage device (100) may include IO bound storage groups (120). The IO bound storage groups (120) may be a data structure that specifies groupings of virtual storage devices (140) that are IO bound. In other words, the groupings may specify virtual storage devices (140) that limit the storage performance rate of other virtual storage devices of the group.
For example, two virtual storage devices that utilize the same physical storage device are only be capable of sharing the data storage rate of the share physical storage device. When the two virtual storage devices store data at different times, each virtual storage device may utilize the full data storage rate of the shared physical storage device. However, when both of the two virtual storage devices store data at the same time, each virtual storage device is only allocated a portion of the full data storage rate of the shared physical storage device. Thus, the data storage rate of either virtual storage device depends on whether the other data storage device is utilizing the shared physical storage device at the same time.
In one or more embodiments of the invention, the IO bound storage groups (120) are stored in the data storage device (100). The IO bound storage groups (120) may be stored in a persistent storage of the data storage device (100). The IO bound storage groups (120) may be stored in a memory of the data storage device (100).
While illustrated as being stored in the data storage device (100) in
For additional details regarding the IO bound storage groups (120), See
The data storage device (100) may include IO bound storage group utilization heuristics (130). The IO bound storage group utilization heuristics (130) may be a data structure that specifies utilization of the IO bound storage groups (120). In other words, the IO bound storage group utilization heuristics (130) may specify the relative utilization rate of each group specified by the IO bound storage groups (120).
For example, as data is stored in the data storage device the frequency of storing blocks to each group of the IO bound storage groups (120) may be recorded over time. Groups that are utilized at a lower rate than other groups due to, for example, data storage request patterns from clients, may be identified. The IO bound storage group utilization heuristics (130) may be used as part of the method illustrated in
In one or more embodiments of the invention, the IO bound storage group utilization heuristics (130) are stored in the data storage device (100). The IO bound storage group utilization heuristics (130) may be stored in a persistent storage of the data storage device (100). The IO bound storage group utilization heuristics (130) may be stored in a memory of the data storage device (100).
While illustrated as being stored in the data storage device (100) in
The data storage device (100) may include virtual storage devices (140). The virtual storage devices (140) may be logical data storages that utilize the local physical storage devices (160) and/or the remote storage devices (180). More specifically, the virtual storage devices (140) may abstract one or more, or a portion thereof, of the local physical storage devices (160) and/or the remote storage devices (180) and present a logical data storage that utilizes the aforementioned local and/or remove storage devices. For additional details regarding the relationships between the virtual storage devices (140), local physical storage devices (160) and the remote storage devices (180), See
Each of the virtual storage devices (140) may be capable of storing data. More specifically, each of the virtual storage devices (140) may be capable of storing blocks of data. As will be discussed with respect to
The data storage device (100) may include local physical storage devices (160). The local physical storage devices (160) may be physical devices for storing data. The physical devices may be, for example, hard disk drives, solid state drives, tape drives, zip drives, or any other type of persistent storage device. Each of the physical devices may include circuitry for receiving data storage requests. The data storage requests may be provided to the physical devices by a processor of the data storage device.
The data storage device (100) may include remote storage devices (180). The remote storage devices (180) may be physical storage devices or logical storages for storing data. The remote storage devices (180) may be operably connected to the data storage device (100). The data storage device (100) may store data in the remote storage devices (180) or read data stored in the remote storage devices (180) via the operable connection.
Returning to
In one or more embodiments of the invention, Each IO bound storage group is specified by a user of the data storage device. For example, whenever a new virtual storage device is added to the data storage system, the user may add the new virtual storage device to an existing IO bound storage group or may create a new IO bound storage group.
In one or more embodiments of the invention, Each IO bound storage group generated by performing a speed test of each virtual storage device. For example, whenever a new virtual storage device is added to the data storage device, the data storage device may measure the data storage rate of the new virtual storage device while not storing data to any other virtual storage device. The data storage device may then sequentially measure the data storage rate of the new virtual storage device while storing data to the new virtual storage device and each existing virtual storage device, respectively. If the measured data storage rate of the new virtual storage device changes between the measurements, the data storage device may add the new virtual storage device to an IO bound storage group that includes the existing virtual storage device to which data was being stored concurrently with the new virtual storage device.
In one or more embodiments of the invention, Each IO bound storage group is generated by the method shown in
In the first example relationship, the data storage rate of each virtual storage device (145, 150) depends on the other virtual storage device, i.e., whether the other virtual storage device is storing data at the same time. Thus, with respect to
In the second example relationship, the data storage rate of the virtual storage device A (145) does not depend on any other virtual storage device, i.e., the local storages are not shared with other virtual storage devices. Thus, with respect to
In the third example relationship, the data storage rate of the virtual storage device A (145) does not depend on any other virtual storage device, i.e., the local/remote storages are not shared with other virtual storage devices. Thus, with respect to
In Step 200, a data storage request is obtained. The request may be obtained by receiving the request from a client via a network connection. The request may specify a file for storage.
In Step 210, the file is divided into a number of blocks. The blocks may be chunks, i.e., bit sequences corresponding to a portion of the file, of the file.
In one or more embodiments of the invention, the file is divided into a number of blocks corresponding to the number of IO bound storage groups of the data storage device. In one or more embodiments of the invention, the file is divided into a number of blocks greater than the number of IO bound storage groups of the data storage device. The file may be divided into different numbers of blocks without departing from the invention.
In Step 220, the blocks are distributed across the virtual storage devices for storage using a distribution method. The distribution may be any of the methods illustrated in
In Step 230, the blocks distributed to each virtual storage device are stored by the respective virtual storage device.
In one or more embodiments of the invention, the blocks may be stored as each block is distributed to the respective virtual storage device in Step 220.
In one or more embodiments of the invention, the blocks are stored after all of the blocks are distributed.
In Step 300, a block of a file that has not been distributed is selected. The selection of the block may be arbitrary, e.g., randomly selected, or deterministic, e.g., based on a block identifier of a previously stored block.
In Step 310, the selected block is distributed to a virtual storage device of an IO bound storage group that is marked as available. As used herein, distributing a block to a virtual storage device means to mark, queue, or otherwise cause the block to be stored in the virtual storage device immediately or at a future point in time.
In one or more embodiments of the invention, the IO bound storage group to which the selected block is distributed is marked as unavailable after the selected block is distributed.
In Step 320, it is determined whether all of the blocks of a file are distributed. If all of the blocks are distributed, the method may end following Step 320. If all of the block are not distributed, the method may proceed to Step 330.
In Step 330, it is determined whether all of the IO bound storage groups are marked as unavailable. If all of the IO bound storage groups are marked as unavailable, the method may proceed to Step 340. If all of the IO bound storage groups are not marked as distributed, the method may proceed to Step 300.
In Step 340, all of the IO bound storage groups are marked as available. Each IO bound storage group may be marked as available by modifying an availability indicator of each group to indicate that the group is available. The method may proceed to Step 300 following Step 340.
In Step 400, a block of a file that has not been distributed is selected. The selection of the block may be arbitrary, e.g., randomly selected, or deterministic, e.g., based on a block identifier of a previously stored block.
In Step 410, the selected block is distributed to a virtual storage device of an IO bound storage group that: (i) has the lowest IO bound storage group ID and (ii) is marked as available.
In one or more embodiments of the invention, the IO bound storage group to which the selected block is distributed is marked as unavailable after the selected block is distributed.
In Step 420, it is determined whether all of the blocks of a file are distributed. If all of the blocks are distributed, the method may end following Step 420. If all of the block are not distributed, the method may proceed to Step 430.
In Step 430, it is determined whether all of the IO bound storage groups are marked as unavailable. If all of the IO bound storage groups are marked as unavailable, the method may proceed to Step 440. If all of the IO bound storage groups are not marked as distributed, the method may proceed to Step 400.
In Step 440, all of the IO bound storage groups are marked as available. Each IO bound storage group may be marked as available by modifying an availability indicator of each group to indicate that the group is available. The method may proceed to Step 400 following Step 440.
In Step 500, a block of a file that has not been distributed is selected. The selection of the block may be arbitrary, e.g., randomly selected, or deterministic, e.g., based on a block identifier of a previously stored block.
In Step 510, the selected block is distributed to a virtual storage device of an IO bound storage group that has the lowest utilization heuristic. As described with respect to
In one or more embodiments of the invention, the IO bound storage group to which the selected block is distributed is marked as unavailable after the selected block is distributed.
In Step 520, it is determined whether all of the blocks of a file are distributed. If all of the blocks are distributed, the method may end following Step 420. If all of the block are not distributed, the method may proceed to Step 500.
In Step 600, a newly added virtual storage device is identified.
In Step 610, the newly added virtual storage device may be added to an IO bound storage group.
In one or more embodiments of the invention, the newly added virtual storage device may be added: (i) to an existing IO bound storage group or (ii) to a new IO bound storage group. The newly added virtual storage device may be added (i) to an existing IO bound storage group if its data storage rate is dependent on an existing virtual storage device of an IO bound storage group. The newly added virtual storage device may be added (ii) to a new IO bound storage group if its data storage rate is independent of any existing virtual storage device.
In one or more embodiments of the invention, determination (i) or (ii) is made by a user. In other words, a user of the data storage device may elect to add the newly added virtual storage device to a new or existing IO bound storage group based on their knowledge of the virtual storage device, e.g., the physical storage devices that it utilizes.
In one or more embodiments of the invention, determination (i) or (ii) is made by the data storage device. The data storage device may make the determination by comparing (a) the data storage rate of the virtual storage device when the newly added virtual storage device is the only virtual storage device storing data to (b) the data storage rate of the virtual storage device when the newly added virtual storage device and at least one other virtual storage device is storing data. If the data storage rates of (a) and (b) are different, the data storage device may (i) add the newly added virtual storage device to an existing IO bound storage group. If the data storage rates of (a) and (b) are the same, the data storage device may (ii) at the newly added virtual storage device to a new Io bound storage group.
The method may end following Step 610.
The following are explanatory examples used to clarify aspects of the invention.
The data storage device (700) includes three virtual storage devices (711, 712, 713) and two local physical storage devices (721, 722). Each of the virtual storage devices (711, 712, 713) utilize the local physical storage devices (721, 722).
As seen from
In contrast, virtual storage device C (713) utilizes local physical storage device B (722) which is not shared with any other virtual storage device. Thus, virtual storage device C (713) is grouped into a different IO bound storage group because its data storage rate does not vary based on the data storage operations of any other virtual storage device.
Each block is distributed for storage to virtual storage devices of IO bound groups using the method illustrated in
The data storage device (800) includes four virtual storage devices (811, 812, 813, 814) and three local physical storage devices (821, 822, 823). Each of the virtual storage devices (811, 812, 813, 814) utilize the local physical storage devices (821, 822, 823).
As seen from
In contrast, virtual storage device B (812) utilizes local physical storage device B (822) which is not shared with any other virtual storage device. Thus, virtual storage device B (812) is grouped into a different IO bound storage group because its data storage rate does not vary based on the data storage operations of any other virtual storage device.
Similarly, virtual storage device D (814) utilizes local physical storage device C (823) which is not shared with any other virtual storage device. Thus, virtual storage device C (813) is grouped into a third IO bound storage group because its data storage rate does not vary based on the data storage operations of any other virtual storage device.
Each block is distributed for storage to virtual storage devices of IO bound groups using the method illustrated in
One or more embodiments of the invention may be implemented using instructions executed by one or more processors in the data storage device. Further, such instructions may correspond to computer readable instructions that are stored on one or more non-transitory computer readable mediums.
One or more embodiments of the invention may enable one or more of the following: i) distribute data storage across data storage resources of a data storage device, ii) prevent data from being queued for storage in two virtual storage devices that are data storage rate limited with each other, and iii) improve the data storage rate of a data storage device by only distributing blocks of files to virtual storage devices for storage that are not rate limited by any other virtual storage devices receiving blocks for storage.
While the invention has been described above with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
| Number | Name | Date | Kind |
|---|---|---|---|
| 7085911 | Sachedina et al. | Aug 2006 | B2 |
| 7818515 | Umbehocker | Oct 2010 | B1 |
| 8190835 | Yueh | May 2012 | B1 |
| 8396841 | Janakiraman | Mar 2013 | B1 |
| 8732403 | Nayak | May 2014 | B1 |
| 8782323 | Glikson et al. | Jul 2014 | B2 |
| 8898114 | Feathergill et al. | Nov 2014 | B1 |
| 8898120 | Efstathopoulos | Nov 2014 | B1 |
| 8904120 | Killammsetti et al. | Dec 2014 | B1 |
| 8918390 | Shilane et al. | Dec 2014 | B1 |
| 8943032 | Xu et al. | Jan 2015 | B1 |
| 8949208 | Xu et al. | Feb 2015 | B1 |
| 9183200 | Liu et al. | Nov 2015 | B1 |
| 9244623 | Bent et al. | Jan 2016 | B1 |
| 9250823 | Kamat et al. | Feb 2016 | B1 |
| 9251160 | Wartnick | Feb 2016 | B1 |
| 9280550 | Hsu et al. | Mar 2016 | B1 |
| 9298724 | Patil et al. | Mar 2016 | B1 |
| 9317218 | Botelho et al. | Apr 2016 | B1 |
| 9336143 | Wallace et al. | May 2016 | B1 |
| 9390116 | Li et al. | Jul 2016 | B1 |
| 9390281 | Whaley et al. | Jul 2016 | B2 |
| 9442671 | Zhang | Sep 2016 | B1 |
| 9830111 | Patiejunas et al. | Nov 2017 | B1 |
| 10002048 | Chennamsetty et al. | Jun 2018 | B2 |
| 10031672 | Wang et al. | Jul 2018 | B2 |
| 10102150 | Visvanathan et al. | Oct 2018 | B1 |
| 10175894 | Visvanathan et al. | Jan 2019 | B1 |
| 20030110263 | Shillo | Jun 2003 | A1 |
| 20050120058 | Nishio | Jun 2005 | A1 |
| 20050160225 | Presler-Marshall | Jul 2005 | A1 |
| 20050182906 | Chatterjee et al. | Aug 2005 | A1 |
| 20060075191 | Lolayekar | Apr 2006 | A1 |
| 20080082727 | Wang | Apr 2008 | A1 |
| 20080133446 | Dubnicki et al. | Jun 2008 | A1 |
| 20080133561 | Dubnicki et al. | Jun 2008 | A1 |
| 20080216086 | Tanaka | Sep 2008 | A1 |
| 20080244204 | Cremelie et al. | Oct 2008 | A1 |
| 20090235115 | Butlin | Sep 2009 | A1 |
| 20090271454 | Anglin et al. | Oct 2009 | A1 |
| 20100049735 | Hsu | Feb 2010 | A1 |
| 20100094817 | Ben-Shaul et al. | Apr 2010 | A1 |
| 20100250858 | Cremelie et al. | Sep 2010 | A1 |
| 20110055471 | Thatcher et al. | Mar 2011 | A1 |
| 20110099351 | Condict | Apr 2011 | A1 |
| 20110161557 | Haines et al. | Jun 2011 | A1 |
| 20110185149 | Gruhl et al. | Jul 2011 | A1 |
| 20110196869 | Patterson et al. | Aug 2011 | A1 |
| 20110231594 | Sugimoto et al. | Sep 2011 | A1 |
| 20120158670 | Sharma et al. | Jun 2012 | A1 |
| 20120278511 | Alatorre | Nov 2012 | A1 |
| 20130060739 | Kalch et al. | Mar 2013 | A1 |
| 20130111262 | Taylor et al. | May 2013 | A1 |
| 20130138620 | Yakushev et al. | May 2013 | A1 |
| 20140012822 | Sachedina et al. | Jan 2014 | A1 |
| 20140258248 | Lambright et al. | Sep 2014 | A1 |
| 20140258824 | Khosla et al. | Sep 2014 | A1 |
| 20140281215 | Chen | Sep 2014 | A1 |
| 20140310476 | Kruus | Oct 2014 | A1 |
| 20150106345 | Trimble et al. | Apr 2015 | A1 |
| 20150331622 | Chiu et al. | Nov 2015 | A1 |
| 20160026652 | Zheng | Jan 2016 | A1 |
| 20160112475 | Lawson et al. | Apr 2016 | A1 |
| 20160188589 | Guilford et al. | Jun 2016 | A1 |
| 20160224274 | Kato | Aug 2016 | A1 |
| 20160239222 | Shetty et al. | Aug 2016 | A1 |
| 20160323367 | Murtha et al. | Nov 2016 | A1 |
| 20160342338 | Wang | Nov 2016 | A1 |
| 20170093961 | Pacella et al. | Mar 2017 | A1 |
| 20170220281 | Gupta et al. | Aug 2017 | A1 |
| 20170300424 | Beaverson et al. | Oct 2017 | A1 |
| 20170359411 | Bums et al. | Dec 2017 | A1 |
| 20180089037 | Liu et al. | Mar 2018 | A1 |
| 20180146068 | Johnston et al. | May 2018 | A1 |
| 20180322062 | Watkins et al. | Nov 2018 | A1 |
| Number | Date | Country |
|---|---|---|
| 2738665 | Jun 2014 | EP |
| 2013056220 | Apr 2013 | WO |
| 2013115822 | Aug 2013 | WO |
| 2014185918 | Nov 2014 | WO |
| Entry |
|---|
| Deepavali Bhagwat et al.; “Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup”; IEEE MASCOTS; Sep. 2009 (10 pages). |
| Mark Lillibridge et al.; “Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality”; 7th USENIX Conference on File and Storage Technologies, USENIX Association; pp. 111-pp. 123; 2009 (13 pages). |
| Extended European Search Report issued in corresponding European Application No. 18185076.9, dated Dec. 7, 2018 (9 pages). |
| Lei Xu et al.; “SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Center”; 2011 31st International Conference on Distributed Computing Systems Workshops (ICDCSW); IEEE Computer Society; pp. 31-65; 2011 (5 pages). |
| International Search Report and Written Opinion issued in corresponding PCT Application PCT/US2018/027646, dated Jul. 27, 2018. (30 pages). |
| Extended European Search Report issued in corresponding European Application No. 18184842.5, dated Sep. 19, 2018. |
| Jaehong Min et al.; “Efficient Deduplication Techniques for Modern Backup Operation”; IEEE Transactions on Computers; vol. 60, No. 6; pp. 824-840; Jun. 2011. |
| Daehee Kim et al.; “Existing Deduplication Techniques”; Data Depublication for Data Optimization for Storage and Network Systems; Springer International Publishing; DOI: 10.1007/978-3-319-42280-0_2; pp. 23-76; Sep. 2016. |
| International Search Report and Written Opinion issued in corresponding WO application No. PCT/US2018/027642, dated Jun. 7, 2018 (15 pages). |