This application is a National Stage application under 35 U.S.C. § 371 of International Application No. PCT/JP2019/030082, having an International Filing Date of Jul. 31, 2019. The disclosure of the prior application is considered part of the disclosure of this application, and is incorporated in its entirety into this application.
The present invention relates to a technique for optimizing data storage and operation by arranging data stored in a plurality of data centers based on conditions determined by a business operator.
A conventional hierarchical storage management system has large-capacity storage configured by using SSDs, HDDs, and magnetic tapes in accordance with the number of reference counts made to stored data and the access speed (write, read) of a storage medium. Data having the high number of reference counts is automatically stored in an SSD to achieve a higher access speed (Non Patent Literature 1).
In addition, there is a content distribution system that can shorten download time by providing a content cache server at the boundary between a user area and a public network and downloading a content from the cache server close to an accessing user (Non Patent Literature 2).
Furthermore, thin clients, software-defined storage employing virtualization, etc. have become widespread, and data is managed in data centers without ensuring storage in client terminals of the users (Non Patent Literature 3).
[NPL 1] https://www.atmarkit.co.jp/ait/articles/1106/27/news109.html
[NPL 2] https://blog.redbox.ne.jp/what-is-cdn.html
[NPL 3] https://www.atmarkit.co.jp/ait/articles/1409/29/news130.htm
In recent year, instead of holding and managing software, data, etc. in computer hardware, users have come to manage software, data, etc. on a server in a data center connected to a network, and with the increasing capacity and speed of communication networks, the spread of SNS, the revision of the Electronic Document Law, etc., there has been an increasing demand for storing large volumes of data with different purposes on the network.
Accordingly, a large amount of data of various sizes is stored in the data center for an extended period of time. Among various types of data, there is data for which delay is not allowed, data for which delay is allowed but power consumption and costs need to be reduced, etc. However, to store each of the data having such various operation policies in an appropriate storage medium, the processing needs to be performed manually in the prior art, and thus takes a great deal of time and effort.
The present invention has been made with the foregoing in view, and it is an object to provide a technique capable of automatically selecting a storage medium that matches an operation policy of data from a plurality of storage media disposed in a plurality of data centers.
According to the disclosed technique, there is provided a hierarchical storage management system including: a hierarchical storage that is provided in an individual data center and has at least one storage medium; and a hierarchical storage control apparatus that manages at least one hierarchical storage, wherein the hierarchical storage control apparatus includes a calculation unit that performs processing for obtaining, for individual data managed by the hierarchical storage control apparatus, a storage medium in a data center that satisfies an operation policy by calculating power consumption needed for storing the data, a cost needed for storing the data, and communication time for transferring the data from a data center to a reference source area and by comparing the calculated power consumption, cost, and communication time with the operation policy set for the data.
According to the disclosed technique, there is provided a technique capable of automatically selecting a storage medium that matches an operation policy of data from a plurality of storage media disposed in a plurality of data centers.
Hereinafter, an embodiment of the present invention (the present embodiment) will be described with reference to the drawings. The embodiment described below is merely an example. An embodiment to which the present invention is applied is not limited to the following embodiment.
The present embodiment describes a technique for automatically selecting, for individual data to be stored in a plurality of data centers, a storage medium that matches an operation policy of the data, by referring to the location conditions (construction cost, electricity charges), the data reference frequency and the communication speed of the network, and the type of the storage medium storing the data and the installation location of the storage medium. This technique reduces unnecessary power consumption and a cost and contributes to reductions of the power consumption (improvement in energy-saving properties) and the cost in a cloud-type data center and a virtualized NW as well as to improvement of QoS. Hereinafter, the technique will be specifically described.
(Overall System Configuration)
As illustrated in
The hierarchical storage 40 in the plurality of data centers 30 and the hierarchical storage control apparatus 20 are connected via at least one network 10 so that large-scale storage can be provided.
For example, a storage medium of the hierarchical storage 40 in the data center 30 located near an urban area where the land price is high is configured mainly by a high-speed storage medium such as an SSD and stores data that has a high reference frequency and requires a short delay time. In contrast, a storage medium of the hierarchical storage 40 in the data center 30 located in a suburban area where the land price is low is configured mainly by a plurality of storage media with a low speed such as a magnetic tape to achieve an ultra-high capacity and stores data that has a low reference frequency and allows delay.
When the user 50 refers to data, the data is downloaded from the hierarchical storage 40 in which the data is stored to the user 50.
(Configuration of Hierarchical Storage 40)
The individual storage medium 420 is, for example, an SSD (flash memory), an HDD (magnetic disk), an optical disk, a magnetic tape, or the like. The management unit 430 checks the input and output of data and detects a reference source area and the number of cumulative reference counts when stored data is referred to. The detected information is notified to the hierarchical storage control apparatus 20 and managed therein.
(Configuration of Hierarchical Storage Control Apparatus 20)
The storage unit 220 stores a data center information table 2210, a storage medium information table 2220, a transmission line information table 2230, a calculation interval table 2240, an execution log table 2250, an operation policy table 2260, and a stored data management table 2270.
The timer 230 holds current date and time. The content of each table and the content of calculation performed by the calculation unit 210 will be described below.
(Hardware Configuration Example)
The functions of the hierarchical storage control apparatus 20 can be implemented, for example, by causing a computer to execute a program.
That is, the functions of the hierarchical storage control apparatus 20 can be implemented by executing a program corresponding to processing performed by the hierarchical storage control apparatus 20 by using hardware resources such as a CPU and a memory built in a computer. The above program can be recorded on a computer-readable recording medium (portable memory or the like) to be stored or distributed. In addition, the above program can be provided through a network such as the Internet or e-mail.
The program for implementing the processing by the computer is provided, for example, by a recording medium 1001 such as a CD-ROM, a memory card, or the like. When the recording medium 1001 storing the program is set in the drive device 1000, the program is installed in the auxiliary storage device 1002 from the recording medium 1001 via the drive device 1000. However, the program does not necessarily need to be installed from the recording medium 1001 and may be downloaded from another computer via the network. The auxiliary storage device 1002 stores the installed program and also stores necessary files, data, etc.
When the program is instructed to start, the memory device 1003 reads and stores the program from the auxiliary storage device 1002. The CPU 1004 implements the functions of the hierarchical storage control apparatus 20 in accordance with the program stored in the memory device 1003. The interface device 1005 is used as an interface for connecting to a network and functions as input means and output means via the network. The display device 1006 displays a GUI (graphical user interface) or the like in accordance with the program. The input device 157 includes a keyboard, a mouse, buttons, a touch panel, or the like and is used to input various operation instructions.
(Description of Tables)
Next, the tables stored in the storage unit 220 of the hierarchical storage control apparatus 20 will be described.
(Processing Operation of Hierarchical Storage Control Apparatus 20)
Hereinafter, the details of the calculation processing performed by the calculation unit 210 of the hierarchical storage control apparatus 20 will be described with reference to a flowchart in
The calculation unit 210 compares the latest calculation execution date and time in the execution log table 2250 with the date and time of the timer 230, and when the calculation interval stored in the calculation interval table 2240 has elapsed, the calculation unit 210 starts a calculation. In addition, the calculation unit 210 stores the data and time when the calculation is started in the execution log table 2250.
In S1 (step 1) in
PUyear=Tread×Fread×Pread+(8760−Tread×Fread)×Pidle formula (1)
PUyear: annual data storage power consumption
Tread: reading time
Fread: reference frequency
Pread: power consumption during reading
Pidle: power consumption during standby
Cyear=PUyear×Chargepower+(Chargefoorprint×Sizedata)÷Densitystorage+(Chargemedia×Sizedata)÷(Capacitymedia×Lifetimemedia) formula (2)
Cyear: annual data storage cost
PUyear: data storage power consumption
Chargepower: electricity charge
Chargefoorprint: space cost
Sizedata: data size
Densitystorage: storage medium recording density
Chargemedia: unit price of the storage medium
Capacitymedia: capacity of the storage medium
Lifetimemedia: lifetime of the storage medium
TDL=Tread+T1 formula (3)
TEL: data download time from the data center to the reference source
Tread: reading time
T1: communication speed (communication time) of the NW
In S2, the calculation unit 210 stores the annual power consumption, the annual cost, and the communication speed from the data center to the most frequent reference source area, which have been calculated in S1, in the stored data management table 2270 per data.
In the present example, first, the power consumption, the cost, and the communication speed are calculated for each of all the data, and subsequently, determination, etc. in S3, which will be described below, are performed. Alternatively, however, repetitive processing of “calculation, determination, change” (until the operation policy is satisfied) per data may be performed.
In S3, the calculation unit 210 compares the resultant values (the annual power consumption, the annual storage cost, and the communication speed) calculated in S1 with values set for the policy number corresponding to the data in the operation policy table 2260 per data and determines whether all the values satisfy the corresponding values of the operation policy. When all the values of all the data satisfy the corresponding values in the respective operation policies, the processing ends.
When there is one or more data having the value that does not satisfy the value of the corresponding operation policy, the processing of S4 through S8 is performed on each of the one or more data.
In S4, the data center base and the storage medium corresponding to the data are changed. Alternatively, the data center base may not be changed, and only the storage medium may be changed. How the change is performed is not particularly limited. For example, the change may be made by increasing (or decreasing) the data center number/storage medium number. After the change has been made, the calculation is performed on the assumption that the data is stored in a changed storage medium.
The processing of S4 through S7 is then repeated until the determination in S7 (the same determination as in S4) becomes Yes. The content of the calculation in S5 is the same as that in S1.
When the determination in S7 becomes Yes (when the operation policy is satisfied), the calculation unit 210 transfers the data to the storage medium of the data center at that time. The transfer of data from one data center to another data center can be implemented by instructing the management unit 430 of the hierarchical storage 40 in the relevant data center.
Hereinafter, an example will be described as a more specific example.
In the present example, with respect to the South Kanto, the South Kanto and Joshinetsu are connected by a dedicated line, and the South Kanto and Hokkaido are also connected by a dedicated line. The South Kanto and the North Kanto are connected via a public network, instead of a dedicated line. Alternatively, the data centers may be directly connected to one another by a dedicated line or may be connected via a public network.
For example, as an operation policy assuming data for which latency is not allowed, achieving low delay regardless of the power consumption or the cost is set as a condition of policy number “1”. In addition, for example, as an operation policy assuming a large amount of data having a low reference frequency, having the smallest cost is set as a condition of policy number “3”.
Hereinafter, an example of the detailed processing performed by the calculation unit 210 of the present example will be described.
First, it is assumed that data is uploaded as illustrated in the stored data management table 2270 in
The users who have uploaded the above data have set policy numbers 5, 4, and 1 to the data number 1, 2, and 3, respectively. In addition, the number of reference counts and the most frequent reference source areas at the time when the first calculation is performed are as illustrated in
The calculation described with reference to the flowchart in
The calculation unit 210 calculates the annual power consumption needed for storing the data of the data number “2” by using the formula (1).
With regard to the data of the data number “2” stored in the storage medium of the storage medium number 11, Tread (a single data reading time (h/time)) is (1 T[B]/200 M[B/s])=3600 based on
Further, based on
Further, the calculation unit 210 calculates the annual cost needed for storing the data of the data number “2” by using the formula (2).
Since Chargepower (electricity charge) is 20 [yen/Wh], PUyear×Chargepower=730×20=14600 [yen].
In the present example, since Chargefoorprint (space cost) is 2,000,000/2, and Densitystorage (storage medium recording density) is 60 T, (Chargefoorprint×Sizedata)÷Densitystorage=2,000,000/2×1/60=16667 [yen].
Further, since Chargemedia (unit price of the storage medium)=5,000,000, Capacitymedia (capacity of the storage medium)=60, and Lifetimemedia (lifetime of the storage medium)=4, (Chargemedia×Sizedata)÷(Capacitymedia×Lifetimemedia)=5,000,000×1/(60×4)=20833 [yen].
Therefore, by summing up these resultant values, Cyear=52100 [yen] is obtained.
Further, the calculation unit 210 calculates the communication speed (communication time) from the data center storing the data to the most frequent reference source area by using the formula (3).
In the present example, Since Tread (reading time)=1 TB/200 MB, and T1 (communication time of the NW)=0, TDL=5000 S.
The calculation unit 210 stores the annual power consumption, the annual cost, and the communication speed from the corresponding data center to the most frequent reference source area, which have been calculated as described above, in the stored data management table 2270.
The calculation unit 210 acquires the communication speed, the power consumption, and the cost corresponding to the policy number “4” set for the data of the data number 2 from the operation policy table 2260. The calculation unit 210 compares these acquired values with the communication speed, the power consumption, and the cost calculated above.
Since the communication speed, the power consumption, and the cost all exceed the threshold values of the policy number “4”, the calculation unit 210 determines that the storage medium HDD “11” currently storing the data does not satisfy the operation policy. Thus, the calculation unit 210 changes the storage medium and performs the calculation, and the calculation is continued until the values are within the range of the policy number “4”.
According to the technique in the present embodiment described above, when certain data is stored in any one of the plurality of data centers, the data center and the type of storage medium can be automatically selected for the data to be stored in accordance with the communication speed between the data centers and the power consumption and the cost needed for storing the data set in advance by the business operator. As a result, the power consumption and the cost can be reduced, and this leads to reductions of the electricity charge and environmental load as well as an improvement in QoS.
The present description discloses at least the hierarchical storage management system, the hierarchical storage control apparatus, the hierarchical storage management method, and the program in the following items.
(Item 1)
A hierarchical storage management system, including: a hierarchical storage that is provided in an individual data center and has at least one storage medium; and a hierarchical storage control apparatus that manages at least one hierarchical storage, wherein the hierarchical storage control apparatus includes a calculation unit that performs processing for obtaining, for individual data managed by the hierarchical storage control apparatus, a storage medium in a data center that satisfies an operation policy by calculating power consumption needed for storing the data, a cost needed for storing the data, and communication time for transferring the data from a data center to a reference source area and by comparing the calculated power consumption, cost, and communication time with the operation policy set for the data.
(Item 2)
The hierarchical storage management system according to item 1, wherein, when the calculation unit determines that the calculated power consumption, cost, and communication time do not satisfy the operation policy set for the data, the calculation unit changes a storage medium storing the data and performs the processing on an assumption that the data is stored in a changed storage medium.
(Item 3)
The hierarchical storage management system according to item 1 or 2, wherein the calculation unit calculates power consumption needed for storing the data by calculating a sum of power consumption for data reading from the storage medium storing the data and power consumption during standby,
calculates a cost needed for storing the data by calculating a sum of a cost of power consumption of the storage medium, a cost of installing the storage medium, and a cost of acquiring the storage medium, and
calculates communication time for transferring the data from a data center to a reference source area based on a reading speed of the storage medium and a communication speed of a transmission line between the data center in which the storage medium is installed and the reference source area.
(Item 4)
A hierarchical storage management method used in a hierarchical storage management system including: a hierarchical storage that is provided in an individual data center and has at least one storage medium; and a hierarchical storage control apparatus that manages at least one hierarchical storage,
wherein the hierarchical storage control apparatus performs processing for obtaining, for individual data managed by the hierarchical storage control apparatus, a storage medium in a data center that satisfies an operation policy by calculating power consumption needed for storing the data, a cost needed for storing the data, and communication time for transferring the data from a data center to a reference source area and by comparing the calculated power consumption, cost, and communication time with the operation policy set for the data.
(Item 5)
A hierarchical storage control apparatus used in a hierarchical storage management system including: a hierarchical storage that is provided in an individual data center and has at least one storage medium; and a hierarchical storage control apparatus that manages at least one hierarchical storage,
wherein the hierarchical storage control apparatus includes a calculation unit that performs processing for obtaining, for individual data managed by the hierarchical storage control apparatus, a storage medium in a data center that satisfies an operation policy by calculating power consumption needed for storing the data, a cost needed for storing the data, and communication time for transferring the data from a data center to a reference source area and by comparing the calculated power consumption, cost, and communication time with the operation policy set for the data.
(Item 6)
A program that causes a computer to function as the calculation unit in the hierarchical storage control apparatus according to Item 5.
While the present embodiment has thus been described, the present invention is not limited to the above specific embodiment, and various variations and modifications may be made without departing from the scope of the present invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/030082 | 7/31/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2021/019746 | 2/4/2021 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6330572 | Sitka | Dec 2001 | B1 |
20020008250 | Terzioglu | Jan 2002 | A1 |
20050033757 | Greenblatt | Feb 2005 | A1 |
20050055519 | Stuart | Mar 2005 | A1 |
20050246386 | Sullivan | Nov 2005 | A1 |
20060069886 | Tulyani | Mar 2006 | A1 |
20070136397 | Pragada | Jun 2007 | A1 |
20070179990 | Zimran | Aug 2007 | A1 |
20070250838 | Belady | Oct 2007 | A1 |
20090144393 | Kudo | Jun 2009 | A1 |
20100325273 | Kudo | Dec 2010 | A1 |
20110040937 | Augenstein | Feb 2011 | A1 |
20140298349 | Jackson | Oct 2014 | A1 |
20200026784 | Miyoshi | Jan 2020 | A1 |
Entry |
---|
[No Author Listed] [online], “Part 1 Mechanism of CDN (What kind of technology can CDN do?),” Cash shop blog CDN / WEB high-speed blog, May 18, 2015, retrieved from URL <https://blog.redbox.ne.jp/what-is-cdn.html>, 29 pages (with English Translation). |
Katsurashima, “Systematic understanding of storage virtualization (4): Understanding automatic storage tiering (1/3),” ITmedia Inc., Jun. 27, 2011, retrieved from URL <https://www.atmarkit.co.jp/ait/articles/1106/27/news109.html>, 7 pages (with English Translation). |
Miki et al., “Basic knowledge of storage in the “offensive IT” era, 1st What is Software Defined Storage?” ITmedia Inc., Sep. 29, 2014, retrieved from URL <https://atmarkit.itmedia.co.jp/ait/articles/1409/29/news130.html>, 9 pages (with English Translation). |
Number | Date | Country | |
---|---|---|---|
20220222220 A1 | Jul 2022 | US |