The present invention relates to a storage system that adopts an art of controlling the load distribution thereof by dividing the logical volume in logical units and migrating the load units depending on the amount of load thereof.
Storage systems are equipped with a snapshot function and other functions for enhancing the convenience of the system. By using the snapshot function, it becomes possible to create a snapshot which is a still image of the data in the storage system in operation taken at some point of time, and to maintain the same. Therefore, if the data of the storage system in operation is destroyed, the data at the point of time of creation of the snapshot can be restored.
Further, the snapshot volume created via the snapshot function is a logical copy of the original volume, so the snapshot volume consumes only the capacity corresponding to differential data from the original volume. Therefore, the snapshot function realizes an efficient backup of the storage system.
Recently, along with the advancement of server virtualization technique and desktop virtualization technique, a new use of the snapshot function for providing a snapshot volume created via the storage system to the host computer or virtual machines (VM) is considered. For example, if the data of the OS (Operating System) is stored in the original volume of the storage system, and a snapshot volume of the OS is created, it becomes possible to create a logical copy of the OS. By providing to the host a snapshot volume including the copied OS, it becomes possible to provide a large amount of servers and desktops to the host while consuming only a small amount of capacity. On the other hand, an art for acquiring a writable snapshot in the file system is known (patent literature 1). A snapshot is a still image of data taken at a certain point of time.
The object of the present invention is to provide a storage system having a logical volume divided into logical units (such as 64-kilobyte logical page units), wherein the load information of respective logical pages is acquired and the data in the logical pages are migrated to other volumes based on the load information, so as to prevent the deterioration of performance.
Recently, a new method of use of the snapshot function which is a function for acquiring a logical backup consuming only a small amount of capacity have been proposed. Actually, the new method of use relates to logically copying original data such as operating systems (OS) or application programs (AP) via the snapshot function, and to provide the original data of the copied OS and AP to the virtual machines (VM).
The characteristic features of the above use enables to create, operate and manage a large amount of virtual machines at high speed. This attempt is effective since it consumes only a small amount of capacity, but if I/O load concentrates to the storage system such as when a large number of VMs are started simultaneously, the performance of the host is deteriorated. This problem is caused by the system of the snapshot function. In other words, the snapshot function is only capable of creating a logical backup, and the original data is not necessarily copied to another volume, wherein specific data are shared among a large amount of snapshot volumes. In other words, in order for the large number of VMs to share a specific data in the original volume, when a large amount of VMs issue I/Os simultaneously, the original volume receives concentrated load.
In order to solve the above-mentioned problem, the present invention provides a storage system having a logical volume divided into predetermined units, wherein the load information of each predetermined unit of volumes is acquired and the predetermined units are migrated to other volumes based on the load information.
That is, if a snapshot virtual volume (V-VOL) is provided as OS image of the virtual machine (VM) to the host, a large number of V-VOLs are mapped to a single logical volume. Therefore, if a single VM utilizes a single V-VOL and the VMs are started all at once, burdensome CoW accesses casing a high I/O load concentrates in the storage system, and the starting time of the VMs are elongated. Therefore, the present system measures the I/O pattern (number of IOs per unit time during read/write accesses) during starting of the VMs for each logical page unit prior to having the VMs started all at once, and based on the measurement results, performs the saving and copying of the page to which the write access occurs to the snapshot pool prior to starting the VMs.
In further detail, the present invention provides a storage system coupled to a host computer, comprising a plurality of storage devices, and a controller for providing storage areas of the plurality of storage devices as logical volumes to the host computer, wherein a data shared among a plurality of virtual machines operating in the host computer is stored in one of said logical volumes, wherein the controller specifies an area within said one logical volume receiving a write request during starting of the virtual machines, creates one or more virtual volumes and sets a reference destination of the virtual volume to said one logical volume, copies the data stored in the specified area to another area of the storage device and changes the reference destination of the virtual volume referring to said area to the copy destination, maps the respective one or more virtual volumes to one of the plurality of virtual machines, and starts the plurality of virtual machines, wherein a data write request to a shared data having been copied is written into the copy destination that the virtual volume mapped to the virtual machine refers to.
The present invention enables to realize reduction of the number of CoW accesses causing a heavy access load to the system and load dispersion due to a preliminary saving process for performing the saving and copying of data in a storage area to which the load concentrates to a snapshot pool prior to starting the VM based on the load information, according to which the VM starting time is shortened and the pool capacity can be used effectively.
Now, one example of the preferred embodiments of the present invention will be described with reference to the drawings. In the present embodiments, the portions having the same structural units and denoted by the same reference numbers basically perform the same operations, so the detailed descriptions thereof are omitted.
In the following description, the information according to the present invention is described by using the term “information”, but the information can also be expressed by other expressions and data structures, such as “table”, “list”, “DB (database)” and “queue”. Upon describing the contents of the respective information, expressions such as “identification information”, “identifier”, “name” and “ID” can be used, wherein these expressions are replaceable.
In the following description, sometimes the term “program” is used as the subject for describing the invention. The “program” is executed by a processor to perform a determined process using a memory and a communication port (communication control unit), so that the term “processor” can also be used as the subject in the description. Further, the processes disclosed using a program as the subject can also be performed as a process executed via a computer or an information processing apparatus such as a management server. A portion or all of the program can be realized via a dedicated hardware, or can be formed into a module. Various programs can be installed to respective computers via a program distribution server or a storage media.
Now, the first embodiment of the present invention will be described with reference to
Physically, the cache memory 105 can be the same memory as the main memory 104. The main memory 104 includes a control program and various management information. Although not shown, the control program is a software that interprets an I/O (Input/Output) request command issued by the host computer 10 to control the internal processing of the storage system 100 such as reading and writing of data. The control program includes functions for enhancing the convenience of the storage system 100 (including snapshots and dynamic provisioning). The management information will be described in detail later.
The host computer 10 recognizes the storage area assigned from the storage system 100 as a single storage device (volume). Typically, the volume is a single logical volume 111, but the volume can be composed of a plurality of logical volumes 111, or can be a thin provisioning volume as described in detail later. Although not shown, the logical volume 111 can be composed of a large number of storage media. Various kinds of storage media can exist in a mixture, such as HDDs (Hard Disk Drives) and SSDs (Solid State Drives). The storage system 100 can be equipped with a plurality of RAID groups in which storage media are formed into groups via RAID arrangement. By defining a plurality of logical volumes 111 via a single RAID group, the storage system 100 can use various logical volumes 111 with respect to the host computer 10.
Normally, logical volumes 111 are composed of a redundant structure formed by arranging HDDs and other nonvolatile storage media in a RAID (Redundant Array of Independent Disks) arrangement, but the present invention is not restricted to such arrangement, and other arrangements can be adopted as long as data can be stored thereto. The logical volumes 111 can store various management information other than user data that the storage system 100 stores. In the present invention, the logical volume is also simply called LU (logical Unit).
The main memory 104 stores various management information mentioned later. The storage system 100 also has a load monitoring function for managing the statuses of load of the host interface port 102, the processor 103, the cache memory 105 and the logical volume 111 included in its own system
The P-VOL 201 is a source volume for acquiring a snapshot. The P-VOL stores the original data. Normally, the P-VOL is the logical volume 111. The V-VOL 202 is a snapshot volume created from the P-VOL 201. As shown in
The V-VOL 202 is a virtual volume that the storage system 100 has. The system of V-VOL 202 will now be briefly described. The V-VOL 202 only stores management information such as pointers, and the V-VOL 202 itself does not have a storage area. Pointers corresponding to each small area of the storage area of the P-VOL 201 divided into predetermined units, such as 64 KB units, are provided, and each pointer points to a storage area of either the P-VOL 201 or the snapshot pool 205. In the state immediately after creating the V-VOL 202, the user data is stored in the P-VOL 201 and all the pointers of the V-VOL 202 point to the P-VOL 201. In other words, the V-VOL 202 shares the user data with the P-VOL 201. As for the storage area of the P-VOL 201 to which update request has been issued from the host computer 10 or the like, the data in the small areas including the range of the storage area to which the update request has been issued is saved in the snapshot pool 205, and the pointers of the V-VOL 202 corresponding to the range of the storage area to which the update request has been issued point to the area in which data is saved in the snapshot pool 205. This operation enables the V-VOL 202 to logically retain the data of the P-VOL 201. In the present invention, the P-VOL 201 and the V-VOL 202 can be mounted in a host, and the host can perform reading or writing regardless of whether the mounted volume is the P-VOL 201 or the V-VOL 202, but it is also possible to restrict the reading/writing operations according to usage. The host can recognize the V-VOL 202 as a logical volume 111.
The snapshot pool 205 is a pool area storing the differential data generated between the P-VOL 201 and the V-VOL 202. The snapshot pool 205 can be a single logical volume 111 or can be formed of a plurality of logical volumes 111 being integrated. The P-VOL 201 or the snapshot pool 205 can be a so-called thin provisioning volume, wherein virtual capacities are provided to the host, and when an actual write request occurs, real storage capacities are dynamically allocated to the destination area of the write request.
Each V-VOL 202 is mapped to a single VM 12. The corresponding relationship between V-VOL 202 and VM 12 can be managed not only via the storage system 100 but also via the management computer 11 or the host computer 10. The VM 12 having the V-VOL 202 mapped thereto can recognize the OS data of the V-VOL 202 mapped thereto and is capable of starting the OS.
Upon starting the OS, a host write request is issued from the VM 12 to the OS data portion of the V-VOL 202, the details of the internal operation of the storage system 100 at that time will be described in detail later. Further in
The PDEV #3082 shows the identification number of the storage media constituting the RAID group. For example in
In other words, “0.4-0.7” means that four storage media from the fourth position to the seventh position in casing number 0 storing the storage media constitute the RAID group. If the storage media constituting the RAID group are arranged astride a plurality of casings, they can be shown using a comma, such as in the entry in which the RG #3081 is “1”.
The RAID type 3083 refers to the type of the RAID constituting the RAID group.
The RG #3081 is an identification number showing the RAID group to which the LU belongs, which can be the same value as the RG #3081 of the RG information 308. One LU is at least defined via a single RG. The capacity (GB) 3012 shows the capacity that the LU has in GB units.
The port #3013 is an identification number showing the host interface port 102 to which the LU is mapped. If the LU is not mapped to the host interface port 102, “NULL” can be entered to the port #3013.
Although not shown, if the logical volume is a thin provisioning volume, mapping tables should be prepared to show whether allocation has been performed for each allocation unit for allocating to the logical volume. Further, a separate mapping table of RAID groups and allocation units should be prepared.
The P-VOL LU #3026 shows the LU # of the P-VOL 201 belonging thereto. The P-VOL LU #3026 can be the same value as the LU #3011 of the LU information 301. The V-VOL #3022 is a number for identifying the V-VOL 202 belonging to the pair. The V-VOL 202 is not a logical volume 111 within the storage system 100. However, in order to enable the host computer to recognize the V-VOL, the storage system 100 must assign a volume number to the V-VOL 202. Therefore, the storage system 100 assigns a respective number for uniquely identifying the V-VOL as V-VOL #3022 to each V-VOL 202.
The pair status 3023 shows the status of the pair. According to the pair statuses, “PAIRED” indicates a state in which the contents of the P-VOL 201 and V-VOL 202 mutually correspond, “SPLIT” indicates a state in which the V-VOL 202 stores the status of P-VOL 201 at some point of time, and “FAILURE” indicates a state in which a pair cannot be created due to some failure or the like.
If the pair status 3023 is “SPLIT”, it means that there may be a differential data generated between the P-VOL 201 and the V-VOL 202. In order for the pair status 3023 to be transited from “PAIRED” to “SPLIT”, it is preferable for the administrator to send a command for transiting to “SPLIT” status via the management computer 10 to the storage system 100. However if the storage system 100 has a scheduling function, it is possible for the storage system 100 to set the state automatically to “SPLIT” at a certain time.
Further, in order to do so, the storage system 100 must create a V-VOL 202 in advance and to create a pair with the P-VOL 201. In
Depending on the method of the snapshot function, it is possible to omit the pair status 3023. For example, if the method only considers whether a snapshot has been taken or not, there will be no pair status, and the V-VOL 202 is simply either created or not created. At this time, the created V-VOL 202 corresponds to the “SPLIT” status according to the present embodiment, and the V-VOL retains the status of P-VOL 201 at a point of time when the snapshot has been taken.
The snapshot pool #3024 is an identification number for uniquely identifying the snapshot pool 205 storing the differential data when differential data occurs in the pair, and a unique number must be assigned to each snapshot pool 205. The pair split time 3025 shows the time in which the pair status 3023 of the pair is transited from “PAIRED” to “SPLIT”. This information is necessary for managing the order in which the pairs were split. If the pair status 3023 is either “PAIRED” or “FAILURE”, the V-VOL 202 does not retain the status of P-VOL 201 at some point of time, so that the pair split time 3025 can store a value such as “NULL”.
The page #3032 shows the serial number per storage area dividing the P-VOL 201 into predetermined units. Predetermined units refer to the capacity unit of differential data managed via the snapshot function, which can be sizes such as 64 KB or 256 KB. These predetermined units are called pages.
The differential flag 3033 indicates whether or not a difference has occurred between the relevant page of the P-VOL 201 with the V-VOL 202 constituting a pair therewith. If a difference has occurred, “1” is entered, and if there is no difference, “0” is entered thereto. If a plurality of V-VOLs 202 are created from a single P-VOL 201, if differences have occurred with respect to all the V-VOLs 202, the differential flag 3033 is set to “1”.
The V-VOL #3022 is an identification number for uniquely specifying the V-VOL 202 equipped to the storage system 100, and can be the same value as the V-VOL #3022 of the pair information 302. The page #3032 of the V-VOL differential information 304 can be the same value as the page #3032 of the P-VOL differential information 303 (
The differential flag 3041 has a different ON trigger of the flag compared to the differential flag 3033 of the P-VOL differential information 303. The differential flag 3033 of the P-VOL differential information 303 is turned ON (“1”) when a difference occurs with respect to all the V-VOLs 202 created from the P-VOL 201 upon saving the differential data in a host write operation to the P-VOL 201. On the other hand, the differential flag 3041 of the V-VOL differential information 304 is turned ON (“1”) when differential data is saved during a host write operation to the P-VOL and during a host-write operation to the V-VOL.
The shared V-VOL #3042 shows the V-VOL #3022 that shares the differential data of the relevant page of the relevant V-VOL 202 if that differential data is shared with other V-VOLs 202. Now, we will easily describe the sharing of differential data. We will consider a case in which two V-VOLs 202 are created from a single P-VOL 201 and two pairs are created, and then the two pairs are simultaneously set to “SPLIT” status.
At this time, if a host write request is issued to a certain page of the P-VOL 201, the two V-VOLs 202 retain a still image of the P-VOL 201 at the same point of time, so that the differential data occurs simultaneously for two V-VOLs 202.
However, it is a waste to retain a plurality of the same differential data in an overlapped manner. Therefore, if a plurality of V-VOLs 202 retain a still image of the same page at the same point of time, the differential data at the time of host write to the P-VOL 201 is shared among the plurality of V-VOLs 202. Thereby, the waste of differential data is solved, and the capacity can be saved. Therefore, the sharing of differential data becomes necessary. Sharing is realized by storing the information of V-VOL #3022 sharing the data to the shared V-VOL #3042.
Further, if the differential data is to be shared among a plurality of V-VOLs 202, the respective V-VOL #3022 should be entered. If there are a large number of V-VOLs 202 sharing the differential data, in order to cut down the amount of information of the management information, it may be possible to use a bitmap in which a single V-VOL 202 is represented via a single bit. If there are no other V-VOLs 202 sharing the differential data, “NULL” is entered thereto.
The reference destination address 3043 indicates the storage destination address of the data that the page of the V-VOL 202 refers to. For example, if there is no difference generated in a page and the page is identical to the page of the P-VOL 201, the processor 103 or the like of the storage system 100 can enter “NULL” in the reference destination address 3043 and the relevant page of the P-VOL 201 can be referred to.
On the other hand, if a difference has occurred to the page, the relevant page of the relevant V-VOL 202 must refer to the differential data, so that the processor 103 enters an address information uniquely identifying the destination for saving the differential data to the reference destination address 3043. The address information can be, for example, a combination of the identification number of the snapshot pool 205 and the serial number of the page disposed in the snapshot pool 205.
The respective queue tables are tables composed of an RG #3081 and a pointer 3121, wherein the RG #3081 stores an identification number of the RAID group constituting the snapshot pool 205, which can be the same information as the RG #3081 of the RAID group information 308 (
A pointer 3121 has a page queue 3050 belonging to the relevant RAID group connected thereto. A page queue 3050 refers to an information storing the differential data of the snapshot pool 205, and a plurality of queues are provided for each snapshot pool 205. The number of page queues 3050 are determined based on the capacity of the snapshot pool 205. For example, if differential data is stored in pages of 64 KB units to the snapshot pool 205 having a capacity of 10 GB, the number of page queues 3050 will be 10 GB/64 KB=163840. At this time, the pool free space information 305 has 163840 page queues 3050.
Further, the number of page queues 3050 are allocated for each capacity of the RAID groups constituting the snapshot pool 205. For example, it is assumed that the snapshot pool 205 having a capacity of 10 GB is composed of three RAID groups, and the capacity of each RAID group is 5 GB, 3 GB and 2 GB. In that case, the number of page queues 3050 belonging to the respective RAID groups is 81920, 49152 and 32768, respectively.
Thus, by dividing and managing the page queues 3050 belonging to RAID groups, it becomes possible to perform control so as to store differential data in the arbitrary RAID groups. Further, if differential data is stored in the page queue 3050, it means that the page queue is already used, so that it is connected to the entry of the relevant RG #3081 of the pool used queue table 313. On the other hand, if no differential data is stored in the page queue 3050, it means that the queue is a free queue, so that it is connected to the entry of the relevant RG #3081 of the pool free queue table 312. That is, the page queue 3050 is connected to either the pool free queue table 312 or the pool used queue table 313. The pool free queue table 312 is used to acquire an appropriate save destination for saving the differential data. The details of the page queue 3050 will be described with reference to
The queue number 3051 is a serial number for uniquely identifying the page queue 3050 in the storage system 100. The belonging pool #3052 is an identification number for uniquely identifying the snapshot pool 205 to which the relevant page queue 3050 belongs. This number can be the serial number of the snapshot pool 205 in the storage system 100.
The belonging page #3053 is a serial number of the capacity unit of the differential data (such as 64 KB or 256 KB) indicated by the relevant page queue 3050 in the snapshot pool 205 to which the page queue 3050 belongs. For example, if the storage system 100 has a 10 GB snapshot pool 205 and the capacity unit of the differential data is 64 KB, the belong page #3053 includes numbers from zero to 163839. It is impossible for a plurality of page queues 3050 belonging to the same snapshot pool 205 to have the same belonging page #3053.
The RG #3081 can be the same value as the RG # of the pool free queue table 312 or the RG # of the pool used queue table 313. The RG #3081 is information for checking whether the connection between the page queue and the pool free queue table 312 or the pool used queue table 313 is performed correctly. The post-save write flag 3054 is flag information indicating whether or not a host write request has been issued or not with respect to the V-VOL 202 referring to the relevant page. Further, the post-save write flag 3054 is turned ON (“1”) when a host write occurs to the V-VOL 202 during the preliminary saving process described later.
The reference V-VOL number 3055 is a counter information showing the number of V-VOLs 202 sharing the relevant page queue 3050. Upon saving the relevant page when a host write occurs to the P-VOL 201, a value of 1 or greater is stored according to the number of V-VOLs 202 sharing the relevant page to the reference V-VOL number 3055. The reference V-VOL 202 is reduced by triggers such as the cancelling of pairs or deleting of V-VOLs 202. The Next pointer 3056 and the Prey pointer 3057 are pointer information for realizing a queue structure by connecting mutual page queues 3050 or by connecting a page queue 3050 and a pool free queue table 312 or a pool used queue table 313.
The host write flag 3061 is a flag information that is turned ON (“1”) when even a single write request has been issued from the host computer 10 to the relevant page of the P-VOL 201. The IOPS 3062 is the number of host I/Os received per second by the relevant page of the P-VOL 201. However, the TOPS 3062 can use other values as long as the amount of load per page is expressed. The use of the page performance information 306 is started via a specific trigger, and the information is updated at specific periodic cycles. The trigger for starting use and the periodic update cycle will be described in detail later.
The snapshot pool #3024 can be an identification number for uniquely identifying the snapshot pool in the storage system 100, which can be the same value as the snapshot pool #3024 of the pair information 302 (
The total capacity (GB) 3071 shows the overall capacity of the relevant snapshot pool 205. In the present example, the capacity is expressed by entering a numerical value of GB units, but expressions other than using GB units are possible. The used capacity (GB) 3072 shows the capacity being used in the relevant snapshot pool 205. The capacity is shown in GB units according to the present example, but expressions other than GB units, such as TB units or percentage, are also possible.
The storage system 100 receives a write request to the P-VOL from the host computer 10 (step 1001). Next, the processor 103 refers to the pair information 302, and determines whether the pair status 3023 of the relevant P-VOL 201 is “SPLIT” or not (step 1002). If the result of the determination is “No”, that is, if the pair status is “PAIRED”, the procedure advances to step 1005. If the result of the determination in step 1002 is “Yes”, that is, if the pair status is “SPLIT”, the processor 103 determines whether the value of the differential flag 3033 of the P-VOL differential information 303 is “1” or not (step 1003). If the result of the determination is “Yes”, that is, if the differential flag 3033 is “1”, the procedure advances to step 1005.
If the result of determination in step 1003 is “NO”, that is, if the differential flag 3033 is “0”, the procedure advances to a save destination search process shown in step 1004. The details of the save destination search process will be described with reference to
Next, the details of the save destination search process will be described with reference to
Next, the processor 103 refers to the previously used RG #3001 of the relevant snapshot pool 205 of the RG selection table 300, and determines the RG # to be used for saving the current differential data (step 1102). According to the present embodiment, the RG # is determined in a round-robin fashion. That is, if there are multiple RAID groups constituting the relevant snapshot pool 205, each of the multiple RAID groups are used sequentially in order as the destination for saving differential data. Thus, it becomes possible to prevent differential data from concentrating to a specific RAID group.
Next, the processor 103 refers to the pool free queue table 312. At this time, the processor 103 searches the queue of the entry of the RG # determined in step 1102 (step 1103). Thereafter, the processor 103 determines whether the entry searched in step 1103 has a page queue 3050 connected thereto or not (step 1104). If as a result of determination in step 1104 a page queue 3050 is connected to the entry of the RG # (“Yes” in step 1104), the processor 103 determines the page queue 3050 as the destination for saving the differential data (step 1108).
If as a result of determination in step 1104 a page queue 3050 is not connected to the entry of the RG #, the procedure advances to step 1105 (“No” in step 1104). In step 1105, the processor 103 determines whether the entries of all the RG # in the pool free queue table 312 has been searched or not. If as a result of the determination there is an entry of an RG # that has not been searched (“No” in step 1104), the procedure advances to step 1107. Step 1107 is a process for searching the entry of the next RG # of the entry of the RG # having been previously searched. If the entry of the RG # has reached the terminal end, it is possible to perform control to search the entry of the leading RG #. The processor 103 searches the entry of the next RG #, and returns to the determination process of step 1104 again.
On the other hand, if the result of determination of step 1105 is “Yes”, it means that the entries of all the RG # has been searched but there was no page queue 3050 connected to the entries of the RG #. In other words, there is no page queue in the pool free queue table 312, and that the relevant snapshot pool 205 is in a state not enabling differential data to be saved thereto. Therefore, in step 1106 the processor 103 sends an error message to the administrator and ends the present process.
Lastly, the process subsequent to step 1108 will be described. In step 1108, the page queue 3050 to be used as the destination for saving the differential data is determined, and thereafter, the procedure advances to a differential saving process shown in step 1109. The details of the differential saving process will be described in a different drawing (
Next, the details of the differential saving process will be described with reference to
Next, the processor 103 changes the connection of the page queue 3050 determined in step 1108 of
Next, the processor 103 updates the RG selection table 300 (step 1203). Actually, the contents of the previously used RG #3001 of the RG selection table 300 should be updated to the RG # used for the present differential data saving process.
Next, the processor 103 updates the P-VOL differential information 303 (step 1204). Actually, if differential data has been generated between the relevant P-VOL 201 and all the V-VOLs 202 created from the relevant P-VOL 201, the differential flag 3033 of the P-VOL differential information 303 is set from “0” to “1”.
Thereafter, the processor 103 updates the V-VOL differential information 304 (step 1205). Actually, the differential flag 3041, the shared V-VOL #3042 and the reference destination address 3043 of the V-VOL differential information 304 are respectively updated. The shared V-VOL #3042 is updated when another V-VOL 202 sharing the differential data of the relevant page exists. A belonging pool #3052 and a belonging page #3053 denoted by the page queue 3050 determined in step 1108 should be set as the reference destination address 3043. The differential flag 3041 is changed from “0” to “1” regarding the V-VOL 202 which is in a “SPLIT” state with the relevant P-VOL 201.
Next, the processor 103 updates the pool information 307 (step 1206). Here, the used capacity (GB) 3072 of the pool information 307 is updated. The used capacity of the snapshot pool 205 is increased by saving the differential data, so that the used capacity should be set by calculating the increased capacity. The differential saving process is ended by the above-described steps. The above-described process is a so-called CoW (Copy-on-Write) process for copying the original data to the snapshot pool during a host write process.
Next, the host write process to the V-VOL 202 will be described with reference to
Next, the processor 103 refers to the pair information 302, and determines whether the pair status 3023 of the relevant V-VOL 202 is “SPLIT” or not (step 1302). If the result of the determination is “NO”, that is, if the pair status is “PAIRED”, the procedure advances to step 1303. In step 1303, the processor 103 notifies an error message to the host computer 10 or the administrator, and ends the process. This is because the V-VOL 202 cannot be updated since the pair status thereof is “PAIRED”, that is, the V-VOL 202 is in a corresponding state with the P-VOL 201.
If the result of determination of step 1302 is “Yes”, that is, if the pair status is “SPLIT”, the processor 103 determines whether the value of the differential flag 3041 of the V-VOL differential information 304 is “1” or not (step 1304). If the result of the determination is “Yes”, that is, if the differential flag 3041 is “1”, the procedure advances to step 1305 since the differential data is already saved. In step 1305 the processor 103 writes the write data received from the host computer 10 to a page denoted by the reference destination address 3043 of the V-VOL differential information 304.
If the result of determination of step 1304 is “NO”, it means that the differential data is not yet saved, so that the procedure advances to the save destination search process shown in step 1306. If the save destination search process is completed, the procedure advances to step 1305, and the processor 103 ends the process.
As described, the host write operation to the V-VOL 202 is completed. The flow of the save destination search process according to the host write process of V-VOL 202 can be the same as the host write operation to the P-VOL 201. However, the updating process of the P-VOL differential information 303 during the differential data saving process differs. Actually, there is no need to update the P-VOL differential information 304. The above-mentioned process is also a CoW process since the original data is copied to the snapshot pool during a host writing process similar to
Next, the host read process of the V-VOL 202 will be described with reference to
Next, the processor 103 determines whether the relevant differential flag 3041 of the V-VOL differential information 304 is “0” or not (step 1402). If the result of determination is “NO”, that is, if the differential flag 3041 is “1”, the procedure is advanced to step 1403. In 1403, the processor 103 refers to the relevant reference destination address 3043 of the V-VOL differential information 304, specifies the identification number and the page of the snapshot pool 205 in which the differential data is saved, reads the differential data in the specified page, and ends the process.
If the result of determination in step 1402 is “Yes”, that is, if the differential flag 3042 of the V-VOL differential information 304 is “0”, the processor 103 reads the page of the P-VOL 201 (step 1404) and ends the process. By the steps mentioned above, the host read process of V-VOL 202 is ended.
Next, the problem that the present embodiment aims to solve will be described once again. A method for providing a snapshot volume (V-VOL 202) as an OS image disk of the VM 12 has been provided as a new purpose of use of the snapshot function, which has conventionally been used for backup.
In the actual system, a V-VOL 202 is created using a snapshot function from the P-VOL 201 storing original data such as the OS or application program (AP), and the V-VOL 202 is provided as a volume of the VM 12. This system is advantageous since a large amount of VMs can be created, operated and managed at high speed, but if the large number of VMs 12 are started concurrently, there is a drawback that the reading and writing of the V-VOL 202 occurs frequently. Especially, when writing data to the V-VOL 202, a large number of saving processes of differential data occurs. The process for saving differential data burdens the storage system 100 since a process overhead for reading the original data from the P-VOL 201 and writing the same to the snapshot pool 205 must be performed in addition to the normal write process. Therefore, in order to solve this problem, embodiment 1 of the present invention performs a process to save the original data in advance prior to starting the VM.
When the processor 103 starts the load monitoring process, the processor 103 measures the load of each page unit with respect to the P-VOL 201 included in the storage system 100. The item of measurement is the number of I/Os received respectively as host read request and host write request, and if a page receives even a single host write request, the host write flag 3061 of the page performance information 306 is updated from “0” to “1”. Further, the processor 103 writes the number of I/Os received within a unit time to the TOPS 3061 of the page performance information 306 regardless of whether the type of I/O is a host read request or a host write request. The processor 103 performs the above-mentioned measurement and the update of the page performance information until the storage controller 101 receives a request to terminate the load monitoring process from the user.
Next, the user performs a test start process using the P-VOL 201 having stored the master data via the host computer 10 or the management computer 11 (step 1504). The test start is performed by simply starting the OS in a normal manner. Thereafter, the user ends the test start process (step 1505). Next, the user orders the storage controller 101 to end the load monitoring process via the management computer 11 (step 1506).
Thereafter, the user orders the storage controller 101 to create a V-VOL 202 from the P-VOL 201 using a snapshot function via the management computer 11, and based thereon, the processor 103 creates a V-VOL 202 from the P-VOL 201. At this time, the user can designate the number of V-VOLs 202 created from the P-VOL 201, and if the number is not designated by the user, the storage controller can create a predetermined number of V-VOLs automatically (step 1507). Next, the processor 103 performs the preliminary saving process (step 1508). The preliminary saving process will be described in detail with respect to a separate drawing (
Lastly, the user starts the VM 12 using the mapped V-VOL 202 (step 1510), and ends the process. Further, if the data stored in the P-VOL 201 is an OS data having installed a specific application program, the period for performing the test start process is set from the starting of the OS to the starting of the application program, and the speed of the process for starting the application program can be enhanced.
Next, the preliminary saving process will be described with reference to
Next, the procedure advances to step 1604. In step 1604, the processor 103 updates the differential flag 3041 and the reference destination address 3043 of the V-VOL differential information 304. Actually, the processor 103 updates the differential flag 3041 of the relevant page portion referring to the differential data either saved or copied in step 1603 from “0” to “1”.
As for the reference destination address 3043, the processor 103 similarly writes the save destination and copy destination snapshot pool # (snapshot pool number) and the page # (page number) determined in step 1603. The procedure advances to step 1605, where it is determined whether step 1601 has been performed for all the pages of the relevant P-VOL 201. If the result of determination of step 1605 is “Yes”, the preliminary saving process is ended.
If the result of determination of step 1605 is “NO”, the processor 103 refers to the page performance information 303, and advances to the next entry of the page #3032 (step 1606), where the procedure returns to step 1601. If the result of determination in step 1601 is “NO”, the procedure advances to step 1607. In step 1607, the processor 103 refers to the IOPS (Input Output Per Second) 3062 of the page performance information 306, and determines whether the product of the value of the IOPS 3062 to the relevant page and the number of V-VOls 202 created from the relevant P-VOL exceeds a predetermined IOPS or not. The present description refers to a case in which a host write request is not issued to the relevant page, so that during actual starting of the VM, obviously, the CoW process does not occur.
However, the page to which the host write request is not issued is a page having a possibility that a large number of V-VOLs 202 may continue referring to the relevant P-VOL 201, and that the large amount of concentrated I/O to the P-VOL 201 may become the bottleneck of the performance. Therefore, even if the page does not have a host write request issued thereto, if the product of the number of V-VOLs 202 referring thereto and the IOPS that the respective V-VOLs 202 receive exceeds a predetermined value, or simply if the TOPS that the relevant page receives exceeds a predetermined value, the processor 103 saves the relevant page in the snapshot pool 205, and sets the relevant page of the relevant V-VOL 202 to refer to the snapshot pool 205. Thus, even if the page does not have any write request issued thereto, the page having a heavy load will have its load dispersed within the snapshot pool 205, so that the concentration of load to the P-VOL 201 can be prevented.
If the result of determination in step 1607 is “Yes”, the procedure advances to step 1608. In step 1608, the save destination search process is performed. The procedure of the save destination search process 1608 is the same as the procedure of the save destination search process of
Next, the details of the copying process for preliminary saving (step 1603) will be described with reference to
Thereafter, the processor 103 refers to the previously used RG #3001 of the RG selection table 300, and specifies the RG # selected in the previous differential data saving process or the copying process. Next, the processor 103 refers to the RG #3081 of the pool information 307. If the snapshot pool 205 being the target of the differential data saving process is composed of a plurality of RAID groups, the RAID group subsequent to the RAID group denoted by the previously used RG #3001 is determined as the copy destination RAID group of the current differential data (step 1702).
Thereafter, the processor 103 searches the entry denoted by the RAID group specified in step 1702 from the pool free queue table 312 (step 1703). Next, the processor 103 determines whether a page queue 3050 is connected to the entry searched in step 1703 (step 1704). If the result of determination is “Yes”, that is, if a page queue 3050 is connected to the entry, the processor 103 determines the page queue 3050 connected thereto to the destination for copying the differential data (step 1708).
Next, the processor 103 copies the differential data to the snapshot pool # and the page # denoted by the page queue 3050 determined in step 1708 (step 1709). Then, the processor 103 updates the relevant reference destination address #3043 of the V-VOL differential information 304 to the belonging pool #3052 and the belonging page #3053 of the snapshot denoted by the page queue 3050 copied in step 1709. Further, the used capacity (GB) 3071 of the pool information 307 is also updated (step 1710). In step 1710, the management information is updated so that the respective V-VOLs 202 created from the P-VOL 201 exclusively possess the copied differential data.
Next, the processor 103 determines whether the process for copying the differential data according to the above step is performed for the same number of times as the number of V-VOLs 202 created from the P-VOL 201 (step 1711). If the result of the determination is “Yes”, the process is ended. If the result of determination is “No”, the procedure returns to step 1702. If the determination result of step 1704 is “No”, the procedure advances to step 1705. The steps 1705, 1706 and 1707 are the same as steps 1105, 1107 and 1106 of
According to the respective steps mentioned above, the copying process for preliminary saving is realized. According to the present process, it is necessary to repeatedly perform the copying process for preliminary saving for a number of times corresponding to the number of V-VOLs 202 created from the relevant P-VOL 201. However, it is possible for the administrator to enter the number of VMs 12 to be started to the storage system 100 via the management computer 11 or the like, and to set the number of times for performing the copying process for preliminary saving as the number of VMs 12 entered by the administrator. In that case, the number of VMs 12 can be entered via a VM setup screen 40 shown in
The administrator is capable of entering the P-VOL number 401 and the scheduled number of VMs to be created 402 in the starting VM number setup table 400. Prior to creating the V-VOLs to be mapped to the VM 12, the administrator enters a value to the scheduled number of VMs to be created 402 and presses the enter button 403. Thus, the number of VMs scheduled to be mapped to the V-VOLs created from the relevant P-VOL can be notified to the storage system 100. In this case, according to step 1711, the processor 103 is merely required to determine whether the copying process of differential data has been performed for a number of times equal to the number entered to the scheduled number of VMs to be created 402. The details of the preliminary saving process via page units has been explained.
Next, we will describe the process of deleting the copy data of the differential data created via the preliminary saving process. According to the prior art snapshot, the differential data generated between the P-VOL 201 and the V-VOL 202 is saved in the snapshot pool 205. In other words, when a host write request is issued to the P-VOL 201 or the V-VOL 202 and differential data occurs thereby, differential data must be saved. The differential data can be deleted from the snapshot pool 205 triggered by the deleting of the V-VOL 202 or the changing of the pair status to “PAIRED”.
According to the present invention, the page that may become differential data is saved or copied to the snapshot pool 205 in advance prior to the issue of a host write request, so that it is necessary to consider a process for deleting the data saved in the snapshot pool 205 including the data saved in the snapshot pool 205 but not actually used as differential data.
Next, the processor 103 determines whether the value of the reference V-VOL number 3055 is “0” or not with respect to the searched page queue 3050 (step 1802). If the result of the determination in step 1802 is “Yes” (“0”), the processor 103 reconnects the relevant page queue 3050 to the pool free queue table 312 and frees the area of the relevant page of the snapshot pool 205 (step 1803). One actual possible example in which the determination result in step 1802 is “Yes” is a case where the created V-VOL 202 is deleted and there is no more V-VOLs 202 referring to the relevant page.
Thereafter, the processor 103 updates the used capacity (GB) 3072 of the pool information 307 (step 1804). Next, the processor 103 determines whether all the page queues 3050 belonging to the relevant entry of the pool used queue table 313 has been processed or not (step 1805). If the result of the determination in step 1805 is “Yes”, the processor 103 determines whether all the entries of the pool used queue table 313 has been processed or not (step 1806).
If the result of determination in step 1806 is “Yes”, the process is ended. If the result of the determination in step 1806 is “No”, the processor 103 searches the next entry of the pool used queue table 313 (step 1807), and returns to step 1802. If the result of determination in step 1805 is “No”, the processor 103 searches the next page of the pool used queue table 313 (step 1812) and returns to step 1802.
If the result of determination in step 1802 is “No”, the processor 103 refers to a post-save write flag 3054 of the relevant page queue 3050 and determines whether a host write request has been issued after saving (step 1808). If the result of determination in step 1808 is “No”, the processor 103 updates the reference destination address 3043 of the V-VOL differential information 304 to “NULL” (step 1809). Here, the relevant page queue 3050 is saved but a host write request has not been received, so the data of the relevant page queue 3050 and the data in the page of the P-VOL 201 are the same. Therefore, the processor 103 changes the reference destination of the V-VOL 202 referring to the relevant page queue 3050 to the P-VOL 201.
Next, the processor 103 changes the corresponding relationship with the relevant page queue 3050 from the pool used queue 313 to a pool free queue 312 (step 1810), and updates the used capacity (GB) 3072 of the pool information 307 (step 1811). Next, the procedure advances to step 1805. If the result of determination of step 1808 is “Yes”, the procedure advances to step 1812. The above-described steps realize the process for deleting saved pages.
Incidentally, the trigger for performing the deleting process illustrated in
Next, the host write process to the V-VOL 202 after performing the preliminary saving process will be described. The host read process and the host write process of the P-VOL 201 after the preliminary saving process are the same as in the case without the preliminary saving process, so detailed descriptions thereof are omitted. The host read process of the V-VOL 202 after the preliminary saving process is also the same as the case without the preliminary saving process, so detailed descriptions thereof are omitted.
Next, the processor 103 refers to the pair information 302, and determines the pair status of the relevant V-VOL (step 1902). If the result of the determination is “No”, that is, if the pair status is “PAIRED”, the procedure advances to step 1906. In step 1906, the processor 103 sends an error message to the host computer 10 or the administrator, and ends the process.
If the result of determination of step 1902 is “Yes”, that is, if the pair status is “SPLIT”, the processor 103 determines whether the value of the differential flag 3041 of the V-VOL differential information 304 is “1” or not (step 1903). If the result of the determination is “Yes” (“1”), that is, if the differential flag is “1”, it means that differential data is already saved, so that the procedure advances to step 1904. In step 1904, the processor 103 performs the write process to the V-VOL 202 after the preliminary saving process, and the details of the process will be described with reference to
Next, the details of the write process to the V-VOL after preliminary saving will be described with reference to
If the result of determination in step 2001 is “Yes”, the procedure advances to an inter-pool CoW (Copy-on-Write) process (step 2002). The details of the inter-pool CoW (Copy-on-Write) process will be described with reference to
Subsequently, the details of the inter-pool CoW (Copy-on-Write) process will be described with reference to
The first embodiment of the present invention has been described. The effects of embodiment 1 will now be described. Embodiment 1 enables to enhance the speed of starting the OS or the application of the VM 12 mounting the V-VOL 202 by subjecting the P-VOL 201 storing the OS data or the OS data and the application program data to test starting, performance measurement and preliminary saving. Especially in the case where a host write request is issued during starting of the OS or the starting of the application program, a normal write operation creating only a small load can be performed instead of the burdensome CoW (Copy-on-Write) operation that had been indispensible according to the prior art system, and therefore, the present embodiment enables to reduce the load of the overall storage system and to enhance the speed of starting the VM.
Now, the second embodiment of the present invention will be described with reference to
Further, the RAID group information 309 has added the RG marginal performance (IOPS) 3091 and the RG load (%) 3092 to the RAID group information 308 described in embodiment 1. The RG marginal performance (IOPS) 3091 shows the marginal performance of the relevant RAID group in IOPS (I/O per second), which can be calculated based on the storage media types constituting the RAID group, the number of media therein and the RAID type.
For example, if the RAID group is composed of four HDDS having a marginal performance of 300 TOPS and having a RAID5 arrangement, the marginal performance of the relevant RAID group becomes 1200 TOPS (300 TOPS×4). The RG load (%) refers to the total amount of load that the RAID group receives shown by percentage, which can be calculated by dividing the value of the RG marginal performance (IOPS) 3091 by the number of TOPS of the load that the relevant RAID group is currently receiving and showing the value in percentage. The storage system 100 can perform update of the RG load (IOPS) periodically, such as every minute.
Although not shown, the RG marginal performance (IOPS) 3091 can be shown via throughput (MB/sec) or can be shown per Read/Write types. The RG load (%) 3092 can also be shown per Read/Write types.
On the other hand, embodiment 2 differs from embodiment 1 in that the status of load of the RAID groups is considered when determining the RG # to be used for saving differential data. In other words, the RAID group information 309 is referred to in step 2202 of
As described, the load of RAID groups in the storage system 100 can be uniformized by selecting the RAID group for saving differential data and for performing the inter-pool CoW process of the differential data based on the status of load of the respective RAID groups and selecting the RAID group having the smallest load.
Now, the operation for further creating a snapshot virtual volume from a V-VOL according to the third embodiment of the present invention will be described with reference to
The purpose of the snapshot structure for creating a V-VOL 204 from the V-VOL 203 as shown in
According to embodiment 3, a pair composed of V-VOL and V-VOL (a pair of two V-VOLs) exist for creating a V-VOL 204 from V-VOL 203. Therefore, VOL identification numbers for uniquely identifying the P-VOLs and V-VOLs in the storage system 100 are assigned to all the P-VOLs and V-VOLs.
The VOL identification number of a P-VOL or a V-VOL which is the source of snapshot creation is entered to the P-VOL VOL #3011. The VOL identification number of the V-VOL created from the snapshot creation source is entered to the V-VOL VOL #3022. As described, the pair of P-VOL and V-VOL and the pair of V-VOL and V-VOL are managed.
As described, similar to embodiments 1 and 2, the present embodiment enables to reduce the load of the overall storage system and to enhance the speed of starting the VM by performing a normal write operation having a small load instead of the burdensome CoW (Copy-on-Write) operation that had been indispensible according to the prior art system when a host write request was issued during starting of the OS or the starting of the application program.
The present invention can be applied to storage devices such as storage systems, information processing apparatus such as large-scale computers, servers and personal computers, and communication devices such as cellular phones and multifunctional portable terminals.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/006028 | 10/28/2011 | WO | 00 | 11/9/2011 |