This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2021-4255, filed on Jan. 14, 2021, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a storage system, a storage control device, and a storage control method.
In a storage system including a plurality of storage control devices, for example, a storage control device in charge of input/output (I/O) processing is predetermined for each of a plurality of logical storage areas. Furthermore, in such a storage system, there are some cases where the storage control device in charge of I/O processing for a certain logical storage area is switched, and the I/O processing is taken over by a switching destination storage control device. For example, in a case where a processing load of the switching source storage control device becomes excessive, the storage control device in charge of I/O processing is switched to the storage control device having a lower processing load.
Examples of the related art include as follows: Japanese Laid-open Patent Publication No. 2003-162377; and Japanese. Laid-open Patent Publication No. 2015-169956.
According to an aspect of the embodiments, a storage system includes: a first storage control device; and a second storage control device, wherein, in a state of controlling input/output (I/O) processing for a logical storage area using a cache, when receiving a switching instruction configured to switch a device in charge that controls the I/O processing for the logical storage area from the first storage control device to the second storage control device, the first storage control device performs first switching processing of notifying the second storage control device of a management device number that indicates the first storage control device as a device that manages the cache, and executing response processing for the switching instruction to switch the device in charge, and when receiving a determination request as to whether data requested to be read from the logical storage area by a readout request hits the cache from the second storage control device after execution of the first switching processing, the first storage control device determines whether the data hits the cache, and when receiving the readout request after execution of the first switching processing, the second storage control device transmits the determination request to the first storage control device indicated by the notified management device number.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
The following procedure can be considered as a procedure for such switching processing. For example, when switching is instructed, dirty data is written back to a back-end storage device from a cache used in the I/O processing by the switching source storage control device. Then, when the write back of all the dirty data is completed, a response to the switching instruction is performed, and the I/O processing in the switching destination storage control device is started.
Furthermore, the following techniques have been proposed for switching a connection relationship between a cache memory and a storage module. For example, when connection is switched so as to connect a storage module to a cache memory different from a current cache memory, information stored in the pre-switching cache memory is moved to the post-switching cache memory.
By the way, when a response to the switching instruction is performed after write back of the cache dirty data is completed as described above, there is a problem that the time from receiving the switching instruction to the response becomes long. In particular, in the case of using a secondary cache for I/O processing, the capacity of the secondary cache is much larger than that of a primary cache, so there is a high possibility that write back of dirty data in the secondary cache takes a long time, and a response time to the switching instruction becomes long by the time of the write back.
In one aspect, the embodiment is intended to provide a storage system, a storage control device, and a storage control method capable of shortening a response time after receiving a switching instruction of a device in charge of I/O processing.
Hereinafter, the embodiments will be described with reference to the drawings.
The storage control devices 10 and 20 control I/O processing for a logical storage area. As an example in
The storage control device 10 controls the I/O processing for the logical storage area 1 using a cache 11. The cache 11 is secured in a storage device mounted inside the storage control device 10 or a storage device connected to an outside of the storage control device 10. In such a state, it is assumed that the storage control device 10 receives a switching instruction instructing switching the device in charge from the storage control device 10 to the storage control device 20 (step S1).
Then, the storage control device 10 executes the following switching processing including processing of steps S2 and S3. First, the storage control device 10 notifies the switching destination storage control device 20 of a management device number 22 indicating the storage control device 10 as a device for managing the cache 11 (step S2). The notified management device number 22 is stored in, for example, a storage device 21 included in the storage control device 20. When the storage control device 10 notifies the management device number 22, the storage control device 10 executes response processing to the switching instruction and switches the device in charge to the storage control device 20 (step S3).
As a result, control of the I/O processing for the logical storage area 1 by the switching destination storage control device 20 is started. In this state, it is assumed that the storage control device 20 receives a data readout request from the logical storage area 1 (step S4). Then, the storage control device 20 refers to the notified management device number 22 and recognizes that the management device of the cache 11 corresponding to the logical storage area 1 is the storage control device 10. Then, the storage control device 20 transmits a determination request as to whether the data (readout data) requested to be read by the readout request hits the cache 11 to the storage control device 10 indicated by the management device number 22 (step S5).
When the switching source storage control device 10 receives the determination request, the storage control device 10 determines whether the readout data hits the cache 11 (step S6). Here, for example, when the readout data exists in the cache 11 and a cache hit is determined, the storage control device 10 reads out the readout data from the cache 11 and transfers the readout data to the storage control device 20 (step S7). The storage control device 20 receives the transferred readout data, transmits the received readout data to a transmission source device of the readout request (not illustrated), and executes response processing for the readout request (step S8).
As described above, in the case of receiving the switching instruction of the device in charge, the switching source storage control device 10 responds to the switching instruction to switch the device in charge by simply notifying the switching destination storage control device 20 of the management device number 22 indicating the management device of the cache 11. As a result, the response time after receiving the switching instruction can be shortened as compared with the case of making a response after writing back all the dirty data stored in the cache 11 to a physical storage area that implements the logical storage area 1.
Furthermore, there is a possibility that dirty data remains in the cache 11 at the point of time when the device in charge has been switched. Therefore, it is necessary to enable access to the dirty data remaining in the cache 11 so as to avoid occurrence of data inconsistency when the switching destination storage control device 20 receives the readout request. In the above processing, the management device number 22 is notified to the storage control device 20 at the time of the switching processing. As a result, the switching destination storage control device 20 can request the determination as to whether the readout data hits the cache 11 on the basis of the management device number 22, and can acquire the readout data from the cache 11 in the case where the readout data hits the cache 11.
In this way, the switching source storage control device 10 notifies the management device number 22 instead of executing the write back of the cache 11 so that the storage control device 20 can access the dirty data in the cache 11 after switching, and then completes the switching processing As a result, the response time to the switching instruction can be shortened while avoiding the data inconsistency due to the I/O processing after switching.
The CMs 100a to 100d are storage control devices that control I/O processing for logical volumes in response to requests from the host servers 400a and 400b. The logical volume to be controlled for I/O is implemented using a storage device mounted on a disk array.
In the example of
The disk arrays 200a and 200b are each equipped with a plurality of storage devices that implement the storage area of the logical volume. In the present embodiment, as an example, it is assumed that the disk arrays 200a and 200b are equipped with hard disk drives (HDDs) as such storage devices.
Furthermore, the CMs 100a to 100d perform I/O control for the logical volume, using a storage area by a storage device (flash memory) mounted on a flash module as a secondary cache. In the example of
The CMs 100a to 100d are connected to the host servers 400a and 400b via a network 511. The network 511 is a storage area network (SAN) using, for example, a fibre channel (FC), an Internet small computer system interface (iSCSI), or the like.
Furthermore, the CMs 100a to 100d can communicate with one another via a switch 512. The switch 512 is connected to the CMs 100a to 100d via, for example, a bus of a peripheral component interconnect express (PCI Express, hereinafter abbreviated as “PCIe”) and relays signals transmitted between CMs.
A management terminal 500 is a terminal device operated by an administrator to manage the CMs 100a to 100d and is connected to the CMs 100a to 100d via the network 511.
Note that the number of CMs included in the storage system is not limited to four as illustrated in
Furthermore, in the present embodiment, the logical volume is implemented by the storage device (here, HDD) mounted on the disk array. Furthermore, a primary cache and a secondary cache are used during the I/O control for the logical volume. Then, the primary cache is implemented by a random access memory (RAM) in the CM, and the secondary cache is implemented by the storage device (here, the flash memory) in the flash module.
Note that the storage device that implements the secondary cache is only needed to be a nonvolatile storage device that has a higher access speed than the storage device that implements the logical volume and has a slower access speed than the storage device that implements the primary cache. For example, in the case where a solid state drive (SDD) is used as the storage device that implements the logical volume, a so-called storage class memory (SCM) such as magnetoresistive RAM (MRAM), ferroelectric RAM (FeRAM), phase change RAM (PRAM), resistive RAM (ReRAM) or the like may be used as the storage device that implements the secondary cache. Furthermore, the nonvolatile storage device that implements the secondary cache may be built in the CM.
The CM 100a is implemented as, for example, a computer as illustrated in
The processor 101 integrally controls the entire CM 100a. The processor 101 is, for example, a central processing unit (CPU), a micro processing unit (MPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a programmable logic device (PLD). Furthermore, the processor 101 may be a combination of two or more elements of a CPU, an MPU, a DSP, an ASIC, and a PLD.
The RAM 102 is implemented as, for example, a dynamic RAM (DRAM), and is used as a main storage device of the CM 100a. The RAM 102 temporarily stores at least a part of an operating system (OS) program or an application program to be executed by the processor 101. Furthermore, the RAM 102 stores various data needed for processing by the processor 101. Note that, as will be described below, a part of a storage area of the RAM 102 is used as the primary cache during the I/O control for the logical volume.
The SSD 103 is used as an auxiliary storage device of the CM 100a. The SSD 103 stores the OS program, the application program, and various data. Note that another type of nonvolatile storage device such as an HDD can be used as the auxiliary storage device.
The host interface 104 communicates with the host servers 400a and 400b and the management terminal 500 via the network 511.
The drive interface 105 is connected to the disk array 200a. As illustrated in
The flash interface 106 is connected to the flash module 300a. As illustrated in
The CM interface 107 communicates with the other CMs 100b to 100d via the switch 512.
Processing functions of the CM 100a can be implemented by the above-described hardware configuration, Note that, for example, the host servers 400a and 400b can also be implemented as a computer having the hardware configuration as illustrated in.
First, the area of the primary cache 111 is secured in the RAM 102. Furthermore, the area of the secondary cache 311 is secured in the flash module 300a. The CM 100a controls the I/O processing for the logical volume, using the primary cache 111 and the secondary cache 311.
Furthermore, cache management information 112 and CM in charge management information 113 are stored in the RAM 102. The cache management information 112 is information for managing the primary cache 111 and the secondary cache 311, and includes, for example, information indicating a correspondence relationship between an address on the logical volume and an address on the cache, information indicating an attribute of data on the logical volume, and the like. The CM in charge management information 113 is information indicating a correspondence relationship between the logical volume and the CM in charge. The “CM in charge” indicates a CM that controls the I/O processing for the logical volume.
Furthermore, the CM 100a also includes a host communication unit 121, a resource control unit 122, a cache control unit 123, a redundant array of inexpensive disks (RAID) control unit 124, and a switching control unit 125, Processing of the host communication unit 121, the resource control unit 122, the cache control unit 123, the RAID control unit 124, and the switching control unit 125 is implemented by, for example, the processor 101 included in the CM 100a executing a predetermined application program.
The host communication unit 121 executes communication processing with the host servers 400a and 400b and with the management terminal 500. For example, the host communication unit 121 receives an I/O request from the host server 400a or 400b, and transmits a response to the I/O request to the host server 400a or 400b.
The resource control unit 122 determines the CM in charge of the logical volume that is the target of the I/O request received by the host communication unit 121 with reference to the CM in charge management information 113. In the case where the CM in charge is its own CM (here, the CM 100a), the resource control unit 122 passes the I/O request to the cache control unit 123 in its own CM. Meanwhile, in the case where the CM in charge is another CM, the resource control unit 122 transfers the I/O request to that CM. Furthermore, when receiving the I/O request transferred from the resource control unit of another CM, the resource control unit 122 passes the I/O request to the cache control unit 123 in its own CM.
The cache control unit 123 executes the I/O processing in accordance with the I/O request, using the primary cache 111 and the secondary cache 311.
The RAID control unit 124 controls the I/O processing for the disk array 200a and the I/O processing for the flash module 300a, using RAID. For example, when receiving a request to write data in the logical volume to the disk array 200a from the cache control unit 123, the RAID control unit 124 writes the data such that the data is made redundant in the plurality of HDDs in the disk array 200a. Furthermore, when receiving a data write request to the secondary cache 311 from the cache control unit 123, the RAID control unit 124 writes the data such that the data is made redundant in a plurality of flash memories in the flash module 300a.
Note that a RAID level for such I/O control is arbitrarily set for each of the disk array 200a and the flash module 300a. Furthermore, these RAID levels may be individually set for each logical volume.
The switching control unit 125 controls the switching processing of the CM in charge.
In the example of
Note that both the cache areas CA1 and CA2 actually include each area of the primary cache and the secondary cache. Furthermore, both the logical volumes LV1 and LV2 are implemented using a plurality of HDDs included in the disk array 200a or the disk array 200b, and the data is redundantly stored in the plurality of HDDs by RAID.
Meanwhile, the host servers 400a and 400b can use a plurality of access paths when accessing a certain logical volume. As a result, even if one access path is blocked due to an abnormality or the like, the I/O processing with the logical volume can be continued via another access path.
In the example of
For example, in
For example, it is assumed that the CM 100a is requested to write data D1 to the logical volume. In this case, the cache control unit 123 of the CM 100a writes the data D1 to the primary cache 111. At the same time, to avoid data loss due to a malfunction of the CM 100a, the cache control unit 123 transfers the data D1 to a predetermined backup destination CM (here, the CM 100b). As a result, the data D1 is also written to the RAM 101 of the CM 100b, and the data D1 is duplicated. When these processes are completed, the cache control unit 123 returns a response to the host server as the write request source.
Furthermore, in the case where a free space of the primary cache 111 is not sufficient when writing data to the primary cache 111, the cache control unit 123 moves data having the earliest final access time among data in the primary cache 111 to the secondary cache 311. In the example of
Note that, cases where data is written to the secondary cache 311 include a case where data hits the secondary cache 311 for the write request from the host server in addition to the case where data is expelled from the primary cache 111 as described above.
By the way, the write of data to the primary cache 111 and the secondary cache 311 is managed using the cache management information 112 stored in the RAM 102. When data is written to the primary cache 111 or the secondary cache 311, management data related to the data is registered in the cache management information 112. This management data includes a logical volume number indicating the data write destination, a logical block address (LBA) on the logical volume, and a storage destination address in the cache area. In the case of writing data to the primary cache 111, a memory address on the RAM 102 is registered as the storage destination address, for example. In the case of writing data to the secondary cache 311, an address in the logical storage area (RAID volume) implemented by controlling a plurality of flash memories on the flash module 300a by RAID is registered as the storage destination address, for example.
Furthermore, when the management data is newly registered in the cache management information 112, the management data is transferred to the backup destination CM 100b and stored in the RAM 102 of the CM 100b. Furthermore, when the management data in the cache management information 112 is updated, the corresponding management data stored in the backup destination CM 100b is also updated. In this way, at least the management data corresponding to the dirty data on the cache is duplicated.
In the example of
Furthermore, when the data D2 moves from the primary cache 111 to the secondary cache 311, the storage destination address in the cache area, of the management data M2 corresponding to the data D2, is updated. At the same time, the updated management data M2 is transferred to the CM 100b, and the management data M2 stored in the RAM 102 of the CM 100b is updated with the updated management data M2. Thereby, the management data M2 is duplicated.
As in the example of this management data M2, the management data related to the secondary cache 311 is stored in the RAM in the CM, not in the flash module in which the area of the secondary cache 311 is secured. Thereby, the speed of read and write of the management data can be improved, and as a result, the speed of the I/O processing using the primary cache 111 and the secondary cache 311 can be increased.
A record for each cache page (for each cache page of the secondary cache 311 in the example of
Furthermore, in the cache management information 112, page management information 112-2 is registered for each cache page ID (that is, for each cache page). In the page management information 112-2, physical position information of the cache page and data attribute indicating an attribute of data stored in the cache page are registered. In
Here, the record number of the hash table 112-1 is a hash key based on data write destination information in the logical volume. For example, when write of data to the logical volume is requested, the cache control unit 123 calculates the hash key on the basis of the volume number of the logical volume and a first logical address of the write destination range in the logical volume. In the case where the same record number as the calculated hash key is not present in the hash table 112-1 (in the case of a cache miss), the cache control unit 123 registers a new record in the hash table 112-1 and registers the hash key as the record number. Furthermore, the cache control unit 123 acquires the cache page ID of a free cache page, registers the cache page ID in the record, and registers the data attribute indicating dirty data to the page management information 112-2 corresponding to the acquired cache page ID.
Note that the management data M2 illustrated in
Next, an I/O processing procedure for the logical volume will be described with reference to the flowcharts of
[step S11] The host communication unit 121 of the CM 100a receives the readout request from the logical volume from the host server and passes the readout request to the resource control unit 122. When determining that the CM in charge of the readout source logical volume is the CM 100a on the basis of the CM in charge management information 113, the resource control unit 122 passes the readout request to the cache control unit 123.
Note that, for example, in the case where another CM receives the readout request, the resource control unit 122 of that CM determines that the CM in charge is the CM 100a on the basis of the CM in charge management information 113, and transfers the readout request to the CM 100a. In the CM 100a, the resource control unit 122 receives the transferred readout request and passes the readout request to the cache control unit 123.
[step S12] The cache control unit 123 refers to the hash table for the primary cache 111 included in the cache management information 112, and determines whether the data in the readout source range in the logical volume is present in the primary cache 111. In the case where the record in which the hash key calculated on the basis of the volume number and the readout source address of the readout source logical volume is registered as the record number is registered in the hash table, the data in the readout source range is determined to be present in the primary cache 111 (primary cache hit). In the case where the data in the readout source range is present in the primary cache 111, the processing proceeds to step S16, or in the case where the data is not present, the processing proceeds to step S13.
[step S13] The cache control unit 123 refers to the hash table for the secondary cache 311 included in the cache management information 112, and determines whether the data in the readout source range in the logical volume is present in the secondary cache 311. In the case where the record in which the hash key calculated on the basis of the volume number and the readout source address of the readout source logical volume is registered as the record number is registered in the hash table, the data in the readout source range is determined to be present in the secondary cache 311 (secondary cache hit). In the case where the data in the readout source range is present in the secondary cache 311, the processing proceeds to step S14, or in the case where the data is not present, the processing proceeds to step S15.
[step S14] The cache control unit 123 reads the data in the readout source range from the secondary cache 311 and copies the data to the primary cache 111. At this time, the cache control unit 123 transfers the read data to the backup destination CM and duplicates the data in the RAM 101. Furthermore, the cache control unit 123 updates the management data corresponding to the copy destination cache page among the management data included in the cache management information 112, and transfers the updated management data to the backup destination CM and duplicates the updated management data in the RAM 101.
[step S15] The cache control unit 123 reads the data in the readout source range from the HDD in the disk array 200a and copies the data to the primary cache 111. At this time, the cache control unit 123 transfers the read data to the backup destination CM and duplicates the data in the RAM 101. Furthermore, the cache control unit 123 updates the management data corresponding to the copy destination cache page among the management data included in the cache management information 112, and transfers the updated management data to the backup destination CM and duplicates the updated management data in the RAM 101.
Note that, in steps S14 and S15, in the case where the free space of the primary cache 111 is insufficient, the data stored in the cache page having the earliest final access time among the cache pages on the primary cache 111 is expelled to the secondary cache 311. Then, the data read from the secondary cache 311 or the HDD is stored in the cache page.
[step S16] The cache control unit 123 reads the data requested to be read from the primary cache 111. Under the control of the resource control unit 122, the read data is transferred to the host server via the host communication unit 121 in the CM that has received the readout request.
[step S21] The host communication unit 121 of the CM 100a receives the write request and write data for the logical volume from the host server and passes them to the resource control unit 122. When determining that the CM in charge of the write destination logical volume is the CM 100a on the basis of the CM in charge management information 113, the resource control unit 122 passes the write request and the write data to the cache control unit 123.
Note that, for example, in the case where another CM receives the write request and write data, the resource control unit 122 of that CM determines that the CM in charge is the CM 100a on the basis of the CM in charge management information 113, and transfers the write request and write data to the CM 100a. In the CM 100a, the resource control unit 122 receives the transferred readout request and write data, and passes the transferred readout request and write data to the cache control unit 123.
[step S22] The cache control unit 123 refers to the hash table for the primary cache 111 included in the cache management information 112, and determines whether the data in the write destination range in the logical volume is present in the primary cache 111. In the case where the record in which the hash key calculated on the basis of the volume number and the write destination address of the write destination logical volume is registered as the record number is registered in the hash table, the data in the write destination range is determined to be present in the primary cache 111 (primary cache hit). In the case where the data in the write destination range is present in the primary cache 111, the processing proceeds to step S23, or in the case where the data is not present, the processing proceeds to step S24.
[step S23] The cache control unit 123 overwrites the data in the write destination range stored in the primary cache 111 with the write data. At this time, the cache control unit 123 transfers the write data to the backup destination CM and overwrites the original data in the write destination range duplicated in the RAM 101.
[step S24] The cache control unit 123 refers to the hash table for the secondary cache 311 included in the cache management information 112, and determines whether the data in the write destination range in the logical volume is present in the secondary cache 311, In the case where the record in which the hash key calculated on the basis of the volume number and the write destination address of the write destination logical volume is registered as the record number is registered in the hash table, the data in the write destination range is determined to be present in the secondary cache 311 (secondary cache hit). In the case where the data in the write destination range is present in the secondary cache 311, the processing proceeds to step S25, or in the case where the data is not present, the processing proceeds to step S26.
[step S25] The cache control unit 123 overwrites the data in the write destination range stored in the secondary cache 311 with the write data.
[step S26] The cache control unit 123 writes the write data to the primary cache 111, transfers the write data to the backup destination CM, and duplicates the write data in the RAM 101. Furthermore, the cache control unit 123 newly registers the management data corresponding to the cache page of the data write destination in the cache management information 112, transfers the management data to the backup destination CM, and duplicates the management data in the RAM 101.
[step S27] The cache control unit 123 requests the resource control unit 122 to perform write completion response processing. By the processing of the resource control unit 122, a write completion response is transmitted to the host server via the host communication unit 121 in the CM that has received the write request.
Note that the data written in the secondary cache 311 according to the procedures illustrated in
Next, the switching processing for the CM in charge for the logical volume will be described.
In the storage system according to the present embodiment, the CM in charge of the logical volume can be switched to any other CM. For example, in the case where a processing load becomes excessive in the CM that is the CM in charge of a certain logical volume, the CM in charge can be switched to the CM having the lowest processing load among the other CMs. Furthermore, as described above, the cache area in each CM and the backup destination CM of the management data are determined in advance, but the switching destination of the CM in charge can be selected regardless of whether the selected CM is the backup destination CM or not.
Here, a comparative example of the switching processing for the CM in charge is illustrated in
[step S31] The management terminal 500 transmits the switching instruction for the CM in charge of a certain logical volume to the CM 100a. Here, as an example, it is assumed that the CM in charge of the logical volume LV1 is instructed to be switched from the CM 100a to the CM 100c. The host communication unit 121 of the CM 100a receives the switching instruction and passes the switching instruction to the switching control unit 125.
[step S32] The switching control unit 125 instructs the cache control unit 123 to write back the dirty data of the primary cache 111 and the secondary cache 311. In response to this instruction, the cache control unit 123 writes back the dirty data of the primary cache 111 and the secondary cache 311 to the corresponding HDD of the disk array 200a.
[step S33] When the write back of all dirty data is completed in step S33, the switching control unit 125 causes the cache control unit 123 to stop the I/O processing for the logical volume LV1.
[step S34] The switching control unit 125 instructs deletion of all the data stored in the primary cache 111 and the secondary cache 311. In response to this instruction, the cache control unit 123 deletes all the data stored in the primary cache 111 and the secondary cache 311.
[step S35] When all the corresponding data is deleted in step S34, the switching control unit 125 executes processing of switching the CM in charge of the logical volume LV1 to the CM 100c. Specifically, the switching control unit 125 updates the CM in charge management information 113 such that the CM in charge of the logical volume LV1 indicates the CM 100c. Furthermore, the switching control unit 125 notifies the other CMs that the CM in charge of the logical volume LV1 is switched to the CM 100c to update the CM in charge management information 113 of each CM.
When this step S35 is executed, the switching destination CM 100c restarts the I/O processing for the logical volume LV1. At this time, the cache control unit 123 of the CM 100c can control the I/O processing for the logical volume LV1, using the primary cache secured in the RAM 102 provided in the CM 100c and the secondary cache secured in the flash module connected to the CM 100c.
[step S36] The switching control unit 125 transmits the switching completion response of the CM in charge to the management terminal 500 via the host communication unit 121.
Note that, in the case where data write to the logical volume LV1 is requested during the period from the start of step S31 to the completion of step S35, the cache control unit 123 of the switching source CM 100a directly writes the write data to the back-end storage area without writing the write data to the cache area, for example. Meanwhile, in the case where the cache hit is determined when the data readout from the logical volume LV1 is requested during this period, the cache control unit 123 can read the data from the cache area. However, to avoid data inconsistency, it is desirable that data is not moved or copied between the primary cache 111 and the secondary cache 311.
The switching processing for the CM in charge as illustrated in
Therefore, in the storage system according to the present embodiment, the following two methods, “switching processing A” and “switching processing B”, are used.
[step S41] The switching control unit 125 of the CM 100a causes the cache control unit 123 to stop the I/O processing for the logical volume LV1.
[step S42] The switching control unit 125 instructs the cache control unit 123 to write back the dirty data of the primary cache 111. In response to this instruction, the cache control unit 123 writes back the dirty data of the primary cache 111 to the corresponding HDD of the disk array 200a.
[step S43] The switching control unit 125 transfers the management data related to the secondary cache 311 of the management data included in the cache management information 112 to the switching destination CM 100c and copies the management data in the RAM 102 of the CM 100c. Specifically, the management data (hash table record and page management information) for the cache page in which the dirty data is stored among the cache pages of the secondary cache 311, is copied to the CM 100c. This management data is incorporated into the cache management information 112 to be referred to by the switching destination CM 100c in order to execute the I/O processing for the logical volume LV1.
Note that the processing of steps S42 and S43 may be executed in parallel. Then, when both pieces of the processing of steps S42 and S43 are completed, the processing of step S44 is executed.
[step S44] The switching control unit 125 transmits the switching completion response of the CM in charge to the management terminal 500 via the host communication unit 121. Then, the switching control unit 125 requests the switching destination CM 100c to start the I/O processing. As a result, the CM 100c restarts the I/O processing for the logical volume LV1.
Note that, for example, the management terminal 500 notifies the host server that the CM in charge of the logical volume LV1 has been switched. As a result, the host server can recognize the switched CM in charge for the logical volume LV1 and becomes able to directly transmit the I/O request to the CM in charge.
According to the above switching processing A, the switching processing is completed when the management data of the secondary cache 311 is copied to the switching destination CM 100c instead of not executing the write back of the secondary cache 311. Therefore, the time spent from the switching instruction to the switching completion response can be shortened.
Meanwhile, the switching destination CM 100c starts the I/O processing for the logical volume LV1 when the processing of
In this way, in the switching processing A, the switching processing is completed only by copying the management data for the switching destination CM to access the switching source secondary cache during the I/O processing from the switching source CM to the switching destination CM. As a result, the time from the switching instruction to the response is shortened.
As described above, when the switching processing A illustrated in
For example, it is assumed that the host server transmits the readout request for the data from the logical volume LV1 and the CM 100c receives the readout request (step S51). Then, it is assumed that the cache control unit 123 of the CM 100c determines that the primary cache has been missed but the secondary cache has been hit on the basis of the cache management information 112 stored by the CM 100c (step S52). That is, it is assumed that the hash key based on the data readout position information matches the record number of any record in the secondary cache hash table in the cache management information 112.
Here, it is assumed that the data requested to be read is determined to be stored in the flash module 300b (stored in the switching destination secondary cache) connected to the CM 100c on the basis of the page management information corresponding to the record (step S53: Yes). In this case, the cache control unit 123 of the CM 100c reads the data requested to be read from the secondary cache after switching secured in the flash module 300b connected to the CM 100c. The read data is transmitted from the host communication unit 121 of the CM 100c to the host server, whereby the response processing is executed (step S54). Actually, the read data is copied to the primary cache of the CM 100c and then transmitted to the host server.
Meanwhile, it is assumed that the data requested to be read is stored in the flash module 300a (stored in the switching source secondary cache) connected to another CM (CM 100a in this case) (step S53: No). This corresponds to the case where the record in which the same record number as the hash key is registered in step S52 is copied from the switching source CM 100a in step S43 in
In this case, the cache control unit 123 of the CM 100c transmits the flash number and the flash address registered in the page management information corresponding to the record to the switching source CM 100a, and requests readout of data from a location indicated by the transmitted information (step S55). The cache control unit 123 of the CM 100a reads the data from the corresponding location in the flash module 300a, that is, the corresponding location in the switching source secondary cache, and returns the data to the CM 100c (step S56).
The cache control unit 123 of the CM 100c acquires the returned data. This data is transmitted from the host communication unit 121 of the CM 100c to the host server, whereby the response processing is executed (step S57). Actually, the read data is copied to the primary cache of the CM 100c and then transmitted to the host server.
In this way, the switching destination CM 100c can acquire the data that has not been written back and remains in the switching source secondary cache, using the management data of the secondary cache copied by the switching processing A, and transmit the data to the readout request source.
Note that the switching destination CM 100c may control the I/O processing without using the secondary cache using the flash module 300b connected to the CM 100c, for example. In this case, regarding hit determination of the secondary cache, only whether the switching source secondary cache has been hit is determined. By such processing, cache control can be simplified.
Furthermore, the following processing is executed for the write request. For example, in the case where the switching source secondary cache is hit for the write request, the switching destination CM 100c stores the write data to the switching destination primary cache and updates the management data copied from the switching source CM by the switching processing A. At the same time, the CM 100c notifies the switching source CM 100a of the address information on the logical volume LV1 regarding the write data.
As will be described below, the switching source CM 100a writes back the dirty data on the switching source secondary cache in the background after the switching processing A is completed. The switching source CM 100a excludes the corresponding dirty data from the write back target on the basis of the write destination address information notified from the switching destination CM 100a to avoid the write back. Alternatively, the switching source CM 100a immediately writes back the corresponding dirty data on the basis of the notified write data address information. By such processing, occurrence of data inconsistency can be avoided.
Note that, in the examples of
Next,
[step S61] The switching control unit 125 of the CM 100a causes the cache control unit 123 to stop the I/O processing for the logical volume LV1.
[step S62] The switching control unit 125 instructs the cache control unit 123 to write back the dirty data of the primary cache 111. In response to this instruction, the cache control unit 123 writes back the dirty data of the primary cache 111 to the corresponding HDD of the disk array 200a.
[step S63] The switching control unit 125 transmits a CM number indicating the CM 100a as a management CM number of the secondary cache for the logical volume LV1 to the switching destination CM 100c, and causes the CM 100c to record the CM number. In the CM 100c, the transmitted management CM number is recorded in, for example, the RAM 102.
Note that the pieces of processing of steps S62 and S63 may be executed in parallel. Then, when both pieces of the processing of steps S62 and S63 are completed, the processing of step S64 is executed.
[step S64] The switching control unit 125 transmits the switching completion response of the CM in charge to the management terminal 500 via the host communication unit 121. Then, the switching control unit 125 requests the switching destination CM 100c to start the I/O processing. As a result, the CM 100c restarts the I/O processing for the logical volume LV1.
In the above switching processing B, the switching processing is completed by transmitting and recording the management CM number of the secondary cache to the switching destination CM. Therefore, the time from the switching instruction to the response can be shortened as compared with the comparative example illustrated in
Here, the management CM number transmitted recorded in step S63 will be described with reference to
For example, as illustrated in the volume numbers “0” and “2” in
Note that
As described above, when the switching processing As illustrated
For example, it is assumed that the host server transmits the readout request for the data from the logical volume LV1 and the CM 100c receives the readout request (step S71). Furthermore, it is assumed that the cache control unit 123 of the CM 100c determines that the primary cache is not hit on the basis of the cache management information 112 held by the CM 100c. Then, the cache control unit 123 of the CM 100c then refers to the CM in charge management information 113 in the CM 100c, and acquires the management CM number of the secondary cache corresponding to the readout source logical volume.
Here, it is assumed that the CM indicated by the acquired management CM number is another CM (switching source CM 100a) (step S72). In this case, the cache control unit 123 of the CM 100c requests the switching source CM 100a to determine the secondary cache hit (step S73). At this time, the readout position information in the logical volume LV1 is specified for the CM 100a.
The cache control unit 123 of the CM 100a refers to the cache management information 112 held by the CM 100a and determines whether the secondary cache is hit (step S74). Here, it is assumed that the hash key based on the specified readout position information matches the record number of any record in the secondary cache hash table in the cache management information 112, and is determined as the secondary cache hit. In this case, the cache control unit 123 of the CM 100a reads the data requested to be read from the switching source secondary cache after switching secured in the flash module 300a, and returns the data to the CM 100c (step S75).
The cache control unit 123 of the CM 100c acquires the returned data. This data is transmitted from the host communication unit 121 of the CM 100c to the host server, whereby the response processing is executed (step S76). Actually, the read data is copied to the primary cache of the CM 100c and then transmitted to the host server.
Note that, in the case where a secondary cache miss is determined in step S74, the fact of the secondary cache miss is notified to the switching destination CM 100c. The cache control unit 123 of the CM 100c reads the data requested to be read from the back-end storage area, copies the data to the primary cache in the CM 100c, and then transmits the data to the host server. In the example of
Alternatively, in the case where the secondary cache miss is determined in step S74, data may be read from the disk array 200a by the cache control unit 123 of the switching source CM 100a. In this case, the read data is transferred to the CM 100c, and the cache control unit 123 of the CM 100a copies the data to the primary cache in the CM 100c and then transmits the data to the host server.
Note that the following processing is executed for the write request. For example, in the case where the primary cache is not hit for the write request, the switching destination CM 100c notifies the switching source CM 100a of the write destination address information. The switching source CM 100a determines whether the secondary cache is hit on the basis of the notified address information. In the case where the secondary cache is hit, the CM 100a excludes the corresponding data on the secondary cache from the write back target and notifies the switching destination CM 100c of permission to write data. Meanwhile, in the case where the secondary cache is not hit, the CM 100a notifies the switching destination CM 100c of permission to write data. The CM 100c that has received the permission notification stores the data requested to be written to the primary cache in the CM 100c, and responds to the write request.
Here, in the switching processing A illustrated in
Meanwhile, in the I/O processing after switching, in the case where the primary cache is not hit, determination of the secondary cache hit is requested to the switching source CM. As illustrated in
As described above, since both the switching processing A and switching processing B have advantages and disadvantages, in the present embodiment, when the switching of the CM in charge is instructed, either the switching processing A or the switching processing B is adaptively selected and executed. Specifically, in the case where the time during which the I/O processing stops is expected to exceed a predetermined determination threshold value when it is assumed that the switching processing A is executed, the switching processing B is executed. As a result, the stop time of the I/O processing due to switching can be suppressed.
Furthermore, when the switching processing B is completed and the I/O processing at the switching destination CM is started, the switching source CM sequentially writes back the dirty data remaining in the secondary cache. Then, as the dirty data in the secondary cache decreases and the data amount of management data to be transferred to the switching destination CM decreases, the expected time during which the I/O processing stops becomes the above-described determination threshold value or less, the switching processing A is executed instead of the switching processing B. This improves the performance of the I/O processing by the switching destination CM.
Here, which method is used to execute the switching processing is determined by, for example, whether a condition of the following equation (1) is satisfied. In the case where the condition of the equation (1) is satisfied, the switching processing B is executed, or in the case where the condition of the equation (1) is not satisfied, the switching processing A is executed.
(The data amount of management data to be transferred to the switching destination CM)/(inter-CM throughput)>permissible stop time of the I/O processing (1)
The permissible stop time on the right side in the equation (1) corresponds to the above-described determination threshold value The data amount of management data in the equation (1) is calculated from the data amount of dirty data remaining in the secondary cache, the number of cache pages in which the data attribute indicates the dirty data among the cache pages of the secondary cache, or the number of pieces of management data corresponding to the cache page Furthermore, the throughput and permissible stop time in the equation (1) are set to predetermined values. Among the values, the permissible stop time may be arbitrarily set as a time permissible as a response time from transmission of the I/O request (for example, the readout request) to reception of a response by the host server, for example. For example, a method of setting the permissible stop time as timeout time or a shorter time than the timeout time of the host server at the time of transmitting the I/O request is conceivable. Furthermore, for example, the permissible stop time may be set as a value within a general maximum response time in a storage device such as an HDD.
[step S81] When the switching instruction for switching the CM in charge of the logical volume LV1 from the CM 100a to the CM 100c is transmitted from the management terminal 500 to the CM 100a, the host communication unit 121 of the CM 100a receives the switching instruction and passes the switching instruction to the switching control unit 125.
[step S82] The switching control unit 125 refers to the cache management information 112 held by the CM 100a, and counts the number of cache pages having the data attribute indicating dirty data among the cache pages of the secondary cache. The switching control unit 125 converts the data amount of management data in the above equation (1) from the counted value, and determines whether the condition of the equation (1) is satisfied on the basis of the converted value, and the predetermined inter-CM throughput and permissible stop time of the I/O processing. In the case where the condition is satisfied, the processing proceeds to step S83. On the other hand, in the case where the condition is not satisfied, the processing proceeds to step S87 and the switching processing A is executed.
[step S83] The switching processing B illustrated in
[step S84] The switching control unit 125 selects one cache page that stores the dirty data among the cache pages on the secondary cache on the basis of the cache management information 112 held by the CM 100a. The switching control unit 125 specifies the ID of the selected cache page to the cache control unit 123, and instructs the cache control unit 123 to write back the data in the cache page. In response to this instruction, the cache control unit 123 writes back the corresponding data in the secondary cache to the corresponding HDD of the disk array 200a.
[step S85] The cache control unit 123 initializes the management data corresponding to the cache page written back in step S84 among the management data of the cache management information 112. In this initialization, for example, the data attribute in the page management information may be updated to indicate clean data, or the corresponding page management information and the corresponding record on the hash table may be deleted from the cache management information 112.
[step S86] The switching control unit 125 refers to the cache management information 112 held by the CM 100a again, and counts the number of cache pages having the data attribute indicating dirty data among the cache pages of the secondary cache. The switching control unit 125 converts the data amount of the management data in the equation (1) from the counted value, and determines whether the condition of the equation (1) is satisfied using this value. In the case where the condition is satisfied, the processing proceeds to step S84 and one cache page storing dirty data is selected. In the case where the condition is not satisfied, the processing proceeds to step S87.
[step S87] The switching processing A illustrated in
[step S88] The switching control unit 125 refers to the cache management information 112 held by the CM 100a, and determines whether the dirty data remains in the secondary cache. In the case where the dirty data remains, the processing proceeds to step S89, or in the case where no dirty data remains, the processing proceeds to step S91.
[step S89] The cache page in which the dirty data is stored is selected from the secondary cache by a similar processing procedure to step S84, and this dirty data is written back to the HDD.
[step S90] The management data corresponding to the cache page to which the write back has been performed is initialized by a similar processing procedure to step S85. After that, the processing proceeds to step S88, and the presence or absence of dirty data in the secondary cache is determined.
[step S91] The switching control unit 125 notifies the switching destination CM 100c that the write back from the secondary cache has been completed. When receiving the notification, the CM 100c starts normal I/O control using the secondary cache (secondary cache after switching) secured in the disk array 200b connected to the CM 100c in addition to the primary cache in the CM 100c. As a result, for the secondary cache, the I/O processing is controlled using only the secondary cache after switching without using the switching source secondary cache. Furthermore, in the case where the secondary cache after switching is not used in step S87, use of the secondary cache after switching is started in step S91.
Note that, in the case where the cache management information 112 for the logical volume LV1 remains in the RAM 102 of the CM 100a, the switching control unit 125 of the switching source CM 100a deletes the cache management information 112.
In the above-described second embodiment, the response time from the switching instruction to the switching completion of the CM in charge can be shortened as compared with the comparative example illustrated in
Moreover, after the switching processing B is executed, the switching processing A is executed at the stage where the expected stop time of the I/O processing in the switching processing A becomes a permissible value or less with the progress of write back in the switching source secondary cache. As a result, the response performance of the I/O processing in the switching destination CM can be improved.
Note that the processing functions of the devices (for example, the storage control devices 10 and 20, the CMs 100a to 100d, the host servers 400a and 400b, the management terminal 500) illustrated in each of the above embodiments can be implemented by a computer. In that case, a program describing the processing content of the functions to be held by each device is provided, and the above processing functions are implemented on the computer by execution of the program on the computer. The program describing the processing content can be recorded on a computer-readable recording medium. The computer-readable recording medium includes a magnetic storage device, an optical disk, a semiconductor memory, or the like, The magnetic storage device includes a hard disk drive (HDD), a magnetic tape, or the like. The optical disk includes a compact disk (CD), a digital versatile disk (DVD), a Blu-ray disk (BD, registered trademark), or the like.
In a case where the program is to be distributed, for example, portable recording media such as DVDs and CDs, in which the program is recorded, are sold. Furthermore, it is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.
The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device of the computer and executes processing according to the program. Note that, the computer can also read the program directly from the portable recording medium and execute processing according to the program. Furthermore, the computer can also sequentially execute processing according to the received program each time when the program is transferred from the server computer connected via the network.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2021-004255 | Jan 2021 | JP | national |