The present invention relates to error checking of the data read from a storage device.
When data is stored in disc media, such as HDD (Hard Disk Drive), a storage apparatus, generally uses LA (Logical Address)/LRC (Longitudinal Redundancy Check) as error detecting code (for example, refer to PTL1). Specifically, for example, at the time of data writing, the storage apparatus adds LA/LRC to data, and transmits a set of data and the LA/LRC to HDD. Thereby, the set of data and the LA/LRC is stored in HDD. And at the time of data reading, the storage apparatus reads the set of data and the LA/LRC from HDD, and performs error checking of data using the LA/LRC.
For example, although new data is transmitted to HDD from the storage controller at the time of writing in which old data of HDD is updated to new data, a case where new data is not stored in HDD by failure of HDD may happen.
In such a case, the old data remains without updated in the area of HDD to which the new data to be written. This old data is stored in HDD in sets with corresponding LA/LRC to this old data.
Therefore, in case the data of this area is read afterward, the set of old data and corresponding LA/LRC to old data is read, and a failure (henceforth non-written failure) which old data is not updated to new data, cannot be detected by using LA/LRC.
Moreover, although an application demand of cheap high-capacity drive is increasing with a data volume increase in recent years, there is a case which a cheap high-capacity drive, for example may not support the sector of size required in order to give LA/LRC to data, and there is a difficulty for using LA/LRC and also a problem that a failure rate of incidence is high.
A storage apparatus has a first storage device in which user data is stored, a second storage device in which management information containing a primary hash value corresponding to a data management unit including user data is stored for every data management unit, and a controller which is coupled to the first and the second storage device.
The controller (A) receives a read request of the read target user data from the upper level apparatus, acquires the primary hash value of the first management unit which is the data management unit containing the read target user data from the second storage device, (B) reads the data of the first management unit from the first storage device, (C) computes the primary hash value based on the data of the first management unit, (D) determines whether the primary hash value in the (A) and the primary hash value in (C) are in agreement, and sends the read target user data contained in the first management unit to the upper level apparatus, when the primary hash value in (A) and the primary hash value in (C) are in agreement.
A non-written failure is detectable. Moreover, even if there is not error detecting code, a failure of read target data is detectable.
Some examples are described with reference to drawings. In addition, examples described below do not limit the invention according to the claims and all the elements or the combinations of those explained in the examples are not necessarily indispensable for the solution means of the invention.
In addition, although various information of the present invention may be explained using expressions such as “aaa table” etc in the following explanations, the various information may be expressed using a data structure other than a table. Therefore, in order to show that it is not dependent on the data structures, “aaa table” etc may be called “aaa information” etc.
First, the outline of the Example 1 is explained.
When the storage apparatus 103 receives a read command (RD command) from the host server (henceforth, host) 101 ((A) of
Subsequently, the storage apparatus 103 reads a primary hash value corresponding to the physical chunk number acquired from the primary hash table 420 in LDEV 313 to a cache memory 119A ((D),(E) of
Moreover, the storage apparatus 103, reads data of a corresponding chunk from LDEV 313 to the cache memory 119A based on the physical chunk number and the data storing address of the chunk in which the read target data is stored ((F),(G) of
Subsequently, the storage apparatus 103 computes a primary hash value corresponding to this chunk based on the read data of the chunk ((H) of
As a result, if the primary hash value computed from the data of the chunk and the primary hash value acquired from the primary hash table 420 correspond, it means that failure(s) (data corruption, non-written failure, etc.) have not occurred to the data of the chunk read from LDEV 313, so the storage apparatus 103 takes out the read target data in the chunk read to the cache memory 119 A and transmits the read target data to the host 101 (
Next, the storage system according to Example 1 is explained in detail.
The storage system 201 has one or more storage apparatus 103 and a maintenance terminal 131 coupled to one or more storage apparatus 103. The maintenance terminal 131 is coupled to one or more storage apparatus 103 via the communication network 181, for example. The maintenance terminal 131 may exist for every storage apparatus 103. The maintenance terminal 131 communicates with a client 132. The client 132 is one or more computer, and has a man machine interface device (a display device and an input device as an example) as an I/O device, for example.
The storage apparatus 103 has one or more PDEV 105, and a controller coupled to one or more PDEV 105, for example. The controller, for example has a back end I/F (communication interface device) coupled to one or more PDEV 105, a front end I/F coupled to the host 101, a storage recourse, and one or more MPPK (microprocessor package) 121 that are coupled to those elements. According to the configuration example shown in
The PDEV 105 is a physical storage device (for example, a hard disk drive or a flash memory) which stores user data. The user data is data which the host 101 stores in the storage apparatus 103. In this example, the user data is managed considering the chunk which is a certain size as a unit. The size of the chunk may be as arbitrary sizes, such as 512 B, 4 KB, 1 MB, and 1 GB, for example.
The CHA 111, receives an I/O command (a write command (a write request) or a read command (a read request)) which has I/O destination information (read destination information or write destination information) from the host 101, and forwards the received I/O command to one of the multiple MPPKs 121. The I/O destination information, for example includes a logical volume ID (for example, LUN (Logical Unit Number)) and an area address (for example, LBA (Logical Block Address)) on the logical volume.
The DKA 113 reads data from the PDEV 105, and writes the data in the cache memory (CM) in the CMPK 119, and the DKA 113 reads data from the cache memory in the CMPK119, and writes the data in the PDEV (for example, a PDEV which is a basis of logical volume of write destination of data) 105.
The MPPK121 is a device which has multiple MP(s) (microprocessor: henceforth, the processor). The processor processes the I/O command from the CHA 111.
Each element of the CHA 111, the DKA 113, the CMPK 119, and the MPPK 121 can communicate with other elements of the CHA 111, the DKA 113, the CMPK 119, and the MPPK 121 through the SW 117.
The CMPK 119 contains one or more cache memory. One or more cache memory is a volatile memory, such as DRAM, for example. In addition, generally the cache memory of the CMPK 119 has a small capacity compared with the PDEV 105. One or more cache memory may have a storage area (henceforth, a shared area) which stores management information which the processor refers to, other than a storage area (henceforth, a cache area) which stores temporarily the data which is outputted/inputted from/to the PDEV 105. Here, reading data from the PDEV 105 to the cache memory is called staging, and writing data in the PDEV 105 from the cache memory is called destage.
Maintenance terminal 131 can communicate with the MPPK 121 at least out of the CHA 111, the DKA 113, the CMPK 119, and the MPPK 121. The maintenance terminal 131 can collect information from the CHA 111, the DKA 113, the CMPK 119, and the MPPK 121, and can store the collected information, for example. Moreover, the maintenance terminal 131 can send a request according to the directions from the client 132 to the MPPK 121 of the storage apparatus 103.
A RAID (Redundant Array of Independent (or Inexpensive) Disks) group 311 is configured by multiple PDEVs 105. This RAID group 311 is a RAID group of the RAID level from which the redundancy of data is secured. That is, even if there is a case where a failure occurs in the PDEV 105 below a predetermined number, a desired data can be acquired based on the data of the PDEV(s) 105 in which failure has not occurred. Here, based on the data of the PDEV(s) 105 in which a failure has not occurred, acquiring (creating) a desired data is called a collection read.
Based on the storage area of the RAID group 311, A LDEV 313 which are one or more logical storage device is formed. One logical volume may be one LDEV 313 or may be the LDEV group with which multiple LDEVs 313 are coupled.
The cache memory 119 A of the CMPK 119 stores a secondary hash table 410 and a part of a primary hash table 420.
The primary hash table 420 manages a hash value (primary hash value) computed by predetermined hash algorithm based on the data of each chunk. Here, the hash algorithm which computes the primary hash value from the data of the chunk, and the length of the primary hash value are arbitrary. Here, the chunk which has the same data has the same hash value, so the group of chunks having the same hash value is the group of chunks which may have the same data. In addition, when the length of the primary hash value is short, chunks which have the same hash value increase, and chunks which become the target to compare each bit on a deduplication process mentioned later increase, and the throughput increases. For this reason, it is necessary to make the size of the primary hash value into certain sizes, such as 20 Bytes, 32 Bytes, and 64 Bytes, for example. The capacity of the primary hash table 420 which manages the primary hash value for each chunk, becomes a large scale comparatively, so the primary hash table 320 is stored in the LDEV 313, and the partial entries of the primary hash table 420 are read to the cache memory 119A in this example.
The secondary hash table 410 is a table in which a hash value (secondary hash value) computed based on the primary hash value is used as an index, and is a table for accessing the primary hash value efficiently corresponding to the secondary hash value. Here, a hash algorithm for computing the secondary hash value from the primary hash value, and the length of the secondary hash value are arbitrary. In this example, the secondary hash value is comparatively a short size (for example, 2 Bytes, 4 Bytes) compared with the primary hash value. In this example, the whole secondary hash table 410 is managed to reside in the cache memory 119A.
In this example, the LDEV 313 which stores user data, and the LDEV 313 which stores the management data (management information) of the primary hash table 420, etc, are the LDEV configured by the RAID group of another PDEV 105. The LDEV 313 which manages management data stores the primary hash table 420 and the data storing table 430. In addition, the primary hash table 420 and the data storing table 430 may be stored in another LDEV 313.
The data storing table 430 is a table which manages the data storing address of the LDEV 313 in which the data of the chunk is stored. The data storing table 430 is a table in which the logical chunk # of a chunk (logiccal chunk #) is used as an index. In this embodiment, regarding the chunks which store the same data, assigning the same data storing address prevents the data from overlapping and stored in the LDEV 313. In addition, since the data storing table 430 needs to store the entries for logical chunks and becomes a large scale comparatively, it is stored in the LDEV 313 in this example.
The secondary hash table 410 stores an entry (secondary hash table entry) which includes fields of a index 411, a entry number 412, a primary hash A physical chunk number 413, a primary hash A link list head physical chunk number 414, a primary hash B physical chunk number 415, a primary hash B link list head physical chunk number 416, a primary hash (others) physical chunk number 417, and a primary hash (others) link list head physical chunk number 418.
A secondary hash value computed based on a primary hash value is stored in the index 411. A total number of the physical chunks corresponding to the primary hash value by which the secondary hash value of the index 411 is computed is stored in the entry number 412. A number of the physical chunks corresponding to a certain primary hash value (it is considered as the primary hash A) by which the secondary hash value of the index 411 is computed is stored in the primary hash A physical chunk number 413. In addition, the primary hash A is a different primary hash value for every entry. A physical chunk number of the entry corresponding to the chunk of the head of the link list (primary hash A link list) which consists of the entries of the primary hash table 420 corresponding to the multiple chunks corresponding to the primary hash A is stored in the primary hash A link list head physical chunk number 414. A number of the physical chunks corresponding to a certain different primary hash value (it is considered as the primary hash B) from the primary hash A by which the secondary hash value of the index 411 is computed is stored in the primary hash B physical chunk number 415. In addition, the primary hash B is a different primary hash value for every entry. A physical chunk number of the entry corresponding to the chunk of the head of the link list (primary hash B link list) which consists of entries corresponding to the multiple chunks corresponding to the primary hash B is stored in the primary hash B link list head physical chunk number 416. A number of the physical chunks corresponding to one or more different primary hash value (it is considered as primary hash (others)) from the primary hash A and the primary hash B by which the secondary hash value of the index 411 is computed is stored in the primary hash (others) physical chunk number 417. In addition, the primary hash (others) is a different primary hash value for every entry. A physical chunk number corresponding to the entry of the chunk of the head of the link list which consists of entries of the multiple chunks corresponding to the primary hash (others) is stored in the primary hash (others) link list head physical chunk number 418. In this example, one entry is divided into three, the primary hash A, the primary hash B, and other primary hash value, and the fields corresponding to those are prepared, so even if there is a case where a primary hash value being as the same secondary hash value becomes three or more, the prepared fields can respond appropriately. Thereby, capacity required for the secondary hash table 410 can be reduced.
The primary hash table 420 is a table which manages the hash value (primary hash value) computed by the predetermined hash algorithm based on the data of the chunk including the user data. The primary hash table 420 stores an entry (primary hash table entry: management element) including fields of a index 421, a primary hash value 422, a data storing address 423, a referenced number 424, a pre-physical chunk number (#) 425, and a next physical chunk number (#) 426.
A number (physical chunk number) of a physical chunk is stored in the index 421. A primary hash value computed based on the data of the chunk of the physical chunk number of the index 421 is stored in the primary hash value 422. An address (data storing address) of the LDEV 313 in which the data of the chunk corresponding to the physical chunk number of the index 421 is stored in the data storing address 423. A number of the logical chunks (namely, the logical chunks which store the same data) which is referring to the data of the physical chunk of the physical chunk number is stored in the referenced number 424. A number of the physical chunk which is a physical chunk used as the same primary hash value, and serves as a turn in front of the link list is stored in the pre-physical chunk number 425. A number of the physical chunk which is a physical chunk used as the same primary hash value, and serves as the next turn of the link list is stored in the next physical chunk number 426.
The data storing table 430 stores an entry (data storing table entry) including fields of an index 431, a data storing address 432, and a physical chunk number (#) 433.
A number (logical chunk number) of the logical chunk is stored in the index 431. A data storing address in which the data of the chunk corresponding to the logical chunk number of the index 431 is stored, is stored in the data storing address 432. Aphysical chunk number of the chunk corresponding to the logical chunk number of the index 431 is stored in the physical chunk number 433. In this data storing table 430, the logical chunk number of a different logical chunk which stores the same data is associated with the same physical chunk number, and is managed.
An example of implementation of the storage apparatus 103 is given here, and the size etc, of the primary hash table 420, the secondary hash table 410, and the data storing table 430 shown in
On the storage apparatus 103, the size of the chunk is set to 8 KB, size of the primary hash value is set to 20 Bytes, and the secondary hash size is set to 2 Bytes, and the data storing address is set to 8 Bytes. Here, when the total capacity of the actual physical storage device of the storage apparatus 103 is set to 1 PB, the number of the physical chunks is 1.25×1011. Moreover, when the logical total capacity at the time of expecting the deduplication of the chunk is set to 10 PB, the number of logical chunks is 1.25×1012.
In this case, the size of the secondary hash table 410 is 48 Bytes (size of one entry)×65536=3 MB, and is a size storable in the cache memory 119A. On the other hand, the primary hash table 420 is 56 Bytes (size of one entry)×1.25×1011=7 TB, and is a size which is too large for storing in the cache memory 119A in this example. Moreover, the data storing table 430 is 18 Bytes (size of one entry)×1.25×1012=20 TB, and in order to make it store in the cache memory 119A, it is too large size. Then, in this example, the primary hash table 420 and the data storing table 430 are stored in the LDEV 313 as described above.
Next, an operation by the storage apparatus 103 according to Example 1 is explained.
First, the write command process at the time of receiving a write command (WR command) from the host 101 is explained with reference to
(Step S11) The processor of the MPPK 121 of the storage apparatus 103 receives a write command from the host 101 ((1) of
(Step S12) The processor computes a logical chunk number corresponding from the I/O destination information ((2) of
(Step S13) The processor computes a primary hash value based on the write target data, and stores it in the cache memory 119A ((3) of
(Step S14) The processor computes a secondary hash value from the computed primary hash value ((4) of
(Step S15) The processor makes the computed secondary hash value as an index, acquires the entry (process target entry) corresponding to the index with reference to the secondary hash table 410 ((5) of
(Step S16) The processor determines whether an unsettled physical chunk number which is not performing the following process is stored or not in the acquired entry. When the unsettled physical chunk number is not associated (Step S16: No), the processor advances the process to the Step S28. On the other hand when the unsettled physical chunk number is associated (Step S16: Yes), the processor acquires the physical chunk number (for example, physical chunk number, etc of the primary hash A link list head physical address number 414) of the entry and requires an entry corresponding to the physical chunk number of the primary hash table 420 ((6) of
(Step S17) The processor acquires an entry of the primary hash table 420 corresponding to the required physical chunk number. In addition, when the entry is not in the cache memory 119A, staging of the one or more entries (chunk) including the entry of the primary hash table 420 corresponding to the physical chunk number required from the LDEV 313 is carried out ((7) of
(Step S18) The processor computes a secondary hash value from the primary hash value of the primary hash value 422 of the entry corresponding to the physical chunk number acquired in (7) (
(Step S19) The processor performs a primary hash table consistency confirmation process (refer to
(Step S20) The processor compares the primary hash value which is computed in (3), with the primary hash value of the primary hash value 422 of the entry of the primary hash table 420 with consistency ((10) of
(Step S21) The processor determines whether the physical chunk number is stored in the next physical chunk number 426 of the process target entry of the primary hash table 420, and when the physical chunk number is stored (Step S21: Yes), the processor performs the process after the Step S17 about the physical chunk number, and when the physical chunk number is not stored (Step S21: No), it advances the process to the Step S16.
(Step S22) The processor acquires a data storing destination address from the data storing address 423 of the process target entry of the primary hash table 420, and acquires user data stored at the data storing destination address. In addition, when the data corresponding to this data storing destination address is not stored in the cache memory 119A, staging of the data (user data) of the data storing destination address of the LDEV 313 is carried out (
(Step S23) The processor computes a primary hash value from the user data acquired in (11) ((12) of
(Step S24) The processor performs a user data consistency confirmation process (refer to
(Step S25) The processor compares each byte of the write target data with the user data with consistency acquired in the Step 24, and determines whether these data correspond or not ((14) of
(Step S26) The processor performs a physical chunk number reference number update process (refer to
(Step S27) The processor performs the deduplication process (
(Step S28) The processor performs the process (Steps S29 through S32) which registers the write target data newly ((17) of
(Step S29) The processor secures a physical chunk assigned to the logical chunk of the write target data. That is, the processor determines the physical chunk number of the physical chunk to assign.
(Step S30) The processor updates the primary hash table 420 and the secondary hash table 410. Specifically, on the entry corresponding to the physical chunk number of the physical chunk to which the primary hash table 420 is assigned, the processor stores the primary hash value of the write target data in the primary hash value 422, and stores the data storing address of the physical chunk assigned to the data storing address 423, and stores 1 in the referenced number 424. Moreover, on the entry of the secondary hash table 410 corresponding to the secondary hash value corresponding to the primary hash value of the write target data, the processor stores the physical chunk number of this write target data in the head physical chunk number of the next link list of the link list managed in the entry. For example, when the primary hash A link list head physical chunk number 414 of the entry of the secondary hash table 410 is Null, the processor stores the physical chunk number of the write target data there. Moreover, when the primary hash A link list head physical chunk number 414 of the entry of the secondary hash table 410 is not Null, and the primary hash B link list head physical chunk number 416 is Null, the processor stores the physical chunk number of the write target data in the primary hash B link list head physical chunk number 416.
(Step S31) The processor updates the data storing table 430. Specifically, the processor stores the data storing address of the write target data in the data storing address 432 on the entry corresponding to the logical chunk number of the write target data of the data storing table 430, and stores the physical chunk number of the physical chunk assigned to the physical chunk number 433.
(Step S32) The processor stores the write target data in the storage destination of the LDEV 313 which the data storing address of the assigned physical chunk shows. Here, since the LDEV 313 in which the primary hash table 420 is written differs from the LDEV 313 in which the write target data is written, there is almost no possibility that a failure of which neither a writing of the primary hash table 420 in the LDEV 313 nor a writing of the write target data in the LDEV 313 is performed, that is, a non-written failure in both sides occurs. Therefore, even if there is a case where a non-written failure occurs to one side, about the other side, the probability which a non-written failure has not occurred is very high.
Therefore, in the case where a non-written failure occurs only in one side, the primary hash value computed based on the write target data differs from the corresponding primary hash value stored in the primary hash table 420, it is detectable that a failure has occurred. In addition, also about the case where a data corruption occurs, the primary hash value computed based on the write target data differs from the corresponding primary hash value stored in the primary hash table 420, it is detectable that a failure has occurred.
The primary hash table consistency confirmation process corresponds to the process of the Step S19 ((9) of
(Step S41) The processor compares the secondary hash value computed in (8) with the secondary hash value computed in (4). As a result, when both sides are in agreement (Step S41: accordance), it is thought that a failure has not occurred in the entry of the acquired primary hash table 420, so the processor ends the process. On the other hand, when both sides are not in agreement (Step S41: discordance), it is thought that a failure (data corruption, non-written failure) has occurred in the entry of the primary hash table 420, the processor advances the process to the Step S42.
(Step S42) The processor performs a collection read about the data including the entry of the primary hash table 420 acquired in (7). Therefore, data without failure is acquirable.
According to this primary hash table consistency confirmation process, an entry of the primary hash table 420 with consistency is acquirable.
The user data consistency confirmation process corresponds to the process of the Step S24 ((13) of
(Step S51) The processor compares the primary hash value computed in (12) with the primary hash value computed in (3). As a result, when both sides are in agreement (Step S51: accordance), it is thought that a failure has not occurred to the acquired user data, so the processor ends the process. On the other hand, when both sides are not in agreement (Step S51: discordance), it is thought that a failure (data corruption, non-written failure, etc.) has occurred to the user data, so the processor advances the process to the Step S52.
(Step S52) The processor performs a collection read about the user data. Therefore, user data without failure is acquirable.
According to this user data consistency confirmation process, user data with consistency is acquirable.
The physical chunk number reference number update process corresponds to the process of the Step S26 ((15) of
(Step S61) The processor acquires an entry of the data storing table 430 corresponding to the logical chunk number computed in (2). Here, when the entry corresponding to the cache memory 119A does not exist, the processor carries out staging of the entry of the corresponding data storing table 430.
(Step S62) The processor determines whether the physical chunk number 433 of the acquired entry is NULL or not. As a result, when the physical chunk number 433 is NULL (Step S62: Yes), the processor advances process to the Step S72, and on the other hand, when the physical chunk number 433 is not NULL (Step S62: No), it advances the process to the Step S63.
(Step S63) The processor carries out staging of the entry of the primary hash table 420 corresponding to the physical chunk number of the physical chunk number 433.
(Step S64) The processor compares the data storing address of the entry of the data storing table 430 acquired in the Step S61 with the data storing address of the entry of the primary hash table 420 acquired in the Step S63. As a result, when the data storing addresses are in agreement (Step S64: accordance), it is thought that a data corruption has not occurred in the entry of the data storing table 430 and the primary hash table 420, so the processor advances the process to the Step S69. On the other hand, when the data storing addresses are not in agreement (Step S64: discordance), a data corruption may have occurred in the entry of the data storing table 430, so the processor advances the process to Step S65.
(Step S65) The processor performs a collection read about the corresponding entry of the data storing table 430. As a result, if there is a case where a data corruption occurs in the entry of the data storing table 430, the data in which the data corruption is canceled is obtained.
(Step S66) The processor carries out staging of the entry of the primary hash table 420 corresponding to the physical chunk number of the physical chunk number 433 of the entry of the data storing table 430 read in the Step S65.
(Step S67) The processor compares the data storing address of the entry of the data storing table 430 read in the Step S65 with the data storing address of the entry of the primary hash table 420 acquired in the Step S66. As a result, when the data storing addresses are in agreement with each (Step S67: accordance), it is thought that the data corruption has not occurred in the entry of the data storing table 430, so the processor advances the process to the Step S69. On the other hand, when the data storing addresses are not in agreement with each (Step S67: discordance), it is thought that a data corruption has occurred in the entry of the primary hash table 420, so the processor advances the process to the Step S68.
(Step S68) The processor performs a collection read about the corresponding entry of the primary hash table 420. As a result, the entry of the primary hash table 420 without a data corruption is acquirable.
(Step S69) The processor subtracts 1 from the referenced number of the referenced number 424 of the entry of the primary hash table 420.
(Step S70) The processor determines whether the referenced number of the referenced number 424 of this entry is 0 or not. As a result, when the referenced number of the referenced number 424 of the entry is 0 (Step S70: Yes), the processor advances the process to the Step S71. On the other hand, when the stored number of the referenced number 424 of the entry is not 0 (Step S70: No), the processor advances the process to the Step S72.
(Step S71) The processor performs a process to delete from the link list which the target entry is managed. Specifically, the processor couples the entry of the primary hash table 420 corresponding to the physical chunk number of the pre-physical chunk number 425 of the target entry to the entry of the primary hash table 420 corresponding to the physical chunk number of the next physical chunk number 426 of the target entry. That is, the processor stores the physical chunk number of the next physical chunk number 426 of the target entry in the next physical chunk number 426 of the entry corresponding to the physical chunk number of the pre-physical chunk number 425 of the target entry, and stores the physical chunk number of the pre-physical chunk number 425 of the target entry in the pre-physical chunk number of the entry corresponding to the physical chunk number of the next physical chunk number of the target entry.
(Step S72) The processor adds 1 to the referenced number of the referenced number 424 of the entry of the primary hash table 420 corresponding to the physical chunk number of the physical chunk of the user data determined as in agreement in (14), and ends the process.
Next, the read command process at the time of receiving a read command (RD command) from the host 101 is explained with reference to
(Step S81) The processor of the MPPK 121 of the storage apparatus 103 receives a read command from the host 101 ((1) of
(Step S82) The processor computes a logical chunk number corresponding from the I/O destination information ((2) of
(Step S83) Based on the computed logical chunk number, the processor accesses the data storing table 430, in order to acquire an entry corresponding to this logical chunk number ((3) of
(Step S84) The processor carries out staging of the entry of the data storing table 430 obtained by the access of the Step S83 ((4) of
(Step S85) The processor accesses the primary hash table 420, in order to acquire a corresponding entry using the physical chunk number of the physical chunk number 433 of the entry of the data storing table 430 corresponding to the logical chunk number computed in (2) ((5) of
(Step S86) The processor carries out staging of the entry of the primary hash table 420 obtained by the access of the Step S85 ((6) of
(Step S87) The processor compares the data storing address of the entry of the data storing table 430 staged in the Step S84 with the data storing address of the entry of the primary hash table 420 staged in the Step S86 ((7) of
(Step S88) The processor performs a collection read about the corresponding entry of the data storing table 430 corresponds. As a result, if there is a case where a data corruption has occurred in the entry of the data storing table 430, the entry that the data corruption is canceled is obtained.
(Step S89) The processor carries out staging of the entry of the primary hash table 420 corresponding to the physical chunk number of the physical chunk number 433 of the entry of the data storing table 430 acquired in the Step S88.
(Step S90) The processor compares the data storing address of the entry of the data storing table 430 acquired in the Step S88 with the data storing address of the entry of the primary hash table 420 acquired in the Step S89 ((8) of
(Step S91) The processor performs a collection read about the corresponding entry of the primary hash table 420. As a result, the entry of the primary hash table 420 without a data corruption is acquirable. Therefore, the data storing address of this entry is in agreement with the data storing address of the entry of the data storing table 430 acquired in the Step S88.
By the process of the above step S86 through Step S91, the entry of the primary hash table 420 with consistency and the entry of the corresponding data storing table 430 are acquirable.
(Step S92) At the Step S86 through Step S91, by using the data storing address with consistency, the processor accesses the LDEV 313 in order to acquire the chunk including the user data (read target data) of the read target ((9) of
(Step S93) The processor carries out staging of the chunk including the read target data obtained by the access of the Step S92 ((10) of
(Step S94) The processor computes a primary hash value based on the chunk including the read target data which is staged ((11) of
(Step S95) The processor compares the computed primary hash value with the primary hash value of the entry of the primary hash table 420 with consistency ((12) of
(Step S96) The processor carries out a collection read of the chunk including the read target data. Thereby, a read target data without a data corruption is acquirable.
(Step S97) The processor sends the read target data of which a data corruption has not occurred to the host 101 of the sending source of the read command ((13) of
Next, the confirmation opportunity of a failure occurrence on the user data, the data storing table 430 and the primary hash table 420, and the recovery process from the occurred failure are explained.
On the read command process, the confirmation of whether a failure has occurred on the data storing table 430 or not is performed in the Step S87 ((7) of
And when the data storing addresses are not in agreement, that is, when a data corruption may have occurred in the entry of the data storing table 430, an entry without a data corruption is acquired by performing a collection read about this entry. As a result, when a failure has occurred on the data storing table 430, the entry of the data storing table 430 transitions to the normal state, as shown in the arrow 1 of
In the read command process, the confirmation of whether a failure has occurred in the primary hash table 420 or not, is performed in the Step S90 ((8) of
And when the data storing addresses are not in agreement, that is, when a data corruption has occurred in the entry of the primary hash table 420, an entry without a data corruption is acquired by performing the collection read about this entry. As a result, when a failure has occurred on the entry of the primary hash table 420, the entry of the primary hash table 420 transitions to the normal state, as shown in the arrow 2 of
In the read command process, the confirmation of whether a failure has occurred in the user data or not is performed in the Step S95 ((12) of
And when both of the primary hash values are not in agreement, that is, when a data corruption, etc, has occurred in the chunk of the user data, a user data without a data corruption is acquired by performing the collection read about the chunk including this user data. As a result, when a failure has occurred in the user data, the user data transitions to the normal state, as shown in the arrow 3 of
As shown in
In the write command process, the confirmation of whether a failure has occurred in the entry of the primary hash table 420 is performed in the Step S19 ((9) of
And when both secondary hash values are not in agreement, that is, when a data corruption may have occurred in the entry of the primary hash table 420, an entry without a data corruption is acquired by performing the collection read about this entry. As a result, when a failure has occurred in the entry of the primary hash table 420, the entry of the primary hash table 420 transitions to the normal state, as shown in the arrow 4 of
Since it is necessary in write command process to update the referenced number of the primary hash table 420 according to the update of a physical chunk number, it is necessary to secure consistency of the entry of the data storing table 430. The confirmation of whether a failure has occurred in the entry of this data storing table 430 or not is performed in the Step S26 ((15) of
And when the data storing addresses are not in agreement, that is, when a data corruption may have occurred in the entry of the data storing table 430, an entry without data corruption is acquired by performing the collection read about this entry. As a result, when a failure has occurred on the data storing table 430, the entry of the data storing table 430 transitions to the normal state, as shown in the arrow 5 of
In the write command process, the confirmation of whether a failure has occurred in the user data or not is performed in the Step S24 ((13) of
And when both primary hash values are not in agreement, that is, when a data corruption has occurred in the chunk of the user data, user data without a data corruption is acquired by performing the collection read about the chunk including this user data. As a result, when a failure has occurred in the user data, the user data transitions to the normal state, as shown in the arrow 6 of
As shown in
In Example 1, by the secondary hash table 410 and the primary hash table 420 as shown in
In Example 1, entries of the primary hash table 420 referred to one time write command process are entries surrounded by the dashed line of
If it is made to carry out staging of the entry of the primary hash table 420 stored in LDEV313 to the cache memory 119A each time when referring to each entry by write command process, the overhead by the process for it becomes too large.
Then, in Example 1, by assigning the entries of the primary hash table 420 which store the primary hash value which becomes the same secondary hash value, to the same physical chunk or physically a near physical chunk, the number of times of staging of entries referred to by the write command process is reduced.
In Example 1, the processor is made to store multiple entries which become the same secondary hash value in the same physical chunk as shown in
In Example 1, as shown in
(a) Let the higher several bytes (for example, 2 bytes) of the physical chunk number be a secondary hash value of user data stored newly.
(b) Let the next several bits (for example, 18 bits) of the higher several bites of (a) be the lower several bits of the primary hash value. Thus, by including the primary hash value in a part of the physical chunk number, the entries which are the same primary hash value can be consolidated to the same physical chunk or a near physical chunk.
(c) When it becomes impossible to store entries in one physical chunk, to store, use the lowest several bits (for example, 4 bits) by adding sequentially from 0.
(d) When all the physical chunks in which (a) and (b) are common are used, look for other physical chunks.
Next, the example at the time of assigning a physical chunk number according to the above-mentioned rule is explained.
Here, when a physical chunk number is set to 8 Bytes and the number of the physical chunks is made into 1.25×1011, it is 37 bits to actually be used as an index by the physical chunk number. Therefore, 0 is stored in 63-38 bits of the physical chunk number. The secondary hash value is stored in 37-22 bits of the physical chunk number. The lowest 18 bites of the primary hash value are stored in 21-4 bits of the physical chunk number. The value which shows any of the physical chunk prepared in order to store the entry corresponding to the user data in which the lowest 18 bits of the secondary hash value and the primary hash value are common is stored in 3-0 bit of the physical chunk number. In Example 1, for example, 16 physical chunks are prepared, and when it becomes impossible to store an entry in the physical chunk corresponding to 0, it is used so that it may be considered as the physical chunk of 1. Therefore, the entries corresponding to the user data being the same hash value and in which lowest 18 bits of the primary hash value are common can be consolidated and stored into 16 continuous physical chunks.
For example, when the new data is the primary hash value which is shown in
In this example, supposing data is equally assigned to all the secondary hash values, entries of the primary hash table 420 corresponding to the primary hash value being the same secondary hash value are storable in the continuous area within 96 MB. Therefore, on the write command process, when searching for the entry of the primary hash table 420, a possibility of having cash hit can be made comparatively high, and the number of times of staging of the required entries of the primary hash table 420 can be reduced.
According to the above-mentioned Example 1, for example, if one sector of a physical storage device is 520 bytes, data of 520 bytes which added LA/LRC of 8 bytes to the user data of 512 bytes is storable in each sector. However, in the physical storage device of SATA, since one sector is 512 bytes, when adding LA/LRC, it is mandatory to store data by the common multiple (for example, 33,280 bytes) unit of 520 bytes and 512 bytes. On the other hand, in Example 1, since a failure of the user data is detectable even if LA/LRC is not stored by adding to the user data, there are no such restrictions and the performance can be improved.
Next, the modification of Example 1 is explained.
In Example 1, when searching for the entry of the primary hash table 420 using one secondary hash value, as shown in
On the other hand, the data structure of the multiple entries of the primary hash table 420 may be a binary tree (for example, red black tree).
If the data structure of the multiple entries of the primary hash table 420 is made into a binary tree (for example, red black tree), the length of the link can be shortened as shown in
Next, the storage system according to Example 2 is explained.
The storage system according to Example 2 comprises FMCMPK 122 replacing with the CMPK 119.
The FMCMPK 122 is a storage device (flash memory device) which has one or more flash memories (FM) and a memory controller coupled to one or more flash memories. Typically, a memory controller includes a processor which performs a process. The flash memory, for example is a type of flash memory which is a larger unit (block unit) of data erase than a unit (page unit) of reading and writing of data, and cannot overwrite data. The flash memory is typically a NAND type flash memory. This kind of flash memory consists of multiple blocks, and each block consists of multiple pages. In Example 2, since the flash memory is used, a large capacity is cheaply securable compared with the cache memory configured from the DRAM of Example 1. Therefore, in Example 2, the whole primary hash table 420 is stored in the cache memory. Therefore, since it is not necessary to carry out staging from the PDEV 105, such as HDD, about the primary hash table 420, the process efficiency improves.
Moreover, the processor of the FMCMPK 122 may be made to perform a part of the process (for example, calculation of hash value and deduplication process, etc.) which the processor of the MPPK 121 has performed. In this way, the load of the processor of the MPPK 121 can be reduced and the process efficiency of the whole system can be improved.
In addition, the data storing table 430 may also be stored depending on the capacity of the cache memory of the FMCMPK 122, and in this way, since it is not necessary to carry out staging also about the data storing table 430, the process efficiency improves.
In addition, the unit of deduplication corresponds to the page unit of FM. Since FM can only perform read and write operations on a page-by-page basis, performing deduplication in units of pages is useful for eliminating waste and ensuring efficient use of capacity.
As mentioned above, although some examples are explained, this is the illustration for the explanation of the present invention, and is not the meaning which limits the range of the present invention only to these examples. That is, the present invention can be carried out with other various forms.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/007830 | 12/6/2012 | WO | 00 | 1/15/2014 |