Some storage systems have configurations that use fast solid-state disks (e.g., Flash drives), to store frequently-accessed data in order to increase access speed to such frequently-accessed data nominally stored on disk drives. FAST Cache, a technology made available by EMC Corp., is an example of such a configuration. The solid-state disks are typically placed in a path between a hard disk and a DRAM cache in the storage system. In this way, the storage system can also use the solid-state disks as a place to offload data from the DRAM cache with a reduced penalty in access time from moving such data to the hard disk.
While such configurations achieve high performance, Redundant Arrays of Independent Disks (RAID) are commonly used to provide high reliability access to large amounts of data storage. There are several types of RAID, ranging from simpler RAID 0 and RAID 1 (data mirroring) through more complex RAID 5 and RAID 6. RAID 5 encodes stripes of data across a plurality of disks with one disk (which rotates from stripe to stripe) storing a parity redundancy code for that stripe, which allows stored data to be recovered even in the event of a disk failure. This parity code involves performing a compound exclusive-or (XOR) operation on corresponding blocks on the different disks. RAID 6 employs a similar approach, but using two redundancy disks, allowing stored data to be recovered even in the event of two disk failures. There are several ways of calculating the values stored on the redundancy disks for RAID 6, such as even-odd parity (which involved storing row parity on one disk and diagonal parity on another disk) and Reed-Solomon encoding.
In some cases, the reliability of high performance storage systems such as those described above is improved with using a RAID configuration. While maximum distance separable codes such as Reed-Solomon codes found in RAID 6 are efficient in that they minimize the number of parity disks required for a given level of redundancy, encoding and decoding of such codes is typically too complex in high-performance storage systems such as FAST cache. Rather, for reliability, a conventional storage system employing FAST cache arranges the solid-state disks in a simpler RAID 1 array. In this way, the storage system maintains its high performance while improving reliability.
Unfortunately, there are deficiencies with the conventional storage system employing FAST cache. For example, RAID 1 results in a payload capacity that is only 50% of the physical disk space. Also, there is empirical evidence that the reliability of RAID 1 is not sufficient for systems employing such high-performance disks.
In contrast to the conventional storage system employing FAST cache which uses a relatively unreliable, high-cost redundancy scheme, an improved technique applies polar codes to storage data to improve the reliability of a storage system that uses high-performance solid-state disks as part of a RAID group for storing frequently-accessed data. Along these lines, a high-performance storage system having n solid-state disks assigns k of those disks as payload disks. The storage system partitions the payload data into a data vector that has k data symbols. The storage system then applies, to the k payload symbols, a (n, k) polar code with generator matrix derived from k rows of the ┌log2 n┐-times Kronecker product of the matrix
with itself. For the case of systematic encoding, generator matrix is reduced to the canonical form containing a k×k identity submatrix on some positions, which is used to produce n encoded symbols from k original ones, and stores each of the encoded payload symbols in a solid-state disk of the RAID group.
Advantageously, the improved technique involves a reduced complexity encoding method for the high-performance disks, while still using fewer parity disks than simple codes like RAID-1. Decoding and partial stripe update operations on the encoded data also have reduced complexity. The reduced number of parity disks involved in using polar codes stems from the fact that polar codes can achieve the theoretical capacity of a binary input output symmetric memoryless channel. By splitting such a channel into n subchannels, it can be shown that, when the capacity of the channel is k/n, k of those subchannels will be noise-free for large values of n. Thus, encoding with polar codes requires fewer extra subchannels, or in the case of storage, parity disks. Further, by using a systematic generalized concatenated code (GCC) formulation of polar codes, the encoding complexity is further reduced by approximately a factor of 2 (for the case of high-rate codes) compared to other encoding algorithms for polar codes.
One embodiment of the improved technique is directed to a method of reliably storing data within a storage system having high-performance solid-state disks arranged as part of a RAID group having n solid-state disks of which k solid-state disks are payload disks. The method includes partitioning the payload data into a data vector that includes k data symbols. The method further includes applying an (n, k) polar code generator matrix to the payload vector to produce a code vector that includes n encoded symbols, the (n, k) polar code generator matrix including exactly k rows of an n×n matrix that is derived from ┌log2 n┐-times Kronecker product of a 2×2 polar seed matrix with itself. The method further includes storing each of the n encoded symbols of the codeword in a solid-state disk of the RAID group.
Additionally, some embodiments of the improved technique are directed to an apparatus constructed and arranged to reliably store data within a storage system having high-performance solid-state disks arranged as part of a RAID group having n solid-state disks of which k solid-state disks are payload disks. The apparatus includes a network interface, memory, and a controller including controlling circuitry constructed and arranged to carry out the method of reliably storing data within a storage system having high-performance solid-state disks arranged as part of a RAID group having n solid-state disks of which k solid-state disks are payload disks.
Furthermore, some embodiments of the improved technique are directed to a computer program product having a non-transitory computer readable storage medium which stores code including a set of instructions to carry the method of reliably storing data within a storage system having high-performance solid-state disks arranged as part of a RAID group having n solid-state disks of which k solid-state disks are payload disks.
The foregoing and other objects, features and advantages will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying figures in which like reference characters refer to the same parts throughout the different views.
An improved technique applies polar codes to storage data to improve the reliability of a storage system that uses high-performance solid-state disks as part of a RAID group for storing frequently-accessed data. Along these lines, a RAID group having n solid-state disks assigns k of those disks as payload disks. The storage system partitions the payload data into a data vector that has k data symbols. The storage system then applies, to the k payload symbols, a (n, k) polar code generator matrix derived from k rows of the ┌log2 n┐-times Kronecker product of the matrix
with itself to produce n encoded symbols and stores each of the encoded payload symbols in a solid-state disk of the RAID group.
Advantageously, the improved technique involves a reduced complexity encoding for the high-performance disks, while still using fewer parity disks than simple codes like RAID-1. Decoding and partial stripe update operations on the encoded data also have reduced complexity. The reduced number of parity disks involved in using polar codes stems from the fact that polar codes can achieve the theoretical capacity of a binary input output symmetric memoryless channel. By splitting such a channel into n subchannels, it can be shown that, when the capacity of the channel is k/n, k of those subchannels will be noise-free for large values of n. In this case the number of parity disks, which is equal to the number of frozen subchannels, is minimal possible for reliable system with k payload disks. Further, by using a systematic encoding algorithm developed for generalized concatenated code (GCC) formulation of polar codes, the encoding complexity is reduced by approximately a factor of 2 (for the case of high-rate codes) compared to other encoding algorithms for polar codes.
Application server 12 is configured to store data in storage system 14. Application server 12 is a server system. In some arrangements, however, application server 12 is a desktop personal computer, a laptop personal computer, a tablet computer, a smart phone, or any other electronic device that is enabled to store data in storage system 14.
Storage system 14 is configured to store data 24 from application server 12 in disk array 20. Storage system 12 is further configured to store frequently-accessed data 26 in solid-state disk array 19 and apply polar codes for encoding and decoding data and updating partial stripes. Storage system 14 includes solid-state disk array 19, disk array 21, and DRAM cache 22.
Disk array 21 includes disks 20(1), 20(2), 20(3), . . . , 20(r). Each disk of disk array 21 is a hard disk drive such a magnetic disk drive, although other types of disks are possible (e.g., slow Flash, optical disk). In some arrangements, disk array 21 is a RAID.
Solid-state disk array 19 takes the form of a set of flash drives 18(1), . . . , 18(k−2), 18(k−1), 18(k), 18(k+1), 18(k+2), . . . , 18(n).
DRAM cache 22 is configured to provide very fast access to a small subset of data 24. DRAM cache 22 is also configured to send data to solid-state disk array 19 and/or disk array 21 when that data is no longer in sufficiently active use by storage system 14.
During operation, application server 12 sends data 24 to disk array 21 for storage. Over time, storage system recognizes data 26 that has been frequently accessed by application server 12. Storage system 14 then moves data 26 to solid-state disk array 19 for fast access.
It should be understood that storage system 14 splits data 26 into a set of k symbols to be stored among the n solid-state disks of array 19. Each symbol includes a set of characters chosen from a fixed alphabet which represents a portion of data 26. A code includes a lexicon of codewords of length n consisting of such characters, and as such is characterized by a minimum distance between distinct codewords of the lexicon. For the purposes of the discussion below, the alphabet of the codes are taken from the set {0,1}.
It should be understood that the disks within a storage system are characterized by failure probability. The linear transformation Oven by , where s represents s-times Kronecker product of a matrix,
and s=┌log2 n┐, induces n subchannels, which are characterized by substantially different erasure probabilities. Let A be the set of subchannels with the smallest erasure probability. The non-systematic generator matrix G′ of a polar code is obtained by taking the rows of with indices in A. By applying elementary row operations to matrix G′, it is possible to obtain a systematic generator matrix G for the same code, which contains an identity submatrix in columns given by A. Such systematic generator matrix is advantageous from the application point of view, since it enables one to partition the codeword into information symbols, which correspond to the payload data, and parity symbols, which are used to recover the payload data if some of the disks fail.
It should further be understood that the arrangement of the solid-state disks of array 19 are shown as in
and s=┌log2 n┐. The k rows are those rows having indices in A. For example, a (8,5) polar code generator matrix takes the form
In this case, the set A={1, 2, 3, 5, 7}.
Storage system 14 applies a k×n generator matrix in canonical form of polar code 34 such as
to a 1×k row vector of payload data 30 to obtain a set of n encoded symbols containing n−k 32 check symbols on the positions {1 . . . n}\A. Storage system 14 stores encoded symbols 30 and 32 in solid-state disk array 19.
It should be understood that the formulation for deriving encoded symbols 30 and 32 as described above does not result in the most efficient storage scheme. Details of alternative schemes for applying polar codes to data 26 are described below with respect to
Further details of storage system 14 are described below with respect to
Network interface 42 takes the form of an Ethernet card; in some arrangements, network interface 42 takes other forms including a wireless receiver and a token ring card.
Memory 46 is configured to store code 48 that contains instructions configured to cause the processor to carry out the improved technique. Memory 46 generally takes the form of, e.g., random access memory, flash memory or a non-volatile memory.
Processor 44 takes the form of, but is not limited to, Intel or AMD-based MPUs, and can include a single or multi-cores each running single or multiple threads. In some arrangements, processor 44 is one of multiple processors working together.
Further details of applying polar codes to encode data 26 are discussed below with respect to
A GCC representation of a (n, k) polar code having a generator matrix G involves decomposing a vector of the k payload data symbols 30 into v subvectors, each of length ki, 1≦i≦v, such that Σ1≦i≦v ki=k.
For example, consider a vector of data 30 arranged as illustrated in
It should be understood that there are several ways of splitting the vector of data 30 into subvectors. For example, processor 44 could arrange vector of data 30 into 2 rows, or 8 rows. If the number of rows is v=2l, then the decomposition is said to be of order l.
A GCC decomposition of order l of G′ follows from the identity =. That is, a GCC can be split into a set of (N,ki) outer codes i, 1≦i≦v and N=2s-l, and (v,v−i+1) nested inner codes i. The outer codes i and the inner codes i are each examples of polar codes.
Each outer code i operates on a subvector of data 30 within a row of the array of intermediate symbols 62 to produce the rest of the elements of that row. For an l=2 decomposition, the resulting array of intermediate symbols 62 has dimension 4×4. In the array of intermediate symbols, the first row has one element from the vector of data 30, so that the outer code generator matrix for that row needs to produce three additional entries. The outer code 1 in this case has 1×4 generator matrix, which contains as submatrix the following check symbols generator matrix B(1)=(1 1 1); this implies that the subsequent elements of the first row are equal to the first element, as illustrated in
outer code 3 has check symbols generator matrix
Note that, because the fourth row is filled with data, there is no outer code needed there.
It should be understood that the lengths of the subvectors, i.e. dimensions of outer codes ki, may not be arranged in ascending order after decomposition of polar code. In such a case, generator matrix for the first inner code may not form a lower-triangular matrix. Such a low triangular matrix is desirable in the case of systematic encoding, as will be seen with respect to
Once the processor fully forms the array of intermediate encoded symbols 62, processor 44 then applies a v×v transposed generator matrix of the first inner code 64 to the array of intermediate encoded symbols 62 to produce the n=Nv final encoded symbols 66 stored in a v×N array.
Inner code 1 is generated by a permutation matrix multiplied by so that the transposed generator matrix 64 V(1) of inner code 1 is upper triangular. In the case illustrated in
The result of multiplying the array 62 by V(1) 64 results in the array of final encoded symbols 66 illustrated in
It should be understood that the final encoded symbols 66 are different from the data symbols 30. In this case, a reading of the payload data may require more complexity than one in the case of systematic encoding. In a systematic code, the encoded symbols include the original data symbols in addition to some check symbols. Such a systematic encoding for polar codes is discussed below with respect to
Consider a representation of the ith outer code generator matrix in the canonical form (I B(i)), where I is the ki×ki identity matrix, and B(i) is a ki×(N−ki) check symbols generator matrix. By considering the construction of the array intermediate symbols and array of final encoded symbols from the above example, one can show that processor 44 may generate the check symbols of the array of final encoded symbols 70 according to the following expression:
ci,k
where ci,j is the element of the array of final encoded symbols 70 in the ith row and jth column, ci
It should be understood that Equation (*) represent a recursive set of expressions to be evaluated in the order specified.
For the above example illustrated in
c3,4=c3,1+c3,2+c3,3+c4,1+c4,2+c4,3−c4,4,
c2,3=c2,1+c2,2+c4,1+c4,2−c4,3,
c2,4=c2,2+c4,2−c4,4,
c1,2=c1,1+c2,1−c2,2+c3,1−c3,2+c4,1−c4,2,
c1,3=c1,1+c2,1−c2,3+c3,1−c3,3+c4,1−c4,3,
c1,4=c1,1+c2,1−c2,4+c3,1−c3,4+c4,1−c4,4.
With systematic encoding, processor 44 needs only to generate n−k check symbols 32 (see
Two decoding schemes, i.e. reconstruction of the payload data in the presence of erasures, which correspond to disk failures, are discussed below. The first scheme employs GCC representation of polar codes and the second scheme is based on Gaussian elimination
In step 80, the smallest row index h in encoded data matrix having an erasure is found. A new matrix
In step 82, an index i←h is set, and a loop over another index j←1 . . . N is initialized. A GCC h is constructed that includes a set of inner codes h,i and a set of outer) codes i. Inner code h,i has a generator matrix equal to Vi . . . v,h . . . v(1), while outer code i has a generator matrix as described above, where i=h . . . v.
In step 84, a decode operation is performed on columns of
In step 86, a decode operation is performed on intermediate symbols ai,j . . . N in outer code i to recover erased symbols. If this fails, then index h is decremented by one, the index i←h is set, and the loop over the index j←1 . . . N is reinitialized. If no further erasure is revealed, then the process proceeds to step 88.
In step 88, a new loop over the index j←h . . . i is initialized, with j being incremented at each step of the loop. The symbols are decremented according to cj,1 . . . N←cj,1 . . . N−ai,1 . . . N when Vj,i(1)=1.
In step 90, a decoding operation is performed on the adjusted codeword symbols in inner code h,i+1. Upon completion of this step, the original data 30 and 32 are recovered so long as there were no failures.
Gaussian elimination represents the maximum likelihood erasure decoding scheme for any linear code. Further details of such a scheme is described below with respect to
In step 102, a subset of non-erased symbols x{1 . . . n}\ε is formed.
In step 104, a subsets of erased and non-erased information symbols xA∩ε and xA\ξ are formed.
In step 106, a submatrix of generator matrix GA\ε,{1 . . . n}\ε is formed.
In step 108, a submatrix of generator matrix GA∩ε,{1 . . . n}\ε is formed.
In step 110, the following equation is formed and solved using Gaussian elimination for xA∩ε:
x{1 . . . n}\ε−xA\εGA\ε,{1 . . . n}\ε=xA∩εGA∩ε,{1 . . . n}\ε.
In this way, processor 44 can recover erased information symbols.
In some arrangements, processor 44 is configured to perform a partial stripe update operation. This involves changing information symbols within a codeword and corresponding check symbols. Details of partial stripe updating within the systematic encoding described above are described below with respect to
Within the systematic polar coding described above, there are two schemes for achieving such partial stripe updating: by using a generator matrix, and by encoding using GCC representation. To the first effects, in step 128, a generator submatrix is formed from rows of the (n, k) polar code generator matrix corresponding to indices of the set of information symbols to be updated. In step 130, the array of information symbols to be updated is multiplied by a generator submatrix. In step 132, an array of information symbols to be updated by Equation (*) is encoded; values of check symbols are read from disks, difference between these check symbols and values of check symbols from encoding are computed, and this difference is written on the disks.
The second scheme for partial stripe updating includes encoding of an array of information symbols to be updated using Equation (*). Computations are performed only for symbols, which depend on the information symbols, being updated.
The result of multiplication by generator submatrix or of encoding using Equation (*) is values of check symbols. The difference between them and old values of check symbols should be written to the disks with new values of information symbols being updated.
While various embodiments of the invention have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
For example, it should be understood that, while the examples described above were directed to arrays of solid-state disks, the improved technique applied to arrays of any other type of disk (e.g., magnetic) arrays.
Further, it should be understood that some embodiments are directed to storage system 14, which is constructed and arranged to reliably store data within a storage system having high-performance, solid-state disks arranged as part of a RAID group having n solid-state disks of which k solid-state disks are payload disks. Some embodiments are directed to a process of reliably storing data within a storage system having high-performance, solid-state disks arranged as part of a RAID group having n solid-state disks of which k solid-state disks are payload disks. Also, some embodiments are directed to a computer program product which enables computer logic to reliably store data within a storage system having high-performance, solid-state disks arranged as part of a RAID group having n solid-state disks of which k solid-state disks are payload disks.
In some arrangements, storage system 14 is implemented by a set of processors or other types of control/processing circuitry running software. In such arrangements, the software instructions can be delivered, within storage system 14, in the form of a computer program product 140 (see
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2012/003084 | 12/29/2012 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/102565 | 7/3/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6039158 | Fox | Mar 2000 | A |
6460158 | Baggen | Oct 2002 | B1 |
7219289 | Dickson | May 2007 | B2 |
8181090 | Corbett et al. | May 2012 | B1 |
8594217 | Fanous et al. | Nov 2013 | B2 |
20110310832 | Hammarwall et al. | Dec 2011 | A1 |
20130111291 | Ma | May 2013 | A1 |
20140208183 | Mandavifar | Jul 2014 | A1 |
20140380114 | Alexeev | Dec 2014 | A1 |
Number | Date | Country |
---|---|---|
20120054571 | May 2012 | KR |
Number | Date | Country | |
---|---|---|---|
20140331083 A1 | Nov 2014 | US |