The present application claims priority to German Patent Application No. 10 2016 015 167.6 entitled “A Channel and Source Coding Approach for the Binary Asymmetric Channel with Applications to MLC Flash Memories”, and filed Dec. 20, 2016; and German Patent Application No. 10 2017 130 591.2 entitled “Methods and Apparatus for Error Correction Coding based on Data Compression” and filed Dec. 19, 2017. The entirety of both of the aforementioned reference is incorporated herein by reference for all purposes.
Embodiments are generally related to the field of channel and source coding of data to be sent over a channel, such as a communication link or a data memory. Some specific embodiments are related to a method of encoding data for transmission over a channel, a corresponding decoding method, a coding device for performing one or both of these methods and a computer program comprising instructions to cause said coding device to perform one or both of said methods.
Flash memories are typically mechanical-shock-resistant non-volatile memories that offer fast read access times. Therefore, flash memories can be found in many devices that require high data reliability, e.g. in the fields of industrial robotics, and scientific and medical instrumentation. In a flash memory device, the information is stored in floating gates which can be charged and erased. These floating gates keep their electrical charge without a power supply. However, information may be read erroneously. The error probability depends on the storage density, the used flash technology (single-level cell (SLC), multi-level cell (MLC), or triple-level cell (TLC)) and on the number of program and erase cycles the device has already performed.
There exists a need in the art for enhanced methods and memory systems for data transfer and/or storage.
Embodiments are generally related to the field of channel and source coding of data to be sent over a channel, such as a communication link or a data memory. Some specific embodiments are related to a method of encoding data for transmission over a channel, a corresponding decoding method, a coding device for performing one or both of these methods and a computer program comprising instructions to cause said coding device to perform one or both of said methods.
This summary provides only a general outline of some embodiments of the invention. The phrases “in one embodiment,” “according to one embodiment,” “in various embodiments”, “in one or more embodiments”, “in particular embodiments” and the like generally mean the particular feature, structure, or characteristic following the phrase is included in at least one embodiment of the present invention, and may be included in more than one embodiment of the present invention. Importantly, such phrases do not necessarily refer to the same embodiment. Many other embodiments of the invention will become more fully apparent from the following detailed description, the appended claims and the accompanying drawings.
A further understanding of the various embodiments of the present invention may be realized by reference to the figures which are described in remaining portions of the specification. In the figures, like reference numerals are used throughout several figures to refer to similar components. In some instances, a sub-label consisting of a lower case letter is associated with a reference numeral to denote one of multiple similar components. When reference is made to a reference numeral without specification to an existing sub-label, it is intended to refer to all such multiple similar components.
Embodiments are generally related to the field of channel and source coding of data to be sent over a channel, such as a communication link or a data memory. In the latter case, “sending data over the channel” corresponds to writing, i.e. storing, data into the memory, and “receiving data from the channel” corresponds to reading data from the memory. In some embodiments, the data memory is non-volatile memory. In some particular instances of the aforementioned embodiments, such non-volatile memory is flash memory. Some specific embodiments are related to a method of encoding data for transmission over a channel, a corresponding decoding method, a coding device for performing one or both of these methods and a computer program comprising instructions to cause said coding device to perform one or both of said methods. It should be noted that while various embodiments discussed herein are described in the context of a memory, such as, for example, a flash memory, serving as the aforementioned channel, that the inventions presented are not limited to such channels. Rather, other implementations may also be used in connection with other forms of channels, such as wireline, wireless or optical communication links for data transmission.
The introduction of MLC and TLC technologies reduced the reliability of flash memories significantly compared to SLC flash (cf. [1]) (numbers in brackets refer to a respective document in the list of reference documents provided below). In order to ensure a reliable information storage, error correction coding (ECC) is required. For instance, Bose-Chaudhuri-Hocquenghem (BCH) codes (cf. [2]) are often used for error correction (cf. [1], [3], [4]). Moreover, concatenated coding schemes were proposed, e.g., product codes (cf.[5]), concatenated coding schemes based on trellis coded modulation and outer BCH or Reed-Solomon codes (cf. [6], [7], [8]), and generalized concatenated codes (cf. [9], [10]). With multi-level cell and triple-level cell technologies, the reliability of the bit levels and cells varies. Furthermore, asymmetric models are required to characterize the flash channel (cf. [11], [12], [13], [14]). Coding schemes were proposed that take these error characteristics into account (cf. [15], [16], [17], [18]).
On the other hand, data compression is less frequently applied for flash memories. Nevertheless, data compression can be an important ingredient in a non-volatile storage system that improves the system reliability. For instance, data compression can reduce an undesirable phenomenon called write amplification (WA) (cf. [19]). WA refers to the fact that the amount of data written to the flash memory is typically a multiple of the amount intended to be written. A flash memory must be erased before it can be rewritten. The granularity of the erase operation is typically much smaller than that of the write operation. Hence, the erase process results in rewriting of user data. WA shortens the life time of flash memories.
Some embodiments of the present inventions improve the reliability of sending data over a channel. In some cases, this improvement includes enhancing the reliability of storing into and reading data from a flash memory, such as a MLC and TLC flash memory, and thus also extend the lifetime of such flash memory.
Various embodiments of the present inventions provide methods of encoding data for transmission over a channel, such as a non-volatile memory. In some instances, the non-volatile memory is a flash memory. The method is performed by a coding device and comprises: (i) obtaining input data to be encoded; (ii) applying a predetermined data compression process to the input data to reduce redundancy, if any, to obtain compressed data; (iii) selecting a code from a predetermined set C={j, i=1 . . . N; N>1} of N error correction codes i, each having a length n being the same for all codes of the set C, a respective dimension ki and error correction capability ti, wherein the codes of the set C are nested such that for all i=1, . . . , N−1:i⊃i+1, ki>ki+1 and ti<ti+1; and (iv) obtaining encoded data by encoding the compressed data with the selected code. Therein, selecting the code comprises determining a code j with j ∈{1, . . . , N} from the set C as the selected code, such that kj≥m, wherein m is the number of symbols in the compressed data and m<n.
Of course, in the special case that the input data does not contain any redundancy which could be removed by performing said compression, the data resulting from applying said compression process may indeed be not compressed at all relative to the input data. Specifically, in this particular case, the data resulting from applying said compression process may even be identical to the input data. As used herein, the term “compressed data” shall, therefore, generally refer to the data resulting from applying said compression process to the input data, even if, for a specific selection of input data, no actual compression can be achieved therewith.
Application of the data compression process allows for a reduction of the amount of input data (e.g., user data), such that the redundancy of the error correction coding can be increased. In other words, at least a portion of the amount of data that is saved due to the compression is now used for additional redundancy, such as additional parity bits. This additional redundancy improves reliability of sending data over the channel, such as a data storage system. Moreover, data compression can be utilized to exploit the asymmetry of the channel.
Furthermore, the coding scheme uses a set C of two or more different codes, where the decoder can resolve which code was used. In the case of two codes, two nested codes 1 and 2 of length n and dimensions k1 and k2 are used, where nested means that 2is a subset of 1. The code 2has the smaller dimension k2<k1 and higher error correction capability t2>t1. If the data can be compressed such that the number of compressed bits is less or equal to k2, the code 2 is used to encode the compressed data, otherwise the data is encoded using 1. Particularly, an additional information bit in the header may be used to indicate whether the data was compressed. Because 2⊂1, the decoder for 1 may also be used to decode data encoded with 2 up to the error correction capability t1. Thus, if the actual number of errors is less or equal to t1 the decoder can successful decode. If the actual number of errors is greater than t1, it is assumed that the decoder for 1 fails. The failure can often be detected using algebraic decoding. Moreover, a failure can be detected based on error detection coding and based on the data compression scheme, because the number of data bits is known, the decoding fails if the number of reconstructed data bits is not consistent with the data block size. In cases where the decoding of 1 fails, the decoder may now continue the decoding using 2 which can correct up to t2 errors. In summary, for sufficiently redundant data, the decoder can thus correct up to t2 errors. In particular, in the case of a channel comprising flash memory, this allows for a significant improvement of the program/erase-cycling endurance and thus an extension of the lifetime of the flash memory
The example embodiments of a encoding method discussed herein can be arbitrarily combined with each other or with other aspects of the present invention, unless such combination is explicitly excluded or technically impossible.
In some embodiments, selecting the code comprises actively performing a selection process, e.g. according to a setting of one or more selectable configuration parameters, while in some other embodiments the selection of a particular code is already preconfigured in the coding device, e.g. as a default setting, such that no further active selection process is necessary. This pre-configuration approach is particularly useful in the case of N=2, where obviously there is only one choice for the code (1)=C1 of the initial iteration I=1 such that a second iteration I=2 remains possible, such that (2)=C2 ⊂C1. Also a combination of these two approaches is possible, e.g. a default configuration which may be adjusted by reconfiguring the one or more parameters.
In some embodiments, determining the selected code comprises selecting that code from the set C as the selected code Cj, which has the highest error correction capability tj=max {ti} among all codes in C for which ki≥m. This allows for an optimization of the additional reliability for the sending of data over the channel, such as a flash memory, which can be achieved by performing the method.
In some further embodiments, the channel is an asymmetric channel, such as—without limitation—a binary asymmetric channel (BAC), for which a first kind of data symbols, e.g. a binary “1”, exhibits a higher error probability than a second kind of data symbols, e.g. a binary “0”. In addition, obtaining encoded data comprises padding at least one symbol of a codeword of the encoded data, which symbol is not otherwise occupied by the applied code (e.g. by user data, header, parity), by setting it to be a symbol of the second kind. In fact, there are ki−m such symbols. The asymmetric channel may particularly comprise or be formed by a non-volatile memory, such as flash memory. The padding may thus be employed to reduce the probability of a decoding error by reducing the number of symbols of the first kind (e.g. binary “1”) in the codeword.
In some further embodiments, applying the compression process comprises sequentially applying a Burrows-Wheeler-transform (BWT), a Move-to-front-coding (MTF), and a fixed Huffman encoding (FHE), to the input data to obtain the compressed data. Therein, the fixed Huffman code to be applied in the FHE is derived from an estimate of the output distribution of the previous sequential application of both the BWT and the MTF to the input data. In particular, these embodiments may relate to a lossless source coding approach for short data blocks that uses a BWT as well as a combination of an MTF algorithm and Huffman coding. A similar coding scheme is for instance used in the bzip2 data compression approach [23]. However, bzip2 is intended to compress complete files. The controller unit for a flash memory operates on a block level with typical block sizes of 512 byte up to 4 kilobytes. Thus, the data compression has to compress small chunks of user data, because blocks might be read independently. In order to adapt the compression algorithm to small block sizes, according to these embodiments, the output distribution of the combined BWT and MTF algorithm is estimated and a fixed Huffman code is used instead of adaptive Huffman coding. Hence, storing or adaptation of code tables can be avoided.
Specifically, according to some related embodiments, the estimate of the output distribution P(l) of the previous sequential application of the BWT and the MTF to the input data is determined as follows:
wherein M is the number of symbols to be encoded by the FHE.
In some related embodiments, the parameters M and P(1) are selected as M=256 and 0.37≤P1≤0.5. A selection, may be specifically: M=256 and P1=0.4. These selections relate to particularly efficient implementations of the compression process and particularly allow for achieving a good degree of data compression
In some further embodiments, the set C={i, i=1 . . . N; N>1} of error correction codes i contains only two of such codes, i.e. N=2.This allows for a particularly simple and efficient implementation of the encoding method, since only two codes have to be stored and processed. This may result in one or more of the following advantages: a more compact implementation of the decoding algorithm, lower storage space requirements, and shorter decoding times.
One or more embodiments of the present inventions provide methods of decoding data, the method being performed by a decoding device, or—more generally—by a coding device (which may for example at the same time also be an encoding device). Such methods comprise obtaining encoded data, such as, for example, data being encoded according to the encoding method of the first aspect; and iteratively:
This decoding method is specifically based on the concept to use a set C of nested codes, as defined above. Accordingly, it is possible to use an initial code 1 for the initial iteration, that has a lower error correction capability t1 than the codes being selected for subsequent iterations. More generally, this applies for any two subsequent codes 1 and i+1. If the initial code 1 used in the initial iteration already leads to a successful decoding, the further iterations can be omitted. Furthermore, as any one of the codes 1 has a lower error correction capability ti than its subsequent code i+1the decoding efficiency of code i will generally be higher than that of code i+1. Accordingly, the less efficient higher code i+1 will only be used, if the decoding based on previous code i failed. As the codes are nested such that (I+1)⊂(I), (I+1)only comprises codewords which are also present in (I), and thus this iterative process becomes possible, which allows to not only improve the reliability of sending data over the channel but also to perform the related decoding in a particularly efficient manner, as the more demanding iteration steps of the decoding process only need to performed, if all previous less demanding iterations have failed to successfully decode the input data.
The example embodiments of a decoding method discussed herein can be arbitrarily combined with each other or with other aspects of the present invention, unless such combination is explicitly excluded or technically impossible.
In some embodiments, the verification process further comprises: if for the current iteration I a decoding failure was detected, determining, before proceeding with the next iteration, whether another code (I+1)⊂(I) exists in the set C, and if not, terminating the iteration and outputting an indication of a decoding failure. Accordingly, in this way a simple-to-test termination criterion for the iteration is defined, which can be easily implemented and which is efficient and ensures that a further iteration step is only initiated if a corresponding code is actually available.
In some further embodiments, detecting whether the decoding process of the current iteration I resulted in a decoding failure comprises one or more of the following: (i) algebraic decoding; (ii) determining, whether the number of data symbols in the reconstructed data of the current iteration is inconsistent with a known corresponding number of data symbols in the original data to be reconstructed by the decoding. Both of these approaches allow for an efficient detection of decoding failures. Specifically, approach (ii) is particularly adapted to decoding of data received from a channel comprising or being formed of an NVM, such as a flash memory, where data is stored in memory blocks of a predefined known size.
Like in the case of the encoding method of the first aspects, in some further embodiments, the set C={i, i=1 . . . N; N>1}of N error correction codes i, of error correction codes i contains only two of such codes, i.e. N=2. This allows for a particularly simple and efficient implementation of the method of decoding, as only two codes have to be stored and processed, which may correspond to one or more of the following advantages: a more compact implementation of the decoding algorithm, lower storage space requirements, and shorter decoding times.
Yet other embodiments of the present inventions provide coding devices, which may for example and without limitation specifically be a semiconductor device comprising a memory controller. The coding device is adapted to perform the encoding method of the first aspect and/or the decoding method of the second aspect of the present invention. In particular, the coding device may be adapted to perform the encoding method and/or the decoding method according to one or more related embodiments described herein.
In some cases, the coding devices include (i) one or more processors; (ii) memory; and (iii) one or more programs being stored in the memory, which when executed on the one or more processors cause the coding device to perform the encoding method of the first aspect and/or the decoding method of the second aspect of the present invention, for example—and without limitation—according to one or more related embodiments described herein.
Yet additional embodiments of the present inventions provide computer programs comprising instructions to cause a coding device, such as the coding device of the third aspect, to perform the encoding method of the first aspect and/or the decoding method of the second aspect of the present invention, for example—and without limitation—according to one or more related embodiments described herein.
The computer program product may in particular be implemented in the form of a data carrier on which one or more programs for performing said encoding and/or decoding method are stored. Preferably, this is a data carrier, such as an optical data carrier or a flash memory module. This may be advantageous, if the computer program product is meant to be traded as an individual product independent from the processor platform on which the one or more programs are to be executed. In another implementation, the computer program product is provided as a file on a data processing unit, in particular on a server, and can be downloaded via a data connection, e.g. the Internet or a dedicated data connection, such as a proprietary or local area network.
The memory controller 2 is also configured as a coding device and adapted to perform the encoding and decoding methods of the present invention, particularly as described below with reference to
As indicated by
The basic codeword format for an error correcting code with flash memories is illustrated in
For the applications in storage systems, the number of code bits n is fixed and cannot be adapted to the redundancy of the data. A basic idea of some embodiments of the coding scheme presented herein is to use the redundancy of the data in order to improve the reliability, i.e. reducing probability of a decoding error, by reducing the number n1 of ones (“1”), or more generally the kind of symbol for which the corresponding error probability is higher than for another symbol (or in the case of binary coding the other kind of symbol),in the codeword. In order to reduce n1, the redundant input data to be encoded is compressed and zero-padding is used,as illustrated in
The Burrows-Wheeler transform is a reversible block sorting transform [28]. It is a linear transform designed to improve the coherence in data. The transform operates on a block of symbols of length N to produce a permuted data sequence of the same length. In addition, a single integer i∈{1, . . . , K} is calculated which is required for the inverse transform. The transform writes all cyclic shifts of the input data into a K×K matrix. The rows of this matrix are sorted in lexicographic order. The output of the transform is the last column of the sorted matrix plus an index which indicates the position of the first input character in the output data. The output is easier to compress because it has many repeated characters due to the sorting of the matrix.
An adaptive data compression scheme has to estimate the probability distribution of the source symbols. The move-to-front algorithm (MTF), also introduced as recency rank calculator by Elias [29] and Willems [30], is an efficient method to adapt to the actual statistics of the user data. Similar to the BWT, the MTF algorithm is a transformation where a message symbol is mapped to an index. The index r is selected for the current source symbol if r different symbols occurred since the last appearance of the current source symbol. Later on, the integer r is encoded to a codeword from a finite set of codewords of different lengths. In order to keep track of the recency of the source symbols, the symbols are stored in a list ordered according to the occurrence of the symbols. Source symbols that occur frequently, remain close to the first position of the list, whereas more infrequent symbols will be shifted towards the end of the list. Consequently, the probability distribution of the output of an MTF tends to be a decreasing function of the index. The length of the list is determined by the number of possible input symbols. Here, for the purpose of illustration, a byte wise processing is used, hence a list with M=256 entries is used.
The final step SE5 of the compression scheme is a Huffman encoding [31], wherein a variable-length prefix code is used to encode the output values of the MTF algorithm. This encoding is a simple mapping from a binary input code of fixed length to a binary variable-length code. However, the optimal prefix code should be adapted to the output distribution of the previous encoding stages. For example, the known bzip2 algorithm, which also uses Huffman encoding, stores to that purpose a coding table with each encoded file. For the encoding of short data blocks, however, the overhead for such a table would be too costly. Therefore, in contrast to the bzip2 algorithm, the present encoding method uses a fixed Huffman code which is derived from an estimate of the output distribution of the BWT and MTF encoding. Accordingly, in the method of
Step SE4, which precedes step SE5, serves to derive the FHE to be applied in step SE5 from an estimate of the output distribution of step SE3, i.e. of the consecutive application of the BWT and MTF in steps SE2 and SE3. Step SE4 will be discussed in more detail below with reference to
In a further step SE6, which follows the compression of the input data in steps SE2 to SE5, a code Cj is selected from a predetermined set C={i, i=1 . . . N; N>1} of N error correction codes i, each having a length n being the same for all codes of the set C, a respective dimension ki and error correction capability ti. The codes of the set C are nested such that for all i=1, . . . , N−1:⊃i+1, ki>ki+1 and ti<ti+1. Specifically, in this example, that particular code from the set C is chosen as the selected code Cj, which has the highest error correction capability tj=max {ti} among all codes in C for which ki≥m.
Then, in a further step SE7, the compressed data is encoded with the selected code Cj to obtain encoded data. In addition, in a step SE8, which may follow step SE7 or be applied simultaneously therewith or even as an integral process within the encoding of SE7, zero-padding is applied to the encoded data by setting any “unused” bits in the codewords of the encoded data, i.e. bits which are neither part of the compressed data nor of the parity added by the encoding, to “0” (as in the BAC of the present example q>p). As discussed above, this zero-padding in step SE8 is a measure to further increase the reliability of sending data over the channel, i.e. in this example, the reliability of storing data to the flash memory 3 and subsequently retrieving it therefrom. Then, in a further step SE9 the encoded and zero-padded data is stored into the flash memory 3.
Subsequent step SD3 comprises selecting a code j(I) of the current iteration (i.e. I=1 for the initial iteration) from a predetermined set C={i, i=1 . . . N; N>1} of N error correction codes i, each having a length n being the same for all codes of the set C, a respective dimension ki and an error correction capability ti. Therein, the codes of the set C are nested such that for all i=1 . . . N−1: i⊃i+1, ki>ki+1 and ti<ti+1, wherein j(I+1)⊂j(I). For I=1, i.e. the initial iteration:j(I) is selected such that j<N. Then, in a further step SD4 the actual decoding of the retrieved encoded data is performed with the selected code of the current iteration, i.e. with j(I) in case of the initial iteration. In a further step SD5, a decompression process corresponding to the compression process used for the encoding of the data is applied to the decoded data being output in step SD4, to obtain reconstructed data of the current iteration I.
A verification step SD6 follows, wherein a determination is made as to whether the decoding process of the current iteration I was successful. For example, this determination may be implemented in an equivalent way as a determination as to whether a coding failure occurred in the current iteration I. If the decoding of the current iteration I was successful, i.e. if no coding failure occurred (SD6—no), the reconstructed data of the current iteration I is output in a further step SD7 as a decoding result, i.e. as decoded data. Otherwise (SD6—yes), the iteration index I is incremented (I=I+1) in a step SD8 and a determination is made in a further step SD9, as to whether a code j(I) for a next iteration is available in the set C. If this is the case (SD9—yes), the method branches back to step SD3 for the next iteration. Otherwise (SD9—no), i.e. when no further code is available for a next iteration, the overall decoding process fails and in step SD10 information indicating this coding failure is output, e.g. by sending a respective signal or message to host 4. Thus, the decoder running the method of
For further illustration, the simplest case where N=2 is now considered. In this case, there are only two different codes 1 and 2 of length n and dimensions k1and k2 in the set C. The two codes are nested which means that 2is a subset of 1, i.e. 1⊃2. The code 2. has the smaller dimension k2<k1 and higher error correction capability t2>t1. If during the encoding process, e.g. with the method of
Reference is now made again to step SE4 of
Now consider the cascade of BWT and MTF. With the BWT, each symbol keeps its value but the order of symbols is changed. If the original string at the input of the BWT contains substrings that occurred often, then the transformed string will have several places where a single character is repeated multiple times in a row. For the MTF algorithm, these repeated occurrences result in sequences of output integers all equal to 1. Consequently, applying the BWT before the MTF algorithm changes the probability of rank 1. In order to take the BWT into account, embodiments of the present invention are based on a parametric logarithmic probability distribution
Note that with the ordinary logarithmic distribution P1≈0.1633 for M=256. With the parametric logarithmic distribution, the parameter P1 is the probability of rank 1 at the output of the cascade of BWT and MTF. P1 may be estimated according to the relative frequencies at the output of the MTF for a real-world data model. In particular, in the following the Calgary and Canterbury corpora [34], [35] are considered. Both corpora include real-world test files in order to evaluate lossless compression methods. If the Canterbury corpus is used to determine the value of P1, this results in P1=0.4. Note that the Huffman code is not very sensitive to the actual value of P1, i.e., for M=256 values in the range 0.37≤P1≤0.5 result in the same code.
where a smaller value of the Kullback-Leibler divergence corresponds to a better approximation. Table I below presents values for the Kullback-Leibler divergence for the logarithmic distribution and the proposed parametric logarithmic distribution with P1=0.4. Both distributions are compared to the actual output distribution of the BWT+MFT processing. All values where obtained for the Calgary corpus using data blocks of 1 kilobyte and M=256. Both transformations are initialized after each data block. Note that the proposed parametric distribution results in smaller values of the Kullback-Leibler divergence for all files in the corpus. These values can be interpreted as the expected extra number of bits per information byte that must be stored, if a Huffman code is used that is based on the estimated distribution P(i) instead of the true distribution Q(i). The Calgary corpus is also used to evaluate the compression gain.
Table II below presents results for the average block length for different probability distributions and compression algorithms. All results present the average block length in bytes and were obtained by encoding data blocks of 1 kilobyte, where we used all files from the Calgary corpus. The results of the proposed algorithm are compared with the Lempel-Ziv-Welch (LZW) algorithm [24] and the algorithm presented in [21] which combines only MTF and Huffman coding. For the later algorithm, the Huffman coding is also based on an approximation of the output distribution of the MTF algorithm, where a discrete log-normal distribution is used. This distribution is characterized by two parameters, the mean value μ and the standard deviation σ. The probability density function for a log-normally distributed positive random variable x is:
For the integers i∈ {1, . . . , M} a discrete approximation of a log-normal distribution may be used, which results in the discrete probability distribution
Where α denotes a scaling factor. The mean value, the standard deviation, and the scaling factor α can be adjusted to approximate the actual probability distribution at the output of the MTF for a real-world data model. In Table II, the discrete log-normal distribution with mean value μ=3, standard deviation σ=3.7 and a scaling factor α=0.1 are used.
Table II presents the average block length in bytes for each file in the corpus. Moreover, the maximum values indicate the worst-case compression result for each file, i.e., these maximum values indicate how much redundancy can be added for error correction. Note that the proposed algorithm outperforms the LZW as well as the MTF-Huffman approach for almost all input files. Only for the image file named “pic”, the LZW algorithm achieves a better mean value.
Table III presents summarized results for the complete corpus, where the values are averaged over all files. The maximum values are also averaged over all files. These values can be considered as a measure of the worst-case compression. The results of the first two columns correspond to the proposed compression scheme using two different estimates for the probability distribution. The first column corresponds to the results with the proposed parametric distribution, where the parameter was obtained using data from the Canterbury corpus. The parametric distribution leads to a better mean value. The proposed data compression algorithm is compared to the LZW algorithm as well as to the parallel dictionary LZW (PDLZW) algorithm that is suitable for fast hardware implementations [25]. Note that the proposed data compression algorithm achieves significant gains compared with the other approaches.
In this section, an analysis of the error probability of the proposed coding scheme for the BAC is presented for the above-presented simple case where N=2 and thus there are only two
different codes 1 and 2 of length n and dimensions k1 and k2 in the set C. Based on these results, also some numerical results for an MLC flash will be presented.
For the binary asymmetric channel, the probability Pe of a decoding error depends on n0 and n1=n−n0, i.e. the number of zeros and ones in a codeword. We denote probability of i errors in the positions with zeros by P0(i). For the BAC, the number of errors for the transmitted zero bits follows a binomial distribution, i.e. the error pattern is a sequence of n0 independent experiments, where an error occurs with probability p. We have
Similarly, we obtain
for the probability of j errors in the positions with ones. Note that the number of errors in the positions with zeros and ones are independent. Thus, the probability to observe i errors in the positions with zeros and j errors in the positions with ones is P0(i)P1(j). We consider a code with error correction capability t. For such a code, we obtain the probability of correct decoding by
and probability of a decoding error by
P
e(n0, n1, t)=1−Pc(n0, n1, t). (9)
The probability Pe(n0, n1) of a decoding error depends on n0,n1, and the error correction capability t {t1, t2}. Moreover, these values depend on the data compression. If the data can be compressed such that the number of compressed bits is less or equal to k2, 2is used with error correction capability t2 to encode the compressed data. Otherwise the data is encoded using 1 with error correction capability t1<t2. Hence, the average error probability Pe may be defined as the expected value
P
e
=
{P
e(n0, n1t)} (10)
where the average is taken over the ensemble of all possible data blocks.
In the following, results for example empirical data are presented. For the data model both the Calgary and the Canterbury corpus are used. The values of the error probabilities p and q are based on empirical data presented in [14]. Note that the error probability of a flash memory increases with the number of program/erase (P/E) cycles. The number of program/erase cycles determines the life time of a flash memory, i.e., the life time is the maximum number of program/erase cycles that can be executed while maintaining a sufficiently low error probability. Hence, the error probability for different numbers of program/erase cycles is now calculated.
The data is segmented into blocks of 1024 bytes, wherein each block is compressed and encoded independently from the other blocks. For ECC, a BCH code is considered which has an error correcting capability t1=40, if uncompressed data is encoded. This code has the dimension k1=8192 and a code length n=8752. For the compressed data, a compression gain of at least 93 bytes for each data block is achieved. Hence, one can double the correcting capability and use t2=80 with k2=7632(954 bytes) for compressed data. The remaining bits are filled with zero-padding as described above.
From this data processing, the actual numbers of zeros and ones for each data block are obtained. Finally, the error probability for each block is calculated according to Equation (10) and averaged over all data blocks. The numerical results are presented in
While above at least one example embodiment of the present invention has been described, it has to be noted that a great number of variations thereto exists. Furthermore, it is appreciated that the described exemplary embodiments only illustrate non-limiting examples of how the present invention can be implemented and that it is not intended to limit the scope, the application or the configuration of the herein-described apparatus' and methods. Rather, the preceding description will provide the person skilled in the art with constructions for implementing at least one exemplary embodiment of the invention, wherein it has to be understood that various changes of functionality and the arrangement of the elements of the exemplary embodiment can be made, without deviating from the subject-matter defined by the appended claims and their legal equivalents.
Number | Date | Country | Kind |
---|---|---|---|
102016015167.6 | Dec 2016 | DE | national |
102017130591.2 | Dec 2017 | DE | national |