The present disclosure relates generally to error detection and correction in communication systems and, more particularly, to error-correcting code memory.
Error-detection and correction techniques are used to identify and rectify errors in computer communications data. Errors can sometimes be introduced into computer communications data, for example, by way of electromagnetic interference or background radiation incurred during transmissions through communications circuitry or storage in memory cells. Error-correcting code (ECC) introduces redundancy into communications data to permit detection of erroneous data and recovery of correct data.
Some error-correcting code techniques have been applied to computer storage data to reduce or eliminate data corruption. Typical encoding approaches have applied source coding techniques to convert each source symbol into a binary string and then channel coding techniques to add redundancy. Similarly, typical decoding approaches have applied channel decoding techniques to remove the added redundancy and then source decoding techniques to convert the binary strings into symbols.
For example, successive-cancellation (SC) decoding of polar codes has been applied, although the resulting error-rate performance demonstrated with finite-length codewords has not proven highly satisfactory. Successive-cancelation list (SCL) and cyclic redundancy check (CRC)-aided SCL decoding schemes have demonstrated relatively improved performance over SC decoding. Another approach has applied an iterative decoding method that alternates between low-density parity-check (LDPC) codes and dictionary information.
Nonetheless, ECC techniques providing relatively increased performance with practical codeword lengths and/or relatively decreased complexity would be desirable for use in memory or storage systems.
According to one embodiment of the present invention, a device for decoding storage data includes a memory that stores machine instructions and a processor coupled to the memory that executes the machine instructions to perform channel decoding based on a codeword to generate a data string. The processor further executes the machine instructions to perform source decoding based on the data string to generate a candidate symbol and identify one or more objects in a dictionary that have an initial symbol combination matching one or more symbols following an object separator based on the data siring. The initial symbol combination terminates with the candidate symbol. The processor also executes the machine instructions to determine a joint probability based on a channel probability and a source probability that the candidate symbol is correct.
According to another embodiment of the present invention, a computer-implemented method of decoding storage data includes performing channel decoding based on a codeword to generate a data string and performing source decoding based on the data string to generate a candidate symbol. The method further includes identifying one or more objects in a dictionary that have an initial symbol combination matching one or more symbols following an object separator based on the data string. The initial symbol combination terminates with the candidate symbol. The method also includes determining a first joint probability based on a channel probability and a source probability that the candidate symbol is correct.
According to yet another embodiment of the present invention, a computer program product for decoding storage data includes a non-transitory, computer-readable storage medium encoded with instructions adapted to be executed by a processor to implement performing channel decoding based on a codeword to generate a data string and performing source decoding based on the data string to generate a candidate symbol. The instructions are further adapted to implement identifying one or more objects in a dictionary that have an initial symbol combination matching one or more symbols following an object separator based on the data string. The initial symbol combination terminates with the candidate symbol. The instructions are also adapted to implement determining a first joint probability based on a channel probability and a source probability that the candidate symbol is correct.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
An embodiment of the present invention implements joint source-channel coding techniques that exploit structural correlations between source data and stored codewords. A dictionary contains information regarding objects related to the source data. A list decoding method jointly takes into account information regarding the read data distribution and the source data distribution to generate the retrieved data.
An embodiment of the present invention is shown in
The provisional channel decoder 12 receives a retrieved codeword from storage data and performs initial channel decoding to generate a provisional binary data string path, or multiple alternative paths, using any suitable channel decoding algorithm. The provisional channel decoder 12 removes channel coding redundancy from the retrieved storage data, while detecting and correcting errors in the retrieved storage data.
The source decoder 14 converts a segment of the provisional binary data string, or each of the alternative strings, into a candidate symbol corresponding to the stored source data type. For example, in an embodiment, the source data is English-language text, and the source decoder 14 converts a segment of the provisional binary data string into a next letter of a partial or whole word.
The source object dictionary 16 stores object-based information associated with the stored source data. For example, in an embodiment, the source data is English-language text, and the source object dictionary 16 stores a compendium of English-language words. In this example, the source symbol unit is a letter. The letters may be represented in any useful format, for example, in accordance with the seven-bit character codes established by the American Standard Code for Information Interchange (ASCII). In an embodiment, the source object dictionary 16 also stores the number of occurrences, or frequencies, with which stored words, as well as combinations of letters in partial words, appear in a corpus of documents related to the type of source data.
The hybrid path selector 18 receives the candidate symbol converted from the provisional binary data string and output from the source decoder 14 as feedback, along with source object information from the source object dictionary 16, and selects a limited number of the provisional binary data string paths to be retained based on estimated joint source-channel probabilities regarding each path based on the word frequency information received from the dictionary 16 and statistical channel input information.
Another embodiment is shown in
The joint source-channel coding storage system 20 includes a source encoder 24, a channel encoder 26, a storage 28, and a joint, source-channel decoder 30. The source encoder 24 receives source data 22 (d) to be stored, including object-based data, for example, text, image, audio or video data, or any combination of these.
The source encoder 24 performs a source encoding procedure to convert symbols, such as individual letters, in the source data 22 into binary strings (U) that can be efficiently transmitted and stored. For example, in an embodiment, the source encoder 24 implements Huffman encoding. The associated Huffman tree may be based on empirical statistics extracted from the source data 22 or another corpus of related data, such as a larger corpus of general text sources, in the case that the source data 22 includes text data. The source encoder 24 also concatenates multiple binary strings corresponding to symbols to form a data string, for example, a block or frame, that corresponds to a sequence of source symbols.
The channel encoder 26 performs a channel, encoding procedure to convert the data string into a codeword (X) to be transmitted to and stored in the storage 28. For example, in an embodiment, the channel encoder 26 implements a polar code algorithm. The channel encoding procedure adds redundancy to the source data to allow for detection and correction of any errors in the subsequently retrieved data.
The joint source-channel decoder 30 converts retrieved storage data into objects, such as words. The joint source-channel decoder 30 includes a provisional channel decoder 32, a source decoder 34, a dictionary 36 and a hybrid path selector 38. The provisional channel decoder 32 receives a codeword retrieved from the storage 28 and-converts the retrieved codeword (Y), or a segment of the codeword, into a provisional data siring (Û), or multiple alternative provisional data strings. For example, in an embodiment, the provisional channel decoder 32 implements a successive-cancellation list (SCL) decoding technique for polar codes to determine alternative data strings that statistically most likely correctly represent the corresponding source data string at each SCL decoding stage.
As known in the art, successive-cancellation list decoding takes into account the channel input. The most probable retrieved data string paths, P(u1N|y1N), are selected at each decoding stage, for example, based on the assumption that the elements in the data string are independent and identically distributed (i.i.d.) according to the Bernoulli distribution with a probability of one-half (0.5). However, in object-based storage, the elements in the data string are correlated. Thus, prediction accuracy can be increased by taking into account information regarding the source, as well as the channel.
At each decoding stage, the source decoder 34 identifies a relevant segment of each of the alternative provisional data strings corresponding to symbols of the stored source data type and converts each segment into a provisional next symbol, or candidate symbol, corresponding to the stored source data type to generate a list of candidate symbol paths. For example, in an embodiment, the source data is English-language text, and the source decoder 34 converts a segment of each of the provisional data strings into a provisional next letter to generate a list of candidate letters.
The dictionary 36 stores object-based data associated with the stored source data or with the stored source data type. For example, in an embodiment, the source data is English-language text, and the dictionary 36 stores a compendium of English-language words including corresponding word frequencies. The dictionary 36 may be based on a corpus of documents. For example, in an-embodiment, the dictionary 36 includes words and the corresponding frequencies of those words occurring in a ten million-word excerpt from an encyclopedia.
At each decoding stage, the hybrid path selector 38 receives the candidate symbols as feedback from the output of the source decoder 34, and queries the dictionary 36 to verify whether or not each of the candidate symbols, when combined with predecessor symbols, corresponds to an initial symbol combination in an object found in the dictionary 36. Symbol combinations that do not correspond to the initial symbols of any object contained in the dictionary 36 are rejected.
The hybrid path selector 38 computes estimated joint source-channel probabilities for each of the alternative data string paths. Since the binary strings corresponding to symbols in each of the alternative data strings are correlated based on the underlying source objects, the probability associated with each alternative data string path given the retrieved codeword can be represented as follows:
P(u1i|y1N)∝P(y1N|u1i)P(d1j)
since:
Thus, the joint source-channel probability, or joint probability, includes a channel probability component, P(y1N|u1i), based on statistical channel information, and a source probability component, P(d1j), based on source information. The joint source-channel probability reflects the likelihood that a candidate symbol is correct, that is, the likelihood that the candidate symbol matches a corresponding source symbol in the source data. It should be noted that this joint probability computation assumes that individual objects, such as words in text, are independent such that the following equation holds true:
P(d1j)=πjk=1P(dk)
Nevertheless, in some embodiments, this assumption may not be strictly true. For example, in the case of natural language text grammar provides additional correlation between words. As a result, in some embodiments, the joint source-channel probability computation may be further refined to reflect additional correlation that may exist among objects in the source data.
The hybrid path selector 38 determines a list including a limited number, L2, of alternative data string paths that have the highest probabilities of correctly representing the corresponding source symbol or symbols. Thus, at each decoding stage, up to L2 decoding paths are concurrently considered. For example, in an embodiment, a trimming or pruning procedure is used to remove candidate paths from a tree representing an object in the retrieved codeword, leaving only the L2 most likely paths after each decoding stage. In an embodiment, the statistical determination of symbols and underlying binary data strings progresses on an object-by-object basis, for example, identifying individual objects between object separators, such as spaces or punctuation marks in text.
In an alternative embodiment, the hybrid path selector 38 performs an adaptive joint source-channel decoding procedure. For example, the hybrid path selector 38 begins by performing decoding implementing a relatively small list size, such as L2=1. If the decoding procedure does not produce an acceptable result, the hybrid path selector 38 increases the list size, for example, by a factor of two, during each successive attempt until the decoding procedure is successful or until the list size reaches a predetermined maximum permitted size, Lmax. If the decoding procedure does not succeed using the maximum list size, then a decoding error is declared and the procedure ends.
In practicality, the source object dictionary cannot contain all possible objects that may be encountered, such as misspelled words in text. Thus, in an embodiment, a dynamic dictionary is configured to automatically update the dictionary data structure with additional objects that are encountered in the source data but not included in the dictionary during the encoding process. The dynamic dictionary utilizes a tree structure to represent all words in the dictionary and store the number of occurrences of each corresponding combination of letters.
For example, referring to
The root node 42 records the total number of words in the corpus. Each of the letter nodes 44, 46, 48 stores the represented letter and the marginal frequency of words in the corpus beginning with the corresponding combination of letters. Thus, the dictionary data structure 40 represents a source with the following words and corresponding number of occurrences, or frequency, of each word:
In some embodiments, the source object dictionary is statically configured previous to the encoding and decoding processes. However, if any additional objects are present in the source data, the static dictionary cannot recognize the new objects. Thus, in an alternative embodiment, a dynamic dictionary is configured to automatically update the dictionary data structure with additional objects encountered in the source data during encoding. For example, the source object dictionary 16 of
In an embodiment, the following procedure can be implemented to update the dynamic dictionary during the encoding process:
Referring to
Referring now to
In block 72, a source encoding procedure is performed on the source data symbols to generate binary strings. For example, in an embodiment, the source encoding procedure implements a Huffman code or other data compression algorithm, as described above. The binary strings representing individual symbols from the source data are concatenated to form a data string, in block 74.
In block 76, a channel encoding procedure is performed on the data string to generate a codeword. For example, in an embodiment, the channel encoding procedure implements a polar code or other data redundancy code algorithm, as described above. The codeword is transmitted through a communication channel, in block 78. For example, in an embodiment, the codeword is sent to a storage device, for example, a hard disk drive (HDD), a solid-state drive (SSD), or any other suitable data storage device.
In block 80, a retrieved codeword is received from the communication channel. For example, in an embodiment, the retrieved codeword is retrieved from the data storage device. At each decoding stage, a provisional channel decoding procedure is performed on the retrieved codeword to generate multiple alternative provisional data strings, in block 82.
For example, in an embodiment, a successive-cancellation list decoding algorithm for polar codes, or other data redundancy decoding algorithm, is implemented. Multiple alternative decoding paths, or provisional data strings, are concurrently considered at each decoding stage, as explained above. During each decoding stage the number of decoding paths initially is doubled before the tree structure is trimmed, or pruned, to discard all but a predetermined number of most probable paths. In various embodiments, the decoding stages may correspond to each successive bit of data in the retrieved codeword, a fixed number of data bits in the retrieved codeword, or any other suitable division of data in the retrieved codeword.
In block 84, a source decoding procedure is performed on the set of alternative provisional data strings to generate alternative candidate symbols. For example, in an embodiment, a Huffman decoding algorithm or other data compression algorithm is implemented to extract candidate symbols from the alternative provisional data strings, as described above. The candidate symbols are sent through a feedback loop, in block 86, for further validation regarding the channel decoding procedure.
In block 88, the alternative candidate symbols are concatenated with any previously decoded symbols following the most recent object separator encountered in the provisional data string. For example, in a text data string, the decoded letters following a space or punctuation are concatenated to form a partial or whole word.
The combinations of concatenated symbols, including the most recently decoded candidate symbol or symbols at the trailing end, in block 90, are compared to object information stored in a source object dictionary. The object information is reviewed to identify any objects in the dictionary with an initial symbol combination that matches the partial or whole object formed by the concatenated decoded symbols.
The marginal frequency, or number of occurrences, related to each symbol combination is retrieved from the dictionary, in block 92, as explained above. In block 94, source probabilities are computed for each of the alternative candidate symbols based on the marginal frequencies stored in the dictionary with respect to each combination of concatenated symbols, as explained above.
As an example, with reference to the dictionary data structure 50 of
In block 96, estimated joint source-channel probabilities are computed for each of the alternative provisional data string paths from block 82, as explained above. The joint source-channel probabilities combine the source probability regarding a particular source object or partial object with the channel probability regarding a particular retrieved data string, as explained above.
In block 98, the joint source-channel probabilities axe used to determine which of the alternative provisional data strings to retain at each stage of the joint source-channel decoding procedure. In an embodiment, a specified number of the alternative provisional data strings from block 82 having the highest joint source-channel probabilities are retained at each decoding stage. The additional alternative provisional data strings are trimmed or pruned from the data structure. In the case that no objects in the dictionary match the symbol combinations from block 88, then the corresponding source probability, and thus, the joint source-channel probability, equal zero and the corresponding candidate symbol or symbols from block 84, and any corresponding alternative data strings from block 82, are rejected.
In block 100, a determination is made regarding whether or not additional decoding stages are required to complete the decoding of the retrieved codeword from block 80. If so, then the process continues at block 82; otherwise, at the final decoding stage, a retrieved data string having the highest joint source-channel probability is selected from the list of alternative-data strings as output, in block 102.
The systems and methods described herein can offer advantages such as a joint source-channel coding scheme using polar codes having reduced complexity and improved performance. For example, embodiments do not require iterative decoding and can provide reduced block or frame error rates (FER) at relatively low signal-to-noise ratios (SNR) with respect to some existing methodologies. At relatively higher SNR, embodiments can provide substantial gain, demonstrating a similar waterfall slope with respect to some existing methodologies. Embodiments can tolerate higher raw bit error rates (BER) and thus extend the life of some types of storage media, such as solid-state devices (SSD) based on NAND flash technology.
Referring to
As illustrated in
In some embodiments, the computing device 110 is coupled to a communication network by way of the network interface 120, which in various embodiments may incorporate, for example, any combination of devices, as well as any associated software or firmware, configured to couple processor-based systems, including moderns, access points, routers, network interface cards, LAN or WAN interfaces, wireless or optical interfaces and the like, along with any associated transmission protocols, as may be desired or required by the design.
The computing device 110 can be used, for example, to implement the functions of the components of the joint source-channel decoder 10 of
Aspects of this disclosure are described herein with reference to flowchart illustrations or block diagrams, in which each block or any combination of blocks can be implemented by computer program instructions. The instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to effectuate a machine or article of manufacture, and when executed by the processor the instructions create means for implementing the functions, acts or events specified in each block or combination of blocks in the diagrams.
In this regard, each block in the flowchart or block diagrams may correspond to a module, segment, or portion of code that includes one or more executable instructions for implementing the specified logical functions(s). It should also be noted that, in some alternative implementations, the functionality associated with any block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or blocks may sometimes be executed in reverse order.
A person of ordinary skill in the art will appreciate that aspects of this disclosure may be embodied as a device, system, method or computer program product. Accordingly, aspects of this disclosure, generally referred to herein as circuits, modules, components or systems, or the like, may be embodied in hardware, in software (including source code, object code, assembly code, machine code, micro-code, resident software, firmware, etc.), or in any combination of software and hardware, including computer program products embodied in a computer-readable medium having computer-readable program code embodied thereon.
It will be understood that various modifications may be made. For example, useful results still could be achieved if steps of the disclosed techniques were performed in a different order, and/or if components in the disclosed systems were combined in a different manner and/or replaced or supplemented by other components. Accordingly, other implementations are within the scope of the following claims.