The present application is related to devices that store data in non-linear bit-storage media, such as memristive bit-storage media, and, in particular, to a method and system for reducing write-buffer capacities by encoding data prior to storage.
The dimensions of electronic circuit elements have decreased rapidly over the past half century. Familiar circuit elements, including resistors, capacitors, inductors, diodes, and transistors that were once macroscale devices soldered by hand into macroscale circuits are now fabricated at sub-microscale dimensions within integrated circuits. Photolithography-based semiconductor manufacturing techniques can produce integrated circuits with tens of millions of circuit elements per square centimeter. The steady decrease in size of circuit elements and increase in the component densities of integrated circuits have enabled a rapid increase in clock speeds as which integrated circuits can be operated as well as enormous increases in the functionalities, computational bandwidths, data-storage capacities, and efficiency of operation of integrated circuits and integrated-circuit-based electronic devices.
Unfortunately, physical constraints with respect to further increases in the densities of components within integrated-circuits manufactured using photolithography methods are being approached. Ultimately, photolithography methods are constrained by the wave length of radiation passing through photolithography masks in order to fix and etch photoresist and, as dimensions of circuit lines and components decrease further into nanoscale dimensions, current leakage through tunneling and power-losses, due to relatively high resistances of nanoscale components are providing challenges with respect, to further decreasing component sizes and increasing component densities by traditional integrated-circuit-manufacturing and design methodologies. These challenges have spawned entirely new approaches to the design and manufacture of nanoscale circuitry and circuit elements. Research and development efforts are currently being expended to create extremely dense, nanoscale electronic circuitry through self-assembly of nanoscale components, nanoscale imprinting, and other relatively new methods. In addition, new types of circuit elements that operate at nanoscale dimensions have been discovered, including memristive switching materials that can be employed as bistable nanoscale memory elements. Unfortunately, memristive switching materials, and other candidate bistable-memory-element materials, which feature non-linear responses to applied voltage, temperature, and other forces and gradients that are applied to change the state of the materials, often exhibit relatively broadly distributed, asymmetrical probability density functions (“PDFs”) that characterize the probabilities that a memory element switches with respect to different durations of time that the switching force or gradient is applied. The asymmetrical PDF may feature a relatively long tail, corresponding to the fact that the force or gradient may need to be applied for a significantly greater period of time, to ensure switching, than the average time for switching. Alternatively, the PDF characterizes the switching behaviors of a large number of memory elements, with the long tall corresponding to a small fraction of the large number of memory elements which switch at significantly longer durations of application of the force or gradient than the majority of the large number of memory elements. This fact, in turn, entails significantly decreased operational bandwidths and/or reliability with respect to theoretical devices with narrowly distributed, symmetrical PDFs, for which the time that a force or gradient needs to be applied in order to ensure switching up to a probability corresponding to a maximum tolerable bit error rate is not significantly greater than the average application time at which switching occurs. Theoreticians, designers, and developers of memory devices and other data-storage devices based on non-linear data-storage materials, such as memristive materials, continue to seek methods and device architectures that ameliorate the asymmetrical, broadly-distributed switching-time characteristics of certain of these devices.
The present application is directed to electronic data-storage devices that store data in memory elements characterized by relatively broad and/or asymmetric switching-time probability density functions. These types of memory elements, many of which incorporate non-linear, bistable materials, including memristive materials, may exhibit worst-case switching times that are significantly larger than average switching times. The probability distributions reflect the switching times observed when a memory element is repeatedly switched from a first bistable state to a second bistable state. The probability distributions also reflect the observed switching times of a large number of individual memory elements when a switching voltage, current, or other force or gradient is applied to the large number of memory elements. The potentially lengthy switching times result, for conventional data-storage devices, in relatively long switching cycles and correspondingly low data-storage-input bandwidths.
The electronic data-storage devices to which the current application is directed are discussed below m six subsections: (1) Overview of Memory Elements with Asymmetrically-Distributed Switching Times; (2) Error Control Coding; (3) Hypothetical WRITE methods; (4) Analysis of the Various WRITE Methods; (5) Results of the Analysis of the Various WRITE Methods; and (6) Examples of Electronic Data-Storage Devices to which the Current Application is Directed.
The voltage at which the nanoscale electronic device transitions from the low-resistance state to the high-resistance state is referred to as Vw−232. Choosing the high-resistance state to represent Boolean value “0” and the low-resistance state to represent Boolean value “1,” application of the positive voltage Vw+ can be considered to be a WRITE-1 operation and application of the negative voltage Vw− can be considered to be a WRITE-0 operation. Application of an intermediate-magnitude voltage VR 236 can be used to interrogate the value currently stored in the nanoscale electronic device. When the voltage VR is applied to the device, and when, as a result a relatively large magnitude current flows through the device, the device is in the low-resistance, or Boolean 1 state, but when relatively little current passes through the device, the device is in the Boolean 0 state. Thus, the nanoscale electronic device illustrated in
Although this example, and a subsequent example, feature bistable materials that, can have either of two different stable electronic states, depending on the history of voltages applied across the device, devices with three or more stable states can also be used in various applications. For example, a device with three stable states can store one of three different values “0,” “1,” or “2,” of a base-3 number system, or two of the three stable states of the three-state device can be used for storing a bit value, with the non-assigned state providing further separation from the information-storing states. In many cases, voltage is applied to change the state of a bistable memory element. However, other types of bistable materials may be switched by application of other forces and/or gradients, including temperature for phase-change-material-based devices. Other types of devices may feature types of states other than electrical-resistance states.
For the hypothetical log-normal distribution shown in
For many types of electronic devices, including memories, commercial applications demand extremely low error rates. As a result, in order to ensure that a sufficient portion of the memory elements written during a particular application of a write voltage to the memory do indeed switch, the WRITE voltage may need to be applied to the memory for a duration many times that of the average switching time for memory elements or, in other words, for a duration of time such that, for a normalized PDF, the area under the PDF between 0 and the application time approaches 1.0 and the area under the PDF to the right of the duration of application approaches 0.
A suitable expression for modeling the PDF for a memristive memory element is provided below:
A suitable expression for modeling the CDF for a memristive memory element is next provided:
In the above expression, the function erfc denotes the complementary error function. The PDF and CDF can be viewed as expressions for the distribution of t/τ, where the median value of ln(t/τ) is 0 and ln(t/τ) is Gaussian distributed. The ratio t/τ represents switching times normalized by the median switching time τ. The parameter τ is modeled, in certain types of memristive memory elements, by the following expressions:
τON=aONe−b
τOFF=aOFFe−b
τON is the τ parameter for positive applied voltages, which switch the memristive memory element into the ON or “1” state, and τOFF is the parameter τ for negative applied voltages that switch the memristive memory element from the “1” or ON state to the “0” or OFF state. The constants aON, aOFF, bON, and bOFF are empirically determined positive real constants and v is the applied switching voltage.
There are two approaches, employed in various examples, for designing and producing cost-effective memory and other data-storage devices, using memory elements characterized by log-normal and/or broadly distributed switching-time PDFs, with desirable data-input bandwidths. These two approaches can each be used separately or in combination.
Excellent references for error-control coding are the textbooks “Error Control Coding: Fundamentals and Applications,” Lin and Costello, Prentice-Hall Incorporated, New Jersey, 1983 and “Introduction to Coding Theory,” Ron M. Roth, Cambridge University Press, 2006. A brief description of the error-defection and error-correction techniques used in error-control coding is next provided. Additional details can be obtained from the above-referenced textbooks or from many other textbooks, papers, and journal articles in this field.
Error-control encoding techniques systematically introduce supplemental bits or symbols into plain-text messages, or encode plain-text messages using a greater number of bits or symbols than absolutely required, in order to provide information in encoded messages to allow for errors arising in storage or transmission to be detected and, in some cases, corrected. One effect of the supplemental or more-than-absolutely-needed bits or symbols is to increase the distance between valid codewords, when codewords are viewed as vectors in a vector space and the distance between codewords is a metric derived from the vector subtraction of the codewords.
In describing error detection and correction, it is useful to describe the data to be transmitted, stored, and retrieved as one or more messages, where a message μ comprises an ordered sequence of symbols, μi, that are elements of a field F. A message μ can be expressed as:
μ=(μ0,μ1, . . . ,μk−1)
where μi∈F.
The field F is a set that is closed under multiplication and addition, and that includes multiplicative and additive inverses. It is common, in computational error detection and correction, to employ finite fields, GF(pm), comprising a subset of integers with size equal to the power m of a prime number p, with the addition and multiplication operators defined as addition and multiplication modulo an irreducible polynomial over GF(p) of degree m. In practice, the binary field GF(2) or a binary extension held GF(2m) is commonly employed, and the following discussion assumes that the field GF(2) is employed. Commonly, the original message is encoded into a message c that also comprises an ordered sequence of elements of the field GF(2), expressed as follows:
c=(c0,c1, . . . cn−1)
where ci∈GF(2).
Block encoding techniques encode data in blocks. In this discussion, a block can be viewed as a message μ comprising a fixed number of symbols k that is encoded into a message c comprising an ordered sequence of n symbols. The encoded message c generally contains a greater number of symbols than the original message μ, and therefore n is greater than k. The r extra symbols in the encoded message, where r equals n−k, are used to carry redundant check information to allow for errors that arise during transmission, storage, and retrieval to be detected with an extremely high probability of detection and, in many cases, corrected.
In a linear block code, the 2k codewords form a k-dimensional subspace of the vector space of all n-tuples over the field GF(2). The Hamming-weight of a codeword is the number of non-zero elements in the codeword, and the Hamming distance between two codewords is the number of elements in which the two codewords differ. For example, consider the following two codewords a and b, assuming elements from the binary field:
a=(1 0 0 1 1)
b=(1 0 0 0 1)
The codeword a has a Hamming weight of 3, the codeword h has a Hamming weight of 2, and the Hamming distance between codewords a and b is 1, since codewords a and b differ in the fourth element. Linear block codes are often designated by a three-element tuple [n, k, d], where n is the codeword length, k is the message length, or, equivalently, the base-2 logarithm of the number of codewords, and d is the minimum Hamming distance between different codewords, equal to the minimal-Hamming-weight, non-zero codeword in the code.
The encoding of data for transmission, storage, and retrieval and subsequent decoding of the encoded data, can be nationally described as follows, when no errors arise during the transmission, storage, and retrieval of the data:
μ→c(s)→c(r)→μ
where c(s) is the encoded message prior to transmission, and c(r) is the initially retrieved or received, message. Thus, an initial message μ is encoded to produce encoded message c(s) which is then transmitted, stored, or transmitted and stored, and is then subsequently retrieved or received as initially received message c(r). When not corrupted, the initially received message c(r) is then decoded to produce the original message μ. As indicated above, when no errors arise, the originally encoded message c(s) is equal, to the initially received message c(r), and the initially received message c(r) is straightforwardly decoded, without error correction, to the original message μ.
When errors arise during the transmission, storage, or retrieval of an encoded message, message encoding and decoding can be expressed as follows:
μ(s)→c(s)→c(r)→μ(r).
Thus, as stated above, the final message μ(r) may or may not be equal to the initial message μ(s), depending on the fidelity of the error detection and error correction techniques employed to encode the original message μ(s) and decode or reconstruct the initially received message c(r) to produce the final received message μ(r). Error detection is the process of determining that:
c(r)≠c(s)
while error correction is a process that reconstructs the initial, encoded message from a corrupted initially received message:
c(r)≠c(s).
The encoding process is a process by which messages, symbolized as μ are transformed into encoded messages c. Alternatively, a message μ can be considered to be a word comprising an ordered set of symbols from the alphabet consisting of elements of F, and the encoded messages c can be considered to be a codeword also comprising an ordered set of symbols from the alphabet of elements of F. A word μ can be any ordered combination of k symbols selected from the elements of F, while a codeword c is defined as an ordered sequence of n symbols selected from elements of F via the encoding process:
{c:μ→c}.
Linear block encoding techniques encode words of length k by considering the word μ to be a vector in a k-dimensional vector space, and multiplying the vector μ by a generator matrix, as follows:
c=μ·G.
Notationally expanding the symbols in the above equation produces either of the following alternative expressions:
where gi=(gi,0, gi,1, gi,2 . . . gi,n−1).
The generator matrix G for a linear block code can have the form:
or, alternatively;
Gk,n=[Pk,r|Ik,k].
Thus, the generator matrix G can be placed into a form of a matrix P augmented with a k by k identity matrix Ik,k. Alternatively, the generator matrix G can have the form:
Gk,n=[Ik,k|Pk,p].
A code generated by a generator matrix in this form is referred to as a “systematic code.” When a generator matrix having the first form, above, is applied to a word μ, the resulting codeword c has the form:
c=(c0,c1, . . . ,cr−1,μ0,μ1, . . . ,μk−1)
where ci=μ0p0,i+μ1p1,i, . . . , μk−1pk−1,i). Using a generator matrix of the second form, codewords are generated with trading parity-check bits. Thus, in a systematic linear block code, the codewords comprise r parity-check symbols ci followed by the k symbols comprising the original word μ or the k symbols comprising the original word μ followed by r parity-check symbols. When no errors arise, the original word, or message μ, occurs in clear-text form within, and is easily extracted from, the corresponding codeword. The parity-check symbols turn out to be linear combinations of the symbols of the original message, or word μ.
One form of a second, useful matrix is the parity-check matrix Hr.n. defined as:
Hr,n=[Ir,r|−PT]
or equivalently,
The parity-check matrix can be used for systematic error detection and error correction. Error detection and correction involves computing a syndrome S from an initially received or retrieved message c(r) as follows:
S=(s0,s1, . . . ,sr−1)=c(r)·HT
where HT is the transpose of the parity-check matrix Hr,n expressed as:
Note that, when a binary field is employed, x=−x, so the minus signs shown above in HT are generally not shown.
The syndrome S is used for error detection and error correction. When the syndrome S is the all-0 vector, no errors are detected in the codeword. When the syndrome includes bits with value “1,” errors are indicated. There are techniques for computing an estimated error vector ê from the syndrome and codeword which, when added by modulo-2 addition to the codeword, generates a best estimate of the original message μ. Details for generating the error vector ê are provided in the above mentioned texts. Note that up to some maximum number of errors can be detected, and fewer than the maximum number of errors that can be detected can be corrected.
The probability of a switching failure, Pb(T), for a given memory element, or the bit-error rate for a multi-memory-element device, is computed from the above-discussed log-normal CDF as follows:
Pb(T)=1−Fτ,σ(T),T≧0.
where Fτ,σ(T) is the above-discussed CDF. In the following discussion, for simplicity, the asymmetry between on-switching and off-switching is ignored, as are cases in which a successfully applied WRITE operation does not change the state of a memory element and, therefore, failure of the WRITE operation does not change the state of a memory element. Ignoring these cases does not alter comparisons between various methods, discussed below. In the following discussion, switching failure of memristive memory elements and other non-linear data-storage materials is modeled as a binary symmetric noisy channel.
In the following discussion, when ECCs are employed, it is assumed that the code C is an [n,k,d] code and that, therefore, up to (d−1)/2 bit errors that occur in writing and/or reading each codeword can be corrected. Of course, the ability to recover from bit errors comes at the cost of the redundant bits r that are added to each group of binary information bits of length A, resulting in an information rate R defined as:
information rate=R=k/n
R<1 for coded information
R=1 for encoded information.
As discussed above, when uncoiled information is stored into and retrieved from a memory, the fraction of erroneous bits in the retrieved information from memory, assuming that no errors occur during reading of the stored information, is Pb, the probability of switching failure or BER. When coded information is stored into a memory, subsequently retrieved, and processed by an error-correcting decoder, the BER {circumflex over (P)}b is:
Next a number of different data-writing methods that employ one or both of feedback signals and ECC discussed above with reference to
For single-pulse methods, the total time of application of a WRITE voltage, Tt, or other force or gradient used to switch a memory element, is equal to T, the duration of the single pulse. For multiple-pulse methods, Tt is equal to the sum of the multiple pulses:
Tt=T0+ . . . +T1.
The average voltage-application time, Tavg, is the expected total application time:
Tavg=E(Tt).
For single-pulse methods, Tavg=T. The average voltage-application time per bit for methods that employ ECC, T*avg, is:
T*avg=Tavg/R=pulse time/bit,
accounting for the additional time for writing the added redundant bits. Finally, the gain G or expected savings in energy consumption or memory bandwidth per information hit for s particular data-writing method w, is:
where G is expressed in dB:
Thus, the following comparisons, uncoded BERs Pb, coded BERs {circumflex over (P)}b, total time of application of voltages or other forces and/or gradients employed to write data Tt, the average application time for multi-pulse methods Tavg, the average pulse time per bit T*avg and the gain G are evaluated to facilitate comparison of the different data-writing methods. While T*avg is the appropriate figure of merit to use when comparing energy consumption and memory bandwidth between different WRITE methods, Tavg and Tmax are reflective of device wear and worst-case latency considerations.
As discussed above, one approach for ameliorating the potentially long WRITE-voltage application times needed to ensure high reliability for data storage in devices with memory elements that exhibit log-normal distribution of switching times is to use a feedback signal that allows a memory controller to determine, at selected points in time, whether a particular memory element has switched. It should be noted that this feedback-signal-based method for decreasing the average duration of application of WRITE voltages incurs significant costs in additional circuitry and circuit elements. Similarly, as discussed above, the ability to correct errors provided by the use of ECCs involves storing of additional, redundant bits that decreases the information rate for a memory device.
In the following discussion, various simplifications are made. For example, in the above-provided expression for {circumflex over (P)}b, it is assumed that a decoder always fails when more than s bits of a codeword are corrupted or, in other words, the decoder is always able to detect uncorrectable error patterns. When the decoder detects an uncorrectable error pattern, the decoder discontinues attempting to decode the codeword, but does not introduce additional errors. In practice, this is not always the case. There is a small probability that the decoder will generate an incorrectly decoded codeword for an uncorrectable error pattern. The assumption is that this probability is ignored, which is reasonable in practice, since making the assumption does not significantly affect the result of the overall BER computation.
There are many different parameters that might be optimised for devices that feature memory elements with log-normally distributed switching times. For example, in addition to changing the length T and number of pulses during which a WRITE voltage, or other force or gradient, is applied, the voltage itself may be varied, with higher voltages generally decreasing the average pulse time needed to achieve a particular BER, but also increasing energy expended by a memory or other data-storage device to store information. It turns out, however, that, in many cases, there is no optimal WRITE voltage within the range of WRITE voltages that may be applied, but, instead, using larger-magnitude WRITE voltages generally results in expending less energy. In other words, the larger the WRITE voltage applied to a memory element, the shorter the WRITE voltage needs to be applied and the less total energy is expended to switch a memory element. Of course, at some point, increasing the WRITE voltage leads to failure of the device, and the longevity of the device may also be negatively impacted by use of high WRITE voltages. As another example, the variance σ of the natural fogs of switching times, modeled, as discussed above, by the above-provided PDF and CDF expressions, is dependent on the applied WRITE voltage. However, the dependence is weak, and thus does not constitute a good candidate parameter for optimization.
In the following discussion, as mentioned above, application times are reported in units of τ, or, in other words, the random variable is t/τ. Thus, in the following discussion, the results are provided in a time-scale-independent fashion. In the following computation of various parameters for various information-writing methods, a binary Bose, Ray-Chaudhuri, Hocquenghem (“BCH”) ECC code C is used. This code is a [4304, 4096, 33] ECC, with R≈0.952, which can correct up to 16 random errors per 4096-bit code blocks. This particular code is used, in the following discussion, for good performance in correcting switching-failure errors, although in actual memory systems, additional considerations for selecting codes would also include the types of failure modes of the code and the ability of the code to adequately handle various types of correlated multi-bit errors. In the following analysis, two different target BER levels are considered: (1) Pb=10−12, representing the lower end of BER levels for current storage devices and corresponding to storing of a two-hour high-definition movie without expected errors; and (2) Pb=10−23, representative of future desired BER levels.
In a first method, shown in
All of the methods illustrated in
In this section, the approaches to analyzing the various WRITE methods, discussed with reference to
In the one-pulse methods, the choice of T determines the input BER, Pb(T), of the stored data, which, in the coded method, is assumed to have been encoded with C. The output BER of the coded method is then estimated by using the parameters n=4304, s=16, of the above-described BCH code.
A multi-pulse WRITE method using two pulses is the simplest data-writing method with feedback. An initial pulse of duration T1 is applied, and the state of the device is sensed. When the device is found to have switched to the desired target state, the WRITE operation is deemed complete. When the device has not switched, an additional pulse of duration Tmax−T1 is applied, where Tmax>T1. Notice that, although interrupting the operation at time T1 reduces the average total pulse time, the switching failure probability is still determined by Tmax, as a result of which Pb=1−Fτ,σ(Tmax). The expected total pulse duration is
Tavg(Tmax,T1)=Fτ,σ(T1)T1+(1−Fτ,σ(T1))Tmax,
Given a target value of Pb, the value of T1 that minimizes Tavg can be computed. Indeed, it is readily verified that Tavg(Tmax,0)=Tavg(Tmax,Tmax)=Tmax, and that, as a function of Tt, Tavg has a sharp minimum in the interval (0,Tmax).
For a binary symmetric noisy channel, the 2-pulse method is identical to the 1-pulse method, except that, in expectation, far shorter pulses and, correspondingly, far less energy, are used to obtain the same BER. The worst-case pulse durations are the same as in the 1-pulse case. Also as in the 1-pulse case, using ECC results in further decreases in expected pulse lengths and energy consumption, but, additionally, in large reductions in worst-case to average pulse-length ratios.
Three-pulse WRITE methods are analyzed in similar fashion to the two-pulse WRITE methods, except that sensing of the state of the memory element is allowed at discrete times T1 and T2, 0≦T1≦T2≦Tmax. The expected total pulse length is given by the formula:
Tavg(Tmax,T1,T2)=Fτ,σ(T1)T1+(Fτ,σ(T2)−Fτ,σ(T1)T2+(1−Fτ,σ(T2))Tmax.
For a given value of Tmax corresponding to a target value of Pb, Tavg exhibits a deep global minimum in T1 and T2, which is easily found by taking partial derivatives with respect to T1 and T2 and solving the resulting system of equations by means of numerical methods.
In continuous-feedback WRITE methods, a pulse of maximal duration Tmax used while the state of the device is continuously monitored, with the applied voltage turned off immediately after switching occurs. The expected pulse length for for a continuous-feedback WRITE method is given by
When Tmax tends to infinity, the above expression tends, as expected, to
the mean of mean of the log-normal density fτ, σ. In fact this limit is approached rather rapidly when Tmax/τ>1.
Feedback offers significant gains in the expected duration of WRITE operations. These gains translate directly to reduced expected energy consumption and reduced wear on the devices. The use of EEC further enhances these gains, sometimes by significant margins. Additionally, the very significant reductions in Tmax due to coding lead to corresponding gains in system throughput, even when WRITE requests are restricted to occur at least Tmax units of time apart. To let throughput benefit also from the reduction in Tavg, and increase operation rate beyond the Tmax limitation, a queueing or buffering mechanism for write operations may be implemented, as some operations will take time Tmax, and WRITE requests arriving at a higher rate will have to be queued and wait while these operations complete. The buffering requirements and reliability of such a system can be analyzed using the tools of queueing theory.
Consider a 2-pulse method, with parameters T1, Tmax, and Tavg. Assume, for simplicity, that WRITE requests arrive at a fixed rate, with an inter-arrival period of A units of time. If A≧Tmax, no queueing is needed, so it is assumed that A<Tmax. Clearly, A>T1 for the queue to have any chance of remaking bounded (in fact, from well-known results in queueing theory, and as will also transpire from the analysis below, A>Tavg. A further simplifying assumption is that the ratio d=(Tmax−A)/(A−T1) is an integer. Because the ratios Tmax/Tavg are rather large, this is not a very restrictive assumption given a target BER achieved with a certain value of Tmax. In most cases, Tmax can be slightly increased to make d an integer. With these assumptions, the analysis of the waiting time in the queue reduces to studying the simple integer-valued random walk.
Let wi denote an integer random variable representing the waiting time in the queue of the ith WRITE request (the actual waiting time being (A−T1)wi), and let p=P(ti=T1), where ti is the actual total pulse length of the ith WRITE, the service time for the ith WRITE request. Let (a−b)* denote a−b when a>b, or 0 otherwise. Then, taking w0=0 as the initial condition
wi+1=(wi−Di)*,i≧1,
where Di is a random variable assuming values in {1,−d}, with P(Di=1)=p, and P(Di=−d)=1−p. By previous assumptions, these probabilities are independent of i. The random walk wi is a Markov chain which, for sufficiently large p, is persistent, returning infinitely often to the state wi=0. Under this assumption, the chain has a stationary distribution
Clearly, a state wi+1=w in the range 1≦w≦d−1 can be reached from wi=w+1, through Di=1. Therefore
Pw=pPw+1=p2Pw+2= . . . =pd−wPd=pd−wu,1≦w≦d,
where u=Pd. State w=0, on the other hand, can be reached from either w=0 or w=1, again with Di=1. Thus, P0=pP0+pP0+pdu. Solving for P0
Finally, for w≧d, state w can be reached from w+1 with Di=1, or from w−d with Di=−d, yielding the recursion
Pw=(1−p)Pw−d+pPw+1,w≧d.
An explicit expression for the generating function can be obtained from the above expressions as
from which, in turn, the expectation of the waiting time can be derived
Letting W=(A−T1)w, and translating back to time units
As expected, E[W] approaches zero when A approaches Tmax (no queue is needed when A≧Tmax), and E[W] approaches infinity when A approaches Tavg. By Little's theorem [3], the expectation of the queue size, Q, is given by
E[Q]=E[W]/A.
It is clear from above-provided expressions that the variable u multiplies all the probabilities Pu. Consider
An explicit expression for G0(z) follows directly, yielding
As for G1(z), applying the expression for G1(z) and the above-provided recursion, and recalling that u=Pd, the following expression is obtained
Rearranging terms, and after some algebraic manipulations, the following expressions are obtained
where gh(z)=(1−zh)/(1−z) for integers h≧1 eliminates a common factor (1−z) from the numerator and denominator of the expression, for G1(z). The above expressions determine G(z) up to a factor of u. Setting G(1)=1, the following expression is obtained
u=(1−p)((d+1)p−d)p−(d+1),
which completes the determination of G(z). The expectation of w is given by
which yields the first-provided expression for E[w]. The second expression for E[W], provided above, then follows by substituting d=(Tmax−A)/(A−T1) into the first-provided expression, multiplying by the time scale A−T1, and recalling that Tavg=pT1+(1−p)Tmax. Notice that, for u to be positive, p>d(d+1), leading to A>Tavg.
Again consider discrete pulsing WRITE methods with intervening reads to verify switching, but rather than imposing an explicit limit on the number of pulses, consider instead imposing a penalty on the verification/read operation and determine the optimal pulsing method subject to this penalty.
Let T1<T2< . . . <Tn−1<Tmax denote a sequence of pulse ending times which also coincide with reads, except for the final pulse ending at Tmax where there is no follow-up read. Thus, the first pulse is of duration T1, the second pulse of duration T2−T1, and so forth. Assume that Tmax is determined, as above, via Tmax=pb−1(p) for some desired raw bit-error rate pb=p. Further assume that a READ operation takes time tr. Therefore, the total expected time penalty for pulsing and reading can be expressed as
where T0=0 and Tsw is the random amount of aggregate pulse duration used to switch. Consider
the minimum average pulse and verification time over all possible pulse end times and number of pulses.
The Ti are constrained to be some positive integer multiple of a small time interval t=Tmax/mmax, as in Ti=mit, and optimized over the mi. The maximum number of pulses is then Tmax/t=mmax. Let {circumflex over (T)}* denote the resulting optimum Tavg under this constraint on the pulse ending times. Clearly, {circumflex over (T)}*≧T*, and it can be shown that
{circumflex over (T)}*≦T*+i.
Given an unconstrained set of pulse end times T1, . . . Tn−1, let T={┌Ti/t┐t:i∈(1, . . . , n−1)} be the set of quantized end times and {circumflex over (T)}1< . . . <{circumflex over (T)}{circumflex over (n)}−1 be the elements of T smaller than Tmax. This construction implies that
{circumflex over (n)}≦n
Ti>{circumflex over (T)}j implies i≦j
Comparing Tavg(Tmax, n, T1, . . . , Tn−1) and Tavg(Tmax, {circumflex over (n)}, {circumflex over (T)}1, . . . , {circumflex over (T)}{circumflex over (n)}−1). Tavg(Tmax, n, T1, . . . , Tn−1) can be interpreted as the expectation of random variable f(Tsw) where f(x) is
and similarly interpret Tavg(Tmax, {circumflex over (n)}, {circumflex over (T)}1, . . . , {circumflex over (T)}{circumflex over (n)}−1) as the expectation of the random variable g(Tsw) with g(x) as
For any 0≦x≦Tmax, g(x)≦f(x)+t, which, by way of the expectation interpretation, suffices to establish {circumflex over (T)}*≦T*+t. Suppose {circumflex over (T)}j−1<x≦{circumflex over (T)}j<Tmax, then g(x)={circumflex over (T)}i+jtr. There, will be some i such that Ti−1<x≦Ti, where {circumflex over (T)}0=T0=0 and {circumflex over (T)}{circumflex over (n)}=Tn=Tmax. Thus Ti≧x>{circumflex over (T)}j−1, and it then follows from Ti>{circumflex over (T)}j implies i>j that i>j−1 or i≧j. Additionally, it is be the case that Ti>{circumflex over (T)}j−t, since otherwise ┌Ti/t┐t would not be in the set of quantized end times T defined above. Putting these two facts together
establishing that indeed g(x)<f(x)+t, for x≦{circumflex over (T)}{circumflex over (n)}−1. Nearly the same argument can be applied for x>{circumflex over (T)}{circumflex over (n)}−1,
Thus, a goal is to compute
The standard approach to such a computation is dynamic programming. For any 0≦m≦mmax and m=m0<m1< . . . <mn−1<mmax, define
which corresponds to the average remaining write time assuming a new pulse starts at mt, with subsequent pulse ending times {m,t}, and assuming no switch occurred prior to time mt. Then define
as the best choice of pulse ending times subsequent to pulse time mt, assuming a pulse starts at mt.
Clearly {circumflex over (T)}*={circumflex over (T)}*(0). Dynamic programming involves computing {circumflex over (T)}*(m) recursively, based on {circumflex over (T)}*(m′) for m′>m. Note that for m=mmax−1 there is precisely one possible pulse end time, namely the one ending at mmaxt, so that
{circumflex over (T)}*(mmax−1)=t.
For m<mmax−1, one can use a single pulse ending at mmaxt, in which case
Tavg(m,1)=(mmax−m)t,
or one can use n≧2 pulses ending at intermediate times. For this case, it turns out that
This is shown as follows
Combining Tavg(m,1)=(mmax−m)t and the initially provided expressions for
Thus, one can compute {circumflex over (T)}*(m) from {circumflex over (T)}*(m′) for m′>m all the way down to m=0. The optimizing pulse end times can be round by keeping track of the optimizing m1 for each m, where the optimizing m1 can be taken to be mmax if the outer minimum is achieved by the first term, corresponding to one pulse ending at Tmax.
The complexity of the algorithm is readily seen to be no worse than O(m2max) operations. A simple way to dramatically speed up the computation of the minimization over m1, relative to a full search, is to compute the running minimum for each successively larger value of m1, starting with m1=m+1, and abort the search when m1 such that m1t−mt+tr exceeds the running minimum. Since m1t−mt+tr is increasing in m1 and since the other component of the cost is always non-negative, aborting in this manner preserves optimally.
and gain are shown for each of the considered WRITE methods, with T*avg explicitly shown for coded methods. The second horizontal section 1004, shows the characteristics obtained for multi-pulse WRITE methods with a specified Tmax and with the cost for reads between pulses equal to various fractions of τ.
As can be seen by analysis of the data shown in the table provided in
At Pb=10−23, the coded 2-pulse method offers 3 dB of additional gain over the uncoded 2-pulse method and, more notably, coding reduces the worst-case-to-average ratio from about 50:1 to 3:1. In fact, the 2-pulse encoded method has a gain of just 1.8 dB over the 1-pulse coded one. When comparing the 3-pulse uncoded and coded methods, coding offers additional gains in expected total pulse length (1 dB at Pb=10−23) and large improvements in worst-case to average ratios. In fact, as shown in
Using continuous feedback offers an additional coding gain of approximately 2.3 dB over the 3-pulse coded method (a ratio of 1.7:1 in average pulse length). In principle, this gap could be narrowed in a discrete pulse setting by arbitrarily increasing the number of pulses. In fact, the continuous pulse case can be seen as the limit of the discrete pulse case as the number of pulses lends to infinity.
To summarize, the effects and interactions of two mechanisms aimed at addressing the challenges posed by the log-normal switching behavior of certain memristor devices have been analyzed. In various settings, the use of coding significantly improves the overall performance of the system, by reducing average and worst-case switching times. These improvements translate into savings in energy consumption and device wear, as well as significant increases in writing throughput. With a judicious combination of a feedback mechanism and error-correction coding, the log-normal switching behavior of memristors should not be an obstacle to meeting the reliability requirements of modern storage systems.
The information-storage device, that represents one example, includes one or more two-dimensional arrays of memory elements 1402. In
Each memory element also generates a feedback signal, as discussed above with reference to
In an alternative example, the ECC encoder occurs in the input sequence upstream from WRITE buffering, with encoded data queued for internal WRITE operations rather than, in the previously described example, uncoded data being queued. In yet alternative examples, the ECC encoder may be incorporated into the READ/WRITE controller or occur at additional positions within the input sequence prior to the first and second controllers.
In an alternative example, either the first and second controllers or the READ/WRITE controller iteratively WRITE data to memory elements using the above-described multi-pulse method, reading back the data to determine where or not the WRITE has succeeded. In this alternative example, the memory elements do not generate feedback signals. Instead, the first and second controllers 1426 and 1428 apply multiple WRITE pulses to memory elements, reading the contents of the memory elements to which the pulses are applied alter each pulse, in order to determine whether or not the data has been correctly written. Based on the multiple-pulse WRITE and intervening READ operations used to verity correct data storage, the first and second controllers generate WRITE-completion signals returned to the READ/WRITE controller, as in the first-described example in which the state or memory elements is continuously monitored.
Note that, in the following discussion, it is assumed that the WRITE operations contain up to some maximum amount of data that can be written, in an internal WRITE operation, by the first and second controllers to the one or more arrays of memory elements. Thus, the data associated with a WRITE operation is ECC encoded and then passed to the first and second controllers for writing to the one or more arrays of memory elements. The feedback signal provided by the one or more arrays of memory elements indicates whether or not the entire, internal WRITE has succeeded. The first and second controllers may apply WRITE voltages for different periods of time to individual memory elements, or may apply a different number of pulses to individual memory elements, during an internal WRITE operation. Alternatively, a more complex buffering mechanism may be used to store received WRITE operations associated with a greater amount of data than can be written in a single internal WRITE operation and to generate multiple internal WRITE operations for the received WRITE operations associated with large amounts of data. The first and second controllers, in general, control storage to multiple memory elements, in parallel, during an internal WRITE operation.
Next, in a continuous loop of steps 1504-1508, the WRITE-buffering component determines whether any new WRITE operations have been requested by an external device, in step 1505, and processes these WRITE operations by calling the routine “input” in step 1506. The WRITE-buffering component also continuously monitors the “writeDQ” signal in step 1507, and when the READ/WRITE controller has processed a next WRITE operation, adjusts the circular queue by calling the routine “output” 1508.
In the above-described routines, it is assumed that the writing of data corresponding to a WRITE operation to the one or more arrays of memory elements by the routine “write” takes significantly longer than buffering of a received WRITE operation by the routine “write buffering,” and thus that, when writeDQ is set to TRUE, in step 1904 by the routine “writing,” the routine “write buffering” can buffer a next WRITE operation and process adjustment of buffer pointers as well as set writeDQ to FALSE, by calling the routine “output,” before the routine writing can finish the current internal WRITE operation and embark on another. Were the assumption not able to be made, then additional test and steps would be used to properly synchronize operation of the two routines and/or additional synchronisation signals would be used to synchronise operation of the two routines. Of course, in general, routines are used above to illustrate operation of hardware devices that may be implemented with logic circuits, the operation of which is synchronized at lower implementation levels.
Thus, by using ECC in addition to monitoring feedback signals from the first and second controllers, a smaller circular buffer can be used than the circular buffer that would be used without using ECC. In other words, the buffer size can be reduced to a size that results in early termination of a certain percentage of WRITE operations, leading to a greater number of switching errors during WRITE operations than would be acceptable. However, the greater number of switching errors is subsequently reduced by the error-correcting capacity of the ECC decoder when the data is read back from memory.
Although the present disclosure has been described in terms of particular examples, it is not intended that the disclosure be limited to these examples. Modifications will be apparent to those skilled in the art. For example, the use of both feedback signals and ECC encoding can be employed in a wide variety of different types of information-storage devices that include memory elements with asymmetrical switching-time PDFs, including memristive memory elements, phase-change memory elements, and other types of memory elements. The particular ECC code employed and the particular values of Tmax employed within the information-storage devices can be set to various different codes and calculated values, respectively, in order to ensure bit-error rates for the information-storage devices that meet or exceed bit-error-rate requirements. In certain types of information-storage devices, the maximum WRITE-voltage application time Tmax and the ECC codes used for encoding the data can be controlled or reset dynamically, depending on dynamic BER requirements, the age of the information-storage device, particularly the ages of the memory elements, the total number of READ/WRITE cycles carried out on the information-storage device, and other such characteristics and parameters.
It is appreciated that the previous description of the disclosed examples is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these examples will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other examples without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the examples shown herein but is to be accorded the widest scope consistent with foe principles and novel features disclosed herein.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2011/045529 | 7/27/2011 | WO | 00 | 1/16/2014 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/015803 | 1/31/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
7903448 | Oh et al. | Mar 2011 | B2 |
20020181311 | Miyauchi | Dec 2002 | A1 |
20060044878 | Perner | Mar 2006 | A1 |
20090109738 | Donze et al. | Apr 2009 | A1 |
20100162068 | Toda | Jun 2010 | A1 |
20100235714 | Toda | Sep 2010 | A1 |
20110026303 | Choi et al. | Feb 2011 | A1 |
20110069529 | Srinivasan et al. | Mar 2011 | A1 |
Number | Date | Country |
---|---|---|
1881455 | Dec 2006 | CN |
101512661 | Aug 2009 | CN |
101916211 | Dec 2010 | CN |
102073603 | May 2011 | CN |
1020080040248 | Jul 2008 | KR |
Entry |
---|
Burr et al., Phase change memory technology, Mar. 29, 2010, Journal of Vacuum Science and TEchnology B, vol. 28, Issue 2, pp. 223-262 (retrieved from Google.com Nov. 10, 2015). |
PCT Search Report/Written Opinion˜Application No. PCT/US2011/045529 dated Feb. 9, 2012˜8 pages. |
EP Search Report˜Application No. 11869906.5-1805/2737482 dated Apr. 7, 2015˜10 pages. |
Wei Yi et al, Feedback write scheme for memristive switching devices, Applied physics A, Jan. 27, 2011, pp. 973-982, Information and Quantum Systems Lab, HP Laboratories, USA. |
Number | Date | Country | |
---|---|---|---|
20140149824 A1 | May 2014 | US |