ON-ACCESS ERROR CORRECTION FOR CONTENT-ADDRESSABLE MEMORY

Information

  • Patent Application
  • 20250201334
  • Publication Number
    20250201334
  • Date Filed
    April 25, 2024
    a year ago
  • Date Published
    June 19, 2025
    28 days ago
Abstract
Examples of the presently disclosed technology provide a methodology for detecting and correcting errors in a summing content-addressable memory (Σ-CAM) while the Σ-CAM is performing a computational task (i.e., “on-access” error correction). The methodology involves adding redundancy columns to a Σ-CAM used to store a task-driven matrix (i.e., a matrix having values comporting with a computational task). Examples can leverage an encoder to compute redundancy values for the redundancy columns such that the Σ-CAM stores a codeword of a linear code (C) in each row. Examples also modify the linear code (C) used to compute redundancy values. Namely, examples modify the linear code (C) so that it includes the all-one vector. With this modification, a system of the presently disclosed technology can detect and correct errors in an output vector from a Σ-CAM based on this modified/particularized linear code (C).
Description
BACKGROUND

Content addressable memory (“CAM”) is a type of computing memory in which stored data is searched by content rather than location. When a “word” is input to a CAM, the CAM searches for the word in its contents. If the CAM finds the word (i.e., “returns a match”), the CAM returns the address of the location where the found word resides. Individual cells of a CAM (referred to herein as CAM cells) can be arranged into rows and columns to form the CAM. Depending on configuration, a respective row or column of a CAM that connects outputs from constituent CAM cells may be referred to as a match line.


A summing CAM (Σ-CAM) (sometimes referred to as Hamming-distance CAM) is a particular type of CAM configured to sum outputs from CAM cells arranged along a respective column. Said differently, a Σ-CAM may refer to a CAM which consists of an custom-character×n array of CAM cells, with each CAM cell (i,j)∈[custom-charactercustom-character×[ncustom-character implementing the function xcustom-characterN(x,ai,j), for some internal state value ai,jcustom-character. Programmed internals states of the Σ-CAM can be represented as an array (matrix) A=custom-charactercustom-character, with ai standing for row i and Aj(=(A){j}) for column j. An input to the Σ-CAM may comprise a row vector (e.g., search key) x=custom-charactercustom-character, with xi serving as the input to all the CAM cells along row i. A respective column (corresponding to a match line) j of the Σ-CAM can compute an integer sum of the outputs of the CAM cells arranged along the respective column j—i.e., cj=custom-characterN(xi, ai,j)=w(x−Aj). These integer sums form the output row vector—i.e., c=custom-character∈[0: custom-charactern—of the Σ-CAM. Accordingly, the Σ-CAM computes the Hamming distances between the input vector and the contents of the CAM cells along each of the n columns in the Σ-CAM.





BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure, in accordance with one or more various examples, is described in detail with reference to the following figures. The figures are provided for purposes of illustration only and merely depict examples.



FIG. 1 depicts an example truth table, in accordance with examples of the presently disclosed technology.



FIG. 2 depicts an encoding mapping algorithm, in accordance with examples of the presently disclosed technology.



FIG. 3 depicts an example table of values, in accordance with examples of the presently disclosed technology.



FIG. 4 depicts an example table of values, in accordance with examples of the presently disclosed technology.



FIG. 5 depicts an example table of values, in accordance with examples of the presently disclosed technology.



FIG. 6 depicts an example CAM cell, in accordance with examples of the presently disclosed technology.



FIG. 7 depicts an example CAM-based circuit, in accordance with examples of the presently disclosed technology.



FIG. 8 depicts an example graph illustrating comparisons between a threshold voltage and voltage output of a match line associated with a column of CAM cells of a summing CAM, in accordance with examples of the presently disclosed technology



FIG. 9 depicts a block diagram of an example computer system in which various of the examples described herein may be implemented.





The figures are not exhaustive and do not limit the present disclosure to the precise form disclosed.


DETAILED DESCRIPTION

Examples of the presently disclosed technology provide a methodology for detecting and correcting errors in a Σ-CAM while the Σ-CAM is performing a computational task (i.e., “on-access” error correction).


The presently disclosed “on-access” error correction methodology can improve upon “off-line” error detection/correction methodologies. Such “off-line” error detection/correction methodologies generally involve testing procedures which would disrupt normal operation of a hardware accelerator during a computational task, and thus must be performed “off-line.” For example, an alternative methodology for detecting errors in a CAM could involve applying a sequence of test vectors to a CAM in order to detect programming and other circuit-based errors. Applying the test vectors may be unrelated to—and would otherwise disrupt—a computational task the CAM is being used to perform. Thus such a methodology would be performed “off-line.” By contrast, the presently disclosed error detection and correction methodology involves correcting an output vector generated by a Σ-CAM while the Σ-CAM is performing a designated computational task. Accordingly, such a methodology may be more computationally efficient (i.e., consume less processing resources, time, power consumption, etc.) than alternative “off-line” error detection and correction methodologies.


Examples realize the advantages provided by “on-access” error detection and correction by leveraging an intelligent insight that a Σ-CAM operates analogously to a vector-matrix multiplier (sometimes referred to as a dot product engine). Leveraging such insight, examples uniquely adapt an error-correction methodology for vector-matrix multipliers for Σ-CAMs. The adapted methodology involves adding redundancy columns to a Σ-CAM already being used to store a task-driven matrix (i.e., a matrix having values comporting with a computational task). Examples can leverage one or more processing resources (e.g., an encoder) to compute redundancy values for the redundancy columns such that the Σ-CAM stores a codeword of a linear code (C) in each row. To further adapt the methodology for a new/particularized type of hardware accelerator—i.e., a Σ-CAM—examples modify the linear code (C) used to compute redundancy values. Namely, examples modify the linear code (C) so that it includes the all-one vector. With this modification, a system of the presently disclosed technology can detect and correct errors in an output vector from a Σ-CAM based on this modified/particularized linear code (C).


For example, a system of the presently disclosed technology may comprise: a) a Σ-CAM comprising CAM cells arranged into a number (l) rows and a number (n) columns, wherein the Σ-CAM is configured to sum outputs from a number (l) CAM cells connected along a respective column of the number (n) columns; and b) one or more processing resources operative to program the CAM cells to store a matrix (A) having dimensions (l×n), wherein: i) each row of the matrix (A) comprises a codeword of a linear code (C), and ii) the linear code (C) includes an all-one vector of dimension (n) as a codeword.


In the above-described system, CAM cells connected along a number (k) columns of the number (n) columns may comprise task-driven CAM cells. Relatedly, CAM cells connected along a number (n−k) columns of the number (n) columns may comprise redundancy CAM cells. Here, it follows that a respective row of the Σ-CAM comprises a number (k) task-driven CAM cells and a number (n−k) redundancy CAM cells. Accordingly, programming the Σ-CAM to store the matrix (A) having dimensions (l×n) may comprise: a) programming the number (k) task-driven CAM cells of the respective row to store task-driven values comporting with a computational task; b) computing redundancy values for the number (n−k) redundancy CAM cells of the respective row based on the programmed task-driven values and the linear code (C); and c) programming the number (n−k) redundancy CAM cells of the respective row to store the computed redundancy values such that the respective row stores a codeword of the linear code (C). In certain implementations, computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row may comprise computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row such that the task-driven values and the redundancy values include a sequence of ones and zeros prescribed by the linear code (C).


In the above-described system, the one or more processing resources may be further operative to detect and correct one or more errors in an output vector (c) from the Σ-CAM based on the linear code (C). In certain of these implementations, the output vector (c) may have dimension (n). Relatedly, the output vector (c) may comprise a concatenation of a task-driven output vector (c′) and a redundancy output vector (c″). Here, the task-driven output vector (c′) may have dimension (k) and correspond to a summation between: (1) a vector-matrix multiplication between a transformation of an input vector (x) of dimension (l) received by the Σ-CAM and a task-driven stored matrix (A′) of dimension (l×k) stored by the task-driven CAM cells of the Σ-CAM; and (2) a vector multiplication between the all-one vector of dimension (n) and a constant value (see e.g., Eq. 4 below). Relatedly, the redundancy output vector (c″) may have a dimension (n−k) and correspond to a summation between: (1) a vector-matrix multiplication between the transformation of the input vector (x) and a redundancy stored matrix (A″) of dimension (I×(n−k)) stored by the redundancy CAM cells of the Σ-CAM; and (2) a vector multiplication between the all-one vector of dimension (n) and the constant value (see e.g., Eq. 4 below). Moreover, the one or more processing resources may detect and correct the one or more errors in the task-driven output vector (c′) based on (e.g., by comparing) the linear code (C) and the redundancy output vector (c″).


In some implementations of the above-described system, a respective CAM cell of the Σ-CAM may comprise one or more programmable memristors. Here, programming the respective CAM cell may comprise programming conductance of the one or more programmable memristors.


Examples of the presently disclosed technology will now be described in greater detail below. It should be understood that the description below merely provides illustrative examples, and should not limit the principles disclosed herein.


Herein, several coding schemes are presented for on-access correction of error Σ-CAMs. Such schemes can apply to binary Σ-CAMs (sometimes referred to herein a Σ-BCAMs) and ternary Σ-CAMs (sometimes referred to herein a Σ-TCAMs). Various schemes entail allocating rows of a Σ-CAM for redundancy such that, when an input vector is applied to the Σ-CAM, errors in the output vector can be corrected, provided that their number (measured by either the Hamming metric or an L1-metric) does not exceed a prescribed value. In the case of Σ-BCAMs, the operation of the Σ-BCAM resembles that of a discrete vector-matrix (V-M) multiplier. Accordingly, error-correction schemes of the presently disclosed technology for Σ-BCAMs build upon schemes that have been proposed for such multipliers. Adaption of such schemes to Σ-TCAMs can be more involved due to the wildcard symbol (sometimes referred to as a “don't care” symbol). Accordingly, schemes of the presently disclosed technology for Σ-TCAMs make use of a special type of a positional binary representation of integer pairs where the representations of the two integers in any pair do not share a 1 in the same position.


Examples of the presently disclosed technology are described in greater detail in conjunction Sections I-VIII below.


Section I. Introduction of Notation

Hereafter, the present application uses the following notation. For y,z∈custom-character, the application denotes by [y:z] the integer subset {x∈custom-character:y≤x≤z} and by [y:zcustom-character the set [y:z−1]. The application will use the shorthand notation [zcustom-character for [0:zcustom-character and custom-character for [2custom-character. For an integer vector x, the application denotes its Hamming weight and L1-norm by w(x) and ∥×∥, respectively. For an custom-character×n matrix A (or a row n-vector if custom-character=1) and a subset X⊆[ncustom-character, the application lets (A)x denote the custom-character×|X| submatrix of A that is formed by the columns that are indexed by X. For y,z∈custom-character] such that z>0, the notation y MOD z stands for the remainder in [zcustom-character when y is divided by z. The ring of integers modulo q will be denoted by custom-character.


Let N:custom-character×custom-charactercustom-character B be the inequality-test (XOR) function, which is defined for every x,a∈custom-character by










N

(

x
,
a

)

=



x



a


=

{




1




if


x


a





0


otherwise



,







Eq
.

1







where custom-characterΩcustom-character denotes the Iverson bracket (which evaluates to 1 if its argument is true, and to 0 otherwise). As described above, a Σ-CAM ma refer to a device which consists of an custom-character×n array of CAM cells, with each CAM cell (i,j)∈[custom-character×]ncustom-character implementing the function xcustom-characterN(x,ai,j), for some internal state value ai,jcustom-character. This application represents the internal states as an array (matrix) A=(ai,j)(i,j)∈[l]×[ncustom-charactercustom-character, with ai standing for row i and Aj(=(A){j}) for column j. The input to the Σ-CAM may comprise a row vector (e.g., “search key”) x=custom-charactercustom-character, with xi serving as the input to all the CAM cells along row i. Each column (corresponding to a match line) j in the Σ-CAM computes an integer sum of the outputs of the CAM cells along the column,










c
j

=









i


[








N

(


x
i

,

a

i
,
j



)


=


w

(

x
-

A
j


)

.






Eq
.

2







These integer sums form the output row vector, c=custom-character∈[0:custom-charactern, of the Σ-CAM. Accordingly, a Σ-CAM computes the Hamming distances between an input vector and the contents of the CAM cells along each of the n columns in the Σ-CAM.


Hereafter, the application uses the notation S(x,A) for the vector in [0:custom-charactern whose entries are given by the right-hand side of Eq. 2. Namely, S(x,A) is the result c of the computation when an input vector x is applied to a Σ-CAM that is programmed to a matrix A of internal states. The zero entries in c=S(x,A) are referred to as “matches,” and an ordinary binary CAM (BCAM) can be seen as a quantized Σ-CAM whose output is the binary n-vector (custom-charactercj=0custom-character (which points at the columns where matches occur).


The Σ-CAM serves as a model for accelerators that have been proposed for computing the Hamming distance in various applications, and there are several designs for their CAM cells (using CMOS as well as resistive technologies) and for the circuitry that computes the Hamming distance along each column. In those designs, a value N(xi, ai,j)=1 is typically realized by setting CAM cell (i,j) to some high conductance. When N(xi, ai,j)=0, the CAM cell is virtually open (i.e., has zero conductance). One method for obtaining the Hamming distance is by fixing the voltage level of the match lines and measuring the current that flows through each column of the CAM. This current is generally proportional to the number of high-conductance CAM cells along the column. In various applications of interest, the matrix λ is modified much less frequently than the input vector x.


Inaccuracies while programming CAM cells, manufacturing defects, and noise while reading an output vector are exam les of factors that may cause the actually-read row vector, y=custom-charactercustom-character, to differ from the correct vector c=S(x,A). The error vector can be defined as the following vector in custom-character:






e
=



(

e
j

)



j


[
n





=

y
-
c






In a scenario where a faulty CAM cell generates the wrong output, yet that output is still in B, the output at the column that contains this CAM cell will be a change by ±1. This application will refer to such an event as an L1-error. The L1-norm ∥e∥ then bounds from below the number of L1-errors, with equality attained if all the faulty CAM cells along the same column err at the same direction. A focus in this application will be on this scenario, in which case one of the presently disclosed design parameters will be the largest L1-norm τ of e that can be tolerated, which—when presuming nothing about the internal states in the matrix A—can serve as a proxy for the largest number r or faulty CAM cells that can be tolerated. However, this application will also consider the scenario where a faulty CAM cell may have a larger effect on the output of the column, in which case r will stand for the largest Hamming weight of e that can be tolerated. Hereafter, by “a number of errors” this application may refer to either ∥e∥ or w(e), depending on the context.


An aim of the presently disclosed technology is providing coding schemes for on-access error correction in Σ-CAMs (and in a variant thereof to be presented below). To this end, the presently disclosed technology adapts a framework for integer vector-matrix (V-M) multipliers to Σ-CAMs, since there is a close relationship between the functions of these two devices. Specifically, it can be seen from Eq. 2 that











S

(

x
,
A

)

=









i
:

x
i


=
0




a
i


+








i
:

x
i


=
1




(


1
n

-

a
i


)




,




Eq
.

3







where 1n stands for the all-one row n-vector. It follows that











S

(

x
,
A

)

=





(


1


-

2

x


)



u


A

+


w

(
x
)

·

1
n




,




Eq
.

4







where u∈{±custom-character.


Namely, up to an additive multiple of the all-one vector, the Σ-CAM performs a multiplication of a vector in {±custom-character by a matrix in custom-character.


With this in mind, examples of the presently disclosed technology program the matrix A so that the first k(<n) entries in c=S(x,A) will carry the target computation of the Σ-CAM, while the remaining n−k entries of c will contain redundancy symbols, which can be used to detect or correct computational errors. Thus, the programmed custom-character×n matrix A will have the structure










A
=

(


A




A



)


,




Eq
.

5







where A′=custom-character and A″=custom-character. The computed row vector for an input vector x∈custom-character will then be c=S(x,A″)=(c′|c″), where c′=custom-character=S(x, A′) is the target computation while c″=custom-character=S(x,A″) is the redundancy portion.


Given positive integer custom-character, n, and k<n, a Σ-CAM coding scheme is a pair (ε,custom-character, where

    • ε:custom-charactercustom-character is an encoding mapping such that for every A′∈custom-character, the image A=ε(A′) has the form of Eq. 5 for some A″∈custom-character, and
    • custom-character:[0:custom-character→[0:custom-characterk ∪{“e”} is a decoding mapping (where “e” designates decoding failure).


The set






𝒞
=


𝒞

(

)

=

{




S

(

x
,



(

A


)


)

:


A





𝔹


×
k



,

x


𝔹




}






is the code induced by custom-character and its members are called codewords. Namely, custom-character⊆[0:custom-charactern and its codewords are all the possible output vectors that can be obtained when A′ ranges over all possible custom-character×k matrices over custom-character and x ranges over all possible custom-character input vectors in custom-character. This application will refer to n and k as the length and dimension, respectively, of the coding scheme. For row vectors z∈custom-character that arise in connection with a coding scheme, this application uses the notation z′ and z″ for their k-prefix (z)[kcustom-character and (n−k)-suffix (z)[n−k custom-character, respectively. This notational convention extends to custom-character×n matrices as well.


Given custom-character, k, and a prescribed number of error τ (measured by either the L1-metric or the Hamming metric) a goal is to have a coding scheme (custom-character, custom-character) with the smallest possible n such that for every A′∈custom-character and x∈custom-character, the k-prefix, c′, of the respective codeword c=S(x,custom-character(A′))=(c′|c″) in custom-character can be recovered correctly in the presence of τ errors or less. Namely, the following equality may hold for every read vector y∈[0:custom-character]n such that ∥y−c∥≤τ:







𝒟

(
y
)

=


c


(

=

S

(

x
,

A



)


)





(alternatively, n can be given and k is to be maximized). Note that the decoding mapping D is may not be required to recover the redundancy part c″.


More generally, given nonnegative integer τ and σ, a coding scheme (custom-character,D) is said to correct τ errors and detect τ+σ errors (in the L1-metric or the Hamming metric) if the following conditions hold for every computed output vector c=S(x,A)∈custom-character and the respective read vector y∈[0:custom-charactern.

    • If |y-c∥≤τ then D(y)=c′.
    • Otherwise, if ∥y−c∥≤τ+σ then D(y)∈{c′, “e”}.


The minimum distance of custom-character denoted d(custom-character), is defined as the smallest L1/Hamming distance between any two codewords in C having distinct k-prefixes:







d

(
𝒞
)

=


min





c
1

,


c
2



𝒞
:









c
1




c
2











c
1

-

c
2









The following result provides further details.


Proposition 1. Let ε:custom-charactercustom-character be an encoding mapping and let τ and σ be nonnegative integers such that








2

τ

+
σ

<

d

(

𝒞

(
ε
)

)





Then there exists a decoding mapping D:[0:custom-character]n→[0:custom-character]k∪{“e”} such that the coding scheme (custom-character,D) can correct τ errors and detect τ+σ errors.


Unless otherwise stated, ε will be separable, namely, for each row index custom-characteri∈[custom-character, the contents of row i in ε(A′) will only be a function of row i in A′(and not of the rest of the rows in A′), will be the same for all i, and will not depend on custom-character. In case of separable encoders, this application assumes that the domain and range of ε are custom-character and custom-character, respectively.


Let Im(ε)⊆custom-character be the set of all images of (a separable)custom-character. It follows from Eq. 3 (or Eq. 4) that






𝒞
=


{








i



[








b
i

·

c
i



+


b
·

1
n


:


c
i







Im



(
ε
)



,


b
i






,

b



[

0
:



]


,






i



[









"\[LeftBracketingBar]"


b
i



"\[RightBracketingBar]"






,


and









i



[








b
i


=


-

2

b




}

.





In particular, custom-character·c∈custom-character for every c∈Im(custom-character), and custom-character/2)·1ncustom-character when custom-character is even. Accordingly, examples leverage coding schemes (ε,custom-character), for which Im(custom-character) is a subset of (the intersection of custom-character with) an ambient module custom-charactercustom-character over custom-character that contains 1n and, in addition, satisfie 2τ+σ<d(custom-character), so that a fortiori 2τ+σ<d(custom-character(custom-character)). For simplicity, this application will generally assume hereafter that σ=0.


Such coding schemes can be obtained through modifications of the constructions for vector-matrix multipliers for the special case where the matrix is over custom-character. Those modifications may require some mild conditions on the length n, but otherwise incur no penalty in the redundancy. This application will start by describing in Section II the scheme which results from a modification of a Hamming-metric construction for vector-matrix multipliers. As a general construction, this construction is perhaps the simplest, and it fits both the L1-metric and the Hamming metric. Other (yet more complex) constructions exist for the L1-metric when τ is sufficiently small compared to n. This application discusses those constructions in Sections VI and VII.


In Section III (and continuing in subsequent sections as well), this application will consider a variant of Σ-CAMs that is based on cells of a ternary content-addressable memory (TCAM). TCAMs are extensions of ordinary binary CAMs where the internal state value of a CAM cell and the input to the CAM cell are allowed to take a third “wildcard” symbol * (sometimes referred to as a “don't care,” or “always match” symbol). Thus, the input and state alphabets of the CAM cells are both custom-character=custom-character∪{*}, and each CAM cell implements the function T:custom-character×custom-charactercustom-character defined in truth 100 of FIG. 1 (the restriction of the truth table 100 to its last two rows and columns coincides with the function N(·,·) in Eq. 1).


The introduction of the symbol wildcard * impacts the presently disclosed coding schemes in a nontrivial way. In particular, examples leverage a special type of a positional binary representation of integer pairs where the representations of the two integers in any pair do not share a 1 in the same position. Such representations, which this application refers to as bi-spanners, may be of an independent interest and their properties will be presented in Section IV. Compared to Σ-CAMs, the presently disclosed constructions for Σ-TCAMs incur an increase of the redundancy by roughly 60%.


This application will conclude with a discussion in Section VIII.


II. Construction Based on Hamming-Metric Codes

This applications presents a (separable) construction for vector-matrix multipliers with a modification for Σ-CAMs. For the purpose of this construction, the set of allowable error patterns is characterized by two parameters: the largest number τc columns in A that contain faulty CAM cells, and the largest number r, of L1-errors per column (“inner errors”). Namely, τc and τi bound from above—respectively—the Hamming weight and the L-norm of the error vector e. Thus, (τci)=(τ, 1) corresponds to the case where up to τ L1-errors can occur, yet with at most one such error occurring per column. Taking (τci)=(τ,n) will subsume the case where the overall number of L1-errors is at most τ. Finally, taking (τc, τi)=(τ, n) will correspond to the Hamming metric, where at most τ columns may be erroneous, without any further assumptions on the number of L1-errors per column.


Given the number of columns n, an upper bound τc on the number of erroneous columns, and an upper bound τi on the number of L1-errors per column, let p>2τi be an odd prime and let m=[log2p]. Examples select a respective Hamming-metric linear τc-error-correcting [ñ,k] code C over custom-character, which is assumed to satisfy the following three conditions.

    • a) It contains the all-one codeword.
    • b) It is systematic, i.e., there is a one-to-one mapping E:custom-character→C such that for every u∈custom-character, the image E(u) has u as its k-prefix.
    • c) It has an efficient bounded-distance decoder D:custom-charactercustom-character: for a received word {tilde over (y)}∈custom-character, the decoder returns the true error vector {tilde over (e)}∈custom-character, provided that w({tilde over (e)})≤τc.


The parameters n and ñ are related by






n
=

k
+


m

(


n
~

-
k

)

.






This application next describes the (separable) encoder, custom-character:custom-charactercustom-character, of the proposed coding scheme through its action on a given row a′∈custom-character. Regarding a′ as a vector in custom-character, it is first extended b the systematic encoder for C into a codeword











E

(

a


)

=



c
˜


=


(


c
˜

ν

)



v



[

n
~








,




Eq
.

6







of C, where custom-character=a′. Now, in the next step of the construction for vector-matrix multipliers, for each v∈[ñ-custom-character, the redundancy symbol {tilde over (c)}k+v(∈custom-character) can be expanded into its base-2 representation ãvcustom-character, i.e.,












c
˜


k
+
ν


=



a
~

ν

·

ω
m
T



,




Eq
.

7








where






ω
m

=


(

1


2



2
2







2

m
-
1



)

.





Here, instead of Eq. 7, examples set the representation ãv to be a vector in custom-character so that












(


(


2
m

-
2

)

·


c
˜


k
+
ν



)



MOD


p

=



a
~

ν

·

ω
m
T



,




Eq
.

8









    • where ωm is obtained from ωm by changing its last entry into 2m−1−1. Since (2m−2)/2<p≠2m−2, the multiplier 2m−2 in Eq. 8 is invertible modulo p. Since p≤2m−1, the expansion in Eq. 8 is indeed always possible (sometimes with two different representing m-vector ãv). Moreover, for {tilde over (c)}k+v=1 examples can take ãv=1m (which will be the selection even if another representation is possible).





Finally, as with vector-matrix multipliers, the image of a′ under custom-character is defined as:









ε
0

(

a


)

=

a
=

(


a


|

a



)



,





where






a


=



(
a
)



[

k
:

n





=


(



a
~

0





"\[LeftBracketingBar]"




a
~

1

|






"\[RightBracketingBar]"





a
~



n
~

-
k
-
1



)

.






The encoding mapping algorithm 200 of FIG. 2 summarizes the encoding process of a typical row in the Σ-CAM. It can be seen that custom-character(1k)=1n and that the linear span of Im(custom-character) over custom-character is a module custom-charactercustom-character over custom-character. To see the properties of this module, define λ:custom-charactercustom-character to each vector






y
=


(



y







"\[LeftBracketingBar]"



y
˜

0



"\[RightBracketingBar]"





y
˜

1





"\[LeftBracketingBar]"






"\[RightBracketingBar]"





y
˜



n
~

-
k
-
1



)





n






where y′=custom-character and {tilde over (y)}v=custom-character(∈custom-character), to the following image λ(y)={tilde over (y)}∈custom-character:








(

y
˜

)



[
k




=



y




MOD


p









(

y
˜

)



[

k
:


n
~






=




(


2
m

-
2

)


-
1


·


(




y
v



˜


·



ω
^


m
T


)



v



[


n
~

-
k








MOD


p





(where the MOD p operation is applied component-wise). λ is a homomorphism and it map custom-character to C. This, in turn, implies that examples can decode (efficiently) any {tilde over (y)}=λ(y) into the correct codeword in C, provided that the number of erroneous symbols in y (and therefore in {tilde over (y)}) does not exceed τc. Furthermore, if the number of L1-errors in each entry of y′ does not exceed τi, then, from the inequality p>2τi, examples can correct all these errors.


Specializing to the case where C is a normalized extended primitive BCH code over custom-character it satisfies conditions a)-c) above, and so do certain shortenings of it. The redundancy that examples obtain is bounded from above by







n
-
k




(

1
+







p
-
1

p

·
2



τ
c




·




log
p



n
~






)

·




log
2


p








with equality holding when τc«ñ (which is a regime of interest). Since [logpñ]<1+logpn, the redundancy behaves like

















log
2


p





log
2


p


·






p
-
1

p

·
2



τ
c




·

log
2



n

+

O



(

τ
c

)



,




Eq
.

9







(where the constant multiplier in the 0(τc) term is proportional to log2≈log2τi).


Example 17

For the case τi=1, where at most one L1-error is tolerated per column, examples take p=3. Accordingly,







n
-
k

=





2


log
2


3


·




4


τ
c


3



·

log
2



n

+

O



(

τ
c

)





1


.68
·

τ
c

·

log
2





n
.







In particular, for τc=1 examples get n−k=(2/log2 3)log2n+0(1)≈1.26 log2 n, and twice as much (i.e., 2.52 log2 n) when τc=2. Yet for τc=2 examples can do better by replacing the BCH codes with Preparata codes, which are linear over custom-character. The redundancy thus obtained is only 2 log2 n. Examples may get a similar redundancy also with the construction of Section VI, with the additional benefit that the two L1-errors may also occur in the same column of the array.


Example 2

For the case τi=2 examples can take p=5, in which case examples get the following approximation of the ratio between the redundancy and τc·log2 n:








n
-
k




τ
c

·

log
2



n





2

4


5


log
2


5





2
.
0


7





However, due to the rounding-up in the [log2 p] term in Eq. 9, examples will get a smaller redundancy if examples select p=7:








n
-
k




τ
c

·

log
2




n





3

6


7


log
2


7





1
.
8


3





For large τi, the (p−1)/p term in Eq. 9 becomes practically 1 and the expression for the redundancy becomes approximately (2τc-1)·log2 n.


Recall that a normalized extended primitive BCH code C over custom-character, which has length ñ=ph for some h∈custom-character, contains the all-one codeword and so it satisfies condition a). This condition also holds for any code that is obtained by shortening C on the zero entries of any nonzero codeword of C (possibly after re-scaling of the coordinates). For example, for any proper divisor s>2τc of ph-1, the code C contains t=(ph-1)/s codewords ci, i∈[tcustom-character, each of Hamming weight s+1, whose supports overlap only on the coordinate that corresponds to the code locator 0. Moreover, all their other nonzero entries are 1. Thus, for any w∈[0:t], the Hamming weight of the codeword custom-characterci of C is either s·w (if p divides w) or s·w+1 (otherwise). Namely, examples can achieve all these code lengths by shortening C while satisfying condition a). For the practical range of parameters, ph-1 examples have sufficiently many divisors (in particular, it is always divisible by 2 and p−1). For instance, when τi=1 examples can take p=3. In this case the values for h=4,5,6 are, respectively, 80=24·5, 242=2·112, and 728=23·7·13. Or when τi=2, examples can take p=5, in which case 54−1=624=24−3·13.


III. Extension to F-TCAMs

This section describes the on-access error correction problem in Σ-TCAMs, namely, Σ-CAMs that are based on TCAM cells (i.e., a particular type of CAM cell). If an entry xi of the input vector x=custom-character is the wild card symbol * then the TCAM cells along row i in the Σ-TCAM will produce the all-zero vector and therefore will not affect the output vector. Therefore, in the discussion that follows, it can be assumed that the input vector x is in custom-character.


For i∈[custom-character, this application uses the notation θi=custom-character for the internal state vector (over custom-character) in an custom-character×n Σ-TCAM. For x∈custom-character, this applications lets ai(x)=custom-character denote the vector in custom-character whose entries are the outputs of the TCAM cells along row i when the input symbol is x. Thus, ai(*)≡0, and for x∈custom-character and any j∈custom-character:







a

i
,
j


(
x
)


=


T

(

x
,

ϑ

i
,
j



)

.





In short,











a
i

(
x
)


=


T
n

(

x
,

θ
i


)


,




Eq
.

10







where Tn(x,θi)=custom-character. Eq. 3 then becomes











S

(

x
,
A

)

=









i


[








a
i

(

x
i

)



=









i
:

x
i


=
0




a
i

(
0
)



+








i
:

x
i


=
1




a
i

(
1
)






,




Eq
.

11







Note that due to the *-cells (i.e., CAM cells programmed to store wildcard symbols), the following relationship no longer holds: ai(1)=1n−ai(0).


Eq. 11 suggests that a Σ-TCAM performs the following multiplication of a vector in custom-character by a 2custom-character×n matrix over custom-character:











(


1


-

x




"\[LeftBracketingBar]"

x



)



(




A

(
0
)







A

(
1
)





)


,




Eq
.

12







where A(0)and A(1) are custom-character×n matrices whose rows are custom-character and custom-character, respectively. A caveat, however, is that the vectors ai(0) and ai(1) have to be non-intersecting, namely, they cannot have a 1 in the same position. Now, if examples apply a separable encoder custom-character (non-intersecting) (a(0))′, (a(1))′∈custom-character, examples can end up with rows







a

(
0
)


=



(



(

a

(
0
)


)







"\[LeftBracketingBar]"



(

a

(
0
)


)





)



and



a

(
1
)



=

(



(

a

(
1
)


)







"\[LeftBracketingBar]"



(

a

(
1
)


)





)






that are intersecting on their redundancy parts (a(0))″ and (a(1))″. Thus, if the custom-character×n array of TCAM cells is viewed as the custom-character×n matrix in Eq. 12, then coding schemes for Σ-TCAMs may not be separable per se. However, the dependence between the rows in the matrix may only be limited to pairs of rows—namely, a(0) and a(1)—that correspond to the same row i of TCAM cells in the physical array. On the other hand, unlike the setting in Section II, since the relationship ai(1)=1n−ai(0) no longer holds, examples can remove the requirement that the all-one vector be a codeword in (the ambient module custom-character that contains) the induced code.


An adaption of a construction for vector-matrix multipliers to Σ-TCAMs may therefore differ from what was described in Section II. Specifically, examples let C, D, and E as defined therein, except that m will be selected differently below, and examples may no longer need condition a) (which required C to contain the all-one codeword). Given two vectors (a(0))′, a(1))′∈custom-character that represent the outputs of the first k cells along a given row in the Σ-TCAM (as in Eq. 10), examples can apply a systematic encoder for C as in Eq. 6 to both of them,








E

(


(

a

(
0
)


)



)

=




c
˜


(
0
)




and



E

(


(

a

(
1
)


)



)


=


c
˜


(
1
)




,




to produce codewords {tilde over (c)}(0) and {tilde over (c)}1 of C. The expansion of Eq. 7 however, will be changed into












c
˜


k
+
v


(
x
)






a
~

v

(
x
)


·


ρ


(

mod

p

)



,

x

𝔹

,




Eq
.

13







where ãv(0) and ãv(1) are non-intersecting in custom-character and ρ is a fixed integer vector in custom-character selected so that Eq. 13 can be satisfied for any arbitrary pair ({tilde over (c)}k+v(0),{tilde over (c)}k+v(1))∈custom-character. A simple choice for such a vector ρ is







ρ
=

(


ω

m







"\[LeftBracketingBar]"


ω

m





)


,




where m′=[log2 p]. Here m=2m′=2[log2 p]. Further, examples can select ãv(0) (respectively, ãv(1)) so that its first (respectively, last) m′ entries are all zero. However, as described above the redundancy of the scheme is n−k=m(ñ−k), which means that such a simple choice for ρ may double the redundancy compared to the Σ-CAM construction of Section II. An aim is to do better than this simple solution and, to this end, examples introduce the following definition.


Let custom-character an Abelian group and let custom-character be a subset of custom-character×custom-character. A bi-spanner of custom-character over custom-character is a multiset custom-character=custom-character of elements of custom-character such that for every pair ((ξ(0), ξ(1))∈custom-character there exist disjoint subsets custom-character and custom-characterof custom-character such that











ξ

(
0
)


=








v


𝒥

(
0
)






ρ
v



and



ξ

(
1
)



=







v


𝒥

(
1
)






ρ
v




,




Eq
.

14







Equivalently, writing custom-character as a vector ρ=(ρv)v∈[mcustom-charactercustom-character there exist two non-intersecting vectors a(0), a(1)custom-character such that










ξ

(
0
)


=




a

(
0
)


·

ρ





and



ξ

(
1
)



=


a

(
1
)


·

ρ








Eq
.

15







The representation in Eq. 13 corresponds to the case where custom-character=custom-character and custom-character=custom-character=custom-character×custom-character, where q is a prime p (but in the sequel examples will consider values q that are non necessarily prime). For m≥2, let qm be the largest integer such that custom-character has a bi-spanner of size m over custom-character for any q≤qm. The rightmost column in table 300 of FIG. 3 lists several values of qm, which were found by an exhaustive computer search. For a given q=p, examples select m to be the smallest so that qm≤q, thus allowing examples to satisfy Eq. 13 for any ({tilde over (c)}k+v(0), {tilde over (c)}k+v(1))∈custom-character. For the range of values of q(>2) that is covered by table 300, it can be seen that such an m will be smaller than the simple choice/solution described above. Moreover, as shown in the next section, this in fact holds in general. Examples will use bi-spanners later on in other constructions as well.


Example 3

Referring to the parameters in Example 1, for p=3=q3 examples can take m=3 (while the simple construction would require taking m=2m′=4). The vector ρ=(112) is bi-spanner of custom-character over custom-character. Table 400 of FIG. 4 shows respective representations of the elements of custom-character (as in Eq. 15) and values for the contents of the TCAM cells (due to symmetry, it suffices to list only pairs (ξ(0), ξ(1)) where ξ(0)≥ξ(1). The remaining pairs switch between a(0) and a(1) and, respectively, between 0 and 1 in θ). Compared to the Σ-CAM construction in Example 1, where examples have taken m=[log2 3]=2, the Σ-TCAM construction incurs an increase of the redundancy by a factor of 3/2.


Example 4

Referring to the parameters in Example 2, if examples select p=5 then, since q4=6, examples can take m=4 (instead of m=3 in that example), in which case the redundancy for Σ-TCAMs will increase only by 4/3 compared to Σ-CAMs:








n
-
k




τ
c

·

log
2



n





4
3

·


2

4


5


log
2


5






2
.
7


6





The vector ρ=(123) is a bi-spanner of custom-character over custom-character , and table 500 of FIG. 5 shows the respective contents of the TCAM cells.


Interestingly, the savings that examples gained in Example 2 by selecting p=7 therein does not propagate to Σ-TCAMs: for p=7 examples now take m=5, resulting in an increase by 5/3 of the redundancy compared to Σ-CAMs:








n
-
k




τ
c

·

log
2



n





5
3

·


3

6


7


log
2


7






3
.
0


5





IV. Properties of Bi-Spanners

In this section, the application presents certain properties of the sequence (qm)m that was defined in Section III. In particular, the application shows that the sequence (log2 qm)/m converges to a limit which is greater than 0.622. Thus, a strategy of selecting m for a given q=p to be the smallest so that qm≥q yields a redundancy for Σ-TCAMs which is (asymptotically) less than 1/0.622<1.607 times the respective redundancy for Σ-CAMs (while the simple approach referenced above doubles the redundancy).


The construction of bi-spanners for the case where custom-character=custom-character and






𝒳
=


𝒳
r

=

{


(


ξ

(
0
)


,


ξ

(
1
)



)









0

2

:

ξ

(
0
)



+

ξ

(
1
)




r


}






for some positive inter r, may have an additional requirement that the entries of ρ are all positive and that custom-characterρv=∥ρ∥=r. For any (ξ(0), ξ(1))∈custom-character let ξ(*)=r−ξ(0)−ξ(1) and for disjoint subsets custom-character,custom-charactercustom-characterlet custom-character=[mcustom-character\(custom-charactercustom-character). Under the condition ∥ρ∥=τ examples can rewrite Eq. 14 as:











ξ

(
x
)


=







v


𝒥

(
x
)






ρ
v



,


for


all


x



𝔹
*


,




Eq
.

16







Since max{ξ(0)(1), ξ(*)} can be as small as ┌r/3┐ and since ρ is assumed to be all-positive, examples can deduce from Eq. 16 that ρv≤┌r/3┐ for every v∈custom-character. In particular,











ρ

m
-
1




1
+




1
2









v


[

m
-
1








ρ
v






,




Eq
.

17







Conversely, the infinite sequence (ρm+)m≥0 which starts with ρ0+=1 and satisfies Eq. 17 with equality for all m>1 generates a sequence of integers











r
m
+

=








v


[
m







ρ
v
+



,

m



+






Eq
.

19









    • each being the largest r∈custom-character for which custom-character has an all-positive bi-spanner ρ of size m such that ∥ρ∥=r. From Eq. 17 (when stated with equality) and Eq. 18 examples get that











ρ
m
+

=

1
+




r
m
+

/
2





,

m



+


,






    • and by induction on m it readily follows that both ρm+ and rm+ scale like (3/2)m. Namely, the growth rate of these sequences is











lim

m







log
2



ρ
m
+


m


=




log
2



r
m
+


m

=



log
2

(

3
/
2

)




0
.
5


8

5







Table 300 of FIG. 3 presents the first few values of (ρm+)m>1.


Now, for any positive integer q≥ρm+=1+[rm/2], rm+≥2(q−1) and, so, any bi-spanner custom-character of custom-character over custom-character (when taken modulo q component wise) is also a bi-spanner of custom-character over custom-character. Hence, qm≥ρm+. In fact, examples can do better if ρ may have negative entries as well, even if examples still require that the sum of the entries in ρ equals r. Denoting by rm the largest integer r∈custom-character for which custom-character has a bi-spanner custom-charactercustom-character such that custom-characterρm=r, examples have rm≥rm+, and the inequality is strict for large enough m (see Proposition 2 and Eq. 22 below; for m=6 examples already have r6≥19>17=r6+).


Examples can go one step further and look at bi-spanners of the set [qcustom-character=[qcustom-character×[qcustom-character over custom-character. Letting zm denote the largest integer q for which [qcustom-character has a bi-spanner of size m over custom-character, the sequences (rm)m and (zm)m are related by











1
+




r
m

/
2






z
m



1
+

r

m
+
1




,




Eq
.

19







Indeed, any bi-spanner of custom-character over custom-character is also a bi-spanner of custom-character. Moreover, any bi-spanner of the latter is also a bi-spanner of custom-character and can be made to sum to zero by adding at most one more element.


The sequence (zm)m is non-decreasing and it is super-multiplicative, i.e.,











z


m
1

+

m
2






z

m
1


·

z

m
2




,




Eq
.

20







for any m1, m2custom-character: if ρ and {tilde over (ρ)} are, respectively, bi-spanners of custom-character and custom-character over custom-character, then











(

ρ




"\[LeftBracketingBar]"


q
·

ρ
˜




)



and



(


ρ
˜





"\[LeftBracketingBar]"



q
˜

·
ρ



)


,




Eq
.

21







are bi-spanners of [q·custom-character. Hence, by Fekete's lemma examples get:








lim

m







log
2



r
m


m



=

(
19
)





lim

m







log
2



z
m


m



=

(
20
)




sup
m






log
2



z
m


m

.







The limit in the last equation (which is the growth rate of both (zm)m and (rm)m) will be denoted hereafter by custom-character. Lower bounds, zm*, on the first few values of zm, which were found by a computer search, are shown in table 300 of FIG. 3. In particular, examples get:














log
2


1

1

5


1

1


>


0
.
6


22


,




Eq
.

22







Here, custom-character is strictly larger than the growth rate of (ρm+)m. In the next proposition the application shows that custom-character is also the growth rate of (qm)m.


Proposition 2.







lim

m







log
2



q
m


m


=




Proof any bi-scanner of custom-character over custom-character, when taken modulo q, is a bi-spanner of custom-character over custom-character. Therefore, qm≥zm and, so,













lim

m




_




log
2



q
m


m





lim

m







log
2



z
m


m



=




Eq
.

23







Conversely, given m∈custom-character, let q=qm and suppose that ρ1 is a bi-spanner of custom-character of size m over custom-character. Examples can regard ρ1 as an integer vector in custom-character. Let ρ2 be a smallest bi-spanner of custom-character over custom-character and let vm denote its size. Note that vm≤2[log2m] (with equality attained when the bi-spanner is constructed simply). Accordingly,










ρ
=

(


ρ
1





"\[LeftBracketingBar]"



-
q

·

ρ
2




)


,




Eq
.

24







is a bi-spanner of custom-character over custom-character. Indeed, given any (ξ(0), ξ(1))∈custom-character, there exist a(0), a(1)custom-character such that








ξ

(
x
)


=



a

(
x
)


·

ρ
1
T




MOD


q


,

x

𝔹

,




which means that there exist β(0), β(1)custom-character such that








ξ

(
x
)


=



a

(
x
)


·

ρ
1
T


-

q
·

β

(
x
)





,

x


𝔹
.






Since custom-character is a bi-spanner of custom-character, there exist b(0), b(1)custom-character such that








β

(
x
)


=


b

(
x
)


·

ρ
2
T



,

x


𝔹
.






The last two equations imply that ρ in Eq. 24 is a bi-spanner of custom-character of size m+vm over custom-character. Hence, qm=q≥zm+vm and, so,













lim

m




_




log
2



q
m


m


=





lim

m




_




log
2



q
m



m
+

v
m







lim

m







log
2



z

m
+

v
m





m
+

v
m





=


,




Eq
.

25







The result follows from Eq. 23 and Eq. 25.


Accordingly, examples can conclude that







Ω

(


(

3
/
2

)

m

)

=


ρ
m
+



1
+




r
m

/
2






z
m



q
m



3

m
/
2







where the last inequality follows from a simple counting argument. In particular,








log
2

(

3
/
2

)

<
0.622





(


log
2


3

)

/
2

<
0.793




V. Single L1-Error Correction in Σ-CAMs and Σ-TCAMs

This section describes the case τ=1 (where there is at most one L1-error in the whole array). This case corresponds to the parameters (τc, τi)=(1,1) in Section II. The construction therein for Σ-CAMs yields a redundancy of approximately 1.26 log2n (see Example 1), and the adaptation in Section III to Σ-TCAMs results in an increase of that redundancy by a factor of 3/2 to approximately 1.89 log2 n (see Example 3).


This section presents a construction for Σ-CAMs with a redundancy of m=[log2(2n+1)]. A respective construction for Σ-TCAMs will have redundancy which is the smallest m such that qm≥2n+1. In particular, for sufficiently large n, this redundancy may be less than 1.607·log2(2n+1) (see Proposition 2 and Eq. 22).


Given a code length n, let M=[log2(2n+1)], which will be the redundancy of the construction so that k=n−m. Let






α
=



(

α
j

)



j


[
n





=

(


α






"\[LeftBracketingBar]"


α




)






be a vector of code locators in custom-character (where a′=custom-character and a″=custom-character) that satisfies the following conditions.

    • i) The entries of a are nonzero elements in [2n+custom-character.
    • ii) For any two distinct indexes i,j∈[ncustom-character (except when both are in [k:custom-character):







α
i





α
j



and



α
i


+

α
j





2

n

+
1







    • iii) a″=ωm=(1222 . . . 2m−1).





Such vectors a can be constructed for every n>4.


The single-L1-error-correction encoder, ε1:custom-charactercustom-character, maps a′∈custom-character to a=(a′|a″)∈custom-character such that













-

α



·


(

α


)






MOD



(


2

n

+
1

)


=


α


·

ω
m




,




Eq
.

26







(i.e., a″ is the base-2 representation of the left-hand side of Eq. 26). Thus, the code that is induced by ε1 is a subset of the following module over custom-character:








1

=

{


c




n

:

c
·

α





=
0

}





It follows that under conditions i)-iii), one can always correct any one change of ±1 occurring in the first k coordinates in any vector in this module.


To adapt this scheme to Σ-CAMs, it suffices to require that 1ncustom-character. This application shows that when n>9 is not a power of 2 (i.e., n≠2m−2), this can always be achieved by properly selecting the code locators.


Examples start off by taking the code locators so that the entries of a′=custom-character form the set








{

α
j

}



j


[
k





=



{

j
+
1

}



j


[
n







\(




{

2
j

}



j


[

m
-
1








{


2

n

+
1
-

2

m
-
1



}


)







and a″=custom-characterm (note that 2m−1>n, which, by condition (ii), requires 2n+1−2m−1 to be excluded from a′). Assuming that the entries of a′ are in increasing order, ak−1=n, except when n=2m−1, in which case ak−1=n−1. Denote by μ1 the first moment (modulo 2n+1) of the code locators:











μ
1

=



1
n

·

α





MOD



(


2

n

+
1

)



,




Eq
.

27







If μ1=0 examples are done (yet this rarely happens). By negating ak−1(∈{n,n−1}) examples can get μ1 to be such that









(


μ
1

/
2

)



MOD



(


2

n

+
1

)




[





n
/
2



-
1

,




3

n
/
2



+
1


]


,




in which case there exist Ω(k2) index pairs (i,j) such that i and j are distinct in custom-character and ai+aj=(μ1/2)MOD(2n+1). The first moment μ1 may then vanish modulo 2n+1 when both ai and aj are negated. The Ω(k2)-term may be positive for every n>9.


Taking n in the above construction to be odd and adding an extra parity bit examples can also detect two L1-errors (corresponding to (τ,σ)=(1,1)), and the extended module will contain the vector 1n+1.


The modification of the construction for vector-matrix multipliers for Σ-TCAMs is similar in spirit to what examples have done in Section II. Namely, examples change condition iii) to read:

    • iii) *a″=custom-character, where custom-character is a bi-spanner of custom-character over custom-character.


Accordingly, m should be selected so that qm≥2n+1 to guarantee such a bi-spanner. In order to detect two L1-errors, examples use two extra bits to record the two possible values of the parity bit.


Example 5

Suppose that a single-L1-error-correcting coding scheme is sought for a Σ-CAM with k=100. Examples can select the redundancy m to be the smallest so that








2
m




2

n

+
1


=



2


(

k
+
m

)


+
1

=

201
+

2

m







This results in m=8.


For Σ-TCAMs, examples can select m to be the smallest so that








q
m




2

n

+
1


=



2


(

k
+
m

)


+
1

=

201
+

2

m







Table 300 of FIG. 3 stops at smaller values of qm, but examples can use the general recurrence for (ρm+)m (which is Eq. 17 with equality therein) to obtain a feasible m and an explicit expression for ρ. Specifically, for m=12, 13, 14 examples get pm+=105, 158, 237, respectively. Accordingly, examples can take m=14 since








q
14



ρ
14
+


=


237

229

=

201
+

2
·
14







Alternatively, examples can use the super-multiplicativity from Eqs. 20 and 21. It can be seen from table 300 that








z
13




z
2

·

z
11




2
·
115


=


230

227

=

201
+

2
·
13







which means that examples can take m=13.


VI. Double L1-Error Correction

This section describes the case τ=2 (i.e., at most two L1-errors in the whole array), which subsumes the scenario (τci)=(2,1) in the construction of Section II. As pointed out in Example 1, that construction for Σ-CAMs has a redundancy of approximately 2.52 log2n, whereas the construction in this section, has a redundancy of 2·log2n+0(1).


Let p>3 be a prime and define n1=(p−1)/2, m=[log2p], k=n1−m, n2=n1+m, and n=n2+1. The (separable) coding scheme has dimension k, length n and, therefore, redundancy n−k=2m+1. For the construction examples can use a vector of code locators a=custom-character that satisfies conditions i)-iii) in Section V. This application defines a3=custom-character. The encoding mapping, custom-character:custom-charactercustom-character, maps a vector a′∈custom-character to a vector a∈custom-character so that the following four conditions hold.












(
a
)



[
k




=

α



,




Eq
.

28
















(
a
)



[

n
1





·

α





0


(

mod

p

)



,




Eq
.

29
















(
a
)



[

n
1





·


(

α

[
3
]


)









(
a
)



[


n
1

:

n
2






·

ω
m






(

mod

p

)



,




Eq
.

30
















(
a
)



[


n
1

:
n





·

1

m
+
1






0



(

mod

2

)



,




Eq
.

31







Eq. 29 is achieved by applying to a′ the encoder custom-character in Eq. 26 (with n therein replaced by n1=(p−1)/2). Eq. 30 means that custom-character forms the base-2 representation of custom-character·(a[3])T MOD p. Eq. 31 means that the last entry in a is the parity bit of custom-character. The induced code of custom-character is a subset of the module, custom-character, over custom-character which consists of all vectors c∈custom-character that satisfy conditions Eq. 29-31 (when formulated with c instead of a). Here, it may be noted that d(custom-character)≥5, i.e., one can correct any pattern of up to two changes of ±1 occurring within the first k coordinates (possibly in the same coordinate) in any vector in custom-character.


As described in Section V, examples adapt a coding scheme for vector-matrix multipliers to Σ-CAMs by guaranteeing that 1n belongs to custom-character. Repeating what we what was described in Section V, it can be assumed that a is such that there exist Ω(k2) index pairs (i,j) such that i and j are distinct in custom-character and ai+aj=(μ1/2) MOD p, where μ1 is the first moment of the code locators, as defined in Eq. 27. Denoting the third moment of the code locators by







μ
3

=



1
n

·


(

α

[
3
]


)






MOD


p





it may be nonzero modulo p (along with a zero first moment). The following describes a method for obtaining such code locators.


Suppose that (i,j) and (i′,j′) are two of the above Ω(k2) index pairs for which








α
i

+

α
j





α

i



+

α

j







μ
1

/
2



(

mod

p

)






Examples may not be able to obtain








α
i
3

+

α
j
3





α

i


3

+

α

j


3





μ
3

/
2


(

mod

p

)






This, in turn, may imply that for at least one of the pairs, say (i,j), the value of μ1 will become zero (modulo p) once examples negate ai and aj, while p3 may be nonzero. Indeed, otherwise there would be two distinct vectors in custom-character with L1-norm 2 that would be in the same coset of the module custom-character: one that has 1s at the positions {i,j} (and 0 otherwise) and the other that has 1s at positions {i′,j′}. This, however, would mean that d(custom-character)≤4, which is a contradiction.


Assuming now that μ3≈0, examples modify Eq. 30 in two ways. First, examples replace the vector Ωm by the vector {circumflex over (Ω)}m which is obtained by changing the last entry of Ωm into 2m−1−1 (as shown in Eq 8). Secondly, examples multiply the right-hand side of Eq. 30 by the following constant








μ
3



2
m

-
2




MOD


p




which is (nonzero and) well defined, since p>2m−1 and therefore p does not divide 2m−2. Thus, Eq. 30 becomes








(


2
m

-
2

)

·


(
a
)



[

n
1





·


(

α

[
3
]


)

T





μ
3

·


(
a
)



[


n
1

:

n
2






·



ω
^

m
T

(

mod

p

)






(compare with Eq. 8). It can be verified that the resulting module, custom-character is still 2-L1-error-correcting and that 1n satisfies Eq. 29 and Eq. 32. Moreover, when m is odd it also satisfies Eq. 31, in which case 1n custom-character.


When m is even, examples can further modify the construction to have one more redundancy bit (i.e., set n=n2+2) and replace Eq. 31 by, say,






{





a

n
2


=

a

n
1











(
a
)



[


n
1

:
n





·

1

m
+
2

T




0



(

mod


2

)










(so that 1n∈{circumflex over (M)}2).


In the case of Σ-TCAMs, the presently disclosed modification to the scheme for vector-matrix multipliers is similar to what was described in Sections III and V. Specifically, examples select m to be the smallest so that qm≥2n+1. Examples set








n
1

=


p
-
1

2


,




k=n1−m, n2=n1+m, and n=n2+2. Examples also take a so that custom-character is a bi-spanner, ρ, of custom-character over custom-character (see condition (iii) in Section V). Eqs. 28-31 now become, for each x∈custom-character:










(

a

(
x
)


)



[
k







=


(

a

(
x
)


)











(

a

(
x
)


)



[

n
1





·

a
T







0


(

mod

p

)










(

a

(
x
)


)



[

n
1





·


(

a

[
3
]


)

T









(

a

(
x
)


)



[


n
1

:

n
2






·


ρ
T

(

mod

p

)










(

a

(
x
)


)



[


n
1

:
n





·

1

m
+
2

T








0


(

mod

2

)



,







where a(0) and a(1) are non-intersecting (in particular, note that two bits are allocated for the parity in the last equation).


VII. Multiple L1-Error Correction

This section describes the correction of an arbitrary number τ of L1-errors.


Given the designed number of correctable L1-errors τ, let p>2τ be a prime and define n1=(p−1)/2 and m=[log2p]. Also, let a be an integer vector in custom-character that satisfies conditions (i)-(iii) in Section V. For i∈custom-character, denote by a[i] the integer vector custom-character, and let HBer=HBer(a, τ) denote the τ×n1 integer matrix whose rows are a[2i+1], i∈custom-character. Namely, HBer, when seen as a matrix over custom-character, is a parity-check matrix of a Berlekamp code CBer over custom-character. Let P be any τ×τ integer matrix such that det(P)≢0(modp) and define the following module over custom-character:












B

e

r


=


{

c





n
1


:
c


MOD






p



C

B

e

r



}








=



c





n
1


:

cH
Ber
T


P


MOD


p


=
0


}







When p>2τ, the minimum Lee distance of CBer is known to be at least 2τ+1 and, so, =











d

(



B

e

r


)




2

τ

+
1


,




Eq
.

33







The above holds also for any coset of custom-character within custom-character.


Let custom-character:custom-charactercustom-character be the encoder in Section V (with n therein replaced by n1) and let a=(a′|a″)∈Im (custom-character) (where a′=custom-character). Examples can compute the following syndrome vector {tilde over (s)}∈custom-character of a:







s
˜

=



(


s
˜

v

)



v


[
τ





=

a


H
Ber
T


P


MOD






p






If, in addition, examples select P so that its first column is the standard unit vector (100 . . . )T, examples get (from a∈Im(custom-character)) that {tilde over (s)}0=0. Examples can expand each entry {tilde over (s)}v in {tilde over (s)} to its base-2 representation svcustom-character:












s
˜

v

=


s
v

·

ω
m
T



,




Eq
.

34







Consider now an encoding mapping custom-character:custom-charactercustom-character defined by











ε
:
a



(

a




"\[LeftBracketingBar]"

s


)


,




Eq
.

35








where





s
=


(


s
1





"\[LeftBracketingBar]"


s
2



"\[RightBracketingBar]"









"\[LeftBracketingBar]"


s

τ
-
1




)




𝔹


(

τ
-
1

)


m


.






If y=custom-character(a)+e where ∥e∥≥τ, then, from Eq. 33 and Proposition 1, examples can recover a′ from y under the assumption that the (τ−1)m-suffix of y is error-free. This assumption can be guaranteed by applying to s a (second) encoder of a linear τ—L1-error-correcting code of dimension (τ−1)m. Examples can continue this process recursively, but if examples just do (2τ+1)-fold repetition in the second step examples can end up with a total redundancy of











τ





log
2

(


2

n

+
1

)




+

O

(


τ
2




log
2

(

τ


log
2


n

)


)


,




Eq
.

36







where n is the ultimate code length. Hence, for n large compared to r, most of the redundancy is due to the first encoding level.


To adapt this scheme to Σ-CAMs, examples select a as in Section VI and, as was done therein, examples replace the vector Ωm in Eq. 34 by {circumflex over (Ω)}m. Consider the syndrome of the all-one vector in custom-character:







s
˜

=


1

n
1





(

H

B

e

r


)

T


P


MOD






p





where the first column in P is the standard unit vector. Recalling that










1

n
1


·

α
T






0





(

mod

p

)

,
yet







1

n
1


·


(

α

[
3
]


)

T






0





(

mod

p

)

,







it follows that {tilde over (s)}0=0, yet custom-character≈0. By properly selecting P, examples can have









(

s
˜

)



[

1
:
τ





=


(


2
m

-
2

)

·

1

τ
-
1




,




in which case examples get that the image of 1n1 under ε in Eq. 35 is the all-one vector 1n1+(τ−1)m. This argument holds for any subsequent recursive encoding step and, in particular, when that step is just repetition. Thus, this application shows that under the choice of the parameters as in Section VI, the all-one vector is guaranteed to be a codeword of the induced code.


Comparing Eq. 36 with the redundancy of the construction in Section II, if examples substitute τc=τ in Eq. 9, examples get in many cases values than are smaller than those obtained in Eq. 36 (even when τi>1). This is because of the 0(·) term in Eq. 36, which becomes non-negligible when τ is not sufficiently small compared to n.


Finally, for Σ-TCAMs, examples replace wm in Eq. 34 by a bi-spanner of custom-character over custom-character.


VIII. Ordinary BCAMs and TCAMs

While this application deals with error correction schemes for Σ-CAMs and Σ-TCAMs, such schemes can be useful for the more ubiquitous BCAMs and TCAMs (which can be seen as Σ-CAMs/Σ-TCAMs where the integer sum of the outputs of the CAM cells along each column is replaced by the complement of their logical—“or”). By replicating each column 2τ+1 times examples can recover from any τ errors in the array, yet a with prohibitively large redundancy.


The error-correction problem for BCAMs has been addressed in several papers. A current proposed solution involves a hardware modification of the sense amplifiers along each match line: a “match” is redefined to mean that the (integer) sum of the outputs of the CAM cells along a column does not exceed a prescribed threshold t (thus, the modified device is in effect a quantized Σ-CAM; ordinary BCAMs correspond to t=0). In such a device, examples can encode the contents of each column, and respectively encode also the input vector, using a t-errorcorrecting binary code. The reading will then be the same/similar as that of an error-free BCAM, provided that the number of errors per column does not exceed t. As for TCAMs, it can be shown that this approach would require no less redundancy than a simple (2t+1)-fold repetition of the contents of each column (resulting in a prohibitive large redundancy).



FIG. 6 depicts an example CAM cell 600, in accordance with examples of the presently disclosed technology.


As alluded to above, CAMs can be categorized as “binary” or “ternary”. A binary CAM (“BCAM”)—composed of constituent BCAM cells—operates on an input pattern (and stores data) containing binary bits of “0” and “1”. A ternary CAM (“TCAM”)—composed of constituent TCAM cells—operates on an input pattern (and stores data) containing binary bits of “0”, “1”, and an “X” value. The “X” value is sometimes referred to as a “don't care” value or a “wildcard” value. In a search on the input pattern in a TCAM, an “X” will return a match on either a “0” bit or a “1”. Thus, a search on the input pattern “10X1” will return a match for both “1001” and “1011.”


As alluded to above, CAM-based circuits of the presently disclosed technology may utilize BCAMs/BCAM cells or TCAMs/TCAM cells depending on implementation.


CAM cell 600 illustrates an example of a 4 transistor-2 memristor (4T2M) TCAM cell that can be used in a CAM-based circuit of the presently disclosed technology. For example, CAM cell 600 may illustrate an example task-driven CAM cell programmed to store a task-driven value comporting with a computational task. CAM cell 600 may also illustrate an example redundancy CAM cell programmed to store a redundancy value.


As depicted, CAM cell 600 comprises a switching transistor T1 connected to a data line SL and a switching transistor T2 connected to an inverted data line SL. As alluded to above, the voltages across data line SL and inverted data line SL may correspond with a value/entry (e.g., a voltage signal) of an input vector applied to a CAM of which CAM cell 600 is a part. For example, voltage across data line SL may correspond with the value/entry of the input vector (e.g., a logical one) while the voltage across the inverted data line SL may correspond with a negated version of the value/entry of the input vector (e.g., a logical zero). A memristor M2 is connected to switching transistor T1 and a memristor M1 is connected to switching transistor T2. As depicted, gate terminals of switching transistors T1 and T2 are connected to a word line WL—which biases switching transistors T1 and T2. Immediately prior to and during a searching/matching operation, the voltage of word line WL may be increased above a threshold value, activating switching transistors T1 and T2. When switching transistor T1 is activated, switching transistor T1 may provide an electrical connection between data line SL and memristor M2. By contrast, when switching transistor T1 is not activated (i.e., when the voltage across word line WL is lower than the threshold value), data line SL and memristor M2 may be electrically disconnected. Likewise, when switching transistor T2 is activated, switching transistor T2 may provide an electrical connection between inverted data line SL and memristor M1. By contrast, when switching transistor T2 is not activated (i.e., when the voltage across word line WL is lower than the threshold value), inverted data line SL and memristor M1 may be electrically disconnected. Accordingly, the inclusion of switching transistors T1 and T2 can ensure that memristors M1 and M2 are disconnected from the data lines of CAM cell 600 when searching/matching operations are not being performed, which can reduce overall power consumption for CAM cell 100.


As depicted, memristors M1 and M2 are connected in series to form a resistive divider 602. The output voltage of resistive divider 602 (i.e., the voltage on common node G) is applied to a gate of a match line-transistor T4 to control activation of match-line transistor T4. When match line-transistor T4 is activated, it may discharge (i.e., “pull-down”) voltage of match line ML. For example, if the voltage applied to the gate of match line-transistor T4 exceeds a threshold value, match line-transistor T4 will activate and discharge (i.e., “pull-down”) the voltage across match line ML—returning a mismatch. By contrast, when the voltage applied to the gate of match line-transistor T4 is less than or equal to the threshold value, match line-transistor T4 may not activate. Accordingly, match line-transistor T4 will not discharge (i.e., “pull-down”) the voltage across match line ML—thus returning a match. While in the specific example of FIG. 6, “pull-down” logic is described, it should be understood that in other examples CAM cell 600 may implement “pull-up” logic instead.


As alluded to above, CAM cell 600 can be programmed to store a task-driven value or a redundancy value by programming conductance of memristors M1 and M2. While the programmed conductances of memristors M1 and M2 will generally remain the same unless re-programmed, the output voltage of resistive divider 602 (i.e., the voltage on common node G) will change based on the value of the voltages received by memristors M2 and M1 from data line SL and inverted data line SL respectively. For example, memristors M1 and M2 can be programmed to a first conductance state (e.g., a logical zero conductance state—which may correspond with a negated literal) that causes the output voltage of the resistive divider to be high (e.g., exceed a threshold value) when the voltages received from data line SL and inverted data line SL represent a logical one—thus activating match line transistor T4 and returning a mismatch when the voltages across data line SL and inverted data line SL represent a logical one. By contrast, memristors M1 and M2 can be programmed to a second conductance state (e.g., a logical one conductance state—which may correspond with a non-negated literal) that causes the output voltage of the resistive divider to be high (e.g., exceed a threshold value) when the voltages received from data line SL and inverted data line SL represent a logical zero—thus activating match line transistor T4 and returning a mismatch when the voltages across data line SL and inverted data line SL represent a logical zero. In various examples memristors M1 and M2 can be programmed to a third conductance state (e.g., a wildcard conductance state) that causes the output voltage of the resistive divider to remain low (e.g., below a threshold value) regardless of whether the voltages received from data line SL and inverted data line SL represent a logical zero or a logical one—thus ensuring match line transistor T4 remains un-activated, and returns a match when the voltages across data line SL and inverted data line SL represent a logical zero or a logical one.


As depicted, a service line transistor T3 can work in concert with switching transistor T1 and/or switching transistor T2 to program conductances of memristors M1 and M2 using a service line SX.


It should be understood that CAM cell 600 is just one example of a CAM cell that may be included in a CAM-based circuit of the present technology. In other implementations, CAM cell 600 may comprise a BCAM cell, or a TCAM cell of different configuration, such as a CMOS-based CAM cell, a six transistor-two memristor (6T2M) CAM cell, a 3 terminal CAM cells, a 16 transistor (16T) TCAM cell, etc.



FIG. 7 depicts an example CAM-based circuit 700, in accordance with examples of the presently disclosed technology.


As depicted, CAM based circuit 700 may comprise a Σ-CAM 710, and one or more processing resources (not depicted for brevity) operative to program the CAM cells of Σ-CAM 710, and detect one or more errors in an output vector (c) from Σ-CAM 700. As depicted, the output vector (c) may be composed of constituent values c0-Cn−1.


As described above, a Σ-CAM (sometimes referred to as Hamming-distance CAM) is a particular type of CAM configured to sum outputs from CAM cells arranged along a respective column. Said differently, a Σ-CAM may refer to a CAM which consists of an custom-character×n array of CAM cells, with each CAM cell (i,j)∈custom-character×custom-character implementing the function xcustom-characterN(x,ai,j), for some internal state value ai,jcustom-character. Programmed internal states of the Σ-CAM can be represented as an array (matrix) A=custom-charactercustom-character, with ai standing for row i and Aj(=(A){j}) for column j. An input to the Σ-CAM may comprise a row vector (e.g., search key) x=custom-charactercustom-character, with xi serving as the input to all the CAM cells along row i. A respective column (corresponding to a match line) j of the Σ-CAM can compute an integer sum of the outputs of the CAM cells arranged along the respective column j—i.e., cj=custom-characterN(xi, ai,j)=w(x−Aj). These integer sums form the output row vector—i.e., c=custom-character∈[0:custom-character]n—of the Σ-CAM. Accordingly, the Σ-CAM computes the Hamming distances between the input vector and the contents of the CAM cells along each of the n columns in the Σ-CAM.


As an illustrative example, Σ-CAM 710 comprises a number (n) columns i.e., column0 at the farthest left of FIG. 7 to columnn−1 at the farthest right of FIG. 7. Σ-CAM 710 also comprises a number (l) rows—i.e., row0 at the top of FIG. 7 to rowl−1 at the bottom of FIG. 7.


For ease of reference, the CAM cell arranged along column0 and row0 may be referred to as CAM cell0,0. Likewise, the CAM cell arranged along columnn−1 and rowl−1 may be referred to as CAM celln−1,l−1, and so on.


The constituent CAM cells of Σ-CAM 710 may comprises various types of CAM cells, including BCAM cells or TCAM cells (e.g., the example CAM cell 600 of FIG. 6).


As depicted, CAM cells arranged along a common row of Σ-CAM 710 are electrically connected along a common dataline. For example, the CAM cells of row0 are electrically connected along a first data line, and accordingly may each receive an input value x0. Likewise, the CAM cells of rowl−1 are electrically connected along an Ith data line, and accordingly may each receive an input value xl−1. Here, the input values x0-xl−1 may comprise constituent values of an input vector (x).


As depicted, CAM cells arranged along a common column of Σ-CAM 710 are electrically connected along a common match line. For example, the CAM cells connected along column0 are electrically connected along match line ML0. Likewise, the CAM cells connected along columnn−1 are electrically connected along match line MLn−1.


As depicted (and as described above), Σ-CAM 710 may be configured to sum outputs from CAM cells arranged along a respective column. The final sum may be output by the match line associated with the respective column.


For example (and as described above in conjunction with FIG. 6), when a CAM cell (e.g., CAM cell 600) returns a mismatch, the CAM cell may be configured to pull-down voltage of the match line the CAM cell is connected to. Accordingly, Σ-CAM 710 can effectively sum/count the number of mismatches (or conversely, the number of matches) returned by CAM cells of a respective column by reading the final voltage output of the match line associated with the respective column.


For concept illustration, FIG. 8 depicts an example graph 800 illustrating comparisons between a threshold voltage and voltage output of a match line associated with a respective column of Σ-CAM 710. As depicted, the threshold voltage is 0.9 a.u. volts and the match line voltage is compared to the threshold voltage at three discrete times: t=10.00 a.u., seconds; t=10.25 a.u., seconds; and t=10.50 a.u., seconds. If the match line returns three matches (as illustrated by curve 802), the match line voltage will exceed the threshold voltage three times. If the match line returns two matches (as illustrated by curve 804), the match line voltage will exceed the threshold voltage two times. If the match line returns one match (as illustrated by curve 806), the match line voltage will exceed the threshold voltage one time. If the match line returns zero matches (as illustrated by curve 808), the match line voltage will exceed the threshold voltage zero times. Here, the threshold voltage and the sensing/comparison times may be strategically selected to fit such a relationship.


As described above, CAM-based circuit 700 can detect and correct errors in Σ-CAM 710 while Σ-CAM 710 is performing a computational task (i.e., “on-access” error correction).


This “on-access” error correction methodology can improve upon “off-line” error detection/correction methodologies. Such “off-line” error detection/correction methodologies generally involve testing procedures which would disrupt normal operation of a hardware accelerator during a computational task, and thus must be performed “off-line.” For example, an alternative methodology for detecting errors in a CAM could involve applying a sequence of test vectors to a CAM in order to detect programming and other circuit-based errors. Applying the test vectors may be unrelated to—and would otherwise disrupt—a computational task the CAM is being used to perform. Thus such a methodology would be performed “off-line.” By contrast, CAM-based circuit 700 can correct an output vector generated by Σ-CAM 710 while Σ-CAM 710 is performing a designated computational task. Accordingly, such a methodology may be more computationally efficient (i.e., consume less processing resources, time, power consumption, etc.) than alternative “off-line” error detection and correction methodologies.


CAM-based circuit 700 realizes the advantages provided by “on-access” error detection and correction by leveraging an intelligent insight that a Σ-CAM (e.g., Σ-CAM 700) operates analogously to a vector-matrix multiplier (sometimes referred to as a dot product engine). Leveraging such insight, CAM-based circuit 700 can adapt an error-correction methodology for vector-matrix multipliers for Σ-CAMs. The adapted methodology involves adding redundancy columns to Σ-CAM 710. CAM-based circuit 700 can leverage one or more processing resources (e.g., an encoder) to compute redundancy values for the redundancy columns such that Σ-CAM 710 stores a codeword of a linear code (C) in each row. To further adapt the methodology for a new/particularized type of hardware accelerator—i.e., Σ-CAM 710—CAM-based circuit 700 can modify the linear code (C) used to compute redundancy values. Namely, CAM-based circuit 700 can modify the linear code (C) so that it includes the all-one vector. With this modification, CAM-based circuit 700 can detect and correct errors in the output vector (c) from Σ-CAM 710 based on this modified/particularized linear code (C).


For example (and as described above), CAM-based circuit 700 may comprise: a) Σ-CAM 710 comprising CAM cells arranged into the number (l) rows and the number (n) columns, wherein the Σ-CAM 710 is configured to sum outputs from a number (l) CAM cells connected along a respective column of the number (n) columns; and b) one or more processing resources (not depicted) operative to program the CAM cells to store a matrix (A) having dimensions (l×n), wherein: i) each row of the matrix (A) comprises a codeword of a linear code (C), and ii) the linear code (C) includes an all-one vector of dimension (n) as a codeword.


In Σ-CAM 710, CAM cells connected along a number (k) columns of the number (n) columns may comprise task-driven CAM cells. Relatedly, CAM cells connected along a number (n−k) columns of the number (n) columns may comprise redundancy CAM cells. Here, it follows that a respective row of Σ-CAM 710 comprises a number (k) task-driven CAM cells and a number (n−k) redundancy CAM cells. Accordingly, programming Σ-CAM 710 to store the matrix (A) having dimensions (l×n) may comprise: a) programming the number (k) task-driven CAM cells of the respective row to store task-driven values comporting with a computational task; b) computing redundancy values for the number (n−k) redundancy CAM cells of the respective row based on the programmed task-driven values and the linear code (C); and c) programming the number (n−k) redundancy CAM cells of the respective row to store the computed redundancy values such that the respective row stores a codeword of the linear code (C). In certain implementations, computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row may comprise computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row such that the task-driven values and the redundancy values include a sequence of ones and zeros prescribed by the linear code (C).


In CAM-based circuit 700, the one or more processing resources may be further operative to detect and correct one or more errors in the output vector (c) from Σ-CAM 710 based on the linear code (C). In certain of these implementations, the output vector (c) may have dimension (n). Relatedly, the output vector (c) may comprise a concatenation of a task-driven output vector (c′) and a redundancy output vector (c″). Here, the task-driven output vector (c′) may have dimension (k) and correspond to a summation between: (1) a vector-matrix multiplication between a transformation of an input vector (x) of dimension (l) received by Σ-CAM 710 and a task-driven stored matrix (A′) of dimension (l×k) stored by the task-driven CAM cells of Σ-CAM 710; and (2) a vector multiplication between the all-one vector of dimension (n) and a constant value (see e.g., Eq. 4 above). Relatedly, the redundancy output vector (c″) may have a dimension (n−k) and correspond to a summation between: (1) a vector-matrix multiplication between the transformation of the input vector (x) and a redundancy stored matrix (A″) of dimension (l×k)) stored by the redundancy CAM cells of Σ-CAM 710; and (2) a vector multiplication between the all-one vector of dimension (n) and the constant value (see e.g., Eq. 4 above). Moreover, the one or more processing resources may detect and correct the one or more errors in the task-driven output vector (c′) based on (e.g., by comparing) the linear code (C) and the redundancy output vector (c″).



FIG. 9 depicts a block diagram of an example computer system 900 in which various of the examples described herein may be implemented. Computer system 900 may also be used to compute redundancy values and detect/correct errors in output vectors from Σ-CAMs (e.g., Σ-CAM 710).


The computer system 900 includes a bus 912 or other communication mechanism for communicating information, one or more hardware processors 904 coupled with bus 912 for processing information. Hardware processor(s) 904 may be, for example, one or more general purpose microprocessors.


The computer system 900 also includes a main memory 906, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 912 for storing information and instructions to be executed by processor 904. Main memory 906 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 904. Such instructions, when stored in storage media accessible to processor 904, render computer system 900 into a special-purpose machine that is customized to perform the operations specified in the instructions.


The computer system 900 further includes a read only memory (ROM) 908 or other static storage device coupled to bus 912 for storing static information and instructions for processor 904. A storage device 910, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 912 for storing information and instructions.


The computer system 900 may be coupled via bus 912 to a display 912, such as a liquid crystal display (LCD) (or touch screen), for displaying information to a computer user. An input device 914, including alphanumeric and other keys, is coupled to bus 912 for communicating information and command selections to processor 904. Another type of user input device is cursor control 916, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 904 and for controlling cursor movement on display 912. In some embodiments, the same direction information and command selections as cursor control may be implemented via receiving touches on a touch screen without a cursor.


The computing system 900 may include a user interface module to implement a GUI that may be stored in a mass storage device as executable software codes that are executed by the computing device(s). This and other modules may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.


In general, the word “component,” “engine,” “system,” “database”, “data store,” and the like, as used herein, can refer to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++. A software component may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software components may be callable from other components or from themselves, and/or may be invoked in response to detected events or interrupts. Software components configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution). Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware components may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.


The computer system 900 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 900 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 900 in response to processor(s) 904 executing one or more sequences of one or more instructions contained in main memory 906. Such instructions may be read into main memory 906 from another storage medium, such as storage device 910. Execution of the sequences of instructions contained in main memory 906 causes processor(s) 904 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.


The term “non-transitory media,” and similar terms, as used herein refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 190. Volatile media includes dynamic memory, such as main memory 906. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.


Non-transitory media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between non-transitory media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 912. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.


The computer system 900 also includes a communication interface 918 coupled to bus 912. Network interface 918 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks. For example, communication interface 918 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, network interface 918 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN). Wireless links may also be implemented. In any such implementation, network interface 918 sends and receives electrical, electromagnetic or optical indicators that carry digital data streams representing various types of information.


A network link typically provides data communication through one or more networks to other data devices. For example, a network link may provide a connection through local network to a host computer or to data equipment operated by an Internet Service Provider (ISP). The ISP in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet.” Local network and Internet both use electrical, electromagnetic or optical indicators that carry digital data streams. The indicators through the various networks and the indicators on network link and through communication interface 918, which carry the digital data to and from computer system 900, are example forms of transmission media.


The computer system 900 can send messages and receive data, including program code, through the network(s), network link and communication interface 918. In the Internet example, a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 918.


The received code may be executed by processor 904 as it is received, and/or stored in storage device 910, or other non-volatile storage for later execution.


Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code components executed by one or more computer systems or computer processors comprising computer hardware. The one or more computer systems or computer processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). The processes and algorithms may be implemented partially or wholly in application-specific circuitry. The various features and processes described above may be used independently of one another, or may be combined in various ways. Different combinations and sub-combinations are intended to fall within the scope of this disclosure, and certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate, or may be performed in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The performance of certain of the operations or processes may be distributed among computer systems or computers processors, not only residing within a single machine, but deployed across a number of machines.


As used herein, a circuit might be implemented utilizing any form of hardware, software, or a combination thereof. For example, one or more processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a circuit. In implementation, the various circuits described herein might be implemented as discrete circuits or the functions and features described can be shared in part or in total among one or more circuits. Even though various features or elements of functionality may be individually described or claimed as separate circuits, these features and functionality can be shared among one or more common circuits, and such description shall not require or imply that separate circuits are required to implement such features or functionality. Where a circuit is implemented in whole or in part using software, such software can be implemented to operate with a computing or processing system capable of carrying out the functionality described with respect thereto, such as computer system 900.


As used herein, the term “or” may be construed in either an inclusive or exclusive sense. Moreover, the description of resources, operations, or structures in the singular shall not be read to exclude the plural. Conditional language, such as, among others, “can,” “could,” “might,” or “may”, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps.


Terms and phrases used in this document, and variations thereof, unless otherwise expressly stated, should be construed as open ended as opposed to limiting. Adjectives such as “conventional,” “traditional,” “normal,” “standard,” “known,” and terms of similar meaning should not be construed as limiting the item described to a given time period or to an item available as of a given time, but instead should be read to encompass conventional, traditional, normal, or standard technologies that may be available or known now or at any time in the future. The presence of broadening words and phrases such as “one or more,” “at least,” “but not limited to” or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent.

Claims
  • 1. A system comprising: a summing content addressable memory (CAM) comprising CAM cells arranged into a number (l) rows and a number (n) columns, wherein the summing CAM is configured to sum outputs from a number (l) CAM cells connected along a respective column of the number (n) columns; andone or more processing resources operative to program the CAM cells to store a matrix (A) having dimensions (l×n), wherein: each row of the matrix (A) comprises a codeword of a linear code (C), andthe linear code (C) includes an all-one vector of dimension (n) as a codeword.
  • 2. The system of claim 1, wherein: CAM cells connected along a number (k) columns of the number (n) columns comprise task-driven CAM cells; andCAM cells connected along a number (n−k) columns of the number (n) columns comprise redundancy CAM cells such that the respective row of the summing CAM comprises a number (k) task-driven CAM cells and a number (n−k) redundancy CAM cells.
  • 3. The system of claim 2, wherein programming the summing CAM to store the matrix (A) having dimensions (l×n) comprises: programming the number (k) task-driven CAM cells of the respective row to store task-driven values comporting with a computational task;computing redundancy values for the number (n−k) redundancy CAM cells of the respective row based on the programmed task-driven values and the linear code (C); andprogramming the number (n−k) redundancy CAM cells of the respective row to store the computed redundancy values such that the respective row stores a codeword of the linear code (C).
  • 4. The system of claim 3, wherein computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row based on the programmed task-driven values and the linear code (C) comprises: computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row such that the task-driven values and the redundancy values include a sequence of ones and zeros prescribed by the linear code (C).
  • 5. The system of claim 2, wherein the one or more processing resources are operative to: detect and correct one or more errors in an output vector (c) from the summing CAM based on the linear code (C).
  • 6. The system of claim 5, wherein: the output vector (c) has dimension (n);the output vector (c) comprises a concatenation of a task-driven output vector (c′) and a redundancy output vector (c″);the task-driven output vector (c′) has dimension (k) and corresponds to a summation between: a vector-matrix multiplication between a transformation of an input vector (x) of dimension (l) received by the summing CAM and a task-driven stored matrix (A′) of dimension (l×k) stored by the task-driven CAM cells of the summing CAM, anda vector multiplication between the all-one vector of dimension (n) and a constant value;the redundancy output vector (c″) has a dimension (n−k) and corresponds to a summation between: a vector-matrix multiplication between the transformation of the input vector (x) and a redundancy stored matrix (A″) of dimension (I×(n−k)) stored by the redundancy CAM cells of the summing CAM, anda vector multiplication between the all-one vector of dimension (n) and the constant value; anddetecting and correcting one or more errors in the output vector (c) from the summing CAM based on the linear code (C) comprises detecting and correcting the one or more errors in the task-driven output vector (c′) based on the linear code (C) and the redundancy output vector (c″).
  • 7. The system of claim 1, wherein: a respective CAM cell of the summing CAM comprises one or more programmable memristors; andprogramming the respective CAM cell comprises programming conductance of the one or more programmable memristors.
  • 8. A system comprising: one or more processing resources operative to: program a summing content addressable memory (CAM) to store a matrix (A) having dimensions (l×n), wherein: each row of the matrix (A) comprises a codeword of a linear code (C), andthe linear code (C) includes an all-one vector of dimension (n) as a codeword; anddetect and correct one or more errors in an output vector (c) from the summing CAM based on the linear code (C).
  • 9. The system of claim 8, further comprising the summing CAM.
  • 10. The system of claim 9, wherein: the summing CAM comprises CAM cells arranged into a number (l) rows and a number (n) columns; andthe summing CAM is configured to sum outputs from a number (l) CAM cells connected along a respective column of the number (n) columns.
  • 11. The system of claim 10, wherein the one or more processing resources are operative to: program a number (k) task-driven CAM cells of the respective row to store task-driven values comporting with a computational task;compute redundancy values for a number (n−k) redundancy CAM cells of the respective row based on the programmed task-driven values and the linear code (C); andprogram the number (n−k) redundancy CAM cells of the respective row to store the computed redundancy values such that the respective row stores a codeword of the linear code (C).
  • 12. The system of claim 11, wherein computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row based on the programmed task-driven values and the linear code (C) comprises: computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row such that the task-driven values and the redundancy values include a sequence of ones and zeros prescribed by the linear code (C).
  • 13. The system of claim 11, wherein: the output vector (c) has dimension (n);the output vector (c) comprises a concatenation of a task-driven output vector (c′) and a redundancy output vector (c″);the task-driven output vector (c′) has dimension (k) and corresponds to a summation between: a vector-matrix multiplication between a transformation of an input vector (x) of dimension (l) received by the summing CAM and a task-driven stored matrix (A′) of dimension (l×k) stored by the task-driven CAM cells of the summing CAM, anda vector multiplication between the all-one vector of dimension (n) and a constant value;the redundancy output vector (c″) has a dimension (n−k) and corresponds to a summation between: a vector-matrix multiplication between the transformation of the input vector (x) and a redundancy stored matrix (A″) of dimension (I×(n−k)) stored by the redundancy CAM cells of the summing CAM, anda vector multiplication between the all-one vector of dimension (n) and the constant value; anddetecting and correcting one or more errors in the output vector (c) from the summing CAM based on the linear code (C) comprises detecting and correcting the one or more errors in the task-driven output vector (c′) based on the linear code (C) and the redundancy output vector (c″).
  • 14. The system of claim 8, wherein: a respective CAM cell of the summing CAM comprises one or more programmable memristors; andprogramming the respective CAM cell comprises programming conductance of the one or more programmable memristors.
  • 15. A method comprising: programming a summing content addressable memory (CAM) to store a matrix (A) having dimensions (l×n), wherein: each row of the matrix (A) comprises a codeword of a linear code (C), andthe linear code (C) includes an all-one vector of dimension (n) as a codeword; anddetect and correct one or more errors in an output vector (c) from the summing CAM based on the linear code (C).
  • 16. The method of claim 15, wherein: the summing CAM comprises CAM cells arranged into a number (l) rows and a number (n) columns; andthe summing CAM is configured to sum outputs from a number (l) CAM cells connected along a respective column of the number (n) columns.
  • 17. The method of claim 16, wherein programming the summing CAM to store the matrix (A) having dimensions (l×n) comprises: programing a number (k) task-driven CAM cells of the respective row to store task-driven values comporting with a computational task;computing redundancy values for a number (n−k) redundancy CAM cells of the respective row based on the programmed task-driven values and the linear code (C); andprogramming the number (n−k) redundancy CAM cells of the respective row to store the computed redundancy values such that the respective row stores a codeword of the linear code (C).
  • 18. The method of claim 17, wherein computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row based on the programmed task-driven values and the linear code (C) comprises: computing the redundancy values for the number (n−k) redundancy CAM cells of the respective row such that the task-driven values and the redundancy values include a sequence of ones and zeros prescribed by the linear code (C).
  • 19. The method of claim 17, wherein: the output vector (c) has dimension (n);the output vector (c) comprises a concatenation of a task-driven output vector (c′) and a redundancy output vector (c″);the task-driven output vector (c′) has dimension (k) and corresponds to a summation between: a vector-matrix multiplication between a transformation of an input vector (x) of dimension (l) received by the summing CAM and a task-driven stored matrix (A′) of dimension (l×k) stored by the task-driven CAM cells of the summing CAM, anda vector multiplication between the all-one vector of dimension (n) and a constant value;the redundancy output vector (c″) has a dimension (n−k) and corresponds to a summation between: a vector-matrix multiplication between the transformation of the input vector (x) and a redundancy stored matrix (A″) of dimension (I×(n−k)) stored by the redundancy CAM cells of the summing CAM, anda vector multiplication between the all-one vector of dimension (n) and the constant value; anddetecting and correcting one or more errors in the output vector (c) from the summing CAM based on the linear code (C) comprises detecting and correcting the one or more errors in the task-driven output vector (c′) based on the linear code (C) and the redundancy output vector (c″).
  • 20. The method of claim 15, wherein: a respective CAM cell of the summing CAM comprises one or more programmable memristors; andprogramming the respective CAM cell comprises programming conductance of the one or more programmable memristors.
CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the priority benefit of U.S. Provisional Patent Application No. 63/609,652, filed on Dec. 13, 2023, the contents of which are incorporated herein by reference in their entirety.

Provisional Applications (1)
Number Date Country
63609652 Dec 2023 US