This disclosure generally relates to adaptive design, fabrication, and utilization of memory embedded in integrated circuits and designed for artificial intelligence (AI) applications.
Deep learning models that are trained and deployed with, e.g., convolutional neural networks (CNNs), may include many convolutional layers, pooling layers, rectification layers, and fully connected layers, and generally require millions of, if not more, trained model parameters for processing complex input data such as images, speeches, and natural languages. Deployment of such a model thus requires massive amount of memory cells for storing the trained model parameters. Writing or reading errors may occur in these memory cells. The reading or writing errors may be asymmetric with respect to the binary values of the data stored in the memory cells due to asymmetry in memory writing or reading operations. For example, in some type of memory cells, errors may occur with a higher rate for writing zeros than for writing ones, or vice versa. Likewise, in some type of memory cells, errors may occur with a higher rate for reading zeros than for reading ones, or vice versa. As such, memory cells may be characterized by data writing asymmetry, data reading asymmetry, and corresponding writing error asymmetry and reading error asymmetry. The data writing asymmetry and data readying asymmetry may be combined into an effective data access asymmetry. The writing error asymmetry and reading error asymmetry may likewise be combined into an effective data access error asymmetry. Traditional technology has been directed to designing and fabricating memory cells with reduced overall reading or writing error regardless of the error asymmetry. In the meanwhile, a deep learning model may have error-tolerance asymmetry in that errors in bits of the model parameters having one value of the two binary values (zero or one) may statistically cause less output error (e.g., misclassification of input by a deep learning classifier) leading to higher error-tolerance compared to errors in bits of the model parameters having the other opposite value of the two binary values.
This disclosure is directed to adjusting memory cell design and fabrication process to deliberately achieve a desired level of memory reading and writing asymmetry between binary ones and binary zeros such that predictive accuracy of a deep learning model having model parameters stored in the memory cells with asymmetric binary data and asymmetric error-tolerance may be improved in the presence of reading and writing errors. Such improvement may be achieved without having to improve an overall memory error rate and without having to rely on memory cell redundancy and error correction codes. Further objects, features, and advantages of this disclosure will become readily apparent to persons having ordinary skill in the art after a review of the following description, with reference to the drawings and claims that are appended to and form a part of this specification.
In one implementation, a method for classifying input data into a set of classes is disclosed. The method includes training a convolutional neural network (CNN) model using a set of training input data each labeled with one of the set of classes to obtain a plurality of model parameters; determining a data preference measure of the model parameters; and determining an acceptable range of memory cell data access asymmetry according to the data preference measure of the model parameters. The method further include adjusting memory cell design and fabrication process to generate an array of memory cells having a data access asymmetry within the acceptable range of memory cell data access asymmetry; embedding the array of memory cells with an artificial intelligence (AI) logic circuit to form an AI device; loading the trained CNN model into the AI device by at least loading the model parameters with the data preference measure into the array of memory cells having the data access asymmetry; and forward-propagate an input data through the trained CNN model using the model parameters loaded in the array of memory cells to determine an output class among the set of classes for the input data.
In the implementation above, the data preference measure of the model parameters may quantify an imbalance between a number of zeros and a number of ones of the model parameters each expressed in a predetermined multi-bit binary form, or may quantifies a bit-inversion tolerance asymmetry of the CNN model.
In the implementations above, the bit-inversion tolerance asymmetry of the CNN model may be determined by repeatedly inverting a predetermined number of bits of the model parameters having a value of zero to one to generate statistically a first prediction error rate of the CNN model using the set of training input data and the zero-to-one inverted model parameters; repeatedly inverting a predetermined number of bits of the model parameters having a value of one to zero to generate statistically a second prediction error rate of the CNN model using the set of training input data and the one-to-zero inverted model parameters; and determining an imbalance between the first prediction error rate and the second prediction error rate as the bit-inversion tolerance asymmetry of the CNN model.
In the implementations above, the data preference measure of the model parameters may quantify a composite of a bit-inversion tolerance asymmetry of the CNN model and an imbalance between a number of zeros and a number of ones of the model parameters each expressed in a predetermined multi-bit binary form.
In the implementations above, each memory cell of the array of memory cells may include a magnetic tunnel junction comprising a thin insulating layer sandwiched by a permanent ferromagnetic plate and a writable ferromagnetic plate.
In the implementations above, the data access asymmetry comprises an asymmetry between error rate in writing binary one and error rate in writing binary zero.
In another implementation, a method for classifying input data into a set of classes is disclosed. The method includes training a CNN model using a set of training input data each labeled with one of the set of classes to obtain a plurality of model parameters; dividing the plurality of model parameters into a first group of model parameters with a first data preference measure and a second group of model parameters with a first data preference measure opposite to the first data preference measure; adjusting memory cell design and fabrication process to generate an array of memory cells comprising a first set of memory cells having a first data access asymmetry and a second set of memory cells having a second data access asymmetry opposite to the first data access asymmetry; embedding the array of memory cells with an AI logic circuit to form an AI device; loading the trained CNN model into the AI device by at least loading the first group of model parameters into the first set of memory cells and the second group of model parameters into the second set of memory cells; and forward-propagate an input data through the trained CNN model using the model parameters loaded in the array of memory cells to determine an output class among the set of classes for the input data.
In the implementation above, the data preference measure of the model parameters may quantify an imbalance between a number of zeros and a number of ones of the model parameters each expressed in a predetermined multi-bit binary form, or may quantify a bit-inversion tolerance asymmetry of the CNN model.
In the implementations above, the bit-inversion tolerance asymmetry of the CNN model is determined by repeatedly inverting a predetermined number of bits of the model parameters having a value of zero to one to generate statistically a first prediction error rate of the CNN model using the set of training input data and the zero-to-one inverted model parameters; repeatedly inverting a predetermined number of bits of the model parameters having a value of one to zero to generate statistically a second prediction error rate of the CNN model using the set of training input data and the one-to-zero inverted model parameters; and determining an imbalance between the first prediction error rate and the second prediction error rate as the bit-inversion tolerance asymmetry of the CNN model.
In the implementations above, the data preference measure of the model parameters quantifies a composite of a bit-inversion tolerance asymmetry of the CNN model and an imbalance between a number of zeros and a number of ones of the model parameters each expressed in a predetermined multi-bit binary form.
In the implementations above, each memory cell of the array of memory cells comprises a magnetic tunnel junction comprising a thin insulating layer sandwiched by a permanent ferromagnetic plate and a writable ferromagnetic plate.
In the implementations above, the first data access asymmetry and the second data access asymmetry each comprises an asymmetry between error rate in writing binary one and error rate in writing binary zero.
In another implementation, a method for classifying input data into a set of classes is disclosed. The method include training a CNN model using a set of training input data each labeled with one of the set of classes to obtain a plurality of model parameters; determining a data preference measure of the model parameters; determining a data access asymmetry of an array of memory cells embedded with an AI logic circuit in an AI device; determining whether the data preference measure is compatible with the data access asymmetry. The method further include, when the data preference measure is not compatible with the data access asymmetry, setting a data inversion flag; inverting each binary bit of the model parameters to generated an inverted model parameters; loading the trained CNN model into the AI device by at least loading the inverted model parameters into the array of memory cells. The method further includes when the data preference measure is compatible with the data access asymmetry, loading the trained CNN model into the AI device by at least loading the model parameters into the array of memory cells. The method additionally includes forward-propagate an input data through the trained CNN model using the model parameters loaded in the array of memory cells when the data inversion flag is not set, and using the inverted model parameters followed by binary inversion when the data inversion flag is set to determine an output class among the set of classes for the input data.
In the implementations above, the data preference measure of the model parameters quantify an imbalance between a number of zeros and a number of ones of the model parameters each expressed in a predetermined multi-bit binary form, or may quantify a bit-inversion tolerance asymmetry of the CNN model.
In the implementations above, the bit-inversion tolerance asymmetry of the CNN model may be determined by repeatedly inverting a predetermined number of bits of the model parameters having a value of zero to one to generate statistically a first prediction error rate of the CNN model using the set of training input data and the zero-to-one inverted model parameters; repeatedly inverting a predetermined number of bits of the model parameters having a value of one to zero to generate statistically a second prediction error rate of the CNN model using the set of training input data and the one-to-zero inverted model parameters; and determining an imbalance between the first prediction error rate and the second prediction error rate as the bit-inversion tolerance asymmetry of the CNN model.
In the implementations above, the data preference measure of the model parameters quantifies a composite of a bit-inversion tolerance asymmetry of the CNN model and an imbalance between a number of zeros and a number of ones of the model parameters each expressed in a predetermined multi-bit binary form.
In the implementations above, the data access asymmetry includes an asymmetry between error rate in writing binary one and error rate in writing binary zero.
Artificial intelligence techniques have been widely used for processing large amount of input data to extract categorical and other information. These techniques, in turn, may then be incorporated into a wide range of applications to perform various intelligent tasks. For example, deep learning techniques based on convolutional neural networks (CNNs) may provide trained CNN models for processing particular types of input data. For example, a CNN model trained for classifying images may be used to analyze an input image and determine a category of the input image among a predetermined set of image categories. For another example, a CNN model may be trained to produce segmentation of an input image in the form of, e.g., output segmentation masks. Such segmentation masks, for example, may be designed to indicate where particular types of objects are in the image and their boundaries.
A deep learning CNN model, may typically contain multiple cascading convolutional, pooling, rectifying, and fully connected layers of neurons, with millions of kernel, weight, and bias parameters. These parameters may be determined by training the model using a sufficient collection of labeled input data. Once a CNN model is trained and the model parameters are determined, it may be used to process unknown input data and to predict labels for the unknown input data. These labels may be classification labels, segmentation masks, or any other type of labels for the input data.
In a training process of a CNN model, each of a large number of labeled training datasets is forward propagated through the layers of neurons of the CNN network embedded with the training parameters to calculate an end labeling loss. Back propagation is then performed through the layers of neurons to adjust the training parameters to reduce labeling loss based on gradient descent. The forward/back propagation training process for all training input datasets iterates until the neural network produces a set of training parameters that provide converging minimal overall loss for the labels predicted by the neural network over labels given to the training datasets. A converged model then includes a final set of training parameters and may then be tested and used to process unlabeled input datasets via forward propagation. Such a CNN model typically must be of sufficient size in terms of number of layers and number of neurons/features in each layer for achieving acceptable predictive accuracy. The number of training parameters is directly correlated with the size of the neural network, and is typically extraordinarily large even for a simple AI model (on the order of millions, tens of millions, hundreds of millions, and thousands of millions of parameters). The forward and back propagations thus require a massive amount of memory to hold these parameters and extensive computation power for iteratively calculating states of a massive number of neurons.
The training process for a CNN model is thus typically handled by centralized or distributed backend servers having sufficient memory and computing power in order to train the CNN model in a reasonable amount of time. These calculations may be performed by special co-processors included in the backend servers that support parallel data processing. For example, a Graphics Processing Unit (GPU) with large embedded memory or with external memory connected to the GPU core via high speed data buses may be included in the backend servers and used to accelerate the forward/back propagations in neural networks, thanks to similarity in parallel data manipulation between graphics data and neural networks.
Once trained, a CNN model may be deployed in the backend servers and provided as a service, taking advantage of the memory capacity and the parallel computing power of the backend servers. The service would include forward propagating an input dataset through the layers of neurons of the trained CNN model to obtain an output label for the input dataset. Such a service may be provided to edge devices. Edge devices may include but are not limited to mobile phones and any other devices, such as Internet-of-Things (IoT) devices. These devices may be designed to handle limited tasks and with limited computing power and memory capacity, and thus incapable of efficiently performing forward propagation locally. As such, these edge devices may communicate with the backend servers via communication network interfaces to provide input datasets to the backend servers and obtained labels for the input datasets from the backend server after the input datasets are processed by the CNN model in the backend servers.
In many applications, local processing of the input data may be desired. For example, when an input dataset is large (e.g., high-resolution 2D or 3D images), transmission of the input dataset from the edge device to the backend servers may consume an unacceptable or unsupported level of communication bandwidth and/or power. Further, some edge devices may have only intermittent communication network connection or no communication network connection at all.
In such applications, the CNN model may reside on the edge devices. As such, the edge devices designed for these applications may include sufficient memories adapted to the needs for storing various types of model parameters of the CNN model. These memories may further be embedded with a CNN logic circuit on a same semiconductor substrate for reducing power dissipation, reducing latency, and increasing data access speed. These embedded memories may be of single type or mixed types, as disclosed, for example, in the U.S. patent application Ser. Nos. 16/050,679, 15/989,515, 15/838,131, 15/726,084, 15/642,100, 15/642,076 filed respectively on Jul. 31, 2018, May 25, 2018, Dec. 11, 2017, Oct. 5, 2017, Jul. 5, 2017, and Jul. 5, 2017, which are herein incorporated by reference in their entireties.
An example of a core AI engine with embedded memory for such an edge device is shown in
The CNN model 200 is essentially manifested as a set of model parameters 250, these parameters may include but are not limited to the convolution features or kernels 214, the weights and biases for the various connections between neurons within the fully connected layer 240. These model parameters may be stored in the embedded memory 202 as convolutional kernels 252, weights 254, and biases 256.
The model parameters 250 of
Further, such binary data asymmetry may vary for different types of parameters within a same CNN model. For example, in a particular CNN model, the convolutional kernel model parameters (252 of
Independent of potential binary data asymmetry for the model parameters, a CNN model may be further characterized by one or more bit-inversion tolerance characteristics and corresponding bit-inversion tolerance asymmetry. The bit-inversion tolerance characteristics may be used to characterize how likely the CNN model would make wrong prediction output when one or more wrong bits of the model parameters are used in the forward propagation. Bit-inversion may include either zero-to-one inversion (a bit having a zero value is written or read mistakenly as one) or one-to-zero inversion (a bit having a one value is written or read mistakenly as zero). A CNN may have statistically different tolerance level between zero-to-one inversion and one-to-zero inversion, represented by a bit-inversion tolerance asymmetry. The bit-inversion tolerance characteristics as well as asymmetry may differ as to different types of model parameters. For example, in a particular CNN model, the kernel model parameters (252 of
The bit-inversion tolerance characteristics and asymmetry may be determined for a particular trained CNN model statistically. In one implementation for determining an overall zero-to-one bit-inversion tolerance characteristics for the trained CNN model, a predetermined number of zero bits in the model parameters may be intentionally inverted to ones in a random manner among all bits of all model parameters. The randomly inverted model parameters may then be used for forward propagation of the training data (or other pre-labeled data not used for training) to produce outputs. Prediction error rate (as determined by comparing the outputs to pre-labels) may be recorded for a set of input data. The process may be repeated for different set of random zero-to-one inversions of the same predetermined number of inverted bits. The process above may further be repeated for inverting different predetermined numbers of bits (one bit, 2 bit, 3 bit, etc.). The various error rates determined above for different number of inverted bits may be weight-averaged (or collectively processed in other manner) to represent the overall zero-to-one bit-inversion tolerance characteristics for the trained model. This determination process may be performed by a sufficiently reliable system. In other words, the system used for such determination may not introduce other unpredicted errors in any significant manner.
The process for determining the overall bit-inversion tolerance characteristics of the trained CNN model may be performed similarly for one-to-zero inversion to obtain an overall one-to-zero bit-inversion tolerance characteristics. Furthermore, the above process for determining bit-inversion tolerance characteristics may be performed separated for each type of model parameters and separately for one-to-zero bit-inversion tolerance and for zero-to-one bit-inversion tolerance. Once the various bit-inversion tolerance characteristics are obtained, an overall bit-inversion tolerance asymmetry for all model parameters or separate bit-inversion tolerance asymmetries for different types of model parameters may be determined. A bit-inversion tolerance asymmetry, for example, may be quantified as relative error rate ratio between a zero-to-one bit-inversion prediction error rate and a one-to-zero bit-inversion prediction error rate.
When a CNN model is deployed in a non-ideal system, memory writing and/or reading errors may occur. These writing and/or reading errors lead to either wrong model parameters being stored in the memory or lead to wrongly read parameters even though the parameters may be stored correctly in the memory. For simplicity of discussion, the disclosure below will primarily focus on memory data writing errors. The underlying principles discussed below applies to memory data reading errors. The memory data writing errors and reading errors may be combined into an effective memory access errors, and these principles disclosed below also applies to memory data access errors.
Data writing/reading of a memory may be asymmetric between writing/reading of ones and writing of zeros. For different memory technologies, different memory cell architectures, different material compositions, and different fabrication processes, the data writing/reading asymmetry (or data access asymmetry) may vary. Data writing/reading asymmetry may include but is not limited to current, voltage, and timing asymmetry for writing/reading zeros and ones. Data writing/reading asymmetry may cause writing/reading error asymmetry (or data access error asymmetry) in inadvertent writing/reading error rates for ones and zeros. For example, the memory cells may need higher programing current for writing ones than zeros, and if the same current is being used for programing zeros and ones, then more errors (higher error rate) may be made when writing ones than zeros,
Furthermore, memory cells, once being written, may be subject to inadvertent bit inversion due to environmental influences such as thermal effect, external radiation, external static electric and/or magnetic field. Such inadvertent inversion may be asymmetric for zeros and ones. For example, zeros in the memory cells, once written, may be more robust against environmental influences than ones, or vice versa. To simplify the discussions below, the writing/reading error asymmetry and the asymmetry for environment-induced memory bit inversion may be categorically referred to as writing/reading error asymmetry (or alternatively, data access error asymmetry).
In traditional application of memories, writing/reading errors for both zeros and ones may be reduced sufficiently to a level where any data access error asymmetry becomes unimportant to system performance. In AI applications, however, memories for storing model parameters for AI models may be more error tolerant compared to other applications, particularly for AI models having limited number of possible outputs (e.g., limited number of classification categories). For example, the model parameters may be 1%-5% wrong, yet the model may still produce correct output classification. As such, memories with high error rates may nevertheless be usable for AI application. At these high error rates, memory data access error asymmetry may become particularly important, as it may significantly affect the accuracy of an AI model, particularly if the AI model is asymmetric as to bit-inversion tolerance. Because memory data access error asymmetry may be adjusted by designing the memory cell structure, profile, material composition, and fabrication process (as described in more detail below in the context of MRAM), such design adjustment may be taken into consideration in view of characteristics of the AI data model and model parameters for further improving prediction accuracy and error tolerance of the model, even when such consideration may not improve the overall error rate of the memory.
These MRAM cell 400 may be designed to achieve read access time faster than 10 nanosecond, faster than 5 nanosecond, or faster than 2 nanosecond. These MRAM cells may further be designed with high density and small cell size. For an MRAM cell, the MTJ may be formed with a width raging from 20 nm to 200 nm.
The MRAM cell 400 may further include bit line 420, word line 460, and source line 430. The source line 430, for example, may be connected to source of transistor 410 and the word line 460 may be connected to the gate of the transistor 410. The drain of the transistor may be connected to the pinned magnetic layer 408. The free magnetic layer 404 may be connected to the bit line 420. While the magnetic moment of the pinned layer may be fixed, the magnetic moment of the free layer may be programed either parallel or antiparallel to that of the pinned layer, representing one and zero of the MRAM cell, respectively. The programming of the cell into binary zero and binary one may be achieved by electric current pulses flowing in opposite directions through the MTJ 402. The implementation of the MTJ 402 in
The programing or writing of the MRAM cell of
Such data access asymmetry and resulting data access error asymmetry may be modified by adjusting the MRAM cell structure in size or in design, using different materials, or using different fabrication processes. In some other implementations, the etch profile of the MTJ structure may be modified to adjust the data access asymmetry. In yet some other implementations, the magnetic moments of the pinned magnetic layer 408 and/or the free magnetic layer 404 may be adjusted in absolute and/or relative values to modify the data access asymmetry. In some other implementations, the composition of the junction layer (e.g., Mg composition in MgO) may be adjusted to modify the data access asymmetry. In yet some other implementations, the roughness of the MTJ structure may be adjusted to modify the data access asymmetry. The above adjustments may further be combined in any manner to modify the data access asymmetry.
Those having ordinary skill in the art understand that the memory cell based on MRAM above is merely an example. Other memory technologies may also be used. These technologies may include but are not limited to phase change random access memory (PCRAM), resistive random access memory (RRAM), and static random access memory (SRAM). These technologies may also be characterized by data writing/reading (access) asymmetry and corresponding writing/reading (access) error asymmetry. Likewise, the data access asymmetry and access error asymmetry may be modified by adjusting, e.g., the design, profile, material composition, and fabrication process for these memory cells.
The description above thus indicates that, in one aspect, a trained CNN model may be characterized by binary data asymmetry in its model parameters (including an overall binary data asymmetry as well as binary data asymmetry within one or more types of model parameters) and may further be characterized by bit-inversion tolerance asymmetry (again, including an overall bit-inversion tolerance asymmetry as well as bit-inversion tolerance asymmetry due to one or more types of model parameters). The description above also indicates that, in another aspect, the memory cells used for storing the model parameters of a trained CNN model may be characterized by data writing/reading (access) asymmetry and corresponding writing/reading (access) error asymmetry, and these asymmetries may be adjusted or shifted by modifying, e.g., the memory cell geometric/architectural design, material composition, and fabrication process. In the disclosure below, various implementations are further described for adapting, matching, shifting, and/or adjusting memory cell data access asymmetry and data access error asymmetry according to the binary data asymmetry of the model parameters and bit-inversion tolerance asymmetry of the CNN model to achieve more fault-tolerant and more accurate AI system with embedded memory. Such improvement may be achieved by adjustment in memory design and fabrication that modify the data access asymmetry and data access error asymmetry without having to improve the overall data access error rate and without having to include significant amount of redundant memory cells with error correction codes. In other words, shifting potential data access errors between zero bits and one bits (modifying data access error asymmetry) may improve model accuracy solely based on the asymmetry characteristics of the AI model parameters.
Continuing with
Continuing with
Further continuing with
In one variation of the implementation of
In one variation of the implementation of
In one variation of the implementation of
Particularly in
In some other Implementations alternative to
According to the various implementations above in
The description and accompanying drawings above provide specific example embodiments and implementations. Drawings containing circuit and system layouts, cross-sectional views, and other structural schematics, for example, are not necessarily drawn to scale unless specifically indicated. Subject matter may, however, be embodied in a variety of different forms and, therefore, covered or claimed subject matter is intended to be construed as not being limited to any example embodiments set forth herein. A reasonably broad scope for claimed or covered subject matter is intended. Among other things, for example, subject matter may be embodied as methods, devices, components, or systems. Accordingly, embodiments may, for example, take the form of hardware, software, firmware or any combination thereof.
Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond an explicitly stated meaning. Likewise, the phrase “in one embodiment/implementation” as used herein does not necessarily refer to the same embodiment and the phrase “in another embodiment/implementation” as used herein does not necessarily refer to a different embodiment. It is intended, for example, that claimed subject matter includes combinations of example embodiments in whole or in part.
In general, terminology may be understood at least in part from usage in context. For example, terms, such as “and”, “or”, or “and/or,” as used herein may include a variety of meanings that may depend at least in part on the context in which such terms are used. Typically, “or” if used to associate a list, such as A, B or C, is intended to mean A, B, and C, here used in the inclusive sense, as well as A, B or C, here used in the exclusive sense. In addition, the term “one or more” as used herein, depending at least in part upon context, may be used to describe any feature, structure, or characteristic in a singular sense or may be used to describe combinations of features, structures or characteristics in a plural sense. Similarly, terms, such as “a,” “an,” or “the,” may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context. In addition, the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
Reference throughout this specification to features, advantages, or similar language does not imply that all of the features and advantages that may be realized with the present solution should be or are included in any single implementation thereof. Rather, language referring to the features and advantages is understood to mean that a specific feature, advantage, or characteristic described in connection with an embodiment is included in at least one embodiment of the present solution. Thus, discussions of the features and advantages, and similar language, throughout the specification may, but do not necessarily, refer to the same embodiment.
Furthermore, the described features, advantages and characteristics of the present solution may be combined in any suitable manner in one or more embodiments. One of ordinary skill in the relevant art will recognize, in light of the description herein, that the present solution can be practiced without one or more of the specific features or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments of the present solution.
From the foregoing, it can be seen that this disclosure relates to AI circuits with embedded memory for storing trained AI model parameters. The embedded memory cell structure, device profile, and/or fabrication process are designed to generate binary data access asymmetry and error rate asymmetry between writing binary zeros and binary ones that are adapted to and compatible with a binary data asymmetry of the trained model parameters and/or a bit-inversion tolerance asymmetry of the AI model between binary zeros and ones. The disclosed method and system improves predictive accuracy and memory error tolerance without significantly reducing an overall memory error rate and without relying on memory cell redundancy and error correction codes.