The present application relates generally to analog memory devices, and more particularly, to techniques that can optimize the conductance range of unit cells comprising of analog memory devices.
Analog memory devices can be utilized for in-memory computing. Compared to traditional computing hardware, in-memory computing hardware can increase speed and energy efficiency, providing potential performance improvements. Rather than moving data from memory devices to a processor to perform a computation, analog memory devices can perform computation in the same place (e.g., in the analog memory) where the data is stored. Because there is no movement of data, tasks can be performed faster and require less energy.
The summary of the disclosure is given to aid understanding of a system and method of optimizing conductance range of unit cells comprising of analog memory devices. Optimized conductance ranges of the analog memory devices can improve the accuracy of operations performed using them, which can provide improved efficiency, and not with an intent to limit the disclosure or the invention. It should be understood that various aspects and features of the disclosure may advantageously be used separately in some instances, or in combination with other aspects and features of the disclosure in other instances. Accordingly, variations and modifications may be made to the system and/or their method of operation to achieve different effects.
In one embodiment, a method for optimizing conductance ranges of a plurality of unit cells is generally described. The method can include defining a plurality of initial conductance ranges for a plurality of unit cells arranged in a crossbar arrangement. The plurality of unit cells can include non-volatile memory (NVM) devices. An initial conductance range can be defined per column of unit cells in the crossbar arrangement. The method can further include using the plurality of initial conductance ranges to encode a plurality of parameter values in a circuit model of the analog memory devices. The method can further include inputting a plurality of sample inputs into the circuit model to determine an output current distribution correlated to a plurality of products between the plurality of sample inputs and the plurality of parameter values. The method can further include determining, based on at least one property of the output current distribution, an optimal conductance range for the plurality of unit cells.
In one embodiment, a system for optimizing conductance ranges of a plurality of unit cells is generally described. The system can include memory configured to store a plurality of parameters. The system can further include a processor configured to define a plurality of initial conductance ranges for a plurality of unit cells arranged in a crossbar arrangement. The plurality of unit cells can include non-volatile memory (NVM) devices. An initial conductance range can be defined per column of unit cells in the crossbar arrangement. The processor can be further configured to use the plurality of initial conductance ranges to encode the plurality of parameter values in a circuit model of the analog memory device. The processor can be further configured to input a plurality of sample inputs into the circuit model to determine an output current distribution correlated to a plurality of products between the plurality of sample inputs and the plurality of parameter values. The processor can be further configured to determine, based on at least one property of the output current distribution, an optimal conductance range for the plurality of unit cells.
In one embodiment, a computer readable storage medium storing a program of instructions executable by a machine to perform one or more methods described herein also may be provided.
Further features, as well as the structure and operation of various embodiments, are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
According to an aspect of the invention, there is provided a method for conductance range optimization. A method for optimizing conductance ranges of a plurality of unit cells is generally described. The method can include defining a plurality of initial conductance ranges for a plurality of unit cells arranged in a crossbar arrangement. The plurality of unit cells can include non-volatile memory (NVM) devices. An initial conductance range can be defined per column of unit cells in the crossbar arrangement. The method can further include using the plurality of initial conductance ranges to encode a plurality of parameter values in a circuit model of the analog memory device. The method can further include inputting a plurality of sample inputs into the circuit model to determine an output current distribution correlated to a plurality of products between the plurality of sample inputs and the plurality of parameter values. The method can further include determining, based on at least one property of the output current distribution, an optimal conductance range for the plurality of unit cells.
Advantageously, the method in an aspect can optimize conductance range of conductive devices in a plurality of unit cells of an analog memory device. The optimized conductance range can constrain the output current of the analog memory device in a linear region, thus reducing error and improving accuracy of the output.
One or more of the following aspects or features can be separable or optional from each other in one or more embodiments.
In another aspect, the plurality of parameter values can represent parameters of a trained machine learning model. The method can further include selecting an input distribution that best fits a training set of the trained machine learning model. The method can further include selecting the plurality of sample inputs from the input distribution. The selection of the input distribution and sampling from the selected input distribution can reduce the number of input samples being used in the conductance range optimization. Also, performing the conductance range optimization after training of the machine learning model prevents interference with the training of the machine learning model.
Yet in another aspect, the method can further include selecting the plurality of sample inputs by applying a kernel density estimation technique on the training set. The application of kernel density estimation can sample a subset of the input dataset, such that the required information about the input dataset, can be reduced.
Yet in another aspect, the method can further include selecting the plurality of sample inputs by selecting at least one of a subset of a training set used in training the trained machine learning model, a validation set used in an accuracy evaluation during training of the trained machine learning model and a test set used in performance evaluation of the trained machine learning model. The selection of the sample inputs can provide flexibility in the types of sample inputs that can be used, and provides sample inputs that may be readily available, for performing conductance range optimization.
Yet in another aspect, the method can further include determining the output current distribution by measuring an output current of every column of unit cells sequentially. The output current measurement from every column of unit cells sequentially allows optimization of conductance ranges for individual columns of unit cells.
Yet in another aspect, the method can further include defining an objective corresponding to the output current distribution and an analog-to-digital converter (ADC) connected to outputs of the plurality of unit cells. The method can further include comparing the at least one property of the output current distribution with different current region boundaries of the ADC to obtain a difference. The method can further include updating the optimal conductance range until the difference satisfies the objective. Using an objective to optimize the conductance range can provide a stopping criterion to complete conductance range optimization, without overloading the system performing the conductance range optimization.
Yet in another aspect, the method can further include initializing a value of a maximum current. The method can further include adjusting the maximum current based on a difference between an output current of a column of unit cells and the maximum current. The method can further include comparing the adjusted maximum current with a saturation current of an ADC connected to outputs of the plurality of unit cells to obtain a difference. The method can further include determining the optimal conductance range of the column of unit cells based on the difference. Using the comparison between the maximum current and the saturation current can allow a system to perform conductance range optimization when optimization algorithms are not readily available.
Yet in another aspect, the method can further include determining a respective optimal conductance range for each column of unit cells. The method can further include determining a respective optimal conductance range for different groups of columns of unit cells. The method can further include determining an optimal conductance range for an entirety of the plurality of unit cells. Optimizing conductance ranges for different groups of columns of unit cell can provide flexibility on the granularity of conductance range tuning.
Yet in another aspect, the method can further include determining a product among a sum of conductance values of a subset of the plurality of unit cells that received a non-zero input among the plurality of input samples, a read voltage being used for reading output current from the plurality of unit cells, and a linear correlation coefficient. The method can further include approximating the output current distribution using the product. Approximating the output current distribution based on the linear correlation coefficient can allow a system to consider a relationship between hardware components (e.g., analog-to-digital converters) and the conductance of the unit cells.
A system that includes at least one processor and at least one memory device can be provided, where at least one processor can be configured to perform one or more aspects of the methods described herein.
A computer program product that includes a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device to cause the device to perform at least one or more aspect of the methods described above can be provided.
An example technical use case of the methods, systems, and computer program products described herein include machine learning applications. Hardware such as analog memory devices can be used to run machine learning models. The conductance range of conductive devices in the unit cells of the analog memory device can be tuned to optimize values in order to improve the accuracy of machine learning models. Further, the conductance range optimization is performed after training of the machine learning model and prior to the hardware being programmed to run the trained machine learning model.
A crossbar (or analog in-memory core (AIMC)) can include a plurality of unit cells including one or more conductive devices. These conductive devices, when arranged in a crossbar arrangement, can be used to perform multiply-accumulate (MAC) and vector-matrix multiplication (VMM) operations in as little as order O(1) time complexity. This can be accomplished by encoding inputs as either analog voltages or pulse width modulation (PWM) pulses, and encoding parameters (e.g., weights) using the conductance state of devices. Using Ohm's law, current I is a product of voltage V and conductance G (e.g., I=V×G). The current summed in each crossbar column represents one element of the resulting multiplication vector of size N. Analog memory devices can be used in various applications, such as machine learning model training and inference. Parameters of machine learning models, such as weights, can be mapped between a high or maximum conductive (Gmax) state, and a low or minimum conductive (Gmin) state of the conductive devices.
COMPUTER 101 may take the form of a desktop computer, laptop computer, tablet computer, smart phone, smart watch or other wearable computer, mainframe computer, quantum computer or any other form of computer or mobile device now known or to be developed in the future that is capable of running a program, accessing a network or querying a database, such as remote database 130. As is well understood in the art of computer technology, and depending upon the technology, performance of a computer-implemented method may be distributed among multiple computers and/or between multiple locations. On the other hand, in this presentation of computing environment 100, detailed discussion is focused on a single computer, specifically computer 101, to keep the presentation as simple as possible. Computer 101 may be located in a cloud, even though it is not shown in a cloud in
PROCESSOR SET 110 includes one, or more, computer processors of any type now known or to be developed in the future. Processing circuitry 120 may be distributed over multiple packages, for example, multiple, coordinated integrated circuit chips. Processing circuitry 120 may implement multiple processor threads and/or multiple processor cores. Cache 121 is memory that is located in the processor chip package(s) and is typically used for data or code that should be available for rapid access by the threads or cores running on processor set 110. Cache memories are typically organized into multiple levels depending upon relative proximity to the processing circuitry. Alternatively, some, or all, of the cache for the processor set may be located “off chip.” In some computing environments, processor set 110 may be designed for working with qubits and performing quantum computing.
Computer readable program instructions are typically loaded onto computer 101 to cause a series of operational steps to be performed by processor set 110 of computer 101 and thereby effect a computer-implemented method, such that the instructions thus executed will instantiate the methods specified in flowcharts and/or narrative descriptions of computer-implemented methods included in this document (collectively referred to as “the inventive methods”). These computer readable program instructions are stored in various types of computer readable storage media, such as cache 121 and the other storage media discussed below. The program instructions, and associated data, are accessed by processor set 110 to control and direct performance of the inventive methods. In computing environment 100, at least some of the instructions for performing the inventive methods may be stored in block 200 in persistent storage 113.
COMMUNICATION FABRIC 111 is the signal conduction paths that allow the various components of computer 101 to communicate with each other. Typically, this fabric is made of switches and electrically conductive paths, such as the switches and electrically conductive paths that make up busses, bridges, physical input/output ports and the like. Other types of signal communication paths may be used, such as fiber optic communication paths and/or wireless communication paths.
VOLATILE MEMORY 112 is any type of volatile memory now known or to be developed in the future. Examples include dynamic type random access memory (RAM) or static type RAM. Typically, the volatile memory is characterized by random access, but this is not required unless affirmatively indicated. In computer 101, the volatile memory 112 is located in a single package and is internal to computer 101, but, alternatively or additionally, the volatile memory may be distributed over multiple packages and/or located externally with respect to computer 101.
PERSISTENT STORAGE 113 is any form of non-volatile storage for computers that is now known or to be developed in the future. The non-volatility of this storage means that the stored data is maintained regardless of whether power is being supplied to computer 101 and/or directly to persistent storage 113. Persistent storage 113 may be a read only memory (ROM), but typically at least a portion of the persistent storage allows writing of data, deletion of data and re-writing of data. Some familiar forms of persistent storage include magnetic disks and solid state storage devices. Operating system 122 may take several forms, such as various known proprietary operating systems or open source Portable Operating System Interface type operating systems that employ a kernel. The code included in block 200 typically includes at least some of the computer code involved in performing the inventive methods.
PERIPHERAL DEVICE SET 114 includes the set of peripheral devices of computer 101. Data communication connections between the peripheral devices and the other components of computer 101 may be implemented in various ways, such as Bluetooth connections, Near-Field Communication (NFC) connections, connections made by cables (such as universal serial bus (USB) type cables), insertion type connections (for example, secure digital (SD) card), connections made though local area communication networks and even connections made through wide area networks such as the internet. In various embodiments, UI device set 123 may include components such as a display screen, speaker, microphone, wearable devices (such as goggles and smart watches), keyboard, mouse, printer, touchpad, game controllers, and haptic devices. Storage 124 is external storage, such as an external hard drive, or insertable storage, such as an SD card. Storage 124 may be persistent and/or volatile. In some embodiments, storage 124 may take the form of a quantum computing storage device for storing data in the form of qubits. In embodiments where computer 101 is required to have a large amount of storage (for example, where computer 101 locally stores and manages a large database) then this storage may be provided by peripheral storage devices designed for storing very large amounts of data, such as a storage area network (SAN) that is shared by multiple, geographically distributed computers. IoT sensor set 125 is made up of sensors that can be used in Internet of Things applications. For example, one sensor may be a thermometer and another sensor may be a motion detector.
NETWORK MODULE 115 is the collection of computer software, hardware, and firmware that allows computer 101 to communicate with other computers through WAN 102. Network module 115 may include hardware, such as modems or Wi-Fi signal transceivers, software for packetizing and/or de-packetizing data for communication network transmission, and/or web browser software for communicating data over the internet. In some embodiments, network control functions and network forwarding functions of network module 115 are performed on the same physical hardware device. In other embodiments (for example, embodiments that utilize software-defined networking (SDN)), the control functions and the forwarding functions of network module 115 are performed on physically separate devices, such that the control functions manage several different network hardware devices. Computer readable program instructions for performing the inventive methods can typically be downloaded to computer 101 from an external computer or external storage device through a network adapter card or network interface included in network module 115.
WAN 102 is any wide area network (for example, the internet) capable of communicating computer data over non-local distances by any technology for communicating computer data, now known or to be developed in the future. In some embodiments, the WAN may be replaced and/or supplemented by local area networks (LANs) designed to communicate data between devices located in a local area, such as a Wi-Fi network. The WAN and/or LANs typically include computer hardware such as copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and edge servers.
END USER DEVICE (EUD) 103 is any computer system that is used and controlled by an end user (for example, a customer of an enterprise that operates computer 101) and may take any of the forms discussed above in connection with computer 101. EUD 103 typically receives helpful and useful data from the operations of computer 101. For example, in a hypothetical case where computer 101 is designed to provide a recommendation to an end user, this recommendation would typically be communicated from network module 115 of computer 101 through WAN 102 to EUD 103. In this way, EUD 103 can display, or otherwise present, the recommendation to an end user. In some embodiments, EUD 103 may be a client device, such as thin client, heavy client, mainframe computer, desktop computer and so on.
REMOTE SERVER 104 is any computer system that serves at least some data and/or functionality to computer 101. Remote server 104 may be controlled and used by the same entity that operates computer 101. Remote server 104 represents the machine(s) that collect and store helpful and useful data for use by other computers, such as computer 101. For example, in a hypothetical case where computer 101 is designed and programmed to provide a recommendation based on historical data, then this historical data may be provided to computer 101 from remote database 130 of remote server 104.
PUBLIC CLOUD 105 is any computer system available for use by multiple entities that provides on-demand availability of computer system resources and/or other computer capabilities, especially data storage (cloud storage) and computing power, without direct active management by the user. Cloud computing typically leverages sharing of resources to achieve coherence and economics of scale. The direct and active management of the computing resources of public cloud 105 is performed by the computer hardware and/or software of cloud orchestration module 141. The computing resources provided by public cloud 105 are typically implemented by virtual computing environments that run on various computers making up the computers of host physical machine set 142, which is the universe of physical computers in and/or available to public cloud 105. The virtual computing environments (VCEs) typically take the form of virtual machines from virtual machine set 143 and/or containers from container set 144. It is understood that these VCEs may be stored as images and may be transferred among and between the various physical machine hosts, either as images or after instantiation of the VCE. Cloud orchestration module 141 manages the transfer and storage of images, deploys new instantiations of VCEs and manages active instantiations of VCE deployments. Gateway 140 is the collection of computer software, hardware, and firmware that allows public cloud 105 to communicate through WAN 102.
Some further explanation of virtualized computing environments (VCEs) will now be provided. VCEs can be stored as “images.” A new active instance of the VCE can be instantiated from the image. Two familiar types of VCEs are virtual machines and containers. A container is a VCE that uses operating-system-level virtualization. This refers to an operating system feature in which the kernel allows the existence of multiple isolated user-space instances, called containers. These isolated user-space instances typically behave as real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can utilize all resources of that computer, such as connected devices, files and folders, network shares, CPU power, and quantifiable hardware capabilities. However, programs running inside a container can only use the contents of the container and devices assigned to the container, a feature which is known as containerization.
PRIVATE CLOUD 106 is similar to public cloud 105, except that the computing resources are only available for use by a single enterprise. While private cloud 106 is depicted as being in communication with WAN 102, in other embodiments a private cloud may be disconnected from the internet entirely and only accessible through a local/private network. A hybrid cloud is a composition of multiple clouds of different types (for example, private, community or public cloud types), often respectively implemented by different vendors. Each of the multiple clouds remains a separate and discrete entity, but the larger hybrid cloud architecture is bound together by standardized or proprietary technology that enables orchestration, management, and/or data/application portability between the multiple constituent clouds. In this embodiment, public cloud 105 and private cloud 106 are both part of a larger hybrid cloud.
Unit cells, including one or more conductive devices with conductance g0,0, . . . , g(M-1),(N-1), can be arranged in a crossbar array having M−1 rows of word lines WL0, . . . , WLM-1 and N−1 columns of bit lines BL0, . . . , BLN-1. The unit cells can be arranged at cross points of the crossbar array. The unit cells in analog memory device 202 can be, for example, resistive RAM (ReRAM), conductive-bridging RAM (CBRAM), NOR flash, magnetic RAM (MRAM), and phase-change memory (PCM). In machine learning applications, the unit cells can be programmed to store and encode synaptic weights values and biases of an artificial neural network (ANN).
The conductance values g00, . . . , gMN can encode or map a plurality of weights and biases, represented as a matrix W, of a trained machine learning model (e.g., a trained neural network). In an aspect, a conductance gij can be expressed as
where gij denotes a mapped conductance value of a conductive device at an i-th row and j-th column, wij denotes the parameter or weight value to be mapped to the conductive device at an i-th row and j-th column, Gmax denotes a maximum conductance value defined for the crossbar array and Wmax is the maximum parameter value to be mapped. A conductance range for encoding the parameters wij can be defined as the range of values of gij between the high conductive state Gmax and the low conductive state Gmin. The value of Gmin can be zero or a predefined value that can be relatively small.
In an aspect, hardware limitations and various device or circuit nonidealities of the peripheral circuitry can present challenges to mapping of weights to conductance states. In an aspect, ADCs can have non-linear and saturation regions, which are undesirable in the context of analog memory devices. If output currents of the analog memory device reside in the non-linear and saturation regions, significant errors can be introduced. In order to constrain output currents in the linear operation region, the conductance region of conductive devices in the unit cells being used for mapping parameters can be optimized.
One conventional technique to constrain output current in the linear operating region is bit slicing, but bit slicing tends to increase latency. Another conventional technique can aim to maximize the signal to noise ratio (SNR) in AIMCs without allowing peripheral circuitry to enter non-linear regions by tuning the conductance range for each column. Other conventional techniques, such as hardware-aware retraining (HWAR) techniques, can also be used for tuning conductance ranges. HWAR is a machine learning technique that uses characteristics and limitations of the target hardware on which a trained machine learning model will be deployed. In an aspect, HWAR techniques such as quantization-aware training (QAT) and noise injection or regularization, may be used for optimizing conductance ranges during retraining of the machine learning model. However, tuning and optimizing conductance ranges during training or learning increases the complexity, and thus decreases the performance of HWAR.
To be described in more detail below, the systems and methods described herein can optimize conductance ranges for each column, or different groups of columns, in an analog memory device post HWAR. The systems and methods described herein can utilize various known information, such as portions of dataset being used for training and evaluating a trained machine learning model and properties of the target hardware being used for mapping parameters of the trained machine learning model, to optimize conductance range of conductive devices in the target hardware. Further, a circuit model that simulates the target hardware can be used in the optimization without a need to reprogram the target hardware during optimization.
In one embodiment, memory 312 can be configured to store program code such as source code and/or executable code that can be accessed by processor 310 to run various applications and programs. By way of example, processor 310 can access program code stored in memory 312 to run a circuit modeling application that generates one or more circuit models simulating operations of hardware devices, such as analog memory devices. Processor 310 can also access various types of data being stored in memory 312.
In one embodiment, memory 312 can store parameters, such as weights wij and biases, of a trained machine learning (ML) model 314. The trained ML model 314 parameters are to be programmed and/or encoded in a target hardware 330. Once the parameters of trained ML model 314 are programmed and/or encoded in target hardware 330, target hardware 330 can be deployed to perform inference. Target hardware 330 can be an analog memory device, including a plurality of unit cells arranged in a crossbar arrangement. Each one of the plurality of unit cells can include non-volatile memory (NVM) devices implemented by conductive devices. In one embodiment, target hardware 330 can be an AIMC 202, as shown in
In one embodiment, prior to programming target hardware 330 with parameters of a trained ML model 314, the processor 310 can be configured to optimize conductance ranges of conductive devices in the unit cells of the target hardware 330. The conductance range optimization can begin with a processor 310 determining and/or approximating properties that may be required for target hardware 330 to generate output currents representing results of MAC operations, VMM operations, or other operations that can be performed by the target hardware 330. The parameters and properties that can be determined by processor 310 may include known information such as target hardware properties 318. Target hardware properties 318 can include at least one of, a number and composition of unit cells in target hardware 330, the physical conductance range of each unit cell in target hardware 330, input voltage range, transfer functions of the ADCs in the target hardware 330, weight and output noise model parameters, parameters relating to sources of non-linear errors (e.g., line resistances), process variation constants for circuit elements in target hardware 330, a number of inputs per read cycle used to determine output current distribution(s) of target hardware 330, and other properties of the target hardware 330. In one embodiment, the known information being determined by processor 310 can be dependent on an approach being used by target hardware 330 to compute and/or approximate output currents.
Processor 310 can set or define a plurality of initial conductance ranges for the unit cells of the target hardware 330. In one embodiment, to define the initial conductance ranges, processor 310 can set Gmin to be zero for all columns of unit cells, and set different values of Gmax for different columns of unit cells, or different groups of columns of unit cells. In one embodiment, in order to maximize signal to noise ratio (SNR), Gmax can be initially set to a maximum conductance value obtainable by all unit cells in the target hardware 330.
In one example, a first initial value of Gmax1 can be defined for the unit cells in column N=0 in
Processor 310 can obtain (e.g., retrieve from memory 312) parameters, wij, of the trained ML model 314. Processor 310 can use target hardware properties 318, the initial conductance ranges, and parameters wij to program a circuit model 332 in one embodiment. The circuit model 332 can be a virtual model or representation of the target hardware 330 that simulates operations, parameters, and properties of target hardware 330. Circuit model 332 can compute or estimate the output current distributions of the ADCs given the input word-line voltages. In one embodiment, circuit model 332 can be realized using a behavioral-based or physics-based model, such as a SPICE model (e.g., a text-description of an AIMC that can be used by a SPICE Simulator to mathematically predict the behavior of the AIMC under varying conditions). Processor 310 can also be configured to adjust these simulated parameters and properties of circuit model 332. In an embodiment, processor 310 can program circuit model 332 to have specifications in virtual space that may be identical or substantially identical to specifications of target hardware 330 in physical space. By way of example, programming circuit model 332 using the initial conductance ranges to encode weights wij in circuit model 332 can allow processor 310 to simulate operations of target hardware 330 where the same initial conductance ranges are used for encoding weights wij.
In order to run the circuit model 332, the processor 310 can provide inputs to the circuit model 332 in order for the circuit model 332 to perform operations, such as MAC or VMM operations, to generate an output distribution 334. By way of example, the target hardware 330 can be an AIMC 202, shown in
In one embodiment, processor 310 can determine the input to be provided to circuit model 332 by sampling subsets of at least one of either the training set, the validation set, or the test set. For example, processor 310 may sample subsets from the training set. As another example, processor 310 may sample subsets form the validation set. As yet another example, processor 310 may sample subsets from the test set. Yet as another example, processor 310 may sample subsets from two or more of those sets, e.g., two or more combinations of the training set, the validation set and the test set. In another embodiment, processor 310 can select a specific distribution (e.g., statistical distribution) from a plurality of candidate distributions, such as input distributions 316 stored in memory 314. The selected distribution can be a distribution that best fits the training set that was used for training of trained ML model 314. Processor 310 can input the selected distribution as an input to circuit model 332. In another embodiment, processor 310 can perform kernel density estimation (KDE) on the training set to generate a distribution and input the distribution to circuit model 332. In one embodiment, KDE can be performed on data that is representative of what the trained ML model 314 may receive after pre-processing is performed
Processor 310 can run the circuit model 332 with the determined inputs, and the circuit model 332 can output current values 334 that simulate currents being outputted from bit lines BL0, . . . , BLN of the AIMC 202. Processor 310 can determine an output distribution 336, that is a statistical distribution of current values 334. In one embodiment where the ADCs in target hardware 330 are current-controlled oscillators (CCO)-based ADCs, memory 312 can store a value of a linear correlation coefficient γ that defines a relationship between an ADC count (e.g., the value of the ADC counter) in target hardware 330 and the corresponding current. Processor 310 can determine, for one or more columns of unit cells, a sum G of conductance values programmed in circuit model 332. The sum G can be multiplied with the coefficient γ and voltages V that was used for encoding inputs being provided to the rows of unit cells intersecting the one or more columns of unit cell. Processor 310 can determine the current output of the column of unit cells by multiplying the product with a read voltage being used for reading the output current from the one or more columns of unit cells. By way of example, for unit cells in a j-th column, the output current Ij=Vread×GjV[V!=0]γ, where Vread is the voltage being used for reading output current from the unit cells. Note, that in the output current determination, the values of V being used in the determination are nonzero values of V.
Processor 310 can analyze the output distribution 336 to determine optimized conductance range(s) 338. By way of example, processor 310 can determine at least one property of output distribution 336, such as the mean, median, mode, standard deviation, entropy, skew, minimum and maximum values, an k-th quartile, or other properties. Processor 310 can update the conductance values in circuit model 332 based on the determined properties of output distribution 336.
In one embodiment, the processor 310 can perform optimization techniques, such as Bayesian optimization, to adjust the conductance values in circuit model 332 until optimized conductance ranges 338 are determined for each column, or different groups of columns, or unit cells in target hardware 330. Processor 310 can define objectives, such as objectives to minimize or maximize differences between output distribution 336 and a saturation current threshold (e.g., current in which the ADC enters non-linear or saturation regimes). In one embodiment, the objective defined by processor 310 can be to minimize a difference between the saturation current threshold of the ADCs and a k-th quartile of output distribution 336. Processor 310 can adjust conductance ranges of circuit model 332, or adjust Gmax, to cause circuit model 332 to generate updated current values 334. Processor 310 can update output distribution 336 with the updated current values 334 and determine whether the updated output distribution 336 satisfies the defined objective. Processor 310 can repeat the adjustment to the conductance values and update the output distribution 336 until a stopping criterion, such as when output distribution 336 satisfies the defined objective. The conductance values in circuit model 332 that causes the output distribution 336 to satisfy the defined objective can be set by processor 310 as optimized conductance ranges 338. In one embodiment, the stopping criterion can also be an upper bound, on a maximum number of iterations or adjustments made to the conductance value in circuit model 332.
In another embodiment, processor 310 can compare output distributions 336 with the saturation current threshold of the ADCs. The comparison can show which columns in circuit model 332 output current greater than or less than the saturation current threshold. Processor 310 can, for example, decrease Gmax for columns where the output current is greater than the saturation current threshold, to reduce the conductance range. Processor 310 can also increase Gmax for columns where the output current is less than the saturation current threshold, to increase the conductance range. Processor 310 can continue to increase or decrease Gmax for different columns, until a specific criteria or objective is satisfied. For example, a criterion may be to have X % of the columns outputting current below the saturation current threshold. Processor 310 can adjust Gmax until output distribution 336 shows X % of the columns output current below the saturation current threshold. The conductance values in circuit model 332 that causes X % of the columns output current below the saturation current threshold can be set by processor 310 as optimized conductance ranges 338.
In response to determining optimized conductance ranges 338, conductive devices in unit cells of target hardware 330 can be tuned to the optimized conductance ranges 338. In one embodiment, each column of unit cells in target hardware 330 can be tuned to an individual conductance range. In another embodiment, different regions or groups of columns of unit cells in target hardware 330 can be tuned to an individual conductance range. In another embodiment, all columns of unit cells in target hardware 330 can be tuned to a single conductance range. In response to target hardware 330 being tuned with optimized conductance ranges 338, parameters of trained ML model 314, such as weights wij or bias, can be encoded in the unit cells of target hardware 330. Target hardware 330 with encoded parameters of trained ML model 314 can be deployed to various applications for performing inference on new input data.
Process 400 can proceed from block 402 to block 404. At block 404, the processor can determine whether the selected j-th column (e.g., starting from j=0) of unit cells exceeds the last column (e.g., (N−1)-th column) of unit cells in the analog memory device. In response to the j-th column of unit cells exceeding the last column (e.g., j>N−1), the processor can determine that all columns have been processed and proceeds to end process 400. In response to the j-th column of unit cells preceding, or being equal to, the last column (e.g., j≤N−1), the process can proceed to block 406.
At block 406, the processor can select a k-th read cycle (e.g., starting from k=1). Depending on the unit cell configuration and ADC type in the analog memory device, outputs are read using one or more current read cycles. An example of a scenario where multiple read cycles are required, would be if an ADC could only read positive currents.
Process 400 can proceed from block 406 to block 408. At block 408, the processor can determine whether the selected k-th read cycle exceeded the last read cycle K (e.g., k>K). In response to k≤K, process 400 can proceed to block 410. At block 410, the processor can determine output distribution for the j-th column and k-th read cycle. If j>0 and k>1, the processor can update an existing output distribution.
Process 410 can proceed from block 410 to block 412. At block 412, the processor can select a next read cycle and process 400 can return to block 408. The processor can continue to determine and/or update the output distribution at block 410 until a last read cycle K is performed. Note that the output distribution being determined and/updated at block 410 is an output distribution of currents being outputted from the j-th column.
At block 408, in response to k>K, process 400 can proceed to block 414. At block 414, the processor can determine one or more properties of the output distribution determined at block 410. Process 400 can proceed from block 414 to block 416. At block 416, the processor can determine whether the properties determined at block 414 adhere to predefined criterions (the criterions as described above). In response to the properties determined at block 414 not adhering to the predefined criterions, process 400 can proceed to block 418.
At block 418, the processor can update the conductance range that was initialized for the j-th column at block 402. Process 400 can return from block 418 to block 406, where the processor can select the k-th current read cycle starting from k=1. The processor can continue to perform one or more of blocks 408, 410, 412, 414, 416, 418 until the properties determined at block 414 adhere to predefined criterions at block 416. Note that the updated conductance range in block 418 that caused the properties to adhere to the predefined criterions can be set as optimized conductance range for the j-th column. In response to the properties determined at block 414 adhering to predefined criterions at block 416, process 400 can proceed to block 420. At block 420, the processor can select a next column of unit cells and return to block 404.
Process 500 can begin at a block 502. At block 502, a processor (e.g., processor 310) can initialize conductance range for each column of unit cells in an analog memory device. and select a j-th column of unit cells. Different conductance ranges can be initialized for each individual column of unit cells, or for groups or regions of columns of unit cells, or one conductance range can be defined for all columns of unit cells.
Process 500 can proceed from block 502 to block 504. At block 504, the processor can determine whether the selected j-th column (e.g., starting from j=0) of unit cells exceeds the last column (e.g., (N−1)-th column) of unit cells in the analog memory device. In response to the j-th column of unit cells exceeding the last column (e.g., j>N−1), the processor can determine that all columns have been processed and proceeds to end process 500. In response to the j-th column of unit cells preceding, or being equal to, the last column (e.g., j≤N−1), the process can proceed to bock 506.
At block 506, the processor can select a k-th read cycle (e.g., starting from k=1). Depending on the unit cell configuration and ADC type in the analog memory device, outputs are read using one or more current read cycles. An example of a scenario where multiple read cycles are required would be if an ADC could only read positive currents. Process 500 can proceed from block 506 to block 508. At block 508, the processor can set an initial maximum current for the j-th column based on the saturation current of one or more ADCs in target hardware 330 (see
Process 500 can proceed from block 508 to block 510. At block 510, the processor can determine whether the selected k-th read cycle exceeded the last read cycle K (e.g., k>K). In response to k≤K, process 500 can proceed to block 518. At block 518, the processor can determine output distribution for the j-th column and k-th read cycle. If j>0 and k>1, the processor can update an existing output distribution.
Process 500 can proceed from block 518 to block 520. At block 520, the processor can compare the output current of the j-th column from the k-th read cycle with the maximum current set at block 508. In response to the output current being less than or equal to the maximum current, process 500 can proceed to block 524. In response to the output current being greater than the maximum current, process 500 can proceed to block 522. At block 522, the processor can update the maximum current that was set at block 508. By way of example, the processor can reduce the maximum current in block 522. Process 500 can proceed from block 522 to block 524.
At block 524, the processor can select a next current read cycle and process 500 can return to block 510 to continue to perform the remaining current read cycles and determine output distributions for updating the maximum current. At block 510, in response to k>K, process 500 can proceed to block 512. At block 512, the processor can compare the maximum current with a saturation current, or a percentage of the saturation current, of an ADC being used for reading out current from the j-th column in the analog memory device. In response to the maximum current being less than the saturation current of the ADC of the j-th column, process 500 can proceed to block 514. In response to the maximum current being greater than the saturation current of the ADC of the j-th column, process 500 can proceed to block 516. At block 516, the processor can reduce Gmax for the j-th column to adjust the conductance range of the j-th column. The reduction in block 516 can constrain the output current of the j-th column to be smaller than the saturation current of the ADC, despite the maximum current being greater than the saturation current. This constraint can cause the output current of the j-th column to be within a linear regime, and thus reduce errors produced by the ADCs. Process 500 can proceed from block 516 to block 514. At block 514, the processor can select a next column of unit cells and return to block 504.
The iterative approach of process 500 can allow system 300 to optimize conductance ranges without using specific optimization algorithms. System 300 of
Process 600 can be performed by system 300 shown in
Process 600 can proceed from block 602 to block 604. At block 604, the processor can use the plurality of initial conductance ranges to encode a plurality of parameter values in a circuit model of the analog memory device.
Process 600 can proceed from block 604 to block 606. At block 606, the processor can input a plurality of sample inputs into the circuit model to determine an output current distribution correlated to a plurality of products between the plurality of sample inputs and the plurality of parameter values. In one embodiment, the plurality of parameter values can represent parameters of a trained machine learning model. The processor can select an input distribution that best fits a training set of the trained machine learning model. The processor can further select the plurality of sample inputs from the input distribution. In one embodiment, the processor can select the plurality of sample inputs by applying a kernel density estimation technique on the training set. In one embodiment, the processor can select the plurality of sample inputs by selecting at least one of a subset of a training set used in training the trained machine learning model, a validation set used in an accuracy evaluation during training of the trained machine learning model and a test set used in performance evaluation of the trained machine learning model.
Process 600 can proceed from block 606 to block 608. At block 608, the processor can determine, based on at least one property of the output current distribution, an optimal conductance range for the plurality of unit cells. In one embodiment, the processor can determine the output current distribution comprises measuring an output current of every column of unit cells sequentially.
In one embodiment, the processor can determine the optimal conductance range by defining an objective corresponding to the output current distribution and an ADC connected to outputs of the plurality of unit cells. The processor can further compare the at least one property of the output current distribution with different current region boundaries of the ADC to obtain a difference. The processor can further update the optimal conductance range until the difference satisfies the objective.
In one embodiment, the processor can initialize a value of a maximum current. The processor can adjust the maximum current based on a difference between an output current of a column of unit cells and the maximum current. The processor can compare the adjusted maximum current with a saturation current of an ADC connected to outputs of the plurality of unit cells to obtain a difference. The processor can determine the optimal conductance range of the column of unit cells based on the difference.
In one embodiment, the processor can determine a respective optimal conductance range for each column of unit cells. In one embodiment, the processor can determine a respective optimal conductance range for different groups of columns of unit cells. In one embodiment, the processor can determine an optimal conductance range for an entirety of the plurality of unit cells.
In one embodiment, the processor can determine a product among a sum of conductance values of a subset of the plurality of unit cells that received a non-zero input among the plurality of input samples, a read voltage being used for reading output current from the plurality of unit cells, and a linear correlation coefficient. The processor can further approximate the output current distribution using the product.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be implemented substantially concurrently, or the blocks may sometimes be implemented in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term “or” is an inclusive operator and can mean “and/or”, unless the context explicitly or clearly indicates otherwise. It will be further understood that the terms “comprise”. “comprises”, “comprising”, “include”, “includes”, “including”, and/or “having.” when used herein, can specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the phrase “in an embodiment” does not necessarily refer to the same embodiment, although it may. As used herein, the phrase “in one embodiment” does not necessarily refer to the same embodiment, although it may. As used herein, the phrase “in another embodiment” does not necessarily refer to a different embodiment, although it may. Further, embodiments and/or components of embodiments can be freely combined with each other unless they are mutually exclusive.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements, if any, in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.