This application relates to the field of neural networks, and more specifically, to a method for data processing in a neural network system and a neural network system.
Artificial intelligence (AI) is a theory, a method, a technology, or an application system that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by a digital computer, to sense an environment, obtain knowledge, and achieve an optimal result by using the knowledge. In other words, artificial intelligence is a branch of computer science, and seeks to learn essence of intelligence and produce a new intelligent machine that can react in a way similar to artificial intelligence. Artificial intelligence is to study design principles and implementation methods of various intelligent machines, so that the machines have perceiving, inference, and decision-making functions. Researches in the field of artificial intelligence include robots, natural language processing, computer vision, decision-making and inference, human-machine interaction, recommendation and search, AI basic theories, and the like.
In the AI field, deep learning is a learning technology based on a deep artificial neural network (ANN) algorithm. A training process of a neural network is a data-centric task, and requires computing hardware to have a processing capability with high performance and low power consumption.
A neural network system based on a plurality of neural network arrays may implement in-memory computing, and may process a deep learning task. For example, at least one in-memory computing unit in the neural network arrays may store a weight value of a corresponding neural network layer. Due to a network structure or system architecture design, processing speeds of the neural network arrays may be inconsistent. In this case, a plurality of neural network arrays may be used to perform parallel processing, and perform joint computing to accelerate the neural network arrays at speed bottlenecks. However, due to some non-ideal characteristics of in-memory computing units in neural network arrays participating in parallel acceleration, such as component fluctuation, conductance drift, and an array yield rate, overall performance of the neural network system is reduced, and accuracy of the neural network system is relatively low.
This application provides a method for data processing in a neural network system using parallel acceleration and a neural network system, to resolve impact caused by a non-ideal characteristic of a component when a parallel acceleration technology is used, and improve performance and recognition accuracy of the neural network system.
According to a first aspect, a method for data processing in a neural network system is provided, including: in a neural network system using parallel acceleration, inputting training data into the neural network system to obtain first output data, where the neural network system includes a plurality of neural network arrays, each of the plurality of neural network arrays includes a plurality of in-memory computing units, and each in-memory computing unit is configured to store a weight value of a neuron in a corresponding neural network; calculating a deviation between the first output data and target output data; and adjusting, based on the deviation, a weight value stored in at least one in-memory computing unit in some neural network arrays in the plurality of neural network arrays, where the some neural network arrays are configured to implement computing of some neural network layers in the neural network system.
In the foregoing technical solution, a weight value stored in an in-memory computing unit in some neural network arrays in the plurality of neural network arrays may be adjusted and updated based on a deviation between actual output data of the neural network arrays and the target output data, so that compatibility with a non-ideal characteristic of the in-memory computing unit may be implemented, to improve a recognition rate and performance of the system, thereby avoiding degradation of the system performance caused by the non-ideal characteristic of the in-memory computing unit.
In a possible implementation of the first aspect, the plurality of neural network arrays include a first neural network array and a second neural network array, and input data of the first neural network array includes output data of the second neural network array.
In another possible implementation of the first aspect, the first neural network array includes a neural network array configured to implement computing of a fully-connected layer in the neural network.
In the foregoing technical solution, only a weight value stored in an in-memory computing unit in the neural network array that implements computing of the fully-connected layer may be adjusted and updated, so that compatibility with a non-ideal characteristic of the in-memory computing unit may be implemented, to improve a recognition rate and performance of the system. The solution is effective and easy to implement with relatively low costs.
In another possible implementation of the first aspect, a weight value stored in at least one in-memory computing unit in the first neural network array is adjusted based on input data of the first neural network array and the deviation.
In another possible implementation of the first aspect, the plurality of neural network arrays further include a third neural network array, and the third neural network array and the second neural network array are configured to implement computing of a convolutional layer in the neural network in parallel.
In another possible implementation of the first aspect, a weight value stored in at least one in-memory computing unit in the second neural network array is adjusted based on input data of the second neural network array and the deviation, and a weight value stored in at least one in-memory computing unit in the third neural network array is adjusted based on input data of the third neural network array and the deviation.
In the foregoing technical solution, weight values stored in in-memory computing units in a plurality of neural network arrays that implement computing of the convolutional layer in the neural network in parallel may alternatively be adjusted and updated, to improve adjustment precision, thereby improving accuracy of output of the neural network system.
In another possible implementation of the first aspect, the deviation is divided into at least two sub-deviations, where a first sub-deviation in the at least two sub-deviations corresponds to the output data of the second neural network array, and a second sub-deviation in the at least two sub-deviations corresponds to output data of the third neural network array; a weight value stored in at least one in-memory computing unit in the second neural network array is adjusted based on the first sub-deviation and input data of the second neural network array; and a weight value stored in at least one in-memory computing unit in the third neural network array is adjusted based on the second sub-deviation and input data of the third neural network array.
In another possible implementation of the first aspect, a quantity of pulses is determined based on an updated weight value in the in-memory computing unit, and the weight value stored in the at least one in-memory computing unit in the neural network array is rewritten based on the quantity of pulses.
According to a second aspect, a neural network system is provided, including:
a processing module, configured to input training data into the neural network system to obtain first output data, where the neural network system includes a plurality of neural network arrays, each of the plurality of neural network arrays includes a plurality of in-memory computing units, and each in-memory computing unit is configured to store a weight value of a neuron in a corresponding neural network;
a calculation module, configured to calculate a deviation between the first output data and target output data; and
an adjustment module, configured to adjust, based on the deviation, a weight value stored in at least one in-memory computing unit in some neural network arrays in the plurality of neural network arrays, where the some neural network arrays are configured to implement computing of some neural network layers in the neural network system.
In a possible implementation of the second aspect, the plurality of neural network arrays include a first neural network array and a second neural network array, and input data of the first neural network array includes output data of the second neural network array.
In another possible implementation of the second aspect, the first neural network array includes a neural network array configured to implement computing of a fully-connected layer in the neural network.
In another possible implementation of the second aspect, the adjustment module is specifically configured to:
adjust, based on input data of the first neural network array and the deviation, a weight value stored in at least one in-memory computing unit in the first neural network array.
In another possible implementation of the second aspect, the plurality of neural network arrays further include a third neural network array, and the third neural network array and the second neural network array are configured to implement computing of a convolutional layer in the neural network in parallel.
In another possible implementation of the second aspect, the adjustment module is specifically configured to:
adjust, based on input data of the second neural network array and the deviation, a weight value stored in at least one in-memory computing unit in the second neural network array; and adjust, based on input data of the third neural network array and the deviation, a weight value stored in at least one in-memory computing unit in the third neural network array.
In another possible implementation of the second aspect, the adjustment module is specifically configured to:
divide the deviation into at least two sub-deviations, where a first sub-deviation in the at least two sub-deviations corresponds to the output data of the second neural network array, and a second sub-deviation in the at least two sub-deviations corresponds to output data of the third neural network array;
adjust, based on the first sub-deviation and input data of the second neural network array, a weight value stored in at least one in-memory computing unit in the second neural network array; and adjust, based on the second sub-deviation and input data of the third neural network array, a weight value stored in at least one in-memory computing unit in the third neural network array.
In another possible implementation of the second aspect, the adjustment module is specifically configured to determine a quantity of pulses based on an updated weight value in the in-memory computing unit, and rewrite, based on the quantity of pulses, the weight value stored in the at least one in-memory computing unit in the neural network array.
Beneficial effects of the second aspect and any possible implementation of the second aspect are corresponding to beneficial effects of the first aspect and any possible implementation of the first aspect. Details are not described herein again.
According to a third aspect, a neural network system is provided, including a processor and a memory. The memory is configured to store a computer program, and the processor is configured to invoke and run the computer program from the memory, so that the neural network system performs the method provided in any one of the first aspect or the possible implementations of the first aspect.
Optionally, during specific implementation, a quantity of processors is not limited. The processor may be a general-purpose processor, and may be implemented by hardware, or may be implemented by software. When the processor is implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like. When the processor is implemented by software, the processor may be a general-purpose processor, and is implemented by reading software code stored in the memory. The memory may be integrated into the processor, or may be located outside the processor and exist independently.
According to a fourth aspect, a chip is provided, and the neural network system according to any one of the second aspect or the possible implementations of the second aspect is disposed on the chip.
The chip includes a processor and a data interface, and the processor reads, by using the data interface, instructions stored in a memory, to perform the method in any one of the first aspect or the possible implementations of the first aspect. In a specific implementation process, the chip may be implemented in a form of a central processing unit (CPU), a micro controller unit (MCU), a micro processing unit (MPU), a digital signal processor (DSP), a system-on-a-chip (SoC), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or a programmable logic device (PLD).
According to a fifth aspect, a computer program product is provided. The computer program product includes computer program code. When the computer program code is run on a computer, the computer is enabled to perform the method in any one of the first aspect or the possible implementations of the first aspect.
According to a sixth aspect, a computer-readable storage medium is provided. The computer-readable storage medium stores computer program code. When the computer program code is run on a computer, the computer is enabled to perform the method in any one of the first aspect or the possible implementations of the first aspect. The computer-readable storage includes but is not limited to one or more of the following: a read-only memory (ROM), a programmable ROM (PROM), an erasable PROM (EPROM), a flash memory, an electrically EPROM (EEPROM), and a hard drive.
The following describes technical solutions of this application with reference to accompanying drawings.
Artificial intelligence (AI) is a theory, a method, a technology, or an application system that simulates, extends, and expands human intelligence by using a digital computer or a machine controlled by a digital computer, to sense an environment, obtain knowledge, and achieve an optimal result by using the knowledge. In other words, artificial intelligence is a branch of computer science, and seeks to learn essence of intelligence and produce a new intelligent machine that can react in a way similar to artificial intelligence. Artificial intelligence is to study design principles and implementation methods of various intelligent machines, so that the machines have perceiving, inference, and decision-making functions. Researches in the field of artificial intelligence include robots, natural language processing, computer vision, decision-making and inference, human-machine interaction, recommendation and search, AI basic theories, and the like.
In the AI field, deep learning is a learning technology based on a deep artificial neural network (ANN) algorithm. An artificial neural network (ANN) is referred to as a neural network (NN) or a quasi-neural network for short. In the machine learning and cognitive science fields, the artificial neural network is a mathematical model or a computing model that simulates a structure and a function of a biological neural network (a central nervous system of an animal, especially a brain), and is used to estimate or approximate a function. The artificial neural network may include a convolutional neural network (CNN), a multilayer perceptron (MLP), a recurrent neural network (RNN), and the like.
A training process of a neural network is also a process of learning a parameter matrix, and a final purpose is to obtain a parameter matrix of each layer of neurons in a trained neural network (the parameter matrix of each layer of neurons includes a weight corresponding to each neuron included in the layer of neurons). Each parameter matrix including weights obtained through training may extract pixel information from a to-be-inferred image input by a user, to help the neural network perform correct inference on the to-be-inferred image, so that a predicted value output by the trained neural network is as close as possible to prior knowledge of training data.
It should be understood that the prior knowledge is also referred to as a ground truth, and generally includes a true result corresponding to the training data provided by the user.
The training process of the neural network is a data-centric task, and requires computing hardware to have a processing capability with high performance and low power consumption. Because a storage unit and a computing unit are separated in computing based on a conventional Von Neumann architecture, a large amount of data needs to be moved, and energy-efficient processing cannot be implemented.
The following describes a system architectural diagram of this application with reference to
The neural network circuit 110 is connected to the host 105 by using a host interface. The host interface may include a standard host interface and a network interface. For example, the host interface may include a peripheral component interconnect express (PCIe) interface.
In an example, as shown in
The host 105 may include a processor 1052 and a memory 1054. It should be noted that, in addition to the components shown in
The processor 1052 is an operation unit and a control unit of the host 105. The processor 1052 may include a plurality of processor cores. The processor 1052 may be an integrated circuit with an ultra-large scale. An operating system and another software program are installed in the processor 1052, so that the processor 1052 can access the memory 1054, a cache, a magnetic disk, and a peripheral device (for example, the neural network circuit in
It should be understood that the processor 1052 in this embodiment of this application may alternatively be another general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or another programmable logic device, a discrete gate or a transistor logic device, a discrete hardware component, or the like. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
The memory 1054 is a main memory of the host 105. The memory 1054 is connected to the processor 1052 by using a double data rate (DDR) bus. The memory 1054 is usually configured to store various software running in the operating system, input data and output data, information exchanged with an external memory, and the like. To improve an access rate of the processor 1052, the memory 1054 needs to have an advantage of a high access rate. In a conventional computer system architecture, a dynamic random access memory (DRAM) is usually used as the memory 1054. The processor 1052 can access the memory 1054 at a high rate by using a memory controller (not shown in
It should be further understood that the memory 1054 in this embodiment of this application may be a volatile memory or a non-volatile memory, or may include both a volatile memory and a non-volatile memory. The non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (programmable ROM, PROM), an erasable programmable read-only memory (erasable PROM, EPROM), an electrically erasable programmable read-only memory (electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (RAM), and is used as an external cache. Through an example rather than limitative description, random access memories (RAMs) in many forms may be used, for example, a static random access memory (static RAM, SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (synchronous DRAM, SDRAM), a double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), an enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), a synchlink dynamic random access memory (synchlink DRAM, SLDRAM), and a direct rambus random access memory (direct rambus RAM, DR RAM).
The neural network circuit 110 shown in
The neural network circuit 210 is connected to the host 105 by using a host interface. As shown in
The neural network circuit 210 shown in
Optionally, the architectures of the neural network systems in
In some examples, the neural network circuit may be implemented by a plurality of neural network matrices that implement in-memory computing. Each of the plurality of neural network matrices may include a plurality of in-memory computing units, and each in-memory computing unit is configured to store a weight value of each layer of neurons in a corresponding neural network, to implement computing of a neural network layer.
The in-memory computing unit is not specifically limited in this embodiment of this application, and may include but is not limited to a memristor, a static RAM (SRAM), a NOR flash, a magnetic RAM (MRAM), a ferroelectric gate field-effect transistor (FeFET), and an electrochemical RAM (ECRAM). The memristor may include but is not limited to a resistive random-access memory (ReRAM), a conductive-bridging RAM (CBRAM), and a phase-change memory (PCM).
For example, the neural network matrix is a ReRAM crossbar including ReRAMs. The neural network system may include a plurality of ReRAM crossbars.
In this embodiment of this application, the ReRAM crossbar may also be referred to as a memristor cross array, a ReRAM component, or a ReRAM. A chip including one or more ReRAM crossbars may be referred to as a ReRAM chip.
The ReRAM crossbar is a radically new non-Von Neumann computing architecture. The architecture integrates storage and computing functions, has a flexible configurable feature, and uses an analog computing manner. The architecture is expected to implement matrix-vector multiplication with a higher speed and lower energy consumption than a conventional computing architecture, and has a wide application prospect in neural network computing.
With reference to
In this embodiment of this application, the neural network layer is a logical layer concept, and one neural network layer means that one neural network operation needs to be performed. Computing of each neural network layer is implemented by a computing node (which may also be referred to as a neuron). In actual application, the neural network layer may include a convolutional layer, a pooling layer, a fully-connected layer, and the like.
A person skilled in the art knows that when neural network computing (for example, convolution computing) is performed, a computing node in a neural network system may compute input data and a weight of a corresponding neural network layer. In the neural network system, a weight is usually represented by a real number matrix, and each element in a weight matrix represents a weight value. The weight is usually used to indicate importance of input data to output data. As shown in
Computing of each neural network layer may be implemented by the ReRAM crossbar, and the ReRAM has an advantage of in-memory computing. Therefore, the weight may be configured on a plurality of ReRAM cells of the ReRAM crossbar before computing. Therefore, a matrix multiply-add operation of input data and the configured weight may be implemented by using the ReRAM crossbar.
It should be understood that the ReRAM cell in this embodiment of this application may also be referred to as a memristor cell. Configuring the weight on the memristor cell before computing may be understood as storing, in the memristor cell, a weight value of a neuron in a corresponding neural network. Specifically, the weight value of the neuron in the neural network may be indicated by using a resistance value or a conductance value of the memristor cell.
It should be further understood that, in actual application, there may be a one-to-one mapping relationship or a one-to-many mapping relationship between the ReRAM crossbar and the neural network layer. The following provides a detailed description with reference to the accompanying drawings, and details are not described herein.
For clarity of description, the following briefly describes a process in which the ReRAM crossbar implements the matrix multiply-add operation.
It should be noted that, in
A ReRAM crossbar 120 shown in
In this embodiment of this application, a weight of a neuron in the neural network may be represented by using a conductance value of a memristor. Specifically, in an example, each element in the weight matrix shown in
Different conductance values of memristor cells may indicate different weights that are of neurons in the neural network and that are stored by the memristor cells.
In a process of performing neural network computing, n pieces of input data Vi may be represented by using voltage values loaded to BLs of the memristor, for example, V1, V2, V3, . . . , and Vn in
It should be understood that there are a plurality of implementations for the voltage values loaded to the memristor. This is not specifically limited in this embodiment of this application. For example, the voltage value may be represented by using a voltage pulse amplitude. For another example, the voltage value may alternatively be represented by using a voltage pulse width. For another example, the voltage value may alternatively be represented by using a voltage pulse quantity. For another example, the voltage value may alternatively be represented by using a combination of a voltage pulse quantity and a voltage pulse amplitude.
It should be noted that the foregoing uses one neural network array as an example to describe in detail a process in which the neural network array completes corresponding multiply-accumulate computing in the neural network. In actual application, multiply-accumulate computing required by a complete neural network is jointly completed by a plurality of neural network arrays.
One neural network array in the plurality of neural network arrays may correspond to one neural network layer, and the neural network array is configured to implement computing of the one neural network layer. Alternatively, the plurality of neural network arrays may correspond to one neural network layer, and are configured to implement computing of the one neural network layer. Alternatively, one neural network array in the plurality of neural network arrays may correspond to a plurality of neural network layers, and is configured to implement computing of the plurality of neural network arrays.
With reference to
For ease of description, an example in which a memristor array is a neural network array is used for description below.
In this embodiment of this application, the neural network layer is a logical layer concept, and one neural network layer means that one neural network operation needs to be performed. Computing of each neural network layer is implemented by a computing node. The neural network layer may include a convolutional layer, a pooling layer, a fully-connected layer, and the like.
As shown in
It should be understood that the pooling operation or the activation operation may be implemented by an external digital circuit module. Specifically, the external digital circuit module (not shown in
It may be understood that
The first memristor array may implement computing of a fully-connected layer in a neural network. Specifically, a weight of the fully-connected layer in the neural network may be stored in the first memristor array, and a conductance value of each memristor cell in the memristor array may be used to indicate the weight of the fully-connected layer and implement a multiply-accumulate computing process of the fully-connected layer in the neural network.
It should be noted that the fully-connected layer in the neural network may alternatively correspond to a plurality of memristor arrays, and the plurality of memristor arrays jointly complete computing of the fully-connected layer. This is not specifically limited in this application.
A plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) shown in
It should be understood that a convolution kernel represents a feature extraction manner in a neural network computing process. For example, when image processing is performed in the neural network system, an input image is given, and each pixel in an output image is weighted averaging of pixels in a small area of the input image. A weighted value is defined by a function, and the function is referred to as the convolution kernel. In the computing process, the convolution kernel successively sweeps an input feature map based on a specific stride, to generate output data (also referred to as an output feature map) after feature extraction. Therefore, a convolution kernel size is also used to indicate a size of a data volume for which a computing node in the neural network system performs one computation. A person skilled in the art may know that the convolution kernel may be represented by using a real number matrix. For example,
Input data of a plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) for parallel computing may include output data of another memristor array or external input data, and output data of the plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) may be used as input data of the shared first memristor array. That is, the input data of the first memristor array may include the output data of the plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array).
There may be a plurality of structures of the input data and the output data of the plurality of memristor arrays for parallel computing. This is not specifically limited in this application.
With reference to
In a possible implementation, as shown in a manner 1 in
For example, input data of the second memristor array is data 1, input data of the third memristor array is data 2, and input data of the fourth memristor array is data 3. For a convolutional layer, one piece of complete input data includes a combination of the data 1, the data 2, and the data 3. Similarly, output data of the second memristor array is a result 1, output data of the third memristor array is a result 2, and output data of the fourth memristor array is a result 3. For the convolutional layer, one piece of complete output data includes a combination of the result 1, the result 2, and the result 3.
Specifically, referring to
In another possible implementation, as shown in a manner 2 in
For example, input data of the second memristor array is data 1. For a convolutional layer, the data 1 is one piece of complete input data, output data of the data 1 is a result 1, and the result 1 is one piece of complete output data. Similarly, input data of the third memristor array is data 2. For the convolutional layer, the data 2 is one piece of complete input data, output data of the data 2 is a result 2, and the result 2 is one piece of complete output data. Input data of the fourth memristor array is data 3. For the convolutional layer, the data 3 is one piece of complete input data, output data of the data 3 is a result 3, and the result 3 is one piece of complete output data.
Specifically, referring to
If an in-memory computing unit in a neural network array is affected by some non-ideal characteristics such as component fluctuation, conductance drift, and an array yield rate, the in-memory computing unit cannot achieve a lossless weight. As a result, overall performance of a neural network system is degraded, and a recognition rate of the neural network system is reduced.
The technical solutions provided in embodiments of this application may improve performance and recognition accuracy of the neural network system.
With reference to
It should be noted that the technical solutions in embodiments of this application may be applied to various neural networks, for example, a convolutional neural network (CNN), a recurrent neural network widely used in natural language and speech processing, and a deep neural network combining the convolutional neural network and the recurrent neural network. A processing process of the convolutional neural network is similar to a processing process of an animal visual system, so that the convolutional neural network is very suitable for the field of image recognition. The convolutional neural network is applicable to a wide range of image recognition fields such as security protection, computer vision, and safe city, as well as speech recognition, search engine, machine translation, and other fields. In actual application, a large quantity of parameters and a large computation amount bring great challenges to application of a neural network in a scenario with high real-time performance and low power consumption.
Step 1010: Input training data into a neural network system to obtain first output data.
In this embodiment of this application, the neural network system using parallel acceleration may include a plurality of neural network arrays, each of the plurality of neural network arrays may include a plurality of in-memory computing units, and each in-memory computing unit is configured to store a weight value of a neuron in a corresponding neural network.
Step 1020: Calculate a deviation between the first output data and target output data.
The target output data may be an ideal value of the first output data that is actually output.
The deviation in this embodiment of this application may be a calculated difference between the first output data and the target output data, or may be a calculated residual between the first output data and the target output data, or may be a calculated loss function in another form between the first output data and the target output data.
Step 1030: Adjust, based on the deviation, a weight value stored in at least one in-memory computing unit in some neural network arrays in the plurality of neural network arrays in the neural network system using parallel acceleration.
In this embodiment of this application, the some neural network arrays may be configured to implement computing of some neural network layers in the neural network system. That is, a correspondence between the neural network array and the neural network layer may be a one-to-one relationship, a one-to-many relationship, or a many-to-one relationship.
For example, a first memristor array shown in
It should be understood that the neural network layer is a logical layer concept, and one neural network layer means that one neural network operation needs to be performed. For details, refer to the description in
A resistance value or a conductance value in an in-memory computing unit may be used to indicate a weight value in a neural network layer. In this embodiment of this application, a resistance value or a conductance value in the at least one in-memory computing unit in the some neural network arrays in the plurality of neural network arrays may be adjusted or rewritten based on the calculated deviation.
There are a plurality of implementations for adjusting or rewriting the resistance value or the conductance value in the in-memory computing unit. In a possible implementation, an update value of the resistance value or the conductance value in the in-memory computing unit may be determined based on the deviation, and a fixed quantity of programming pulses may be applied to the in-memory computing unit based on the update value. In another possible implementation, an update value of the resistance value or the conductance value in the in-memory computing unit is determined based on the deviation, and a programming pulse is applied to the in-memory computing unit in a read-while-write manner. In another possible implementation, different quantities of programming pulses may alternatively be applied based on characteristics of different in-memory computing units, to adjust or rewrite resistance values or conductance values in the in-memory computing units. The following provides description with reference to specific embodiments, and details are not described herein.
It should be noted that, in this embodiment of this application, a resistance value or a conductance value of a neural network array that is in the plurality of neural network arrays and that is configured to implement a fully-connected layer may be adjusted by using the deviation, or a resistance value or a conductance value of a neural network array that is in the plurality of neural network arrays and that is configured to implement a convolutional layer may be adjusted by using the deviation, or resistance values or conductance values of a neural network array configured to implement a fully-connected layer and a neural network array configured to implement a convolutional layer may be simultaneously adjusted by using the deviation. The following provides a detailed description with reference to
For ease of description, the following first describes a computing process of a residual in detail by using a computation of a residual between an actual output value and a target output value as an example.
In a forward propagation (FP) computing process, a training data set such as pixel information of an input image is obtained, and data of the training data set is input into a neural network. After transmission from a first layer of neural network to a last layer of neural network, an actual output value is obtained from output of the last layer of neural network.
In a back propagation (BP) computing process, it is expected that an actual output value of a neural network is as close as possible to prior knowledge of training data. The prior knowledge is also referred to as a ground truth or an ideal output value, and generally includes a true result corresponding to the training data provided by a person. Therefore, a current actual output value may be compared with the ideal output value, and then a residual value may be calculated based on a deviation between the current actual output value and the ideal output value. Specifically, a partial derivative of a target loss function may be calculated. A required update weight value is calculated based on the residual value, so that a weight value stored in at least one in-memory computing unit in a neural network array may be updated based on the required update weight value.
In an example, a square of a difference between the actual output value of the neural network and the ideal output value may be calculated, and the square is used to calculate a derivative of a weight in a weight matrix, to obtain a residual value.
Based on the determined residual value and input data corresponding to a weight value, a required update weight value is determined by using a formula (1).
ΔW represents the required update weight value, rl represents a learning rate, N indicates that there are N groups of input data, V represents an input data value of a current layer, and δ represents a residual value of the current layer.
Specifically, referring to
In a forward operation, a voltage is input at the BL, a current is output at the SL, and a matrix-vector multiplication computation of Y=XW is completed (X corresponds to an input voltage V, and Y corresponds to an output current I). X is input computation data that may be used for forward inference.
In a backward operation, a voltage is input at the SL, a current is output at the BL, and a computation of Y=XWT is performed (X corresponds to an input voltage V, and Y corresponds to an output current I). X is a residual value, that is, a back propagation computation of the residual value is completed. A memristor array update operation (also referred to as in-situ updating) may complete a process of changing a weight in a gradient direction.
Optionally, in some embodiments, for a cumulative update weight obtained in a row m and a column n of the layer, whether to update a weight value of the row m and the column n of the layer may be further determined based on the following formula (2).
Threshold represents a preset threshold.
For the cumulative update weight ΔWm,n obtained in the row m and the column n of the layer, a threshold updating rule shown in the formula (2) is used. That is, for a weight that does not meet a threshold requirement, no updating is performed. Specifically, if ΔWm,n is greater than or equal to the preset threshold, the weight value of the row m and the column n of the layer may be updated. If ΔWm,n is less than the preset threshold, the weight value of the row m and the column n of the layer is not updated.
With reference to
As shown in
As shown in
In this embodiment of this application, a residual value may be calculated based on the first output data and ideal output data by using the foregoing method for calculating a residual value. In addition, based on the formula (1), in-situ updating is performed on a weight value stored in each memristor in the first memristor array for implementing computing of the fully-connected layer.
As shown in
In this embodiment of this application, a residual value may be calculated based on the first output data and ideal output data by using the foregoing method for calculating a residual value. In addition, based on the formula (1), in-situ updating is performed on a weight value stored in each memristor in the first memristor array for implementing computing of the fully-connected layer.
With reference to
As shown in
In this embodiment of this application, a residual value may be calculated based on the output data and ideal output data by using the foregoing method for calculating a residual value. In addition, based on the formula (1), in-situ updating is performed on a weight value stored in each memristor in a plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) for implementing computing of a convolutional layer in parallel. There are a plurality of specific implementations.
In a possible implementation, a residual value may be calculated based on output values of the plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) and a corresponding ideal output value, and based on the residual value, in-situ updating may be performed on the weight value stored in each memristor in the plurality of memristor arrays for implementing computing of the convolutional layer in parallel.
In another possible implementation, a residual value may alternatively be calculated based on a first output value of a first memristor array and a corresponding ideal output value, and based on the residual value, in-situ updating may be performed on the weight value stored in each memristor in the plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) for implementing computing of the convolutional layer in parallel. The following provides a detailed description with reference to
As shown in
Each sub-residual corresponds to output data of each of a plurality of memristor arrays for parallel computing. For example, the residual 1 corresponds to output data of a second memristor array, the residual 2 corresponds to output data of a third memristor array, and the residual 3 corresponds to output data of a fourth memristor array.
In this embodiment of this application, based on input data of each of the plurality of memristor arrays and the sub-residual in combination with the formula (2), in-situ updating is performed on a weight value stored in each memristor in the memristor array.
According to a structure of input data shown in
In this embodiment of this application, a residual value may be calculated based on the output data and ideal output data by using the foregoing method for calculating a residual value. In addition, based on the formula (1), rewriting is performed on a weight value stored in each memristor in a plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) for implementing computing of a convolutional layer in parallel. There are a plurality of specific implementations.
In a possible implementation, a residual value may be calculated based on output values of the plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) and a corresponding ideal output value, and based on the residual value, in-situ updating may be performed on the weight value stored in each memristor in the plurality of memristor arrays for implementing computing of the convolutional layer in parallel.
In another possible implementation, a residual value may alternatively be calculated based on a first output value of a first memristor array and a corresponding ideal output value, and based on the residual value, in-situ updating may be performed on the weight value stored in each memristor in the plurality of memristor arrays (for example, the second memristor array, the third memristor array, and the fourth memristor array) for implementing computing of the convolutional layer in parallel. The following provides a detailed description with reference to
As shown in
Optionally, in this embodiment of this application, weight values stored in upstream arrays of a plurality of memristor arrays for implementing computing of a convolutional layer in parallel may be further adjusted, and a residual value of each layer of neurons may be calculated in a back propagation manner. For details, refer to the method described above. Details are not described herein. It should be understood that, for upstream neural network arrays of the plurality of memristor arrays for implementing computing of the convolutional layer in parallel, input data of these arrays may be output data of further upstream memristor arrays, or may be raw data input from the outside, such as an image, a text, or a speech. Output data of these arrays is used as input data of the plurality of memristor arrays for implementing computing of the convolutional layer in parallel.
With reference to
With reference to
The set operation is used to adjust a conductance of the memristor cell from a low conductance to a high conductance, and the reset operation is used to adjust the conductance of the memristor cell from the high conductance to the low conductance.
As shown in
As shown in
There are a plurality of specific implementations for adjusting the conductance of the target memristor cell. For example, a fixed quantity of programming pulses may be applied to the target memristor cell. For another example, a programming pulse may alternatively be applied to the target memristor cell in a read-while-write manner. For another example, different quantities of programming pulses may alternatively be applied to different memristor cells, to adjust conductance values of the memristor cells.
Based on the set operation and the reset operation, and with reference to
In this embodiment of this application, the target data may be written into the target memristor cell based on an incremental step pulse programming (ISPP) policy. Specifically, according to the ISPP policy, the conductance of the target memristor cell is generally adjusted in a “read verification-correction” manner, so that the conductance of the target memristor cell is finally adjusted to a target conductance corresponding to the target data.
Referring to
Referring to
It should be understood that Vread may be a read voltage pulse less than a threshold voltage, and Vset or Vreset may be a read voltage pulse greater than the threshold voltage.
In this embodiment of this application, the conductance of the target memristor cell may be finally adjusted in the read-while-write manner to the target conductance corresponding to the target data. Optionally, a terminating condition may be that conductance increase amounts of all selected components in the row meet a requirement.
Step 2210: Determine, based on neural network information, a network layer that needs to be accelerated.
In this embodiment of this application, the network layer that needs to be accelerated may be determined based on one or more of the following: a quantity of layers of the neural network, parameter information, a size of a training data set, and the like.
Step 2215: Perform offline training on an external personal computer (PC) to determine an initial training weight.
A weight parameter on a neuron of the neural network may be trained on the external PC by performing steps such as forward computing and backward computing, to determine the initial training weight.
Step 2220: Separately map the initial training weight to a neural network array that implements parallel acceleration of network layer computing and a neural network array that implements non-parallel acceleration of network layer computing in an in-memory computing architecture.
In this embodiment of this application, the initial training weight may be separately mapped to at least one in-memory computing unit in a plurality of neural network arrays in the in-memory computing architecture based on the method shown in
The plurality of neural network arrays may include the neural network array that implements non-parallel acceleration of network layer computing and the neural network array that implements parallel acceleration of network layer computing.
Step 2225: Input a set of training data into the plurality of neural network arrays in the in-memory computing architecture, to obtain an output result of forward computing based on actual hardware of the in-memory computing architecture.
Step 2230: Determine whether accuracy of a neural network system meets a requirement or whether a preset quantity of training times is reached.
If the accuracy of the neural network system meets the requirement or the preset quantity of training times is reached, step 2235 may be performed.
If the accuracy of the neural network system does not meet the requirement or the preset quantity of training times is not reached, step 2240 may be performed.
Step 2235: Training ends.
Step 2240: Determine whether the training data is a last set of training data.
If the training data is the last set of training data, step 2245 and step 2255 may be performed.
If the training data is not the last set of training data, step 2250 and step 2255 may be performed.
Step 2245: Reload training data.
Step 2250: Based on a proposed training method for parallel training of an in-memory computing system, perform on-chip in-situ training and updating on conductance weights of parallel acceleration arrays or other arrays through computing such as back propagation.
For a specific updating method, refer to the foregoing description. Details are not described herein.
Step 2255: Load a next set of training data.
After the next set of training data is loaded, the operation in step 2225 continues to be performed. That is, the loaded training data is input into the plurality of neural network arrays in the in-memory computing architecture, to obtain an output result of forward computing based on the actual hardware of the in-memory computing architecture.
It should be understood that sequence numbers of the foregoing processes do not mean execution sequences in embodiments of this application. The execution sequences of the processes should be determined based on functions and internal logic of the processes, and should not be construed as any limitation to the implementation processes of embodiments of this application.
With reference to
It should be noted that implementation of the solutions of this application is considered in the apparatus embodiments from a perspective of a product and a device. Some content of the apparatus embodiments of this application and some content of the foregoing described method embodiments of this application are corresponding to or complementary to each other. The content is universal in terms of implementation of the solutions and support for a scope of the claims.
The following describes an apparatus embodiment of this application with reference to
As shown in
a processing module 2310, configured to input training data into the neural network system to obtain first output data, where the neural network system includes a plurality of neural network arrays, each of the plurality of neural network arrays includes a plurality of in-memory computing units, and each in-memory computing unit is configured to store a weight value of a neuron in a corresponding neural network;
a calculation module 2320, configured to calculate a deviation between the first output data and target output data; and
an adjustment module 2330, configured to adjust, based on the deviation, a weight value stored in at least one in-memory computing unit in some neural network arrays in the plurality of neural network arrays, where the some neural network arrays are configured to implement computing of some neural network layers in the neural network system.
Optionally, in a possible implementation, the plurality of neural network arrays include a first neural network array and a second neural network array, and input data of the first neural network array includes output data of the second neural network array.
In another possible implementation, the first neural network array includes a neural network array configured to implement computing of a fully-connected layer in the neural network.
Optionally, in another possible implementation, the adjustment module 2330 is specifically configured to:
adjust, based on input data of the first neural network array and the deviation, a weight value stored in at least one in-memory computing unit in the first neural network array.
Optionally, in another possible implementation, the plurality of neural network arrays further include a third neural network array, and the third neural network array and the second neural network array are configured to implement computing of a convolutional layer in the neural network in parallel.
Optionally, in another possible implementation,
the adjustment module 2330 is specifically configured to:
adjust, based on input data of the second neural network array and the deviation, a weight value stored in at least one in-memory computing unit in the second neural network array; and
adjust, based on input data of the third neural network array and the deviation, a weight value stored in at least one in-memory computing unit in the third neural network array.
In another possible implementation,
the adjustment module 2330 is specifically configured to:
divide the deviation into at least two sub-deviations, where a first sub-deviation in the at least two sub-deviations corresponds to the output data of the second neural network array, and a second sub-deviation in the at least two sub-deviations corresponds to output data of the third neural network array;
adjust, based on the first sub-deviation and input data of the second neural network array, a weight value stored in at least one in-memory computing unit in the second neural network array; and
adjust, based on the second sub-deviation and input data of the third neural network array, a weight value stored in at least one in-memory computing unit in the third neural network array.
Optionally, in another possible implementation, the adjustment module 2330 is specifically configured to determine a quantity of pulses based on an updated weight value in the in-memory computing unit, and rewrite, based on the quantity of pulses, the weight value stored in the at least one in-memory computing unit in the neural network array.
It should be understood that the neural network system 2300 herein is embodied in a form of a functional module. The term “module” herein may be implemented in a form of software and/or hardware. This is not specifically limited. For example, the “module” may be a software program, a hardware circuit, or a combination thereof that implements the foregoing functions. When any one of the foregoing modules is implemented by using software, the software exists in a form of computer program instructions, and is stored in a memory. A processor may be configured to execute the program instructions to implement the foregoing method procedures. The processor may include but is not limited to at least one of the following computing devices that run various types of software: a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a microcontroller unit (MCU), an artificial intelligence processor, and the like. Each computing device may include one or more cores configured to perform an operation or processing by executing software instructions. The processor may be an independent semiconductor chip, or may be integrated with another circuit to constitute a semiconductor chip. For example, the processor may constitute a system on chip (SoC) with another circuit (for example, an encoding/decoding circuit, a hardware acceleration circuit, or various bus and interface circuits). Alternatively, the processor may be integrated into an application-specific integrated circuit (ASIC) as a built-in processor of the ASIC, and the ASIC integrated with the processor may be independently packaged or may be packaged with another circuit. The processor includes a core configured to perform an operation or processing by executing software instructions, and may further include a necessary hardware accelerator, for example, a field programmable gate array (FPGA), a programmable logic device (PLD), or a logic circuit that implements a special-purpose logic operation.
When the foregoing modules are implemented by using the hardware circuit, the hardware circuit may be implemented by a general-purpose central processing unit (CPU), a microcontroller unit (MCU), a micro processing unit (MPU), a digital signal processor (DSP), and a system on chip (SoC), or may be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be a complex programmable logic device (CPLD), a field programmable gate array (FPGA), generic array logic (GAL), or any combination thereof. The PLD may run necessary software or does not depend on software to execute the foregoing method.
All or some of the foregoing embodiments may be implemented by software, hardware, firmware, or any combination thereof. When software is used to implement embodiments, all or some of the foregoing embodiments may be implemented in a form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the program instructions or the computer programs are loaded and executed on a computer, the procedure or functions according to embodiments of this application are all or partially generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or another programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium. The semiconductor medium may be a solid-state drive.
It should be understood that the term “and/or” in this specification describes only an association relationship between associated objects and represents that three relationships may exist. For example, A and/or B may represent the following three cases: only A exists, both A and B exist, and only B exists. A and B may be singular or plural. In addition, the character “/” in this specification usually represents an “or” relationship between the associated objects, or may represent an “and/or” relationship. A specific meaning depends on a context.
In this application, “at least one” refers to one or more, and “a plurality of” refers to two or more. “At least one of the following items (pieces)” or a similar expression thereof means any combination of these items, including any combination of singular items (pieces) or plural items (pieces). For example, at least one (piece) of a, b, or c may represent: a, b, c; a and b; a and c; b and c; or a, b, and c, where a, b, and c may be singular or plural.
It should be understood that sequence numbers of the foregoing processes do not mean execution sequences in embodiments of this application. The execution sequences of the processes should be determined based on functions and internal logic of the processes, and should not be construed as any limitation to the implementation processes of embodiments of this application.
A person of ordinary skill in the art may be aware that, in combination with the examples described in embodiments disclosed in this specification, units and algorithm steps can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by the hardware or the software depends on particular applications and design constraint conditions of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.
It may be clearly understood by the person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiments. Details are not described herein again.
In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiments are merely examples. For example, division into the units is merely logical function division and may be other division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or may not be performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected depending on actual requirements to achieve the objectives of the solutions in embodiments.
In addition, functional units in embodiments of this application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.
When the functions are implemented in the form of a software function unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this application essentially, or the part contributing to the conventional technology, or some of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or some of the steps of the methods described in embodiments of this application. The foregoing storage medium includes any medium, for example, a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disc, that can store program code.
The foregoing description is merely a specific implementation of this application, but is not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
201911144635.8 | Nov 2019 | CN | national |
This application is a continuation of International Application No. PCT/CN2020/130393, filed on Nov. 20, 2020, which claims priority to Chinese Patent Application No. 201911144635.8, filed on Nov. 20, 2019. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/130393 | Nov 2020 | US |
Child | 17750052 | US |