The present invention relates to a method for the analogue multiplication and/or for the analogue calculation of a scalar product, formed by multiplication of a first value by a second value of a respective value pair, and summation of results of the multiplication for a plurality of value pairs, with a circuit assembly. The invention also relates to the application of the method in an artificial neural network (ANN).
In electronic signal processing, most circuit parts in the digital domain are nowadays implemented with the aid of CMOS technology and binary static logic. Analogue-to-digital conversion (ADC) and digital-to-analogue conversion (DAC) are moved to the edges of the system as much as possible. This approach of mainly digital signal processing has benefited greatly from the previous scaling of semiconductor technology, i.e. from Moore's Law. The technology-driven efficiency gains have offset the ever-increasing demand for processing power. However, Moore's Law has slowed down considerably in more recent times, so that this offset is at risk in the future, especially with the increasing demand for signal processing power in the field of artificial intelligence (AI). In particular, the demands on computing power of deep neural networks (DNN) are increasing much faster than the scaling gains of the underlying CMOS technology. There is therefore an urgent need for new energy-efficient signal processing techniques that can be used in artificial neural networks.
At the present time, artificial neural networks mainly use digital signal processing techniques based on CMOS technology. However, these techniques will reach their limits in the foreseeable future in terms of energy efficiency, with the constantly increasing demands on computing power.
Approaches nowadays include mixed signal processing based on CMOS technology using switched capacitor (SC) charge redistribution techniques for analogue multiplication or summation. These approaches use specially allocated SRAM arrays to store the input values (activations) and weight factors (weights) and a neuron array for the calculation of the scalar product of the input and weight vectors. In the layout of the corresponding integrated circuit, the memory array and neuron array represent locally separated units. For the further reduction of the energy consumption for data transport between these units, it is also of known art to integrate both units into a single unit. This approach is called in-memory processing.
M. Bavandpour et al, “Mixed-Signal Neuromorphic Inference Accelerators: Recent Results and Future Prospects”, in 2018 IEEE International Electron Devices Meeting (IEDM), provide an overview of circuit assemblies for vector-matrix multiplication (VMM). In one of these circuit assemblies, each weight factor is stored in floating-gate cells, which are implemented as a voltage-controlled current source. Here both the input and output values are encoded as pulse widths of voltage pulses.
The object of the present invention is to specify a method for the multiplication and/or formation of a scalar product, which can be implemented in CMOS technology, and enables energy-efficient operation, together with an artificial neural network in which the method is used.
The object is achieved with the methods according to claims 1 and 2, together with the artificial neural networks according to claims 10 and 11. Advantageous configurations of the methods and the artificial neural networks are the subject matter of the dependent patent claims, or can be found in the following description and the examples of embodiment.
In the proposed method for analogue multiplication, as in the method for analogue calculation of a scalar product, a circuit assembly is used, which has a series circuit comprising a first FET and a second FET, or an FET array comprising a plurality of parallel-connected second FETs, serving as a current source, a charging device, and at least one capacitance, which can be precharged by way of the charging device, and can be discharged by way of the series circuit comprising the first FET and the at least one second FET, or FET array. The charging device can be formed simply in terms of a switch, by way of which the capacitance can be connected to a voltage source. To multiply a first value by a second value, the capacitance is first precharged. Here the first value, encoded as the pulse width of a voltage pulse, is applied to the gate of the first FET, and the second value, encoded as a voltage amplitude, is applied to the gate of the second FET, or, encoded as binary voltage amplitudes, to the gates of the parallel-connected second FETs, so that the capacitance is at least partially discharged for a period of time, which is specified by the pulse width of the voltage pulse applied to the gate of the first FET, with a discharge current, which is specified by the voltage amplitude(s) applied to the gate of the second FET, or to the gates of the parallel-connected second FETs. The result of the multiplication can then be determined either from the residual charge or voltage of the capacitance, or—in a configuration described further below, taking into account the sign of the first value—from a voltage difference or charge difference between this capacitance and a further capacitance.
For the calculation of the scalar product, an analogue circuit assembly is used, which has a plurality of parallel-connected series circuits, comprising a first FET and a second FET, or an FET array comprising a plurality of parallel-connected second FETs, serving as a current source, a charging device, and at least one capacitance, which can be precharged by way of the charging device, and can be discharged by way of the series circuits comprising the first FET and the second FET, or FET array. The calculation of a scalar product is hereby understood to be the multiplication of a first value with a second value of a value pair, and the summation of results of the multiplication for a plurality of value pairs, as is the case with the multiplication of two vectors with vector components (as values) in the Cartesian coordinates system. Each of the value pairs of the scalar product is associated with one of the series circuits. The number of series circuits must correspond at least to the number of value pairs of the scalar product, or the vector components of the vectors that are to be multiplied with each other. The capacitance is again first precharged for the calculation of the scalar product. For each of the value pairs, the first value, encoded as the pulse width of a voltage pulse, is applied to the gate of the first FET of the series circuit associated with the respective value pair, and the second value, encoded as the voltage amplitude, is applied to the gate of the second FET, or, encoded as binary voltage amplitudes, is applied to the gates of the second FETs of the associated parallel-connected series circuit, so that the capacitance is at least partially discharged for a period of time, which is specified by the pulse width of the voltage pulse applied to the gate of the first FET of the respective series circuit, with a discharge current, which is specified by the voltage amplitude(s) applied to the gate of the second FET, or to the gates of the parallel-connected second FETs of the respective series circuit. By virtue of the parallel connection of the series circuits, the discharge currents add up according to Kirchhoff's Law. A result of the calculation of the scalar product can then be determined either from the residual charge or voltage of the capacitance or—in the case of a configuration described further below, taking into account the sign of the first value—from a voltage or charge difference between this capacitance and another capacitance.
The proposed methods and neural networks use as field effect transistors (FET), in particular, field effect transistors with an insulating gate (“MISFET”), preferably MOSFETs (metal oxide semiconductor FETS). The capacitance to be discharged can be formed by a capacitor, or also by the parasitic capacitances of the FETs and connection lines used.
The proposed methods thus employ a circuit assembly with a non-linear transfer function, which in the basic embodiment consists of two stacked FETs connected in series and a capacitance. In what follows, this circuit assembly, by virtue of its function, is also referred to as an analogue mixed signal multiplier (AMS). The first multiplicand value is represented as the pulse width of a voltage pulse, which is applied to the gate of the first FET. The second multiplicand is encoded as an analogue voltage, which is applied to the gate of the second FET. An electrical charge packet, which is proportional to the product of the multiplicands, is accumulated on the capacitance, or subtracted from its charge. The connection of further stacked FET series circuits to the capacitance enables the calculation of a scalar product with a minimum number of components. The analogue multiplication can thereby be carried out with a low energy input, as will be explained in more detail later in the examples of embodiment.
Due to the simple construction with FETs and capacitance, the circuit assembly that is used can be implemented in CMOS technology. In contrast to the digital processing techniques with binary signals that have been preferred in electronic signal processing to date, the proposed methods use analogue-mixed signal processing, in which selected electrical nodes carry significantly more information in the analogue domain, which is physically limited only by noise and leakage. The methods enable complex arithmetic operations to be executed on individual capacitive circuit nodes, wherein as few components as possible are involved, and the number of circuit nodes required is drastically reduced compared to digital techniques. The methods and the circuit assemblies used therein can be used particularly advantageously for computing operations in neurons or neuron layers in artificial neural networks. The basic operation of artificial neurons can be mapped onto the proposed AMS circuit assembly with a small number of circuit nodes and components, and implemented with advanced CMOS foundry technology.
In a preferred application of the method for the calculation of a scalar product in an artificial neural network, the circuit assembly, comprising the parallel-connected series circuits, the charging device, and the capacitance, is in each case part of an artificial neuron. Each value pair corresponds to a weight factor and an input value of the artificial neuron. In the preferred configuration, the weight factor is selected as the first value of each value pair, and the input value is selected as the second value. Thus, the weight factors, in each case encoded as the pulse width of a voltage pulse, are applied to the gate of the first
FET, and the input values, encoded as voltage amplitudes, are applied to the gate of the second FET. The weight factors, which are usually available as digital values, must be suitably stored, preferably in appropriate SRAMs, and are in each case converted into the corresponding voltage pulses by means of a digital-time converter (DTC).
In an alternative configuration of the proposed method for the calculation of a scalar product in an artificial neural network, the input value is selected as the first value of the value pair, and the weight factor is selected as the second value. In this case, preferably not just one, but a plurality of second FETs, are used in a parallel circuit, wherein the weight factors are again appropriately stored in a digital manner. The individual binary digits of the respective weight factor—encoded as voltage amplitude—then control the individual second FETs of the FET array, i.e. the parallel circuit of the second FETs. This will be explained in more detail in the example of embodiment.
By the utilisation of two parallel branches with first FETs, which are connected to the at least one second FET, or FET array, in series, and two capacitors, it is also possible to process signed weight factors. Signed input values can also be implemented by means of an appropriate extension of the circuit topology.
With the proposed method for the calculation of the scalar product, and the circuit assembly used in the latter, a neural network can be constructed, in which the neurons are formed by the circuit assemblies. Depending on the configuration, digital-time converters (DTC) must also be implemented. Suitable transfer circuits for the displacement of the charge at the output of a neuron to the inputs of neurons of the respectively subsequent layer may also be required, depending on the configuration. An example of such a transfer circuit can be found in the following examples of embodiment.
A major advantage of the proposed methods, and the circuit assemblies used in the latter, is a very low power consumption. An approximately 500-fold increase in energy efficiency is anticipated for a 28 nm CMOS AMS, compared to a digital 8-bit x 8-bit field multiplier. The circuit assemblies used in the methods can be implemented as commercial standard CMOS technologies, which are also used for a variety of standard and application-specific integrated circuits. The proposed circuit assemblies can thus be used in a hybrid approach together with traditional analogue, RF, digital, and memory, blocks, on a single chip in a “system-on-chip approach”. An AMS IP library enables the design of an AMS co-processor IP, which can be positioned together with other blocks on an application-specific integrated circuit (ASIC), or together with standard COTS digital processor ICs, i.e. together with standard smartphone processors, so as to enable highly energy-efficient shared processing of specific ANN-related tasks. By the utilisation of established standard CMOS logic technology, power-hungry chip-to-chip interfaces for such hybrid systems with conventional digital and new analogue signal processing are avoided. The proposed methods are particularly well suited for applications that do not require exceptional precision. The development of classification tasks based on neural networks can also be supported by the improved energy efficiency of the proposed method, for example in RADAR and LIDAR object recognition for autonomous driving in automobiles, or in mobile, person-assisted speech and image recognition.
The proposed methods, in conjunction with an artificial neural network, are explained once again in more detail below by means of examples of embodiment, in conjunction with the figures. Here:
In the following examples, the proposed method, with the associated circuit assembly, is used to calculate scalar products in an artificial neural network. To this end
With the proposed method, the calculation of the scalar product that takes place in a neuron is executed in an energy-efficient manner.
In the preferred configuration, the multiplication result is evaluated as follows. The lower MOSFET Nx operates as a current source transistor, which is controlled by its analogue gate-source voltage uGs,Nx=ux, which is provided by way of an input value x, the output of the previous neuron layer. The voltage ux controls the drain current ix by way of the nonlinear transfer function Ix (Ux) in accordance with the current equation of the MOSFET. This nonlinearity is a part of the nonlinear transfer function φ of the preceding neuron layer. Since the n-channel MOSFET in the enhancement mode has a threshold voltage greater than 0, a soft rectifier-like transfer function is implemented.
The drain current ix is then drawn from the upper pole of the capacitor C only if the stacked MOSFET Nw is also conducting. By setting its gate voltage to UDD for a period of time TW corresponding to the weight factor w, the upper MOSFET NW is switched on. The charge QXW drawn from the output node and the corresponding output voltage UC are given by:
The result of the multiplication thus corresponds to the amount of charge QXW that flows through the series circuit of these two MOSFETs. The temporal relationships of the voltages and currents in this circuit assembly are shown in the left-hand part of
The artificial neuron function, i.e. a scalar product followed by a non-linear transfer function, is mapped according to simple electrical network principles (i.e. Kirchhoff's Laws) in conjunction with established FET device physics (IDS=f(UGS, UDS)). A neuron output activation is implemented along a single line with a series of multipliers.
Analogue multiplication is implemented by the use of only two small MOSFETs. The total capacitance to be charged or discharged during the multiplication process can be limited to values of only 0.6 fF for 300 nm wide MOSFETs Nx and Ny in 22 nm CMOS. This results in an energy consumption of the multiplication of 0.5 fJ at a supply voltage of 0.8 V. In contrast, the estimated operating energy of an 8-bit×8-bit field multiplier in 28 nm CMOS technology is 8×30 fJ=240 fJ (based on 30 fJ for a single 8-bit adder), resulting in an approximately 500-fold increase in energy efficiency for the proposed AMS.
In the above preferred configuration, the neuron input weight factors wi are represented by the temporal width Twi of current pulses, wherein the current amplitude Ixi represents the input activations, that is to say, the input values xi (cf.
In an alternative form of embodiment, the weight and activation inputs, and thus the roles of the lower and upper MOSFETs in the multiplier evaluation path(s) of
The advantage of the alternative form of embodiment of
The AMS multiplier circuits according to
In artificial neural networks, the activation value range is often limited to positive values. However, the weights can be positive or negative.
To implement both signed weights and signed input activations, that is to say, input values, the circuit topologies of
For the alternative configuration (
A single neural layer can be implemented by a matrix-like arrangement of a plurality of AMS multiplication cells, or an arrangement of a plurality of scalar product cells next to each other, as is exemplified in
The connection of the AMS multiplication cell to a horizontal and a vertical line, and the connection to a local weight memory (+digital-time converter (DTC)) is shown in
The matrix arrangement of AMS multiplication cells shown in
A very efficient method for transferring the analogue amplitude domain signals from the outputs back to the inputs is charge transfer. An example of a corresponding circuit for the charge transfer (transfer of a charge deficit) is shown in
Alternatively, the charge transfer can also take place by means of analogue voltage signal transfer through linear analogue buffer amplifiers, i.e. based on operational amplifiers with resistors and/or switched capacitors. Digital signal transfer by the interposition of A/D and D/A converters, preferably implemented in terms of energy-efficient SC-based conversion principles such as SAR, and supplemented by means for the processing of large neural layers and the implementation of artificial transfer functions, is also possible. This can be done, for example, by way of digital memories and blocks for digital signal processing.
In the alternative configuration of the proposed method, the output signals yi are signals in the charge (Q) or voltage (Q/C) amplitude domain, while the input signals xi are signals in the pulse width domain. The signal transfer from the matrix outputs yi to the matrix inputs xi therefore requires a charge-to-pulse width converter, as described in one of the preceding sections.
A stack of functional blocks, which are required to preload and write to the analogue horizontal lines, and to read the analogue vertical lines of the multiplication and addition matrix, is located on the west and south sides of the matrix respectively (blocks: preload and bias injection).
Neural network layers. which have more neurons than the matrix row and column numbers n and m respectively, can be supported by analogue charge transfer memory units at the southern output and/or western input edge with additional means for analogue charge addition, as represented by the blocks “transfer gate bank” and “capacitor bank” in
Energy-efficient charge transfer from the neuron layer activation outputs on the south side to the inputs of the next neuron layer on the west side can be implemented by maintaining the analogue charge domain using the analogue charge transfer circuits described above. In addition, power-efficient SC-based A/D converters can be connected to the south activation output edge, and D/A converters can be connected to the west activation input edge to enable a hybrid evaluation of the neural network, i.e. parts requiring low precision in the analogue path, and parts requiring high precision in an additional digital path. This additional digital path can also be used for the application of more specialised activation transfer functions.
Number | Date | Country | Kind |
---|---|---|---|
10 2020 133 088.0 | May 2020 | DE | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/062305 | 5/10/2021 | WO |