The present invention relates to a scalar product circuit and a method for computing binary scalar products of an input vector and weight vectors, an associated module, as well as a processing unit and a computer program for carrying out the method.
In many computationally intensive tasks, in particular in artificial intelligence applications or machine learning applications that use neural networks, scalar products of vectors are determined. For example, the convolutions in a convolutional neural network, referred to below as CNN, are scalar products of vectors. To carry out such vector operations quickly and efficiently, vector-matrix multipliers in the form of electronic circuits specifically provided for this purpose are used.
In these vector-matrix multipliers, also referred to as “dot-product engines,” a vector of input voltages is converted into a vector of output voltages using an arrangement of memristors in the form of a matrix, which are situated at intersection points of mutually orthogonally extending lines and which connect the intersecting lines in pairs, the output voltages in each case being proportional to the scalar product (“dot product”) of the vector of the input voltages and the conductivities of the memristors situated in a column. The input voltages are applied to the row lines extending in one direction, and result in currents, across the memristors, into the column lines that extend orthogonally thereto and that are connected to a ground potential. The currents are converted into the output voltages with the aid of transimpedance amplifiers. Such circuits may in each case reach values of several hundred or several thousand rows and columns.
According to the present invention, a scalar product circuit and a method for computing binary scalar products of an input vector and weight vectors, an associated module, as well as a processing unit and a computer program for carrying out the method, are provided. Advantageous embodiments of the present invention are disclosed herein.
The scalar product circuit according to an example embodiment of the present invention or the associated method employs the measure of providing weight ranges in the matrix circuits which include bit sections for each bit of a weight element, which are stored in the memory cells of a column of the bit sections. The first and second memory states correspond to the two possible values of a bit. The bits of the input elements of the input vectors may be applied to the row line, voltages of 0 V or that correspond to the predetermined voltage value in turn corresponding to the two possible values of a bit. The currents generated in the column lines or their current intensity are/is determined by the analog-to-digital converters as a binary value, and are/is shifted by the bit shifting units corresponding to the values of the bits of the weight elements and of the bits of the input elements, so that with a subsequent addition by the adders, an arithmetically correct value for the scalar product of the input vector and the weight vector is obtained as a binary value.
By use of the present invention, the problems associated with analog vector multiplication, in conjunction with the problems explained in greater detail with reference to
Preferably at least one matrix circuit, more preferably all matrix circuits, include (s) multiple weight ranges. Regardless, at least one weight range, more preferably all weight ranges, include(s) multiple bit sections. In addition, likewise regardless of the number of weight ranges and/or bit sections, preferably at least one bit section, more preferably all bit sections, include (s) multiple columns.
Each of the memory cells is preferably configured in such a way that when the predetermined voltage value is present, the current intensity of the current that is conducted into the column line when the memory cell is in the second memory state is greater, by a multiple, than the current intensity of the current that is conducted into the column line when the memory cell is in the first memory state; the multiple is preferably at least 100, more preferably at least 1,000. Clearly distinguishable current intensity values at the column lines may thus be achieved.
According to an example embodiment of the present invention, each of the memory cells is also preferably configured in such a way that when the memory cell is in the first state, no current is conducted into the column line connected thereto. The memory cells thus allow the implementation of a logical AND operation.
The memory cells preferably include a memristor and/or a semiconductor switching element, in particular a ferroelectric field effect transistor or a field effect transistor with a floating gate. This allows the scalar product circuit to be manufactured using conventional technologies.
According to an example embodiment of the present invention, multiple matrix circuits are preferably provided, the bit shifting units from one weight range in each case being connected to the same adder in two or more of the matrix circuits. Scalar products of input vectors and weight vectors having a length that is greater than the number of rows of a matrix circuit may be computed in this way. For example, if a matrix circuit includes m rows, and the bit shifting units of weight ranges in k matrix circuits are connected to the same adder, scalar products of vectors having a maximum length of m k may be directly computed. If scalar products of even longer vectors are to be computed, for portions of the vectors it is possible for partial sums to be initially formed by the vector multiplication circuit and subsequently added. The weights for the various portions may be stored in various columns of the weight ranges.
According to an example embodiment of the present invention, a voltage generation element is preferably provided for each row line, and is connected to the row line and configured, as a function of a predefined input signal which may be present in two different value ranges, to generate a voltage of 0 V or a voltage having the predetermined voltage value and apply it to the row line. The two different values of the input signal correspond to the two possible values of a bit. Voltage generation elements are advantageous when the input signal has voltage values that are not suitable for direct processing by the matrix circuit or the memory cells (for example, excessively small or large voltages, or because the input signal within the value ranges has excessively large fluctuations).
According to an example embodiment of the present invention, the analog-to-digital converters are preferably configured to determine the binary values using 5 bits, preferably 4 bits, more preferably 3 bits or fewer. The analog-to-digital converters may be constructed with a simple design due to the small bit width.
A method according to an example embodiment of the present invention is provided for computing binary scalar products of one or multiple input vectors, each including binary input elements, and one or multiple predetermined first weight vectors, each including binary weight elements, using a scalar product circuit, including adders that correspond to the number of predetermined first weight vectors, one or multiple weight ranges situated in various matrix circuits of the scalar product circuit being assigned to each adder, each adder being connected to the bit shifting elements that are connected via the analog-to-digital converters to the bit sections included in the weight ranges assigned to the adder, including
With the aid of the method according to the present invention, for example in a convolutional neural network (CNN), convolutions including multiple convolution kernels (corresponding to the number of first weight vectors) may be simultaneously computed in a layer.
According to an example embodiment of the present invention, the method preferably also encompasses computing second binary scalar products of the one or multiple input vectors and one or multiple predetermined second weight vectors which in each case include binary weight elements, the number of second weight vectors being less than or equal to the number of first weight vectors; including
If a neural network, in particular a CNN, includes multiple layers, the convolution kernels of the various layers may be stored as weight vectors in different columns of the same scalar product circuit.
A module according to an example embodiment of the present invention includes a scalar product circuit according to the present invention and a processing unit, connected thereto, that is configured to carry out all method steps of a method according to the present invention. Such a module or processing module may be used, for example, to speed up artificial intelligence applications that are based on neural networks. Such a module, for example in a computer, may be used, for example, as a plug-in module, or also in a control unit of a motor vehicle or of a machine.
A processing unit according to an example embodiment of the present invention is configured, in particular by programming, to carry out a method according to the present invention.
In addition, the implementation of a method according to the present invention in the form of a computer program or computer program product including program code for carrying out all method steps is advantageous, since this incurs particularly low costs, in particular when an executing control unit is also utilized for further tasks and therefore is present anyway. Suitable data media for providing the computer program are in particular magnetic, optical, and electric memories, for example hard disks, flash memories, EEPROMs, DVDs, and others. In addition, downloading a program via computer networks (internet, intranet, etc.) is possible.
Within the scope of the present patent application, the terms “connection,” “is connected,” and the like are to be understood in the sense of electrically conductive connections unless stated otherwise; for example, when a switching element is provided in a connection, the connection may be disconnected and reconnected.
Further advantages and embodiments of the present invention result from the description and the figures.
The present invention is schematically illustrated based on exemplary embodiments in the figures, and is described below with reference to the figures.
The vector-matrix multiplier also includes a row line 4 for each row of the arrangement in the form of a matrix, and includes a column line 6 for each column. Memristors 2 are situated at the intersection points of mutually perpendicularly extending row lines and column lines, and in each case connect a row line to a column line, which otherwise are not connected.
When voltages are applied to the row lines, currents flow from row lines 4, through memristors 2, and into column lines 6. This is illustrated for one column and two rows in
The total current of each column is generally converted into an output voltage Ua with the aid of a transimpedance amplifier 8. Transimpedance amplifier 8, illustrated here by way of example and being conventional, includes an operational amplifier 10 whose inverting input is connected to the column line and whose noninverting input is at ground, and a resistor 12 via which the operational amplifier is provided with negative feedback, so that output voltage Ua is given as Ua=−R·I, where R is the resistance value of resistor 12. At the inverting input of operational amplifier 10, transimpedance amplifier 8 generates a so-called “virtual ground” which, due to the high open-loop gain of the operational amplifier (100,000, for example), differs only slightly from the ground potential (only approximately 50 μV, for example, when voltages U1, U2 are in the range of approximately 5 V), so that the ground potential (i.e., the virtual ground) is present at the end of the column line, as determined by measurement, as is necessary for the functioning of the circuit.
The voltages at the row lines are typically generated from digital signals with the aid of digital-to-analog converters 14. Likewise, the output voltages at the column lines are typically converted back into a digital signal with the aid of sample and hold elements 16 and an analog-to-digital converter 18. To achieve high accuracy and to allow a large value range for the input signals and output signals to be covered, digital-to-analog converters and analog-to-digital converters including a correspondingly large number of bits are thus necessary.
This circuit not according to the present invention has several disadvantages: The line resistances of the row lines or column lines between individual cells decrease the accuracy of the vector-matrix multiplication, since these line resistances, the same as the memristors, influence the intensity of the currents. In addition, a fairly high current in a column line results in a fairly large voltage drop along the column line, which results in inaccuracies, since the computation is based on the potential of a column line corresponding to the ground potential. This likewise applies to the row lines; the higher the current, the greater is the voltage drop along the row lines, so that the input voltages of individual memory cells are shifted. Furthermore, high energy consumption and associated waste heat may occur as a function of the weights. Relatively costly or complicated digital-to-analog converters and analog-to-digital converters are necessary. The larger the matrix, the greater are these disadvantages.
In addition, row lines 26 and column lines 28 of the matrix circuit that extend through weight range 20 are illustrated (here as well, only a few are representatively provided with reference numerals). The same as for the memory cells, once again only a few row lines (namely, the first two row lines and the last row line of the matrix circuit) and a few column lines of all row lines or column lines that extend through the weight range are illustrated. In general, a matrix circuit includes multiple weight ranges, the row lines extending through all weight ranges of the matrix circuit (i.e., are the same for all weight ranges of the matrix circuit), so that voltages applied to the row lines are present at the memory cells of all weight ranges connected thereto.
Each of memory cells 24 is connected to one row line 26 and to one column line 28, and is configured to conduct a current into the column line, the intensity of the current being a function of a memory state of the memory cell and a voltage that is present at the row line connected to the memory cell. The row lines are not directly connected to the column lines, and instead are connected only indirectly via the memory cells. Each memory cell includes two different memory states, a first memory state and a second memory state; when the voltage present at the row line that is connected to the memory cell is zero (0 V), regardless of whether the memory cell is in the first or the second memory state, no current is to flow (current intensity equals zero); i.e., no current flows or is conducted into the column line connected to the memory cell.
Weight range 20 is divided into multiple sections, each of which includes one or multiple columns or column lines. These sections are referred to as bit sections 30, three bit sections being illustrated in the figure as an example. Various bit sections 30 may include a different number of columns, it being preferred that all bit sections in a weight range have the same number of columns. In the figure, three columns are representatively illustrated in each bit section 30, and possible further columns are once again indicated by dots (of course, the number of columns in a bit section may also be less than three). The memory cells connected to a column line in the same bit section are configured in such a way that when they are in the second memory state, and a voltage having a predetermined voltage value not equal to zero is present at the particular row lines, the current conducted from each memory cell of the bit section into the particular column line has the same current intensity.
Preferably all memory cells of a bit section, more preferably all memory cells of a weight range, even more preferably all memory cells of a matrix circuit, most preferably all memory cells of the scalar product circuit, have the same design or the same properties; i.e., they have the same memory states, and the current intensity of the current that is conducted into the particular column line is in each case the same for the same memory state and the same voltage.
The memory cells, as in
The on/off ratio is preferably at least 100, more preferably at least 1,000, with an upper limit of approximately 10,000, depending on the technology of the memristors. Associated number N of bits of the analog-to-digital converter is preferably 5 or fewer (preferably 4 or fewer when the on/off ratio is in the range between 100 and less than 1,000), the maximum number of row lines of the matrix circuit being 2N−1 in each case.
According to another preferred embodiment, the memory cells are configured in such a way that when the memory cell is in the first state, no current is conducted into the column line connected thereto, at least as long as the voltage present at the row line is in a predetermined voltage range that includes 0 V and the predetermined voltage value. This voltage range is determined by the configuration of the memory cell, and in a manner of speaking represents the working range of the memory cell, programming voltages outside this voltage range being used for typical memory cells. If the memory cell is in the second memory state, a current having a current intensity different from zero flows when the applied voltage is sufficiently different from zero, in particular when the applied voltage has the predetermined voltage value. In this way, the memory cells implement a logical AND operation between the memory state and the applied voltage, the first memory state corresponding to a logical 0, the second memory state corresponding to a logical 1, an applied voltage of essentially 0 V corresponding to a logical 0, and an applied voltage essentially not equal to 0 V, in particular having the predetermined voltage value, corresponding to a logical 1. “Essentially” here refers to the fact that corresponding voltage ranges may be present that are correspondingly interpreted, which once again depends on the precise configuration of the memory cells.
An implementation of corresponding memory cells that perform an AND operation may take place, for example, using semiconductor switching elements, including a (programmable) memory state, that switch between a conductive state and a nonconductive state as a function of a voltage that is present at a control connection of the semiconductor switching element and of the memory state. The control connection is then connected to the row line, an output of the semiconductor switching element is connected to the column line, and an input of the semiconductor switching element is connected to a power source, the semiconductor switching element switching the path between the input and the output between the conductive state and the nonconductive state.
In particular, ferroelectric field effect transistors (FeFETs) or floating gate metal oxide field effect transistors (FGMOSs) may be used as semiconductor switching elements. In both, the threshold voltage may be shifted by programming so that memory states may be implemented. For FeFETs, a ferroelectric material whose polarization shifts the threshold voltage is provided between the gate electrode of the FeFET and the source-drain path. The memory state corresponds to the polarization of the ferroelectric material. For FGMOSs, an isolated so-called floating gate, in which an electrical charge via which the threshold voltage is shifted may be stored, is provided between the gate electrode and the source-drain path. The memory state then corresponds to the stored charge. In both cases, the programming takes place by applying suitable (relatively high) programming voltages.
If field effect transistors (FETs) including various memory states are used, in particular FeFETs or FGMOSs, for each column, in addition to the column line a power supply line may be provided which is connected to a power source or voltage supply.
In
The column selection switching elements may be implemented in various ways (also as a function of the design of the memory cells). For example, in the column lines in the connection between the memory cells and the particular analog-to-digital converter, switching elements may be provided which may separate and establish (or close) this connection; these may be semiconductor switching elements, in particular field effect transistors, whose control connection is activated via control lines to be correspondingly provided. In addition, a corresponding switching element may be provided in each memory cell, in which case the control connections of all these switching elements within a column are connected to a control line that is to be provided. If memristors are used, these switching elements provided in each memory cell may, for example, be connected in series with the memristors (i.e., for FETs the drain-source path is connected in series with the memristors). If FeFETs or FGMOSs whose drain terminal is connected to a power supply line provided in each column, as described above, are used as memory cells, switching elements (in particular semiconductor switching elements, for example FETs) that may separate and establish the connection of the power supply lines to the power source may be situated at the power supply lines. These options are merely examples; further embodiments of the column selection switching elements are likewise possible, depending on the design of the memory cells or the matrix circuit.
In the connection between the column lines and analog-to-digital converter 32, a transimpedance amplifier (not illustrated, cf.
The transimpedance amplifier may also be regarded as part of the analog-to-digital converter. In general (within the meaning of the present patent application), an “analog-to-digital converter” is thus understood to mean a current intensity measuring element that determines the current intensity of an electrical current flowing at an input of the current intensity measuring element and outputs it as a binary value (in units determined by the analog-to-digital converter), a ground potential or a virtual ground at the same time being provided at the input. This ground potential is the reference potential on which voltages, in particular the voltages present at the row lines, are based.
In the specific embodiment in
Bit shifting units 34 of weight range 20 are in turn connected to adder 22; i.e., bit shifting units 34 transfer the “shifted” binary values to adder 22. Adder 22 is configured to add up binary values that it receives, i.e., the shifted binary values that are transferred by bit shifting units 34, and to form a summed binary value.
When the scalar product circuit or the matrix circuit is used, i.e., during the computation of scalar products of input vectors and predetermined weight vectors, in each case the bits of a weight element of a weight vector given as a binary value are stored in each row of a weight range, in each case a bit of the weight element being stored in each bit section, i.e., in one of the columns of the bit section. The bits of various weight elements of a weight vector are stored in the same columns, each bit section corresponding to a certain value of the bits of the binary weight elements. In other words, for each weight vector in the bit sections, in each case exactly one column is provided, in whose memory cells the bits of the binary weight elements of the weight vector having a certain value are stored. The memory cells of a row in the columns assigned to the weight vector thus store the bits of a weight element.
“Storing a bit” means that when the bit has the value 0, programming of the first memory state is carried out in the memory cell, and when the bit has the value 1, programming of the second memory state is carried out in the memory cell. In the present context, a “vector” is a set or a tuple of multiple numerical values that are ordered in a sequence, each of these numerical values representing an “element” of the vector. As is customary, the sum over the products of mutually assigned elements from the two vectors, corresponding to their sequence, is referred to as the “scalar product” of two vectors having the same length (i.e., having the same number of elements).
The bits of further weight vectors that are assigned to a different layer of a CNN, for example, are stored in further columns of the bit sections.
During use, the columns assigned to a weight vector are initially activated (the columns not assigned to the weight vector are not activated). The bits of the input elements of the input vector are then applied to the row lines in sequence in multiple passes (corresponding to the number of bits of the input elements); i.e., a voltage corresponding to the value of the bit is generated and applied to the corresponding row line. When the value of the bit is 0, a voltage of 0 V is generated, and when the value of the bit is 1, a voltage corresponding to the predetermined voltage value is generated.
In each pass, each of analog-to-digital converters 32 determines a binary value that corresponds to the current intensity of the current in the particular column line (of which only one is activated in the bit sections). These binary values are shifted by bit shifting units 34 by the number of bits that are predefined for each of the bit shifting units in each pass; i.e., the bits of the binary values are shifted by the particular predefined number. The number, predefined for a bit shifting unit, by which the bits are shifted is equal to sum i+j of value i of the bits of the input elements, which are applied to the row lines in the particular pass, and value j of the bits of the weight elements that are stored in the bit section to which the bit shifting unit is assigned. Thus, in principle the binary value is multiplied by the factor 2i+j to form the shifted binary value.
The “value” of a bit bi of a binary value B that is described as follows (2-ary representation),
where N is an integer greater than or equal to 0 and bi may assume the values 0 or 1, is defined by index i.
In each pass the shifted binary values are transferred to adder 22, which adds them up within each pass and over the multiple passes to form a summed binary value, which after the last pass (i.e., after the last bits of the input vector have been applied) is the scalar product of the input vector and the weight vector. Prior to the first pass, the summed binary value in the adder should obviously be set to zero.
A matrix circuit preferably includes multiple weight ranges through which the same row lines extend. Thus, multiple weight vectors may be stored, and at the same time the scalar product with the same input vector may be formed. Each weight range is then connected to a different adder. This is illustrated in
Each of weight ranges 20 is designed as described in conjunction with
An analog-to-digital converter 32 (current intensity measuring element as described above) and a bit shifting element 34 connected thereto are assigned to each bit section 30, analog-to-digital converter 32 being connected to column lines 28 of bit section 30. The statements made in conjunction with
Each weight range 20 is thus designed as described in conjunction with
Analog-to-digital converters are preferably used with the fewest possible bits, in particular with only 4, 3, or 2 bits, which is advantageous since a simple design of the analog-to-digital converters is thus made possible. This implies that the corresponding matrix circuit is allowed to include only a limited number of rows in order to be able to differentiate between all possible different current intensity values at the row lines (for an n-bit analog-to-digital converter, the number of rows should be less than or equal to 2n−1). However, filter kernels used in CNNs (i.e., weight vectors) often include significantly more entries, for example several hundred or even more than one thousand entries. For their processing, a scalar product circuit including multiple matrix circuits may preferably be used, as shown in
Each matrix circuit 40 in
The scalar product circuit includes multiple adders 22, in the present case it being important that each adder 22 is connected to bit shifting units 34 of multiple weight ranges situated in various matrix circuits, here with the bit shifting units in two weight ranges 20 in each case that are situated in the two matrix circuits 40. If more matrix circuits are present, an adder may be connected to the bit shifting units of (exactly) one weight range in each of the multiple matrix circuits. The connections are illustrated here as extending through the spaces between the weight ranges. In an actual implementation in a chip, the connections extend, for example, in a different level of the chip.
The weight elements of a weight vector that is too long for an individual matrix circuit may be divided over the multiple matrix circuits, the weight elements being stored in weight ranges that are connected to the same adder. The bits of the input elements of the input vectors are applied to the corresponding row lines of the multiple matrix circuits.
The scalar product circuit includes adders corresponding to the number of predetermined weight vectors, i.e., at least the same number of adders as weight vectors, one or multiple weight ranges that are situated in various matrix circuits of the scalar product circuit when multiple weight ranges are assigned being assigned to each adder, each adder being connected to the bit shifting elements that are connected via the analog-to-digital converters to the bit sections that are included in the weight ranges assigned to the adder. This corresponds to a scalar product circuit as described in conjunction with
In the method, an adder is initially assigned to each weight vector in step 102, and those weight ranges that are assigned to the adder that is assigned to the particular weight vector are subsequently assigned to each weight vector.
The bits of the binary weight elements are stored in step 104. For each weight vector, the bits of the weight elements are stored in memory cells contained in each case in a column of a bit section of a weight range assigned to the weight vector. The bits of a weight element are stored in each case in a row, and the bits of various weight elements of the weight vector that have the same value and that are stored in the same weight range are stored in the same bit section of this weight range. “Storing a bit in a memory cell” means that the memory cell is placed in the first memory state when the bit has the value 0, and the memory cell is placed in the second memory state when the bit has the value 1.
The columns in which bits of the weight elements have been stored are activated in step 106, and the other columns are not activated.
One of the input vectors for which the scalar products are to be computed is subsequently selected, and for this input vector the summed binary values of the adders are set to zero in step 108. The bits of the input elements of the particular selected input vector, having the same value, are selected in step 110.
Voltages corresponding to the bits are applied to the row lines (this may also be referred to as applying bits) in step 112, bits of various input elements being applied to various row lines. A voltage of 0 V is applied when the particular bit has the value 0, and a voltage having the predetermined voltage value is applied when the particular bit has the value 1.
Binary values that correspond to the current intensities of the currents in the activated column lines, as described above, are determined by the analog-to-digital converters in step 114.
These binary values are shifted by the bit shifting units in step 116 in order to obtain shifted binary values, the number of bits by which the binary value is to be shifted being predefined for each bit shifting unit. The predefined number of bits is determined as the sum of the value of the bits of the input elements, corresponding to which voltages are applied, and the value of the bits of the weight elements that are stored in the bit section to which the particular bit shifting unit is connected via the analog-to-digital converter.
The shifted binary values are added by the adders in step 118, it being clear that each adder adds those shifted binary values that are determined and transferred by the bit shifting units that are connected to the particular adder.
It is checked in step 120 whether steps 110 through 118 have already been carried out for all values for the bits of the input elements. If this is not the case, the method continues with step 110 (selection of bits having a certain value), bits having a different value not yet used being selected. If this is the case, the summed binary values are read out in step 122 as binary scalar products of the selected input vector and the weight vector.
It is checked in step 124 whether scalar products for all input vectors have already been computed (steps 108 through 122). If this is not the case, the method continues anew with step 108 (zeroing), an input vector being selected for which the scalar products have not yet been computed. If this is the case, the method is ended in step 126.
As mentioned, further weight vectors may be stored in other columns of the bit sections, the procedure as described above in step 104 being followed when storing these weight vectors. These columns may then be activated in order to form scalar products of these further weight vectors and possibly other input vectors.
Number | Date | Country | Kind |
---|---|---|---|
10 2020 211 818.3 | Sep 2020 | DE | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/075407 | 9/16/2021 | WO |