One dimensional systolic array architecture for neural network

Information

  • Patent Grant
  • 5799134
  • Patent Number
    5,799,134
  • Date Filed
    Monday, May 15, 1995
    29 years ago
  • Date Issued
    Tuesday, August 25, 1998
    26 years ago
Abstract
A circuit for implementing a neural network comprises a one dimensional systolic array of processing elements controlled by a microprocessor. The one dimensional systolic array can implement weighted sum and radial based type networks including neurons with a variety of different activation functions. Pipelined processing and partitioning is used to optimize data flows in the systolic array. Accordingly, the inventive circuit can implement a variety of neural networks in a very efficient manner.
Description

FIELD OF THE INVENTION
The present invention relates to a one dimensional systolic array architecture for implementing a neural network.
BACKGROUND OF THE INVENTION
A neural network comprises a relatively large number of simple processing elements which are known as neurons. The neurons are interconnected by connections known as synapses. A portion of a neural network is shown in FIG. 1. The neural network 10 of FIG. 1 comprises a plurality of neurons 12. The neurons 12 are interconnected by the synapses 14. In general, the output of any one particular neuron 12 is connected by a synapses 14 to the input of one or more other neurons 12 as shown in FIG. 1. Illustratively, the neurons 12 are arranged in layers. Three layers, 15, 16 and 17 are shown in FIG. 1. However, an arrangement of neurons into layers is not necessary.
In general, neural networks are used for a wide variety of "learning" type tasks, such as character recognition, speech recognition and dynamic routing.
In general, the output Y.sub.i of a neuron i may be represented as ##EQU1## where x.sub.j is an input signal of the neuron i which is transmitted by a synapse, for example, from the output of another neuron j.
w.sub.ij is a weight associated with a synapse from the output of the neuron j to the input of the neuron i
f(w.sub.ij,x.sub.j) equals w.sub.ij x.sub.j in the case of a weighted sum network or equals (w.sub.ij -x.sub.j).sup.2 in the case of a radial based network.
S (z) is an activation function which may be a sigmoid function, a gaussian function, a linear function, a step function which has the values S(z)=+1, z.gtoreq.0; S(z)=-1, z<0, or a square function which has the values S(z)=0, .vertline.z.vertline.>.delta.; S(z)=1, .vertline.z.vertline..ltoreq..delta., where .delta. is a constant.
FIG. 2 shows a neuron having a plurality of inputs x.sub.j, j=1, 2, . . . , delivered via synapses with weights w.sub.ij. The output of the neuron of FIG. 2 is Y.sub.i.
Table 1 below lists a variety of radial-based and weighted sum neuron types, all of which are sub-species of the general neuron described in Equation (1).
TABLE 1______________________________________Neuron Type Neuron Transfer Equation______________________________________BP ##STR1##Hopfield ##STR2##RBF ##STR3##RCE ##STR4##WTA ##STR5##MADALINE ##STR6##ADALINE ##STR7##MADALINE III ##STR8##______________________________________
In this table, S.sub.1 is a sigmoid activation function, S.sub.2 is a Gaussian activation function, S.sub.3 is a square activation function, S.sub.4 is a linear activation function, S.sub.5 is a step activation function.
A variety of circuit architectures have been proposed in the prior art for implementing neural networks. A first type of neural network circuit comprises a one dimensional or multidimensional array of microprocessors or digital signal processors which implement the functions of the neurons and synapses of a neural network. However, because the hardware architecture is not exclusively used for implementing neural networks, and some programming may be required, the speed, after implementation in an IC, is about several hundred kilo-connections per second. A neural network circuit of this type is disclosed in U.S. Pat. No. 5,204,938.
A second type of neural network circuit comprises a two dimensional array of multipliers for multiplying synaptic weights w.sub.ij and input signals x.sub.j. Generally, this architecture is implemented in VLSI using a current mode analog technique. The speed is approximately several giga connections per second. A disadvantage is that analog-to-digital and digital-to-analog converters are required for an exchange of signals with external digital systems. A neural network circuit of this type is disclosed in U.S. Pat. No. 5,087,826.
A third type of neural network or architecture comprises a one-dimensional systolic array of processing elements. The advantage of the one dimensional systolic array is that pipelining can be used to increase circuit speed. A typical implementation has a speed of several mega-connections per second. A typical architecture of this type is disclosed in U.S. Pat. No. 5,091,864.
All of the above-mentioned architectures, including the one dimensional systolic array architecture, have a significant short coming in that, these architectures can be used to implement weighted sum type neural network but cannot be used to implement a radial based neural network.
In view of the foregoing, it is an object of the present invention to provide a one dimensional systolic array implementation of a neural network which can implement weighted sum and radial based neural networks.
It is a further object of the invention to provide a neural network circuit comprising a one dimensional systolic array of processing elements which is controlled by a microprocessor (or other controller); which controller selects between radial based and weighted sum neurons by transmitting a control signal to the systolic array.
It is also an object of the invention to provide a neural network circuit in the form of a one dimensional systolic array which utilizes pipelined processing and partitioning to achieve a highly efficient data flow in the neural network circuit.
SUMMARY OF THE INVENTION
The present invention is a neural network circuit comprising a one-dimensional systolic array controlled by a controller (e.g. a microprocessor).
In a preferred embodiment, the one dimensional systolic array comprises M processing elements (PE's). The i.sup.th processing element, i=1,2, . . . ,M comprises a weight storage circuit for storing a sequence of synaptic weight w.sub.ij, j=1,2,3 . . . ,N. Each processing element also includes a processor for receiving a sequence of inputs x.sub.j and for outputting an accumulated value. ##EQU2##
The i.sup.th processing element also includes a storage element for storing g.sub.i. This storage element is the i.sup.th storage element in a shift register comprising M storage elements, one in each processing element. The one dimensional systolic array also includes an activation function circuit for receiving the accumulated values g.sub.i sequentially from the shift register and for outputting a sequence of values Y.sub.i =S(g.sub.i) , where S is an activation function.
As indicated above, the one dimensional systolic array is controlled by a microprocessor. The weight storage circuits in the processing elements are connected to the microprocessor via a weight (W) bus and an address (A) bus. The sequences of synaptic weight w.sub.ij are transmitted via the W-bus to the particular weight storage circuit indicated by the A-bus. The microprocessor also transmits a select signal to each processing element (PE-i) to choose between a weighted sum neuron or a radial based neuron (i.e., to select f(w.sub.ij,x.sub.j)). The microprocessor also receives the values Y.sub.i from the activation function circuit in order to generate subsequent sets of input values x.sub.j and updated weights w.sub.ij. The simplified set of connections between the control processor and the one dimensional systolic array is a significant advantage of the invention.
The one-dimensional systolic array takes advantage of pipelining and partitioning to achieve an efficient data flow and an efficient use of hardware resources.





BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 schematically illustrates a neural network;
FIG. 2 schematically illustrates a neuron for use in the network of FIG. 1;
FIG. 3 illustrates a one dimensional systolic array for implementing a neural network in accordance with an illustrative embodiment of the present invention;
FIG. 4 illustrates a processor located in each processing element of the systolic array of FIG. 3;
FIGS. 5A, 5B and 5C illustrates the flow of data from the timing point of view in the inventive systolic array; and
FIG. 6 illustrates the interconnections between a one dimensional systolic array and a control microprocessor in accordance with the invention.





DETAILED DESCRIPTION OF THE INVENTION
FIG. 3 illustrates a one dimensional systolic array in accordance with an illustrative embodiment of the present invention. The systolic array 100 comprises a plurality of processing elements PE-i, i=1,2, . . . ,M. Each processing element PE-i comprises a weight storage circuit 102-i. The weight storage circuit 102-i is a digital memory such as a RAM in most embodiments, although in some cases analog storage devices such as capacitors may be used. Each weight storage circuit stores a set of synaptic weights w.sub.ij. j=1,2, . . . ,N. In some cases, more than one set of synaptic weights may be stored so that the capacity of each weight storage circuit is a multiple of N.
The synaptic weights w.sub.ij are transmitted to the weight storage circuit 102-i from a control microprocessor (not shown in FIG. 3). A set of synaptic weights w.sub.ij is down loaded via the W-bus from the control microprocessor to the particular weight storage circuit 102-i indicated by the address on the A-bus. The write signal is enabled for a set of weights w.sub.ij to be written into a weight storage circuit 102-i.
When the shift control signal w-shift is enabled, the stored set of weights w.sub.i1, W.sub.i2, . . . , w.sub.iN is shifted sequentially out of the weight storage circuit 102-i synchronously with the clock signal.
Each processing element PE-i also includes a processor 104-i. The processor 104-i is a circuit for evaluating ##EQU3## Each processor 104-i includes an input which is connected to the corresponding weight storage circuit 102-i for receiving in sequence the synoptic weights w.sub.i1, w.sub.i2, . . . , w.sub.iN. Each processor 104-i also includes an input for receiving in sequence the inputs x.sub.j. In each clock cycle, one w.sub.ij is received and one x.sub.j is received and one value f(w.sub.ij,x.sub.j) is determined by the processor 104-i.
The processor 104-i receives a select signal to choose a particular function f (weighted sum or radial based), a clear signal for clearing an accumulator (see FIG. 4) and a clock signal.
The processor 104-i outputs at the ACU output a signal ##EQU4##
Each processing element PE-i contains a storage element 106-i. Each storage element 106-i stores a value g.sub.i received from the corresponding processor 104-i. These values are loaded into the storage elements 106-i in response to the load signal. The storage elements 106-i are connected to form a shift register 109. In response S-shift signal, the elements g.sub.i are shifted sequentially out of the shift register 109 synchronously with the clock signal.
The elements g.sub.i are transmitted to the activation function circuit 112. This circuit implements any one of the plurality of activation functions discussed above (e.g. sigmoid, gaussian, linear, step, square, etc.).
The output of the activation function 112 is a sequence of values
Y.sub.i =S(g.sub.i)
The processor 104-i is shown in greater detail in FIG. 4. Each processor 104-i comprises a subtractor 40, a multiplier 42, a pair of multiplexors 44 and 46, and an accumulator 50. The multiplexers 44 and 46 receive a select signal which determines if a weighted sum or radial based neuron is implemented. In each clock cycle one weight w.sub.ij and one input value x.sub.j is received. In the case of a weighted sum, the multiplexer 44 outputs w.sub.ij and the multiplexer 46 outputs x.sub.j. These quantities are multiplied by the multiplier 42. In the case of the radial-based neuron, each multiplexer outputs (w.sub.ij -x.sub.j) and this quantity is squared by the multiplier 42. In each clock cycle, one value ((w.sub.ij .multidot.x.sub.j) or (w.sub.ij -x.sub.j).sup.2) is inputted to the accumulator 50. These values are then accumulated by the accumulator 50 to output g.sub.i.
The timing of the systolic array 100 is as follows. It takes N clock cycles to obtain ##EQU5## in each processing element PE-i. It requires another clock cycle to load the shift register 109 with the values g.sub.i. Another M clock cycles is required to shift the values g.sub.i out of the shift register 109 and through the activation function block 112.
FIGS. 5A, 5B and 5C illustrate how the hardware resources are utilized as a function of clock cycle for the case M>N, M=N, M<N respectively. In each of the Figures, the shading indicates which hardware elements are active in particular clock cycles. Different shadings represent different data flows. It appears that optimal use of the hardware resources occurs when M=N.
Consider the following examples. Consider a neural network with M=8 neurons arranged in one layer (i.e., in one row). If this network is implemented with a systolic array of M=8 processing elements and N=8, then the weights to be stored in each processing element PE-i is as shown in Table 2.
TABLE 2______________________________________Processing NElements 1 2 3 4 5 6 7 8______________________________________PE-1 w.sub.11 w.sub.12 w.sub.13 w.sub.14 w.sub.15 w.sub.16 w.sub.17 w.sub.18PE-2 w.sub.21 w.sub.22 w.sub.23 w.sub.24 w.sub.25 w.sub.26 w.sub.27 w.sub.28PE-3 w.sub.31 w.sub.32 w.sub.33 w.sub.34 w.sub.35 w.sub.36 w.sub.37 w.sub.38PE-4 w.sub.41 w.sub.42 w.sub.43 w.sub.44 w.sub.45 w.sub.46 w.sub.47 w.sub.48PE-5 w.sub.51 w.sub.52 w.sub.53 w.sub.54 w.sub.55 w.sub.56 w.sub.57 w.sub.58PE-6 w.sub.61 w.sub.62 w.sub.63 w.sub.64 w.sub.65 w.sub.66 w.sub.67 w.sub.68PE-7 w.sub.71 w.sub.72 w.sub.73 w.sub.74 w.sub.75 w.sub.76 w.sub.77 w.sub.78PE-8 w.sub.81 w.sub.82 w.sub.83 w.sub.84 w.sub.85 w.sub.86 w.sub.87 w.sub.88______________________________________
Now consider the situation where, the neural network has eight neurons, arranged in a row but the systolic array used to implement the neural network has only four processing elements. This is the case where M=4, N=8, i.e., there are four processing elements PE-1, PE-2, PE-3, PE-4.
In this case, the synaptic weights are stored in the PE's according to Table 3 below:
TABLE 3______________________________________w.sub.1j, w.sub.5j stored in PE-1w.sub.2j, w.sub.6j stored in PE-2w.sub.3j, w.sub.7j stored in PE-3w.sub.4j, w.sub.8j stored in PE-4______________________________________
In a first set of clock cycles a set of inputs x.sub.j is entered into the PE's and processed with the synoptic weights w.sub.1j, . . . , w.sub.4j. In a second set of clock cycles the same set of inputs x.sub.j is again entered into the PE's and combined with synaptic weights w.sub.5j, . . . , W.sub.8j. With this kind of partitioning, efficient use of the hardware resources in the one dimensional systolic array can be achieved.
FIG. 6 shows how the systolic array 100 is controlled by a control processor 200. The microprocessor 200 transmits updated synaptic weights to the PE's in the systolic array via the W-bus. The synaptic weights are transmitted to particular PE's indicated by addresses on the A-bus. The microprocessor transmits a select signal via a select line 202 to determine the type of neural network (radial based or weighted sum). The inputs x.sub.j are transmitted from the microprocessor 200 to the systolic array 100 via the lines 204. The systolic array 100 outputs the values Y.sub.i via the lines 206 to the microprocessor 200 which uses these values to update the synaptic weights.
The simplified connections between the control microprocessor 200 and the one-dimensional systolic array 100 are a significant advantage of the invention.
In short, a one-dimensional systolic array for implementing a neural network has been disclosed. Finally, the above-described embodiments of the invention are intended to be illustrative only. Numerous alternative embodiments may be devised by those skilled in the art without departing from the spirit and scope of the following claims.
Claims
  • 1. A neural network circuit, comprising:
  • a one dimensional systolic array of M processing elements, the i.sup.th processing element, i=1,2, . . . ,M, comprising:
  • a weight storage circuit for storing synaptic weights w.sub.ij, j=1,2, . . . ,N;
  • a processor for receiving a sequence of inputs x.sub.j and for outputting an accumulated value: ##EQU6## where f(w.sub.ij, x.sub.j) equals w.sub.ij x.sub.j if a weighted sum network is selected or equals (w.sub.ij -x.sub.j).sup.2 if a radial based network is selected, said processor comprising:
  • first and second multiplexers, each multiplexer receiving said select signal for determining whether said weighted sum or said radial based network is selected; and
  • a multiplier for multiplying the output from said first multiplexer with the output from said second multiplexer to generate said function f(w.sub.ij, x.sub.i),
  • wherein said output of said first multiplexer being w.sub.ij and the output of said second multiplexer being x.sub.i when said select signal indicates a weighted sum network, and
  • wherein said output of said first multiplexer being (w.sub.ij -x.sub.i) and the output of said second multiplexer being (w.sub.ij -x.sub.i) when said select signal indicates a radial based network;
  • a storage element for storing g.sub.i, said storage element being an i.sup.th storage element in a shift register comprising M storage elements; and
  • said one dimensional systolic array also comprising an activation function circuit for sequentially receiving said accumulated values g.sub.i from said shift register and for outputting a sequence of values Y.sub.i =S(g.sub.i), where S is an activation function.
  • 2. The neural network circuit of claim 1 further comprising a control circuit connected to said one dimensional systolic array for selecting said function f(w.sub.ij,x.sub.j).
  • 3. The neural network circuit of claim 1 wherein said processor in said i.sup.th processing element comprises:
  • a circuit for receiving a sequence of values w.sub.ij from said weight storage circuit, a sequence of input values, x.sub.j, and a select signal for selecting a particular function f, and for outputting a sequence of values f(w.sub.ij,x.sub.j); and
  • an accumulator for accumulating said values f(w.sub.ij,x.sub.j) to obtain g.sub.i.
  • 4. The neural network circuit of claim 1 wherein said activation function S is selected from a group consisting of sigmoid function, gaussian function, linear function, step function, and square function.
  • 5. The neural network circuit of claim 1 further comprising
  • a control processor for outputting said synaptic weights w.sub.ij to each of said processing elements.
  • 6. The neural network circuit of claim 5 further including an address bus and a synaptic weight bus connecting said control processor and said one dimensional systolic array, said synaptic weight bus transmitting synoptic weight values w.sub.ij to a particular processing element i identified by said address bus.
  • 7. The neural network circuit of claim 6 wherein said control processor transmits said input values x.sub.ij to said processing elements, transmits a select signal to said processing elements to select said function f(w.sub.ij,x.sub.j) and receives said values Y.sub.i from said activation function circuit.
  • 8. The neural network circuit of claim 1 wherein M=N.
  • 9. The neural network circuit of claim 1 wherein M>N.
  • 10. The neural network circuit of claim 1 wherein M<N.
  • 11. The neural network circuit of claim 10 wherein, in each of said processing elements, in a first set of clock cycles a first sequence of synoptic weights w.sub.ij is combined with a sequence of values x.sub.j according to the function ##EQU7## to obtain a first set of values Y.sub.i and in a second set of clock cycles a second sequence of synoptic weights w.sub.ij is combined with the same sequence of values x.sub.j according to the function ##EQU8## to obtain a second set of values Y.sub.i.
  • 12. A neural network circuit comprising a one dimensional systolic array of M processing elements; and
  • a control processor,
  • said control processor transmitting to the i.sup.th processing element, i=1,2, . . . ,M, a set of synaptic weights w.sub.ij, j=1,2, . . . ,N, said i.sup.th processing element comprising:
  • first and second multiplexers, each multiplexer receiving said select signal for determining whether said weighted sum or said radial based network is selected; and
  • a multiplier for multiplying the output from said first multiplexer with the output from said second multiplexer to generate said function f(w.sub.ij,x.sub.i),
  • wherein said output of said first multiplexer being w.sub.ij, and the output of said second multiplexer being x.sub.i when said select signal indicates a weighted sum network, and
  • wherein said output of said first multiplexer being (w.sub.ij -x.sub.i) and the output of said second multiplexer being (w.sub.ij -x.sub.i) when said select signal indicates a radial based network;
  • said control processor transmitting to each of said processing elements a set of values x.sub.j ;
  • said control processor outputting a select signal to said one dimensional systolic array to select said function f(w.sub.ij, x.sub.i), where f(w.sub.ij, x.sub.j) equals w.sub.ij x.sub.j if a weighted sum network is selected or equals (w.sub.ij -x.sub.j).sup.2 if a radial based network is selected; and
  • said one-dimensional systolic array outputting to said control processor a set of values Y.sub.i =S, where S is an activation function and ##EQU9##
  • 13. The neural network circuit of claim 12 wherein said control processor updates said synoptic weights w.sub.ij in response to said values Y.sub.i and transmits said updated synoptic weights to said processing elements in said one dimensional systolic array.
  • 14. The neural network circuit of claim 12 wherein each of said processing elements in said one dimensional systolic array comprises:
  • a synoptic weight storage circuit for storing synoptic weights;
  • a circuit for receiving said synoptic weights w.sub.ij from said synoptic weight storage circuit and said values x.sub.j from said control processor and for outputting a sequence of values f(w.sub.ij ,x.sub.j); and
  • an accumulator for accumulating said values f(w.sub.ij,x.sub.j) to output g.sub.i.
  • 15. A processing element for use in a one-dimensional systolic array used to implement a neural network, said processing element comprising:
  • a weight storage circuit for storing a sequence of synoptic weights w.sub.ij ;
  • a processor for receiving a sequence on input values and for outputting f(w.sub.ij, x.sub.j)=w.sub.ij x.sub.i, or f(w.sub.ij,x.sub.j)=(w.sub.ij -x.sub.j).sup.2 depending on whether the value of a select signal received at said processor represents a weighted sum network or a radial based network, respectively, said processor comprising:
  • first and second multiplexers, each multiplexer receiving said select signal for determining whether said weighted sum or said radial based network is selected; and
  • a multiplier for multiplying the output from said first multiplexer with the output from said second multiplexer to generate said function f(w.sub.ij,x.sub.i),
  • wherein said output of said first multiplexer being w.sub.ij and the output of said second multiplexer being x.sub.i when said select signal indicates a weighted sum network, and
  • wherein said output of said first multiplexer being (w.sub.ij -x.sub.i) and the output of said second multiplexer being (w.sub.ij -x.sub.i) when said select signal indicates a radial based network;
  • an accumulator for forming the accumulation; and ##EQU10## an activation function circuit for applying an activation function to the values g.sub.i.
RELATED APPLICATIONS

This is a continuation-in-part of U.S. patent application Ser. No. 08/403,523 filed on Mar. 13, 1995, now abandoned.

US Referenced Citations (14)
Number Name Date Kind
5091864 Baji et al. Feb 1992
5136717 Morley et al. Aug 1992
5138695 Means et al. Aug 1992
5148385 Frazier Sep 1992
5216751 Gardner et al. Jun 1993
5235330 Ramacher et al. Aug 1993
5235673 Austvold et al. Aug 1993
5274832 Khan Dec 1993
5293459 Duranton et al. Mar 1994
5337395 Vassiliadis et al. Aug 1994
5475793 Broomhead et al. Dec 1995
5509106 Pechanek et al. Apr 1996
5517598 Sirat May 1996
5519811 Yoneda et al. May 1996
Non-Patent Literature Citations (10)
Entry
Maria, "1D and 2D systolic implementations for radial basis function networks," IEEE 4th intl conf on microelectronics for neural networks, Dec. 1994.
"Widening world of neural nets," Electronics engineering times p. 35 Jul. 26, 1993.
Cornu, "Design implementation and test of a multi-model systolic neural network accelerator" scientific programming v5 n1 pp. 47-61, 1996.
Kwan, Systolic architecture for Hopfield network, BAM and multi-layer feed-forward network; 1989 IEEE International symposium on circuits and systems, pp. 790-793, Dec. 1989.
Zubair et al., Systolic implementation of neural networks, IEEE computer design--ICCD 1989 international conference, Dec. 1989.
Broomhead et al., A systolic array for nonlinear adaptive filtering and pattern recognition, 1990 IEEE international symposium on Circuits and systems, pp. 962-965, Dec. 1990.
Chen et al., A fuzzy neural network chip based on systolic array architecture, IEEE 1992 ASIC conference and exhibit, pp. 577-580, Feb. 1992.
Sheu, VLSI neurocomputing with analog programmable chips and digital systolic array chips, 1991 IEEE international symposium on circuits and systems, pp. 1267-1270, Apr. 1991.
Broomhead et al., Systolic array for nonlinear multidimensional interpolation, Electronics letters vol. 26 No. 1, pp. 7-9, Jan. 1990.
Kung et al., A unifying algorithm/architecture for artificial neural networks, Artificial Nerual Networks electronic implementations, IEEE Computer Society Press, pp. 120-123, Dec. 1990.
Continuation in Parts (1)
Number Date Country
Parent 403523 Mar 1995