Claims
- 1. A multilayer perceptron device, comprising:
- input means for receiving an input vector;
- a plurality of processing layers of processing elements including at least one hidden layer of processing elements;
- forward propagation comparison means for comparing a processed input vector to an associated target vector and generating a feedback control signal for updating a respective network parameter using back propagation under control of a learning rate, wherein said learning rate is eta.sub.i =eta.sub.o .times.M/KN, wherein eta.sub.o is an overall learning rate for a layer in question, wherein eta.sub.i is a learning rate for updating the respective network parameter, wherein N is the number of inputs to the processing element fed by the updatable parameter value in question, K is the number of outputs from that processing element, and M is the number of inputs to the processing elements of the next layer.
- 2. A device as claimed in claim 1, wherein for an output layer M=K=1.
- 3. A method of training a multilayer perceptron device by tuning a plurality of network parameters that specify strengths of interconnections of said processing elements, the device includes
- an input for inputting a source vector;
- a layer of input processing elements connected to the input;
- an output for outputting a result vector;
- a layer of output processing elements connected to the output;
- a sequence of at least one hidden layer of processing elements between the layer of input processing elements and the layer of output processing elements;
- wherein a processing element of each said layers that precedes a next successive layer furnishes output quantities at outputs of the elements to an input of the next successive layer under multiplication by a respective one of said plurality of network parameters; the method comprising the steps of:
- storing the source vector at the input;
- generating at the output the result vector obtained upon processing of the source vector by the device;
- determining a difference between the result vector obtained and a desired vector;
- under control of a difference, updating said plurality of network parameter values with a normalized learning rate eta, where an initial guess for said learning rate is:
- eta.sub.i =eta.sub.o .times.f(M,N,K),
- wherein:
- eta.sub.o is an overall learning rate for a layer in question;
- eta.sub.i is a learning rate for updating a particular one of said plurality of network parameter values;
- N is a number of inputs to a specific one among the processing elements that is fed by a network parameter value in question;
- K is a number of outputs from a specific one of the processing elements;
- M is a number of inputs to the elements of a next successive layer; and wherein;
- f(M,N,K) indicates a functional relationship between M, N and K of a kind that a value of f increases with increasing M, decreases with increasing N and decreases with increasing K for actual ranges of M, N, and K.
- 4. A method as claimed in claim 3, wherein for the layer of output elements
- eta.sub.i =eta.sub.o .times.f(M),
- N,K having a standard value of 1, and .delta.f/.delta.M being positive.
- 5. A method as claimed in claim 3, wherein for any particular layer the function f(M,N,K) has a uniform value.
- 6. A method as claimed in claim 3, wherein f(M,N,K) is substantially proportional to M.
- 7. A method as claimed in claim 3, wherein f(M,N,K) is substantially inversely proportional to N.
- 8. A method as claimed in claim 3, wherein f(M,N,K) is substantially inversely proportional to K.
- 9. A method as claimed in claim 3, wherein for the multilayer perception device eta.sub.o has a uniform value.
- 10. A multilayer perceptron device, comprising:
- input means for receiving an input vector;
- a plurality of processing layers of processing elements including at least one hidden layer of processing elements;
- forward propagation comparison means for comparing a processed input vector to an associated target vector and generating a feedback control signal for updating a respective network parameter using back propagation under control of a learning rate, wherein said learning rate is eta.sub.i =eta.sub.o .times.f(M,N,K), wherein eta.sub.o is an overall learning rate for a layer in question, wherein eta.sub.i is a learning rate for updating the respective network parameter, wherein N is the number of inputs to the processing element fed by the updatable parameter value in question, K is the number of outputs from that processing element, and M is the number of inputs to the processing elements of the next layer, and wherein f indicates a functional relationship between M, N and K of a kind that a value of f increases with increasing M, decreases with increasing N and decreases with increasing K for actual ranges of M, N and K.
Priority Claims (1)
Number |
Date |
Country |
Kind |
89202133 |
Aug 1989 |
EPX |
|
Parent Case Info
This is a continuation of application Ser. No. 08/141,439, filed Oct. 18, 1993, now abandoned which is a continuation of Ser. No. 07/570,472, filed Aug. 21, 1990, now abandoned.
US Referenced Citations (9)
Foreign Referenced Citations (1)
Number |
Date |
Country |
8807234 |
Sep 1988 |
WOX |
Continuations (2)
|
Number |
Date |
Country |
Parent |
141439 |
Oct 1993 |
|
Parent |
570472 |
Aug 1990 |
|