The present invention contains subject matters related to Japanese Patent Application JP 2006-093108 filed in the Japanese Patent Office on Mar. 30, 2006, the entire contents of which being incorporated herein by reference.
1. Field of the Invention
The present invention relates to an information-processing apparatus, a method of processing information, a learning device, a learning method, and program products. More particularly, it relates to an information-processing apparatus and the like in which long time sequences can be learnt or produced in a recurrent neural network (hereinafter, referred to as “RNN”).
2. Description of Related Art
Feed-forward networks included in artificial neural networks have been broadly applied to any pattern recognition, any learning of unknown function or the like. In the feed-forward networks, output is determined by only current inputs without taking into consideration any past history. It is difficult to learn pieces of time-series information to cope with them appropriately.
Models of the feed-forward networks that can cope with the pieces of time-series information by converting their time-series pattern to their space pattern have been proposed. In these models, history to be considered is limited.
Alternatively, models of RNN have been proposed. The RNN is a neural network having a recurrent loop so-called “a context loop” and can cope with pieces of time-series information by performing any processing based on internal state in the context loop, thereby preventing the history to be considered from being limited.
An article, “Learning to generate combinatorial action sequences utilizing the initial sensitivity of deterministic dynamical systems” by Ryu NISIMOTO and Jun TANI, Neural Networks 17, 2004, p 925-p 933 has disclosed such a technology that action sequences of a robot can be changed by utilizing the RNN to learn and produce action sequences (time-series patterns) of the robot and changing initial values of the internal state of the RNN.
The technology disclosed in the above article is suitable for action sequences including a small number of time steps in the RNN. If, however, the action sequences include a large number of time steps in the RNN, it is difficult to learn or produce such long time action sequences having the large number of time steps.
It is desirable to provide an information-processing apparatus and the like in which such the long time action sequences can be learnt or produced in the RNN.
According to an embodiment of the present invention, there is provided an information-processing apparatus equipped with a recurrent neural network. The recurrent neural network contains an input node that allows data to be input, an output node that outputs data based on the data input through the input node, a context input node, a context output node, a context loop that returns a value indicating internal state in the network from the context output node to the context input node, and a recurrent loop that returns output from the network at predetermined time to the network as a next input to the network. The information-processing apparatus has a production device that produces a current input to the network by adding output from the output node into an immediately preceding input to the network at a predetermined rate, and produces a current input to the context input node by adding output from the context output node into an immediately preceding input to the context input node at a predetermined rate.
Further, the production device produces internal state of the input node at immediate future after current time by adding the output from the output node into the internal state of the input node at the current time at a predetermined rate, and produces internal state of the context input node at immediate future after the current time by adding the output from the context output node into the internal state of the context input node at the current time at a predetermined rate.
An initial value to be given to the context input node is obtained by learning. In the learning, any influence by an error in the internal state of the context input node at predetermined time on an error in the internal state of the context output node immediately before the predetermined time is adjusted.
According to another embodiment of the present invention, there is provided a method of processing information by using a recurrent neural network containing an input node that allows data to be input, an output node that outputs data based on the data input through the input node, a context input node, a context output node, a context loop that returns a value indicating internal state in the network from the context output node to the context input node, and a recurrent loop that returns output from the network at predetermined time to the network as a next input to the network. The method includes the steps of producing a current input to the network by adding output from the output node into an immediately preceding input to the network at a predetermined rate, and producing a current input to the context input node by adding output from the context output node into an immediately preceding input to the context input node at a predetermined rate.
According to further embodiment of the present invention, there is provided a program product that allows a computer to perform the above method of processing information by using the recurrent neural network.
In the above embodiments of the invention, the current input to the network is produced by adding output from the output node into the immediately preceding input to the network at a predetermined rate and the current input to the context input node is produced by adding output from the context output node into the immediately preceding input to the context input node at a predetermined rate. This enables long time action sequence to be learnt or produced in the RNN.
According to an additional embodiment of the present invention, there is provided learning device that learns an initial value provided to a context input node of the information-processing apparatus. The information-processing apparatus is equipped with a recurrent neural network containing an input node that allows data to be input, an output node that outputs data based on the data input through the input node, a context input node, a context output node, a context loop that returns a value indicating internal state in the network from the context output node to the context input node, and a recurrent loop that returns output from the network at predetermined time to the network as a next input to the network.
The learning device contains an adjusting device that adjusts any influence by an error in the internal state of the context input node at predetermined time on an error in the internal state of the context output node immediately before the predetermined time.
The adjusting device sets a value obtained by dividing the error in the internal state of the context input node at predetermined time by a positive coefficient as the error in the internal state of the context output node immediately before the predetermined time, to adjust the influence by the error in the internal state of the context input node at the predetermined time on the error in the internal state of the context output node immediately before the predetermined time.
According to still another embodiment of the present invention, there is provided a learning method of learning an initial value to be provided to a context input node of an information-processing apparatus. The information-processing apparatus is equipped with a recurrent neural network containing an input node that allows data to be input, an output node that outputs data based on the data input through the input node, a context input node, a context output node, a context loop that returns a value indicating internal state in the network from the context output node to the context input node, and a recurrent loop that returns output from the network at predetermined time to the network as a next input to the network. This learning method includes a step of adjusting any influence by an error in the internal state of the context input node at predetermined time on an error in the internal state of the context output node immediately before the predetermined time.
According to still further embodiment of the present invention, there is provided a program product that allows a computer to perform the above learning method of learning an initial value to be provided to a context input node of an information-processing apparatus.
In the above embodiments of the learning device and method of the invention, any influence by an error in the internal state of the context input node at the predetermined time on an error in the internal state of the context output node immediately before the predetermined time can be adjusted.
The concluding portion of this specification particularly points out and directly claims the subject matter of the present invention. However, those skilled in the art will best understand both the organization and method of operation of the invention, together with further advantages and objects thereof, by reading the remaining portions of the specification in view of the accompanying drawing(s) wherein like reference characters refer to like elements.
The following will describe embodiments of the present invention with reference to the accompanied drawings.
The information-processing apparatus 10 contains a learning direction device 11, RNN device 12, and production direction device 13 and performs learning processing on time-series data (time-series pattern).
The learning direction device 11 directs the RNN device 12 to perform learning processing on time-series data by supplying the RNN device 12 with the time-series data as teacher data.
The RNN device 12 contains a storage portion 21 and an operation portion 22. In the RNN device 12, recurrent neural network (RNN) with three layers including an input layer 51, an output layer 53, and an intermediate layer 52 therebetween is constructed.
In the RNN 41 shown in
It is to be noted that if each node is indistinguishable in the input node 61-i, the context input node 62-k, the hidden node 63-j, the output node 64-i, and the context output node 65-k, they are simply referred to as the input node 61, the context input node 62, the hidden node 63, the output node 64, and the context output node 65, respectively.
Referring back to
When each node of the input layer 51, namely, the input nodes 61-i and the context input nodes 62-k receive their initial values from the production direction device 13, the operation portion 22 produces time-series data based on the initial values and outputs the time-series data thus produced to the production direction device 13 as produced data. In order to produce the time-series data, the weight coefficients and the optimal initial value to be provided to the context input node 62-k, which are obtained by the above learning, are used. When each node of the input layer 51 receive their initial values from the production direction device 13, the RNN device 12 acts as production device to produce the time-series data based on the initial values thus received.
The production direction device 13 directs the RNN device 12 to produce the time-series data of desired time step numbers (samples, times) by supplying the initial values to each node of the input layer 51 of the RNN 41.
The following will describe details of the RNN 41 with reference to
The RNN 41 contains the input layer 51, the intermediate (hidden) layer 52, the output layer 53, and calculation portions 54, 55.
As described above, the input layer 51 has the input nodes 61-i (i=1, 2, . . . , I) and the context input nodes 62-k (k=1, 2, . . . , K). The intermediate layer 52 has the hidden nodes 63-j (j=1, 2, . . . , J). The output layer 53 has the output nodes 64-i (i=1, 2, . . . , I) and the context output nodes 65-k (k=1, 2, . . . , K).
To the input nodes 61-i, data xui(t) that is i-th item constituting the state vector xu(t) at time t is input. To the context input node 62-k, data cuk(t) that is k-th item constituting the internal state vector cu(t) of the RNN 41 at time t is input.
If the data xui(t) and the data cuk(t) are respectively input to the input nodes 61-i and the context input node 62-k, items of the data xi(t) and ck(t) that are respectively output from the input nodes 61-i and the context input node 62-k are respectively represented by following equations (1) and (2):
xi(t)=ƒ(xiu(t)) (1); and
ck(t)=ƒ(cku(t)) (2).
The functions f of the equations (1) and (2) include differentiable continuous function such as sigmoid function. These equations (1) and (2) mean that the data xui(t) and the data cuk(t) respectively input to the input nodes 61-i and the context input node 62-k are activated by the functions f and output from the input nodes 61-i and the context input node 62-k as the data xi(t) and the data ck(t). It is to be noted that a superscript “u” of each of the data xui(t) and the data cuk(t) indicates internal state on the node before it has been activated, which is similar to other nodes.
Data huj(t) to be input to the hidden nodes 63-j can be represented by following equation (3) using weight coefficient whij that represents a weight of combination between the input nodes 61-i and the hidden nodes 63-j and weight coefficient whjk that represents a weight of combination between the context input nodes 62-k and the hidden nodes 63-j:
hju=(t)Σwijhxi(t)+Σwhjkck(t) (3).
Data hj(t) output from the hidden nodes 63-j can be represented by following equation (4):
hj(t)=ƒ(hju(t) (4).
It is to be noted that sigma of a first term in the right side of the equation (3) means sum of all of the nodes i (i=1, 2, . . . , I) and sigma of a second term in the right side of the equation (3) means sum of all of the nodes k (k=1, 2, . . . , I).
Similarly, data yui(t) to be input to the output nodes 64-i, data yi(t) output from the output nodes 64-i, data ouk(t) to be input to the context output nodes 65-k, and data ok(t) output from the context output nodes 65-k can be respectively represented by following equations (5), (6), (7), and (8):
yiu(t)=Σwijyhj(t) (5);
yi(t)=ƒ(yiu(t) (6);
oku(t)=Σwojkhj(t) (7); and
ok(t)=ƒ(oku(t)) (8).
In the equation (5), wyij is a weight coefficient indicating weight of combination of the hidden nodes 63-j and the output nodes 64-i and sigma means sum of all of the nodes j (j=1, 2, . . . , J). In the equation (7), wojk is a weight coefficient indicating weight of combination of the hidden nodes 63-j and the context output nodes 65-k and sigma means sum of all of the nodes j (j=1, 2, . . . , J).
The calculation portion 54 calculates finite difference delta xui(t+1) between the data xui(t) at time t and the data xui(t+1) at time t+1 from data yi(t) output from the output nodes 64-i according to the following equation (9) and then, calculates the data xui(t+1) at time t+1 according to the following equation (10) and output the calculated data xui(t+1).
It is to be noted that in these equations, alpha and tau indicate optional coefficients, respectively.
Thus, when the RNN 41 shown in
The calculation portion 55 calculates finite difference delta cuk(t+1) between the data cuk(t) at time t and the data cuk(t+1) at time t+1 from data ok(t) output from the context output nodes 65-k according to the following equation (11) and then, calculates the data cuk(t+1) at time t+1 according to the following equation (12) and output the calculated data cuk(t+1).
This data cuk(t+1) output from the calculation portion 55 at time t+1 is also fed back to the context input nodes 62-k.
The equation (12) means that the internal state vector cu(t+1) at next time can be obtained by adding the data ok(t) output from the context output nodes 65-k that is weighted by a coefficient α to the internal state vector cu(t) in the network at a current time. In this sense, the RNN 41 shown in
Thus, when the RNN 41 shown in
The following will describe production processing of the information-processing apparatus 10 that produces time-series data with reference to a flowchart shown in
First, at Step S11, the production direction device 13 supplies the RNN device 12 with the initial value X0 of the input data and the initial value C0 of the context input data.
At Step S12, the input nodes 61-i calculate the data xi(t) according to the equation (1) and outputs the calculated data xi(t) as well as the context input nodes 62-k calculate the data ck(t) according to the equation (2) and outputs the calculated data ck(t).
At Step S13, the hidden nodes 63-j calculate the data huj(t) according to the equation (3), calculate the data hj(t) according to the equation (4), and outputs the calculated data hj(t).
At Step S14, the output nodes 64-i calculate the data yui(t) according to the equation (5), calculate the data yi(t) according to the equation (6), and outputs the calculated data yi(t).
At Step S15, the context nodes 65-k calculate the data ouk(t) according to the equation (7), calculate the data ok(t) according to the equation (8), and outputs the calculated data ok(t).
At Step S16, the calculation portion 54 calculates the finite difference data Δxui(t+1) according to the equation (9), calculates the data xui(t+1) at time t+1 according to the equation (10), and outputs the calculated data xui(t+1) to the production direction device 13.
At Step S17, the calculation portion 55 calculates the finite difference data Δcuk(t+1) according to the equation (11), calculates the data cuk(t+1) at time t+1 according to the equation (12). The calculation portion 55 feeds (inputs) the calculated data cuk(t+1) back to the context input nodes 62-k.
At Step S18, the RNN device 12 determines whether or not the production of the time-series data is finished. At the Step S18, if it is determined that the production of the time-series data is not finished, the calculation portion 54, at Step S19, feeds the calculated data xui(t+1) at time t+1 back to the input nodes 61-i and the processing returns to the Step S12.
On the other hand, if it is determined that the production of the time-series data is finished by, for example, attaining the desired time step number, at the Step S18, the RNN device 12 finishes the production processing.
The following will describe learning of time-series data in the RNN device 12.
It is supposed that when, for example, a humanoid robot equipped with the information-processing apparatus 10 learns plural action sequences (actions), the weight coefficients whij, whjk between nodes of the input layer 51 and nodes of the intermediate layer 52 as well as the weight coefficients wyij, wojk between nodes of the intermediate layer 52 and nodes of the output layer 53 correspond to all the action sequences.
In the learning processing, learning of time-series data corresponding to the plural action sequences is carried out simultaneously. Namely, in the learning processing, the RNNS 41 of the same number as that of the action sequences are prepared and the weight coefficients whij, whjk, wyij, wojk are calculated for each action sequence so that their average value can become final weight coefficients whij, whjk, wyij, wojk of one RNN 41. Repeating such the processing enables weight coefficients whij, whjk, wyij, wojk of the RNN 41 that is used in the production processing to be obtained. In the learning processing, the initial value cu(t0)=C0 of the context input data is also obtained for each action sequence at the same time.
First, at Step S31, the production direction device 13 supplies the RNN device 12 with N items of time-series data as teacher data. The production direction device 13 also supplies the RNN device 12 with a predetermined value as the initial value cuk(t0)=C0k of the context input data of the N pieces of RNNS 41.
At Step S32, the operation portion 22 of the RNN device 12 substitutes one for a variable “s” indicating times of learning.
At Step S33, the operation portion 22 calculates amounts of errors δwhij, δwhjk of the weight coefficients whij(s), whjk(s) between nodes of the input layer 51 and nodes of the intermediate layer 52, amounts of errors δwyij, δwojk of the weight coefficients wyij(s), wojk(s) between nodes of the intermediate layer 52 and nodes of the output layer 53, and an amount of error δC0k of the initial value C0k of the context input data, using back propagation through time (BPTT) method, on the RNNS 41 corresponding to each of the N items of time-series data. In this case, in the RNN 41 to which the n-th time-series data (n=1, 2, . . . , N) is input, the amounts of errors whij, δwhik, δwyij, δwojk, δC0k obtained by using BPTT method are respectively represented as the amounts of errors δwhij, n, δwhjk, n, δwyij, n, δwojk, n, δC0k, n.
BPTT method is a learning algorithm for the RNN 41 having a context loop, and by unfolding situation of signal propagation in time into space one, back propagation (BP) method used in the normal multilayer neural network is applied thereto. The weight coefficients whij(s), whjk(s), wyij(s), wojk(s) are obtained so that an error between the data xu(t+1) at time t+1 that is obtained from the data xu(t) at time t and teacher data xu(t+1)* at time t+1 can be made smaller.
It is to be noted that the operation portion 22 adjusts time constant of the context data by dividing, in the calculation using the BPTT method in Step S33, amount of error δcuk(t+1) of the data cuk(t+1) of the context input node 62-k at time t+1 by an optional positive coefficient m when the operation portion 22 performs back propagation on the amount of error δcuk(t+1) of the data cuk(t+1) of the context input nodes 62-k at time t+1 to the amount of error δok(t) of the data ok(t) of the context output nodes 65-k at time t.
In other words, the operation portion 22 calculates the amount of error δok(t) of the data ok(t) of the context output nodes 65-k at time t according to the following equation (13) using the amount of error δcuk(t+1) of the data cuk(t+1) of the context input nodes 62-k at time t+1:
Adapting the equation (13) for the BPTT method enables influence of the context data of immediately before time step, which indicates internal state of the network, to be adjusted.
At Step S34, the operation portion 22 averages the weight coefficients whij, whjk between nodes of the input layer 51 and nodes of the intermediate layer 52, and the weight coefficients wyij, wojk between nodes of the intermediate layer 52 and nodes of the output layer 53, respectively, by N items of time-series data, and updates the weight coefficients whij, whik, wyij, wojk to averaged ones.
Namely, the operation portion 22 calculates the weight coefficients whij(s+1), whik(s+1) between nodes of the input layer 51 and nodes of the intermediate layer 52, and the weight coefficients wyij(s+1), wojk(s+1) between nodes of the intermediate layer 52 and nodes of the output layer 53 according to the following equations (14) through (21):
In these equations, eta indicates a learning coefficient and alpha indicates an inertia coefficient. It is to be noted that in the equations (14), (16), (18), and (20), if s=1, the terms Δwhij(s), Δwhjk(s), Δwyij(s), Δwojk(s) respectively become zero.
At Step S35, the operation portion 22 updates the initial value c0k,n of the context input data. Namely, the operation portion 22 calculates the initial value c0k,n(s+1) of the context input data according to the following equations (22) and (23):
Δc0k,n(s+1)=ηδc0k,n+αΔc0k,n(s) (22); and
c0k,n(s+1)=c0k,n(s)+Δc0k,n(s+1) (23).
At Step S36, the operation portion 22 determines whether or not the variable s is less than a predetermined times of learning. The predetermined times of learning are set to times so that learning error can be sufficiently made small.
If it is determined that the variable s is less than a predetermined times of learning, i.e., times of learning such that learning error can be sufficiently made small have not yet performed, at the Step S36, the processing goes to Step S37 where the operation portion 22 increments the variable s by one. The processing then goes to the Step S33. The processing further repeats the Steps S33 through S37. On the other hand, if it is determined that the variable s is not less than a predetermined times of learning, the learning processing ends.
It is to be noted that at the Step S36, the operation portion 22 can determine whether or not the learning error is involved within a predetermined reference limit. When it determines that the learning error is involved within the predetermined reference limit, the learning processing ends.
Thus, in the learning processing, processing such that the weight coefficients whij, whjk, wyij, wojk are obtained for each action sequence and their average values become the weight coefficients whij, whjk, wyij, wojk of final one RNN 41 is repeated, thereby obtaining the weight coefficients whij, whjk, wyij, wojk of the RNN 41 to be used in production processing.
In such the processing, in other words, the weight coefficients whij, whjk between nodes of the input layer 51 and nodes of the intermediate layer 52, and the weight coefficients wyij, wojk between nodes of the intermediate layer 52 and nodes of the output layer 53 are allocated to indiscrete part of the actions to the plural action sequences while the initial values c0k,n of the context nodes are allocated to discrete part of the actions to the plural action sequences. Therefore, the initial values c0k,n of the context nodes obtained by the learning processing have separate values for each action sequence. This allows the reproduced action sequence to alter based on the given initial values c0k,n of the context nodes.
Although the weight coefficients whij, whjk, wyij, wojk obtained for each action sequence have been averaged for each time in the above learning processing, they can be averaged for each of the predetermined times. For example, if the times of learning to be finished are 10,000 times, the weight coefficients whij, whjk, wyij, wojk obtained for each action sequence may be averaged for each ten times of learning.
The following will describe the learning processing and the production processing of the time-series data of the above information-processing apparatus 10 based on results of experiments in which a humanoid robot acted.
Specifically, as shown in
Time-series data given to the RNN device 12 as teacher data relates to signals on a joint motor for robot. In this experiment, node number of the input nodes 61 in the RNN 41 was set to eight (I=8); node number of the hidden nodes therein was set to twenty (J=20); node number of the context input nodes 62 therein was set to ten (K=10); and node number of the output nodes 64 therein was set to eight (I=8). Numbers of learning were set to 500,000 times to perform the learning. Therefore, the robot was controlled with eight-axis motor to perform D5 the action sequences D1 through D3.
In this experiment, learning was performed in which a total of 15 items of time-series data obtained by adding five species of noises that was slightly different one from another to each of the action sequences D1 through D3 was set as teacher data. Weight coefficients in the RNN 41 that were common to the 15 items of time-series data were obtained and the initial values C0 of the context input data to the 15 items of time-series data were obtained.
It has been seen that at the learning of 500,000 times, the learning error converges sufficiently, except somewhat fluctuation.
In each of the
As seen from every graph of
The following will describe initial values C0 of the context input data that is obtained in the learning processing.
In
As seen from
Thus, it is possible to switch the action sequences D1 through D3 sufficiently based on the initial values c0 of the context input data that is input to the RNN 41 even if the initial values X0 of the input data that is input to the input node 61 of the RNN 41 is identical when the initial state (a) is identical. In other words, the initial values c0 of the context input data for switching the action sequences D1 through D3 are self-assembled by the learning processing.
Thus, the RNN 41 included in the RNN device 12 enables to be realized with stability the learning of sequences (of time-series data) including a branch structure such that the initial values X0 of the input data that is input to the first input node 61 of the RNN 41 is identical but vary in its midstream irrespective of long time sequences from 69 to 79 time steps.
The above series of processing can be realized by not only hardware but also software. If the series of processing is realized by the software, program pieces constituting this software are installed into a computer embedded in special purpose hardware or a computer that can perform various kinds of functions by installing various kinds of program pieces, for example, a multi-purpose personal computer, from a program storage medium.
An input/output interface 105 is connected to the CPU 101 via the bus 104. To the input/output interface 105, an input portion 106 containing a key board, a mouse, a microphone and the like and an output portion 107 containing a display such as a cathode ray tube (CRT) and a liquid crystal display (LCD), a speaker and the like are connected. The CPU 101 allows various kinds of processing to be performed corresponding to any commands input by the input portion 106. The CPU 101 also allows results of the processing to be output to the output portion 107.
The storage portion 108 connected to the input/output interface 105 contains a hard disk and stores program and/or various kinds of data that the CPU 101 uses for performing various kinds of functions. A communication portion 109 communicates with any outer apparatus via a network such as the Internet and a local area network or directly if the communication portion 109 is connected to the outer apparatus.
A drive 110 connected to the input/output interface 105 drives a removable medium 121 such as a magnetic disk, an optical disk, a magneto-optical disk or a semiconductor memory when the removable medium is installed thereinto for obtaining the stored program and/or data. The program and/or data thus obtained are transferred to the storage portion 108 as occasion demands. The storage portion 108 stores the transferred program and/or data. The program and/or data may be obtained through the communication portion 109 and stored in the storage portion 108.
The program storage medium storing the programs to be installed in a computer and to be performed by the computer is constituted of the removable media 121 shown in
The steps in the flowcharts shown in
It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and alternations may occur depending on design requirements and other coefficients insofar as they are within the scope of the appended claims or the equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
2006-093108 | Mar 2006 | JP | national |