1. Field of the Invention
Programs incorporating machine learning techniques are widely used in many applications today. Many learning programs are implemented on neural network platforms. Even though the state of the art has advanced rapidly in recent years, many difficulties remain. For example, recurrent neural networks, which are neural networks specialized with sequential data, have been among the most difficult to train. One reason for the difficulty is that such a network iterates a large number of times through its internal states during training, with each iteration a likelihood of “blowing up” or reducing to insignificance either an internal state or its derivative.
One particular kind of network, referred to as the Long Short Term Memory (LSTM) neural network, is designed to mitigate these problems by providing control signals to gate interactions with an internal memory state. LSTM is first described in the article “Long Short-Term Memory,” by S. Hochreiter and J. Schmidhuber. A copy of the article may be obtained at http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf. In an LSTM element, the control signals limit when the memory element may be written into and read from, while maintaining a connection between successive memory states, thereby retaining memory.
LSTM neural networks have been among the most successful networks that deal with sequential data. Additional information regarding LSTM neural network, expressed in lay terms, may be found, for example, in the article “Demystifying LSTM Neural Networks,” available at: http://blog.teminal.com/demistifying-long-short-term-memory-lstm-recurrent-neural-networks/.
According to one embodiment of the present invention, an LSTM element includes (a) a first multiplicative element that receives an input value and more than one input control value provides a resulting input value that is a function of a product of the input value and all the input control values; (b) a state element providing a state value at each time point, wherein the state value assumes a first value at a first time point, and assumes a second value at a second time point immediately following the first time point, the second value being derived from a sum of the resulting input value and a function of the first value; and (c) a second multiplicative element that receives the state value of the state element and an output control value, and provides an output value as a function of a product of the state value and the output control value.
In addition, in one embodiment of the present invention, the LSTM memory element further includes a third multiplicative element that receives one or more memory control values to provide a feedback state value that is a function of the current state value and the memory control values, such as one less the product of the one or more memory control values. The number of memory control values is preferably greater than one.
According to one embodiment of the present invention, a system includes a number of different LSTM elements, wherein the input value of each LSTM element is gated by a different number of input control signals.
The present invention is better understood upon consideration of the detailed description below in conjunction with the accompanying drawings.
The present inventor discovered that, by adding one or more additional control signals to gate the input signal in some or all of the LSTM elements of an LSTM neural network, the performance of the LSTM neural network can be profoundly improved.
As shown in
First and second memory control nodes 209 and 210, first and second input control node 203 and 207, input node 204 and first and second output control nodes 206 and 211 may be conventionally implemented neurons in a neural network. Although shown in
The present inventor's discovery is unexpected and surprising, as the conventional theory would lead one to believe that, in an LSTM network, additional control signals to gate the input signal of an LSTM element should make no difference in the performance of the resulting LSTM neural network. Nevertheless, the present inventor has demonstrated the unexpected result in an experiment involving sentence completion. In a sentence completion program, through training, the program learns to predict the words in the remainder of a sentence based on a fragment of the sentence. For example, given the fragment “Where is New York”, the program is expected after training to provide possible candidates for the complete sentence, such as “Where is New York University?” “Where is New York Yankee Stadium?” and so forth. In the prior art, less favorable results are obtained when the program is trained with the sentence fragment being seen as a collection of characters than a collection of words. Also, in the prior art, more favorable results are obtained when the training data are provided from a collection of documents that are all of the same language. However, a search program trained in this manner would perform unfavorably when required to search over a collection of documents that include documents of a number of different languages. Consequently, many applications are artificially limited as to be language-specific. By introducing the LSTM elements of the present invention, the present inventor was able to show not only performance improvement in the word-based approach, but also showed no significant performance difference between the word-based approach and the character-based approach. This result provides significant promise for many applications that can be used across language boundaries, for example.
The present inventor theorizes that, in practice, multiple control lines are better at retaining information than one. As the number of control lines becomes arbitrarily large, the LSTM of the present invention tends to a limit that is similar to a conventional computer memory bank, in that that the control lines play the role of the memory address lines. By providing different types of LSTM elements of the present invention in an LSTM network, with each type of LSTM element having a different number of control lines to gate the respective input signals, one may allow a multitude of different memories to co-exist, thereby enabling different memory characteristics to exist in the system. One implementation may also include conventional neurons that are without memory protection. A system providing different types of LSTM elements may be referred to as “Higher Order LSTM.” Such a system has been shown to be particularly effective in training programs in the applications described above.
The above detailed description is provided to illustrate specific embodiments of the present invention without being limiting. Various modification and variations within the scope of the present invention are possible. The present invention is set forth in the accompanying claims.
The present application claims priority from U.S. Provisional Patent Application Ser. No. 62/203,606, filed on Aug. 11, 2015. The application is hereby incorporated by reference herein in its entirety
Number | Date | Country | |
---|---|---|---|
62203606 | Aug 2015 | US |