The present invention relates to artificial neural networks. In particular, the present invention relates to multi-layer artificial neural networks capable of performing deep learning.
The idea of artificial neural networks has existed for a long time. Nevertheless, limited computation ability of hardware had been an obstacle to related researches. Over the last decade, there are significant progresses in computation capabilities of processors and algorithms of machine learning. Not until recently did an artificial neural network that can generate reliable judgments become possible. Gradually, artificial neural networks are experimented in many fields such as autonomous vehicles, image recognition, natural language understanding, and data mining.
Neurons are the basic computation units in a brain. Each neuron receives input signals from its dendrites and produces output signals along its single axon (usually provided to other neurons as input signals). The typical operation of an artificial neuron can be modeled as:
wherein x represents the input signal, y represents the output signal. Each dendrite multiplies a weight w to its input signal x; this parameter is used to simulate the strength of influence of one neuron on another. The symbol b represents a bias contributed by the artificial neuron itself. The symbol f represents a specific nonlinear function and is generally implemented as a sigmoid function, tan h function, or rectified linear function in practical computation.
For an artificial neural network, the relationship between its input data and final judgment is in effect defined by the weights and biases of all the artificial neurons in the network. In an artificial neural network adopting supervised learning, training samples are fed to the network. Then, the weights and biases of artificial neurons are adjusted with the goal to find out a judgment policy that the judgments can match the training samples. In an artificial neural network adopting unsupervised learning, whether a judgment matches the training sample is unknown. The network adjusts the weights and biases of artificial neurons and tries to find out an underlying rule. No matter which kind of learning is adopted, the goals are the same—finding out suitable parameters (i.e. weights and biases) for each neuron in the network. The determined parameters will be utilized in future computation.
Currently, most artificial neural networks are designed as having a multi-layer structure. Layers serially connected between the input layer and the output layer are called hidden layers. The input layer receives external data and does not perform computation. In a hidden layer or the output layer, input signals are the output signals generated by its previous layer, and each artificial neuron included therein respectively performs computation according to the aforementioned equation. Each hidden layer and output layer can respectively be a convolutional layer or a fully-connected layer. The main difference between a convolutional layer and a fully-connected layer is that neurons in a fully connected layer have full connections to all neurons in its previous layer. On the contrary, neurons in a convolutional layer are connected only to a local region of its previous layer. Besides, many artificial neurons in a convolutional layer share parameters.
At the present time, there are a variety of network structures. Each structure has its unique combination of convolutional layers and fully-connected layers. Taking the AlexNet structure proposed by Alex Krizhevsky et al. in 2012 as an example, the network includes 650,000 artificial neurons that form five convolutional layers and three fully-connected layers connected in serial.
Generally speaking, as the number of layers increases, an artificial neural network can simulate a more complicated function (i.e. a more complicated judgment policy). However, as the number of layers increases, the number of artificial neurons required in the network would swell significantly and introduce a huge burden in the hardware cost. Undoubtedly, this difficulty will be an impediment to applying artificial neural networks to consumer electronics in the future.
To solve the aforementioned problem, a new multi-layer artificial neural network and controlling method thereof are provided.
One embodiment according to the invention is a multi-layer artificial neural network including a plurality of artificial neurons, a storage device, and a controller. The plurality of artificial neurons are used for performing computation based on plural parameters. The storage device is used for storing plural sets of parameters; each set of parameters is corresponding to a respective layer. At a first time instant, the controller controls the storage device to provide a set of parameters corresponding to a first layer to the plurality of artificial neurons so that the plurality of artificial neurons format least part of the first layer. At a second time instant, the controller controls the storage device to provide a set of parameters corresponding to a second layer to the plurality of artificial neurons so that the plurality of artificial neurons form at least part of the second layer.
Another embodiment according to the invention is a controlling method for a multi-layer artificial neural network. The multi-layer artificial neural network includes a plurality of artificial neurons for performing computation based on plural parameters. According to the controlling method, at a first time instant, a set of parameters corresponding to a first layer is provided to the plurality of artificial neurons so that the plurality of artificial neurons form at least part of the first layer. Further, at a second time instant, a set of parameters corresponding to a second layer is provided to the plurality of artificial neurons so that the plurality of artificial neurons form at least part of the second layer.
Another embodiment according to the invention is a non-transitory computer-readable storage medium encoded with a computer program for controlling a multi-layer artificial neural network. The multi-layer artificial neural network includes a plurality of artificial neurons for performing computation based on plural parameters. The computer program includes instructions that when executed by one or more computers cause the one or more computers to perform operations including: at a first time instant, providing a set of parameters corresponding to a first layer to the plurality of artificial neurons so that the plurality of artificial neurons form at least part of the first layer; and at a second time instant, providing a set of parameters corresponding to a second layer to the plurality of artificial neurons so that the plurality of artificial neurons form at least part of the second layer.
The advantage and spirit of the invention may be understood by the following recitations together with the appended drawings.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
The figures described herein include schematic block diagrams illustrating various interoperating functional modules. It should be noted that such diagrams are not intended to serve as electrical schematics and interconnections illustrated are intended to depict signal flow, various interoperations between functional components and/or processes and are not necessarily direct electrical connections between such components. Moreover, the functionality illustrated and described via separate components need not be distributed as shown, and the discrete blocks in the diagrams are not necessarily intended to depict discrete electrical components.
One embodiment according to the invention is a multi-layer artificial neural network including a plurality of artificial neurons, a storage device, and a controller. The controller is designed to request the storage device to, at different time instants, provide parameters corresponding to different layers to those artificial neurons.
As described above, each artificial neuron performs computation based on its input signals and respective parameters (weights and biases). In the process of machine learning, no matter the learning strategy includes only forward propagation or both forward propagation and backpropagation, these parameters might be continuously adjusted. During and after the learning process, the storage device 152 in
The input pins 161˜164 respectively receive the external data D1˜D3. First, at a time instant t1, the controller 154 requests the storage device 152 to provide parameters corresponding to hidden layer 120 (i.e. parameters stored in the storage region 152A) to the artificial neurons N1˜N4. For example, the controller 154 can request the storage device 152 to provide parameters corresponding to the artificial neuron 121 (so the parameters are labeled as P121) to the artificial neuron N1. As shown in
The multiplexers 171˜174 are also controlled by the controller 154 (via connections omitted in these figures). At the time instant t1, the controller 154 controls multiplexers 171˜174 to connect input pins 161˜164 with artificial neurons N1˜N4, so that the external data D1˜D3 is provided to the artificial neurons N1˜N4 as input signals.
Then, at a time instant t2, the controller 154 controls the storage device 152 to provide parameters corresponding to the hidden layer 130 (i.e. parameters stored in the storage region 152B) to the artificial neurons N1˜N2. For example, the controller 154 can request the storage device 152 to provide parameters corresponding to the artificial neuron 131 (labeled as P131) to the artificial neuron N1, and provide parameters corresponding to the artificial neuron 132 (labeled as P132) to the artificial neuron N2. Further, the controller 154 also controls multiplexers 171˜172 to connect the storage device 152 with the artificial neurons N1˜N2, and requests the storage device 152 to provide previously stored computation results Y121˜Y124 to the artificial neurons N1˜N2 as input signals.
Subsequently, at a time instant t3, the controller 154 requests the storage device 152 to provide parameters corresponding to the artificial neuron 141 (labeled as P141) to the artificial neurons N1. Further, the controller 154 also controls the multiplexer 171 to connect the storage device 152 with the artificial neuron N1, and requests the storage device 152 to provide previously stored computation results Y131˜Y132 to the artificial neuron N1 as input signals.
It should be noted the time instants t1, t2, and t3 are usually relative time instants instead of absolute time instants. For example, the time instant t2 can be defined as the time instant after a certain number of clock cycles counted since the time instant t1. Practically, circuit designers can estimate an appropriate interval between the time instants t1 and t2 based on the computation speed of the artificial neurons N1˜N4 and the signal latency between the blocks. Alternatively, the time instant t2 can be set as the time instant when the computation relative to the hidden layer 120 ends. In other words, the time instant t2 can be triggered by the end of computation relative to the hidden layer 120.
It can be seen through the above descriptions, no matter whether input signals of the artificial neurons N1˜N4 are provided from the storage device 152, and no matter whether computation results of the artificial neurons N1˜N4 are stored into the storage device 152, as long as the controller 154 changes parameters provided to the artificial neurons N1˜N4, the artificial neurons N1˜N4 can be reconfigured to work for a different layer. Although the artificial neural networks 100, 200, 300 do not have multiple fixed layers physically, these networks can be configured to complete computation tasks of multiple layers sequentially. Compared with prior arts, artificial neural networks according to the invention obviously can utilize fewer artificial neurons while generate the same computation results. Thereby, the hardware cost is significantly reduced.
It should be noted that the detailed computation in the artificial neurons and how their parameters are adjusted in the learning process are known by those ordinarily skilled in the art and not further described hereinafter. The scope of the invention is not limited to these computation details.
In practical applications, the interconnections between the artificial neurons N1˜N4 and the storage device 152 can be implemented by a high-speed communication interface, so as to reduce time for retrieving and storing data. The overall operation speed of the artificial neural networks 100, 200, 300 can accordingly be increased. For example, the high-speed communication interface can be but not limited to a serializer-deserializer (SERDES) interface or a radio frequency interface (RFI).
As shown above, the reconfigurable artificial neurons may form a complete fully-connected layer. Alternatively, the reconfigurable artificial neurons may also form one part of a fully-connected layer and the other part of the fully-connected layer is formed by artificial neurons with fixed configuration. Similarly, the reconfigurable artificial neurons may form one part or all of a convolutional layer.
Moreover, the scope of the invention is not limited to the number of reconfigurable artificial neurons, either. Although
Furthermore, besides the reconfigurable artificial neurons, the storage device, and the controller, artificial neural networks according to the invention can include other circuits, such as but not limited to a pooling layer connected subsequent to a convolutional layer and an oscillator for generating clock signals. Those ordinarily skilled in the art can comprehend that the scope of the invention is not limited to a specific network structure. An artificial neural network according to the invention can be used to implement but not limited to the following network structures: the LeNet proposed by Yann LeCun, the AlexNet proposed by Alex Krizhevsky et al., the ZF Net proposed by Matthew Zeiler et al., the GoogLeNet proposed by Szegedy et al., the VGGNet proposed by Karen Simonyan et al., and the ResNet proposed by Kaiming He et al.
Practically, the controller 154 can be implemented by a variety of processing platforms. Fixed and/or programmable logic, such as field-programmable logic, application-specific integrated circuits, microcontrollers, microprocessors and digital signal processors, may be included in the controller 154. Embodiments of the controller 154 may also be fabricated to execute a process stored in a memory (not illustrated) as executable processor instructions.
Moreover, the controller 154 can be designed as controlling the storage device 152 and other configurable routing circuits (if needed) according to a configuration file predetermined by circuit designers. The content of the configuration file indicates at which time instant should the controller 154 reconfigure the artificial neurons N1˜N4 to which layer.
In an alternative embodiment, the controller 154 is designed as operating according to a configuration file adaptively determined based on the property of external data. Please refer to
In prior arts, the hardware structure of an artificial neural network is usually predetermined based on the property of data to be processed. More specifically, an artificial neural network in prior arts generally has a fixed hardware structure and fixed circuits. Different from prior arts, the artificial neural network 400 provides a more flexible solution that one or more of the following factors can be adjusted: the network structure, the number of layers, the number of artificial neurons in each layer, the connections between artificial neurons. Therefore, the artificial neural network 400 can be not bonded to a specific application. Aiming at external data with pure properties, the input analyzer 156 can determine a configuration with fewer layers for the artificial neural network 400, so as to save computation resources and prevent overfitting. On the contrary, aiming at external data with complicated properties, the input analyzer 156 can determine a configuration with more layers for the artificial neural network 400, so as to make the judgments more matching to the external data.
Another embodiment according to the invention is a controlling method for a multi-layer artificial neural network. The multi-layer artificial neural network includes a plurality of artificial neurons for performing computation based on plural parameters.
Another embodiment according to the invention is a non-transitory computer-readable storage medium encoded with a computer program for controlling a multi-layer artificial neural network. The multi-layer artificial neural network includes a plurality of artificial neurons for performing computation based on plural parameters. The computer program includes instructions that when executed by one or more computers cause the one or more computers to perform operations including: at a first time instant, providing a set of parameters corresponding to a first layer to the plurality of artificial neurons so that the plurality of artificial neurons form at least part of the first layer; and at a second time instant, providing a set of parameters corresponding to a second layer to the plurality of artificial neurons so that the plurality of artificial neurons form at least part of the second layer.
Practically, the aforementioned computer-readable storage medium may be any non-transitory medium on which the instructions maybe encoded and then subsequently retrieved, decoded and executed by a processor, including electrical, magnetic and optical storage devices. Examples of non-transitory computer-readable recording media include, but not limited to, read-only memory (ROM), random-access memory (RAM), and other electrical storage; CD-ROM, DVD, and other optical storage; and magnetic tape, floppy disks, hard disks and other magnetic storage. The processor instructions may be derived from algorithmic constructions in various programming languages that realize the present general inventive concept as exemplified by the embodiments described above. The variety of variations relative to the artificial neural network 100 can also be applied to the non-transitory computer-readable storage medium and the details are not described again.
With the example and explanations above, the features and spirits of the invention will be hopefully well described. Those ordinarily skilled in the art will readily observe that numerous modifications and alterations of the device may be made while retaining the teaching of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims. Additionally, mathematical expressions are contained herein and those principles conveyed thereby are to be taken as being thoroughly described therewith. It is to be understood that where mathematics are used, such is for succinct description of the underlying principles being explained and, unless otherwise expressed, no other purpose is implied or should be inferred. It will be clear from this disclosure overall how the mathematics herein pertain to the present invention and, where embodiment of the principles underlying the mathematical expressions is intended, the ordinarily skilled artisan will recognize numerous techniques to carry out physical manifestations of the principles being mathematically expressed.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.