This present invention is related to machine learning technology, in particular, a searching framework system configurable for different hardware constraints to search for the optimized neural network model.
Convolutional neural network (CNN) is recognized as one of the most remarkable neural networks achieving significant success in machine learning, such as image recognition, image classification, speech recognition, natural language processing, and video classification. Because of a large amount of data sets, intensive computational power, and higher demand for memory storage, the CNN architecture becomes more and more complicate and difficult to achieve a better performance, making a resource limited embedded system with a low memory storage and low computing capabilities, such as a mobile phone and video monitor unable to be implemented with the CNN architecture.
More specifically, the hardware configurations are different in different devices. Different hardware has different capabilities to support related CNN architectures. To achieve best applications' performance under the reconfigurable hardware constraints, it is critical to search for the best CNN architecture to fit the hardware constraints.
An embodiment discloses a method for operating a searching framework system. The searching framework system comprises an arithmetic operating hardware. The method comprises inputting input data and reconfiguration parameters to an automatic architecture searching framework of the arithmetic operating hardware. The automatic architecture searching framework executes arithmetic operations to search for an optimized convolution neural network (CNN) model, and outputs the optimized CNN model.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
The present invention provides an automatically architecture searching framework (AUTO-ARS) that outputs an optimized convolution neural network (CNN) model under the reconfigurable hardware constraints.
The reconfiguration parameters 104 comprise the hardware configuration parameters such as memory size and computing capability of the arithmetic operating hardware 108. The input data 102 can be multimedia data, such as images and/or voices. The automatic architecture searching framework 106 executes arithmetic operations to search for the optimized CNN model. The optimized CNN model 110 includes application task lists such as classification, object detection, segmentation, etc.
The 2nd level of the recurrent neural network will then execute. The 1st embedded data 308 is inputted to a decoder 310 to generate 1st decoded data 311. Then, the 1st decoded data 311 and the 1st hidden layer data 304 are inputted to a 2nd hidden layer 313 to perform a hidden layer operation for generating 2nd hidden layer data 314. Further, the 2nd hidden layer data 314 is inputted to a 2nd fully connected layer 315 to perform a fully connected operation for generating 2nd fully connected data 316. The 2nd fully connected data 316 is then inputted to a 2nd embedding vector 317 to execute an embedding procedure for generating 2nd embedded data 318.
As shown in the above steps, the 3rd level of the recurrent neural network will then follow. This process will keep going to next level of the recurrent neural network until the number of layers of the CNN data exceeds a predetermined number, then the updated CNN data will be outputted to the reinforcement rewarding neural network 210. In some embodiments, if the validation accuracy has reached the predetermined value before the number of layers of the CNN data exceeds the predetermined number, then the updated CNN data will be outputted as the optimized CNN model. In other embodiments, even if the validation accuracy has reached the predetermined value before the number of layers of the CNN data exceeds the predetermined number, the CNN data will keep updating until all levels of the recurrent neural network have updated the CNN data, and the reinforcement rewarding neural network 210 will then output the latest updated CNN data as the optimized CNN model.
The CNN data comprises convolution layers, activation layers, and pooling layers. The convolution layers comprise the number of filters, kernel size, and bias parameters. The activation layers comprise leaky relu, relu, prelu, sigmoid, and softmax functions. The pooling layers comprise the number of strides and kernel size.
The searching framework system 100 is configurable for different hardware constraints. The searching framework system 100 combines a convolution neural network, the architecture generator 200 and the reinforce rewarding neural network 210 for searching for the optimized CNN model 110. The architecture generator 200 predicts the components of neural network, such as convolutional layers with the number of filters, kernel size, and bias parameters, activation layers with different leaky functions, and etc. The architecture generator 200 generates hyper-parameters as a sequence of tokens. More specifically, the convolutional layers have their own tokens, such as the number of filters, kernel size, and bias parameters. The activation layers have their own activation functions, such as leaky relu, relu, prelu, sigmoid, softmax functions, and etc. The pooling layer has its own tokens, such as the number of stride and kernel size. All of these tokens for different types of layers are in the reconfigurable hardware configuration pool.
The process of updating CNN data stops when the number of layers in CNN data exceeds the predetermined number. Once the architecture generator 200 finishes updating the CNN data, a feed forward neural network satisfying the reconfigurable hardware configurations' constraint is built and can be passed to the reinforcement reward neural network 210 for training. The reinforcement reward network 210 takes the CNN data and train it until it converges. A validation accuracy of the proposed neural network is defined as an optimized result. By a policy gradient method, using the validation accuracy as a design metric, the architecture generator 200 will update its' parameters to re-generate the better CNN data over time. By updating the hidden layers, the optimized CNN model can be constructed. Applying the proposed techniques, the optimized CNN model is built within customized model size and acceptable computational complexity under reconfigurable hardware configurations' constraints.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
This application claims the benefit of U.S. Provisional Application No. 62/728,076, filed Sep. 7, 2018 which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62728076 | Sep 2018 | US |