The present invention generally relates to machine learning, and more particularly to a convolutional neural network (CNN) regularization system or architecture for object recognition.
A convolutional neural network (CNN) is one of deep neural network that uses convolutional layers to filter inputs for useful information. The filters in the convolutional layers may be modified based on learned parameters to extract the most useful information for a specific task. The CNN may commonly be adaptable to classification, detection and recognition such as image classification, medical image analysis and image/video recognition. CNN inference, however, requires significant amount of memory and computation. Generally speaking, the higher accuracy the CNN model has, the more complex architecture (i.e., more memory and computation) and higher power consumption the CNN model requires.
As low-power end devices such as always-on-sensors (AOSs) grow, demand of low-complexity CNN is increasing. However, the low-complexity CNN cannot attain performance as high as high-complexity CNN due to limited power. The AOSs under power-efficient co-processors with low-complexity CNN would continuously detect simple objects until main processors with high-complexity CNN are activated. Accordingly, two CNN models (i.e., low-complexity model and high-complexity model) need be stored in system, which, however, requires more static random-access memory (SRAM) devices that are expensive in cost.
In view of the foregoing, it is an object of the embodiment of the present invention to provide a convolutional neural network (CNN) regularization system that can support multiple modes for substantially reducing power consumption.
According to one embodiment, a multi-stage training method adaptable to an artificial neural network regularization system, which includes a first inference block and a second inference block disposed in at least one hidden layer of an artificial neural network, is proposed. A whole of the artificial neural network is trained to generate a pre-trained model. Weights of first filters of the first inference block are fine-tuned while weights of second filters of the second inference block are set zero, thereby generating a first model. Weights of the second filters of the second inference block are fine-tuned but weights of the first filters of the first inference block for the first model are fixed, thereby generating a second model.
Although CNN is exemplified in the embodiment, it is appreciated that the embodiment may be generalized to an artificial neural network that is an interconnected group of nodes, similar to the vast network of neurons in a brain. According to one aspect of the embodiment, the CNN regularization system 100 may support multiple (operating) modes, one of which may be selectably operable at. Specifically, the CNN regularization system 100 of the embodiment may be operable at either high-precision mode or low-power mode. The CNN regularization system 100 at low-power mode consumes less power, but obtains lower precision, than at high-precision mode.
In the embodiment, as shown in
The CNN regularization system 100 of the embodiment may include a matching unit 14 (e.g., face matching unit) coupled to receive object feature map (e.g., face feature map, face feature or face vector) of the output layer 13, and configured to perform (object) matching in companion with a database to determine, for example, whether a specific object (such as face) has been recognized as a recognition result. Conventional techniques of face matching may be adopted, details of which are thus omitted for brevity.
In first stage (step 21), a whole of the CNN regularization system 100 may be trained as in a general training flow, thereby generating a pre-trained model. That is, the nodes (or filters) of the first inference blocks 101 and the second inference blocks 102 are trained generally in the first stage.
In second stage (step 22), weights of the first nodes of the first inference blocks 101 for the pre-trained model may be fine-tuned and weights of the second nodes of the second inference blocks 102 may be set zero (or turned off), thereby generating a low-power (first) model. As exemplified in
In third stage (step 23), weights of the second nodes of the second inference blocks 102 may be fine-tuned but weights of the first nodes of the first inference blocks 101 for the low-power model are fixed (as at the end of step 22), thereby generating a high-precision (second) model. As exemplified in
Specifically, in the embodiment, each second inference block 102 may receive outputs of the second inference block 102 of preceding layer, and outputs of the first inference block 101 of preceding layer, while each first inference block 101 may receive only outputs of the first inference block 101 of preceding layer. In another embodiment, as shown in
The CNN regularization system 100 as trained according to the multi-stage training method 200 may be utilized, for example, to perform face recognition. The trained CNN regularization system 100 may be operable at low-power mode, in which the second inference blocks 102 may be turned off to reduce power consumption. The trained CNN regularization system 100 may be operable at high-precision mode, in which a whole of the CNN regularization system 100 may operate to achieve high precision.
According to the embodiment disclosed above, as only single system or model is required, instead of two systems or models as in the prior art, the amount of static random-access memory (SRAM) devices implementing a convolutional neural network may be substantially be decreased. Accordingly, always-on-sensors (AOSs) controlled by co-processors would continuously detect simple objects at low-power mode, until main processors are activated at high-precision mode.
The CNN regularization system 100 as exemplified in
In first stage of training the CNN regularization system 400, a whole of the CNN regularization system 400 may be trained as in a general training flow, thereby generating a pre-trained model. In second stage, weights of the first nodes of the first inference blocks 101 for the pre-trained model may be fine-tuned and weights of the second nodes of the second inference blocks 102 and the third nodes of the third inference blocks 103 may be set zero (or turned off), thereby generating a first low-power model. In third stage, weights of the second nodes of the second inference blocks 102 may be fine-tuned, the third nodes of the third inference blocks 103 may be set zero, but weights of the first nodes of the first inference blocks 101 for the first low-power model may be fixed, thereby generating a second low-power model. In fourth (final) stage, weights of the third nodes of the third inference blocks 103 may be fine-tuned but weights of the first nodes of the first inference blocks 101 and the second nodes of the second inference blocks 102 for the second low-power model may be fixed, thereby generating a high-precision (third) model.
The trained CNN regularization system 400 may be operable at first low-power mode, in which the second inference blocks 102 and the third inference blocks 103 may be turned off to reduce power consumption. The trained CNN regularization system 400 may be operable at second low-power mode, in which the third inference blocks 103 may be turned off. The trained CNN regularization system 400 may be operable at high-precision mode, in which a whole of the CNN regularization system 400 may operate to achieve high precision.
Although specific embodiments have been illustrated and described, it will be appreciated by those skilled in the art that various modifications may be made without departing from the scope of the present invention, which is intended to be limited solely by the appended claims.