This invention relates generally to a method for training a neural network, and more specifically to an active learning method for training artificial neural networks.
Artificial neural networks (NNs) are revolutionizing the field of computer vision. The top-ranking algorithms in various visual object recognition challenges, including ImageNet, Microsoft COCO, and Pascal VOC, are all based on NNs.
In the visual object recognition using the NNs, the large scale image datasets are used for training the NNs to obtain good performance. However, annotating large-scale image datasets is an expensive and tedious task, requiring people to spend a large number of hours analyzing image content in a dataset because the subset of important images in the unlabeled dataset are selected and labeled by the human annotations.
Accordingly, there is need to achieve better performance with less annotation processes and, hence, less annotation budgets.
Some embodiments of the invention are based on recognition that an active learning using an uncertainty measure of features of input signals and reconstruction of the signals from the features provides less annotation processes with improving the accuracy of classifications of signals.
Accordingly, one embodiment discloses a method for training a neuron network using a processor in communication with a memory, and the method includes determining features of a signal using the neuron network; determining an uncertainty measure of the features for classifying the signal; reconstructing the signal from the features using a decoder neuron network to produce a reconstructed signal; comparing the reconstructed signal with the signal to produce a reconstruction error; combining the uncertainty measure with the reconstruction error to produce a rank of the signal for a necessity of a manual labeling; labeling the signal according to the rank to produce the labeled signal; and training the neuron network and the decoder neuron network using the labeled signal.
Another embodiment discloses an active learning system that includes a human machine interface; a storage device including neural networks; a memory; a network interface controller connectable with a network being outside the system; an imaging interface connectable with an imaging device; and a processor configured to connect to the human machine interface, the storage device, the memory, the network interface controller and the imaging interface, wherein the processor executes instructions for classifying a signal using the neural networks stored in the storage device, wherein the neural networks perform steps of determining features of the signal using the neuron network; determining an uncertainty measure of the features for classifying the signal; reconstructing the signal from the features using a decoder neuron network to produce a reconstructed signal; comparing the reconstructed signal with the signal to produce a reconstruction error; combining the uncertainty measure with the reconstruction error to produce a rank of the signal for a necessity of a manual labeling; labeling the signal according to the rank to produce the labeled signal; and training the neuron network and the decoder neuron network using the labeled signal.
Accordingly, one embodiment discloses a non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations. The operation includes determining features of a signal using the neuron network; determining an uncertainty measure of the features for classifying the signal; reconstructing the signal from the features using a decoder neuron network to produce a reconstructed signal; comparing the reconstructed signal with the signal to produce a reconstruction error; combining the uncertainty measure with the reconstruction error to produce a rank of the signal for a necessity of a manual labeling; labeling the signal according to the rank to produce the labeled signal; and training the neuron network and the decoder neuron network using the labeled signal.
In some embodiments, the use of an artificial neural network that determines an uncertainty measure may reduce central processing unit (CPU) usage, power consumption, and/or network bandwidth usage, which is advantageous for improving the functioning of a computer.
In some embodiments according to the invention, an active learning system includes a human machine interface, a storage device including neural networks, a memory, a network interface controller connectable with a network being outside the system. The active learning system further includes an imaging interface connectable with an imaging device, a processor configured to connect to the human machine interface, the storage device, the memory, the network interface controller and the imaging interface, wherein the processor executes instructions for classifying an object in an image using the neural networks stored in the storage device, in which the neural networks perform steps of determining features of a signal using the neuron network, determining an uncertainty measure of the features for classifying the signal, reconstructing the signal from the features using a decoder neuron network to produce a reconstructed signal, comparing the reconstructed signal with the signal to produce a reconstruction error, combining the uncertainty measure with the reconstruction error to produce a rank of the signal for a necessity of a manual labeling, labeling the signal according to the rank to produce the labeled signal, and training the neuron network and the decoder neuron network using the labeled signal.
The active learning system 10 attempts to efficiently query the unlabeled images for performing annotations through the basic workflow shown in
Further, in some embodiments of the invention, a method for training a neuron network uses a processor in communication with a memory, and the method includes steps of determining features of a signal using the neuron network, determining an uncertainty measure of the features for classifying the signal, reconstructing the signal from the features using a decoder neuron network to produce a reconstructed signal, comparing the reconstructed signal with the signal to produce a reconstruction error, combining the uncertainty measure with the reconstruction error to produce a rank of the signal for a necessity of a manual labeling, labeling the signal according to the rank to produce the labeled signal, and training the neuron network and the decoder neuron network using the labeled signal. In some cases, the labeling can include labeling the signal using the neuron network if the rank does not indicate the necessity of the manual labeling process, and further the labeling can include transmitting a labeling request to an annotation device if the rank indicates the necessity of the manual labeling process.
Further, the determining features may be performed by using an encoder neural network. In this case, the encoder neural network can perform feature analysis of given signals. In some cases, the signal may be an electroencephalogram (EEG) or an electrocardiogram (ECG). The neural network can use biological signals instead of image signals. Accordingly, some embodiments of the invention can be applied to provide specific signals for assisting a diagnosis of medical doctors.
The active learning system 10 attempts to efficiently query the unlabeled images for the annotation through a process flow shown in the figure. The process flow includes the following stages:
S1—An initial labeled training dataset is provided and the neural network is trained by using the dataset.
S2—By using the trained NN obtained in step Si, each image in the unlabeled dataset is evaluated and a score would be assigned to each image.
S3—Given the score obtained in step S 2, images with the top K highest scores are selected for labeling by the annotation device.
S4—The selected images with newly annotated labels are added into the current (latest) labeled training set to get a new training dataset.
S5—The network is refined or retrained based on the new training dataset.
As shown in
Although a term “image” is used in the specification, another “signal” can be used in the active learning system 10. For instance, the active learning system may process other signals, such as an electroencephalogram (EEG) or an electrocardiogram (ECG). Instead of the images, the EEG or ECG signals can be trained in the active learning system 10. Then the trained active learning system 10 can be applied to determine or judge abnormality with respect to an input signal, which can be a useful assistance for medical diagnosis of relevant symptoms.
In the CNN training process 21, the input signal 12 is fed to the CNN 13 and the CNN 13 extracts the features 14 from the input signal 12. Then a CNN decoder 25 reconstructs a signal 26 from the features 14 to compare with the input signal 12. By comparing the input signal 12 and the reconstructed signal 26, the CNN training process 21 computes or generates a reconstruction error 27. The active learning system 10 combines the reconstruction error 27 and the uncertainty measure 16, and ranks the input signal 12 by a score 17.
When the score 17 is higher than a predetermined threshold, the input signal 12 is fed to a labeling interface (not shown) that allows an operator to annotate the input signal 12 according to one of predetermined classified labels, which is indicated as Human labeling process 18. The process steps performed in the active learning process 11 and the CNN training process 21 described above are illustrated in
In some embodiments of the invention, the rank is defined based on an addition of an entropy function and the reconstruction error.
The trained NN 301 is used for extracting the features 303 for each of the images in the unlabeled dataset 103 and also for computing classifications by the softmax output layer 304. The classification result obtained by the softmax output layer 304 is a probability vector of dimension D where the dimension D is the number of object classes. Denoting the input image by x and the classification result computed by the softmax output layer 304 indicating a probability vector by p, each dimension of the probability vector p represents the probability that the input image 103 belongs to a specific class. The sum of the components of p is equal to one. The uncertainty of the class of the input image can then be measured in the step of the uncertain measure 305 by an entropy function H(x). When the entropy H(x) is computed based on the Shannon entropy, the uncertainty of the class of the input image is given by
H(x)=Σi=1D−pi log pi (1)
In an uncertainty method, the uncertainty measure can be used as the importance score of the unlabeled image 104. Further, other entropy measures defined in the Renyi entropy category can be used for the uncertainty computation. For instance, the entropy function H(x) may be Collision entropy,
Further, entropy based methods may be defined by
for obtaining an estimate of uncertainty, and an experimental result is shown in
Since the uncertainty method is a universal active learning method, it can be used in conjunction with various classifiers (SVMs, Gaussian processes, or neural networks) as long as the vector representing the class probability can be derived from each input image. In this case, the uncertainty method does not utilize the property of the classifier and reaches sub-optimal performance.
In accordance with some embodiments, an approach to improve the uncertainty method by utilizing the property of neural network computation is described in the following. It is established that a neural network computes a hierarchy of feature representation as processing an input image. The completeness of the feature representation can be used to judge how well the neural network models the input image. In order to quantify the completeness of the feature representation, an autoencoder neural network can be used.
When an input image 700 is provided, the autoencoder NN 710 outputs classification results 703 from the features 702 extracted by the encoder neural network 701. Further, the features 702 are transmitted to the decoder neural network 705. The decoder neural network 705 generates a reconstructed image 704 from the features 702 extracted by the encoder NN 701. In some cases, the encoder NN 701 may be referred to as a first sub-network #1, and the decoder neural network 705 may be referred to as a second sub-network #2. The first sub-network 701 extracts the features 702 from the input image 700. The extracted features 702 are fed into the softmax output layer 703 that outputs classification results. In this case, the extracted features 702 are also fed into the second sub-network #2. The second sub-network #2 generates a reconstructed image 704 from the features 702 and outputs the reconstruction image.
In some embodiments, a reconstruction error is defined based on the Euclidean distance between an input image (or input signal) and a reconstructed image (or reconstructed signal).
Further, the reconstructed image 704 is compared to the input image 700 based on the Euclidean distance measurement. The Euclidean distance between the input image 700 and the reconstructed image 704 can be used for quantifying the completeness of the feature representation. When letting x be the vector representation of the input image and y be the vector representation of the reconstructed image, the reconstruction error measure R(x) is defined by the Euclidean distance as follows.
R(x)=∨x−y ∨22 (2)
The Euclidean distance indicates how the input image is well represented by the feature representation. When a reconstruction error R(x) is small, it indicates that the neural network models the input image well. However, when the reconstruction error R(x) is large, then it indicates that the neural network does not model the input image well. In some embodiments, including the input image in training improves the representation power (accuracy) of the autoencoder NN 710.
For ranking the importance of an input image, the following formula can be used,
αH(x)+βR(x) (3)
where α and β are non-negative weighting parameters.
When the input image 700 is provided to the active learning system 720, the encoder NN 701 generates the features 702 from the input image 700. The features 702 can be used for generating a classification result via the Softmax output layer 703. The classification result is fed to the ranking layer 205. Further, the features 720 is fed to the decoder NN 705 and used to generate a reconstructed image 704 by using the decoder NN 705. The reconstructed image 704 is fed to the ranking layer 205. At the ranking layer 205, the classification result and the reconstructed image are used to compute the importance score 104 with respect to an unlabeled image of the input image 700.
The importance score 104 of the unlabeled image can be calculated from the classification output 703 and the reconstructed image 704 by using the ranking layer 205 in the calculation step. After obtaining the importance score 104 regarding the unlabeled image, the active learning system outputs the importance score 104 as an output.
The storage device 630 includes original images 631, a filter system module 632, and a neural network 400. For instance, the processor 620 loads the code of the neural network 400 in the storage 630 to the memory 640 and executes the instructions of the code for implementing the active learning. Further, the pointing device/medium 612 may include modules that read programs stored on a computer readable recording medium.
For comparison, the following convolutional neural network (CNN) was used for the experiments in the MNIST dataset: (20)5c-2p-(50)5c-2p-500fc-r-10fc, where “(20)5c” denotes a convolutional layer of 20 neurons with a kernel size 5, “2p” denotes a 2×2 pooling, “r” denotes rectified-linear units (ReLU), and “500fc” denotes a fully connected layer with 500 nodes. One softmax loss layer is added to the classification output “10fc” for the backpropagation. For the convolutional autoencoder neural network (CANN) part, the structure from the deconvolutional network is adapted. For the CIFAR10 dataset: “(32)3c-2p-r-(32)3c-r-2p-(64)3c-r-2p-200fc-10fc”. For the CANN part, the structure is the same as mentioned in MNIST settings.
In
The advantage is reducing the number of annotated data, as discussed above, the artificial neural network according to some embodiments of the invention can provide less annotation processes with improving the classification accuracy, the use of artificial neural network that determines an uncertainty measure may reduce central processing unit (CPU) usage, power consumption, and/or network bandwidth usage, which is advantageous for improving the functioning of a computer.
The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. Such processors may be implemented as integrated circuits, with one or more processors in an integrated circuit component. Though, a processor may be implemented using circuitry in any suitable format. The processor can be connected to memory, transceiver, and input/output interfaces as known in the art.
Also, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Alternatively, or additionally, the invention may be embodied as a computer readable medium other than a computer-readable storage medium, such as signals.
The terms “program” or “software” are used herein in a generic sense to refer to any type of computer code or set of computer-executable instructions that can be employed to program a computer or other processor to implement various aspects of the present invention as discussed above.
Use of ordinal terms such as “first,” “second,” in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
Although several preferred embodiments have been shown and described, it would be apparent to those skilled in the art that many changes and modifications may be made thereunto without the departing from the scope of the invention, which is defined by the following claims and their equivalents.