Embodiments of the subject matter disclosed herein generally relate to deep learning systems and methods, and more specifically, to solving the catastrophic forgetting associated with the deep learning systems.
Deep learning has achieved great success in various fields. However, despite its impressive achievements, there are still several problems that plague the efficiency and reliability of the deep learning systems.
One of these problems is catastrophic forgetting, which means that a well-trained deep learning model tends to completely forget all the previously learned information when learning new information. In other words, once a current deep learning model is trained to perform a specific task, it cannot be easily re-trained to perform a new, similar, task without negatively impacting the original task's performance. Unlike human and animals, the deep learning models do not have the ability to continuously learn over time and from different datasets, by incorporating new information while retaining the previously learned experience, which is known as “incremental learning.”
Two theories have been proposed to explain human's ability to perform incremental learning. The first theory is Hebbian learning with homeostatic plasticity, which suggests that human brain's plasticity will decrease as people learn more knowledge to protect the previously learned information. The second theory is the complementary learning system (CLS) theory, which suggests that human beings extract high-level structural information and store the high-level information in a different brain area while retaining episodic memories.
Inspired by these two neurophysiological theories, researchers have proposed a number of methods to deal with deep learning catastrophic forgetting. The most straight-forward and pragmatic method to avoid catastrophic forgetting is to retrain a deep learning model completely from scratch with all the old data and new data. However, this method is proved to be very inefficient due to the large amount of training that is necessary each time new information is available. Moreover, the new model that learns from scratch the new information and the old one may share very low similarity with the previous model, which results in poor learning robustness.
In addition to this straightforward method, there are three categories of methods that deal with this matter. The first category is the regularization approach, which is inspired by the plasticity theory. The core idea of such methods is to incorporate the plasticity information of the neural network model into the loss function to prevent the parameters from varying significantly when learning new information. These approaches are proved to be able to protect the consolidated knowledge [1]. However, due to the fixed size of the neural network, there is a trade-off between the performance of the old and new tasks [1]. The second class uses dynamic neural network architectures. To accommodate the new knowledge, these methods dynamically allocate neural resources or retrain the model with an increasing number of neurons or layers. Intuitively, these approaches can prevent catastrophic forgetting but may also lead to scalability and generalization issues due to the increasing complexity of the network. The last category utilizes the dual-memory learning system, which is inspired by the CLS theory. Most of these systems either use dual weights or take advantage of pseudo-rehearsal, which draw training samples from a generative model and replay them to the model when training with new data. However, how to build an effective generative model remains a difficult problem.
Thus, there is a need for a new deep learning model that is capable of learning new information while not being affected by the catastrophic forgetting problem. Further, the system needs to be robust and practical when implemented in real life situations.
According to an embodiment, there is a method for classifying data into classes, and the method includes receiving new data, receiving support data, wherein the support data is a subset of previously classified data, processing with a first set of layers of a deep learning classifier the new data and the support data to obtain a learned representation of the new data and the support data, and applying a second set of layers of the deep learning classifier to the learned representation to associate the new data with a corresponding class.
According to another embodiment, there is a classifying apparatus for classifying data into classes, and the classifying apparatus includes an interface for receiving new data and receiving support data, wherein the support data is a subset of previously classified data, and a deep learning classifier connected to the interface and configured to, process with a first set of layers the new data and the support data to obtain a learned representation of the new data and the support data, and apply a second set of layers to the learned representation to associate the new data with a corresponding class.
According to yet another embodiment, there is a method for generating support data for a deep learning classifier, the method including receiving data, processing with a first set of layers of the deep learning classifier the received data to obtain a learned representation of the received data, and training a support vector machine block with the learned representation to generate support data. The support data is used by the deep learning classifier to prevent catastrophic forgetting when classifying data.
According to still another embodiment, there is a classifying apparatus for classifying data into classes, and the classifying apparatus includes an interface for receiving data, and a processor connected to the interface and configured to, process with a first set of layers of a deep learning classifier the received data to obtain a learned representation of the received data, and train a support vector machine block with the learned representation to generate support data. The support data is used by the deep learning classifier to prevent catastrophic forgetting when classifying data.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate one or more embodiments and, together with the description, explain these embodiments. In the drawings:
The following description of the embodiments refers to the accompanying drawings. The same reference numbers in different drawings identify the same or similar elements. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims.
Reference throughout the specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with an embodiment is included in at least one embodiment of the subject matter disclosed. Thus, the appearance of the phrases “in one embodiment” or “in an embodiment” in various places throughout the specification is not necessarily referring to the same embodiment. Further, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
According to an embodiment, a novel method for performing incremental deep learning in an efficient way with a deep learning model when encountering data from new classes is now discussed. The method and model maintain a support dataset for each old class, which is much smaller than the original dataset of that class, and show the support datasets to the deep learning model every time there is a new class coming in so that the model can “review” the representatives of the old classes while learning the new information. Although the broad idea of rehearsal has been suggested before [2, 3, 4, 5], the present method selects, in a novel way, the support data, such that the selection process becomes systematic and generic to preserve as much information as possible. As discussed later, it will be shown that it is more efficient to select support vectors of a support-vector machine (SVM), which is used to approximate the neural network's last layer, as the support data, both theoretically and empirically. Further, the network is divided into two parts, the first part including the last layer and the second part including all the previous layers. This is implemented to stabilize the learned representation of old data before being fed to the last layer and to retain the performance for the old classes, following the idea of the Hebbian learning theory. Two consolidation regularizes are used to reduce the plasticity of the deep learning model and constrain the deep learning model to produce similar representations for the old data.
Schematically, this new model 100 is illustrated in
Following the setting of [6, 7], consider a dataset {xn, ŷn}n=1N, with xn∈D being the feature, and ŷn∈
K being the one-hot encoding of the label, K is the total number of classes of information and N is the size of the dataset. The input (i.e., the learned representation) to the last layer is denoted as δn∈
T for xn and W is considered to be the parameter of the last layer so that zn=Wδn. After applying softmax activation function to zn, the output on of the whole deep learning model (i.e., neural network) is obtained for the input xn. Thus, the following equation holds for this model:
For the deep learning model, the cross-entropy loss is used as the loss function, i.e.,
The negative gradient of the loss function L with regard to wj,i is given by:
According to [6] and [7], after the learned representation of the deep learning model becomes stable, the last weight layer will converge to the SVM solution. This means that it is possible to write W=a(t)Ŵ+B(t), where Ŵ is the corresponding SVM solution, t represents the t-th iteration of the algorithm, a(t)→∞, and B(t) is bounded. Thus, equation (3) becomes:
The candidate value of {tilde over (y)}n,i is {0, 1}. If {tilde over (y)}n,i=0, that term of equation (4) does not contribute to the loss function L. Only when {tilde over (y)}n,i=1, the data contributes to the loss L and thus, to the gradient. Under these conditions, because a(t)→∞, only the data with the smallest exponential nominator can contribute to the gradient. That data are the ones having the smallest margin Ŵi,:δn, which are the support vectors for class i. Based on these observations, it is discussed next how to select data from the old data to construct the support data.
The softmax layer 262, which is the last layer of the deep learning model, uses the learned representation 218 to classify the data that is input to the apparatus 200. The consolidation regularizers block 240, as discussed later, stabilizes the deep learning network and maintains the high-level feature representation of the old information.
Returning to the process of building the support data 214, it is noted that according to [8] and [9], even human beings, who are proficient in incremental learning, cannot deal with catastrophic forgetting perfectly. On the other hand, a common strategy for human beings to overcome forgetting during learning is to review the old knowledge frequently [10]. Actually, during reviewing, the humans do not usually review all the details, but rather the important ones, which are often enough for humans to grasp the knowledge. Inspired by this real-life example, the novel method maintains a support dataset 214 for each old class, which is then fed to the mapping function block 216 together with the new data 212 of the new classes. In this way, the mapping function block 216 reviews the representative information of the old classes when learning new information.
The configuration of the support data selector 210 that constructs such support data 214 is now discussed. The support data 214 is assumed to be described by {xnS, {tilde over (y)}nS}n=1N
The SENet is configured to utilize the spatial information with 2D filters, and further explores the information hidden in different channels by learning weighted feature maps from the initial convolutional output. The residual network utilizes a traditional convolutional layer within a residual block 400, as shown in
To overcome this issue, the SENet modifies the residual block with additional components which learn scale factors for different channels of the intermediate output and rescale the values of those channels accordingly. Intuitively, the traditional residual network treats different channels equally while the SENet takes the weighted channels into consideration. Using the SENet as the engine for the mapping function block 216, which considers both the spatial information and the channel information, it is more likely to obtain a good structured high-level representation 218 (402′ in
Returning to
The portion of the original data 211 which correspond to these support vectors is then selected as being the support data 214, which is denoted herein as {xnSV, {tilde over (y)}nSV}n=1N
{xnS, {tilde over (y)}nS}n=1N
If the new data 212 is denoted as being {xnnew, {tilde over (y)}nnew}n=1N
{xnS, {tilde over (y)}nS}n=1N
Because the support data selection depends on the high-level representation 218 produced by the deep learning layers, which are finetuned on new data 212, the old data feature representations 214 may change over time. As a result, the previous support vectors 232 for the old data may no longer be support vectors for the new data, which makes the support data invalid (here it is assumed that the support vectors will remain the same as long as the representations are largely fixed, which will be discussed in more details later). To solve this issue, the novel method adds two consolidation regularizers to consolidate the learned knowledge: (1) the feature regularizer 242, which forces the model to produce fixed representations for the old data over time, and (2) the EWC regularizer 244, which consolidates the weights that contribute to the old class classification to the loss function. Each of these two regularizers are now discussed in detail. Note that these regularizers apply only to the mapping function block 216 and not to the softmax layer 262 (i.e., only to the first set of layers and not to the second set of layers of the deep learning model).
The feature regularizer, which will be added to the loss function, forces the mapping function block 216 to produce a fixed representation for the old data. The learned representation, which was denoted above as δn depends on ϕ, which represents the parameters of the deep learning mapping function block 216. The feature regularizer is defined as:
where ϕnew is the parameters for the deep learning architecture trained with (1) the support data from the old classes and (2) the new data from the new class(es), ϕold is the parameters for the mapping function of the old data, and Ns is the number of support data 214.
The feature regularizer 242 requires the model to preserve the feature representation produced by the deep learning architecture for each support data, which could lead to potential memory overhead. However, because the model operates on a very high-level representation 218, which is of much less dimensionality than the original input 211, the possible overhead is neglectable.
The second regularizer is the EWC regularizer 244. According to the Hebbian learning theory, after learning, the related synaptic strength and connectivity are enhanced while the degree of plasticity decreases to protect the learned knowledge. Guided by this neurophysiological theory, the EWC regularizer [15] was designed to consolidate the old information while learning new knowledge. One goal of the EWC regularizer is to constrain those parameters (in the mapping function block 216) which contribute significantly to the classification of the old data. Specifically, the more a parameter contributes to the previous classification, the harder a constrain is applied to that parameter to make it unlikely to be changed. That is, the method makes those parameters that are closely related to the previous classification less “plastic.” In order to achieve this goal, the Fisher information is calculated for each parameter. The Fisher information measures the contribution of the parameters to the final prediction.
Formally, the Fisher information for the parameters θ={ϕ, W} can be calculated as follows:
where f(x; θ) is the functional mapping used by the mapping function block 216 of the entire neural network.
The EWC regularizer 244 is defined as follows:
where i iterates over all the parameters of the model.
There are two benefits of using the EWC regularizer in the present method. First, the EWC regularizer reduces the “plasticity” of the parameters that are important to the old classes and thus, it guarantees stable performance over the old classes. Second, by reducing the capacity of the deep learning model, the EWC regularizer prevents overfitting to a certain degree. The function of the EWC regularizer could be considered as changing the learning trajectory, by pointing to a region where the loss is low for both the old and new data. This idea is schematically illustrated in
The two regularizers 242 and 244 are added to the loss function L of equation (2) so that the new loss function used in this method becomes:
{tilde over (L)}(θ)=L+λfRf(ϕ)+λewcRewc(θ), (10)
where λf and λewc are the coefficients for the feature regularizer and the EWC regularizer, respectively. After plugging equations (2), (7), and (9) into equation (10), the following novel loss function is obtained:
where Kt is the total number of classes at the incremental learning time point t (see
Combining the deep learning model, which consists of the deep learning architecture mapping function block 216 and the final fully connected classification layer block 260, the novel support data selector 210, and the two consolidation regularizers 240 together, the present method is a highly effective framework (called SupportNet in the following), which can perform class incremental learning without catastrophic forgetting. This framework can resolve the catastrophic forgetting issue in two ways. Firstly, the support data 214 can help the model of the mapping function block 216 to review the old information during future training. Despite the small size of the support data 214, they can preserve the distribution of the old data quite well. Secondly, the two consolidation regularizers 242 and 244 consolidate the high-level representation 218 of the old data and reduce the plasticity of those weights, which are important for the old classes.
The novel method discussed above for avoiding catastrophic forgetting in class incremental learning when implemented in a computing device is now discussed with regard to
The result of the processing step 602 with the mapping function 216 is the high-level representations 218 shown in
In step 604, the SVM model 220 is applied to the high-level representations 218 for generating the support vectors 230. In step 606, only the support vectors 232 which are located on the edge (border) of the various classifications of the data are selected to contribute to the support data 214. These support vectors 232 are indexed to form the support vector index 236 and then, in step 608, the data associated with these vectors is extracted from the original data 211 and assembled as the support data 214. The support data 214 is much smaller in size than the original data 211, but it is still representative for all the classifications associated with the original data 211. Note that if there is already a support data collection, step 608 updates the existing support data so that the new data found in the initial data 211 finds its way into the updated support data so that catastrophic forgetting is prevented.
Having the support data, the method illustrated in
The novel classification apparatus 200 has been tested on seven datasets: (1) MNIST, (2) CIFAR-10 and CIFAR-100, (3) Enzyme function data, (4) HeLa, (5) BreakHis and (6) tiny ImageNet. MNIST, CIFAR-10 and CIFAR-100 are commonly used benchmark datasets in the computer vision field. MNIST consists of 70K 28*28 single channel images belonging to 10 classes. CIFAR-10 contains 60K 32*32 RGB images belonging to 10 classes, while CIFAR-100 is composed of the same images but the images are further classified into 100 classes.
The latter three datasets are from bioinformatics. Enzyme function data is composed of 22,168 low-homologous enzyme sequences belonging to 6 classes. The HeLa dataset contains around 700 512*384 gray-scale images for subcellular structures in HeLa cells belonging to 10 classes. BreakHis is composed of 9,109 microscopic images of the breast tumor tissue belonging to 8 classes. Each image is a 3-channel RGB image, whose dimensionality is 700 by 460. Tiny ImageNet is similar to ImageNet, but it is much harder than ImageNet since it has 200 classes while within each class, there are only 500 training images and 50 testing images.
The tests compared the methods discussed with regard to
For all the tasks, the experiment started with a binary classification. Then, each time the experiment incrementally gave data from one or two new classes to each method, until all the classes were fed to the model. For the enzyme data, the experiment fed one class each time. For the other five datasets, the experiment fed two classes in each round.
As expected, the “All Data” method has the best classification performance because it has access to all the data and retrains a brand new model each time. The performance of this “All Data” method can be considered as the empirical upper bound of the performance of the incremental learning methods. All the other incremental learning methods have performance decrease relative to the “All Data” method with different degrees. EWC and “Fine Tune” have quite similar performance, which drops quickly when the number of classes increases. The iCaRL method is much more robust than these two methods.
In contrast, SupportNet has significantly better performance than all the other incremental learning methods across the five datasets. In fact, its performance is quite close to the “All Data” method and stays stable when the number of classes increases for the MNIST and enzyme datasets. On the MNIST dataset, VCL with K-center Coreset can also achieve very impressive performance. Nevertheless, SupportNet can outperform it along the process. Specifically, the performance of SupportNet has less than 1% on MNIST and 5% on enzyme data difference compared to that of the “All Data” method. These figures also show the importance of SupportNet's components. As shown in
Although the novel SupportNet method has been discussed with regard to class incremental learning, SupportNet can be easily adopted to perform other incremental learning tasks, such as the split MNIST task. In this task, a method needs to deal with a sequence of similar tasks which are related to each other. More specifically, the method needs to perform five binary classifications tasks in sequential order with a single model. The SupportNet method was modified for this task and then compared with four state-of-the-art methods: VCL, VCL with K-center Coreset, GEM and iCaRL. Notice that VCL-related methods are very recent state-of-the-art methods. The results show that SupportNet can also achieve state-of-the-art performance on this task, although it was originally designed to perform class incremental learning. Compared to the other methods, SupportNet can achieve higher performance on the new task while with little compromise on the older tasks. This experiment suggests the potential of SupportNet to combat catastrophic forgetting as a whole.
To further evaluate SupportNet's performance on class incremental learning setting with more classes, it was tested on tiny ImageNet dataset, and compared with iCaRL. The performance of Support-Net and iCaRL on this dataset is shown in
Next, the performance of SupportNet was investigated with reduced support data. Experiments were run for the SupportNet method with the support data size as small as 2000, 1500, 1000, 500, and 200, respectively. The results indicated that even with 500 support data points, the SupportNet method can outperform iCaRL with 2000 data points, which further demonstrates the effectiveness of the support data selecting strategy.
Then, the performance of the SupportNet method was investigated in terms of the impact of the support data size when compared with another dataset. As shown in
All these experiments show that the proposed novel class incremental learning method, SupportNet, solves the catastrophic forgetting problem by combining the strength of deep learning and SVM. SupportNet can efficiently identify the important information associated with the old data, which is fed to the deep learning model together with the new data for further training so that the model can review the essential information of the old data when learning the new information. With the help of two powerful consolidation regularizers, the support data can effectively help the deep learning model prevent the catastrophic forgetting issue, eliminate the necessity of retraining the model from scratch, and maintain a stable learned representation that corresponds to the old and the new data.
A method for classifying data into classes based on the embodiments discussed above is now presented. The method includes, as shown in
In one application, the method further includes constraining parameters of the first set of layers with a loss function, and/or adding to the loss function first and second regularizers, wherein the first regularizer is different from the second regularizer. The first regularizer depends on parameters of the first set of layers. The second regularizer uses Fisher information for each parameter of the first set of layers. The method may further include feeding the learned representation to a support vector machine block for generating vectors, and/or selecting only the support vectors that lie on a border of a classification, and/or selecting data from the new data and support data that corresponds to the support vectors and updating the support data with the selected data.
In another embodiment, as illustrated in
The above-discussed procedures and methods may be implemented in a computing device or controller as illustrated in
Exemplary computing device 1300 suitable for performing the activities described in the exemplary embodiments may include a server 1301. Such a server 1301 may include a central processor (CPU) 1302 coupled to a random access memory (RAM) 1304 and to a read-only memory (ROM) 1306. ROM 1306 may also be other types of storage media to store programs, such as programmable ROM (PROM), erasable PROM (EPROM), etc. Processor 1302 may communicate with other internal and external components through input/output (I/O) circuitry 1308 and bussing 1310 to provide control signals and the like. Processor 1302 carries out a variety of functions as are known in the art, as dictated by software and/or firmware instructions.
Server 1301 may also include one or more data storage devices, including hard drives 1312, CD-ROM drives 1314 and other hardware capable of reading and/or storing information, such as DVD, etc. In one embodiment, software for carrying out the above-discussed steps may be stored and distributed on a CD-ROM or DVD 1316, a USB storage device 1318 or other form of media capable of portably storing information. These storage media may be inserted into, and read by, devices such as CD-ROM drive 1314, disk drive 1312, etc. Server 1301 may be coupled to a display 1320, which may be any type of known display or presentation screen, such as LCD, plasma display, cathode ray tube (CRT), etc. A user input interface 1322 is provided, including one or more user interface mechanisms such as a mouse, keyboard, microphone, touchpad, touch screen, voice-recognition system, etc.
Server 1301 may be coupled to other devices, such as a smart device, e.g., a phone, tv set, computer, etc. The server may be part of a larger network configuration as in a global area network (GAN) such as the Internet 1328, which allows ultimate connection to various landline and/or mobile computing devices.
The disclosed embodiments provide methods and a classifying apparatus that can classify new information without experiencing catastrophic forgetting. It should be understood that this description is not intended to limit the invention. On the contrary, the embodiments are intended to cover alternatives, modifications and equivalents, which are included in the spirit and scope of the invention as defined by the appended claims. Further, in the detailed description of the embodiments, numerous specific details are set forth in order to provide a comprehensive understanding of the claimed invention. However, one skilled in the art would understand that various embodiments may be practiced without such specific details.
Although the features and elements of the present embodiments are described in the embodiments in particular combinations, each feature or element can be used alone without the other features and elements of the embodiments or in various combinations with or without other features and elements disclosed herein.
This written description uses examples of the subject matter disclosed to enable any person skilled in the art to practice the same, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the subject matter is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims.
[1] Ronald Kemker, Angelina Abitino, Marc McClure, and Christopher Kanan. 2017. Measuring Catastrophic Forgetting in Neural Networks. CoRR abs/1708.02072 (2017). arXiv:1708.02072 http://arxiv.org/abs/1708.02072;
[2] David Lopez-Paz and Marc'Aurelio Ranzato. 2017. Gradient Episodic Memory for Continuum Learning. CoRR abs/1706.08840 (2017). arXiv:1706.08840 http://arxiv.org/abs/1706.08840;
[3] Cuong V. Nguyen, Yingzhen Li, Thang D. Bui, and Richard E. Turner. 2018. Variational Continual Learning. In International Conference on Learning Representations;
[4] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, and Christoph H. Lampert. 2016. iCaRL: Incremental Classifier and Representation Learning. CoRR abs/1611.07725 (2016). arXiv:1611.07725 http://arxiv.org/abs/1611.07725;
[5] Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual learning with deep generative replay. In Advances in Neural Information Processing Systems. 2990-2999;
[6] Yu Li, Lizhong Ding, and Xin Gao. 2018. On the Decision Boundary of Deep Neural Networks. arXiv preprint arXiv:1808.05385 (2018);
[7] Daniel Soudry, Elad Hoffer, and Nathan Srebro. 2017. The implicit bias of gradient descent on separable data. arXiv preprint arXiv:1710.10345 (2017);
[8] C. Pallier, S. Dehaene, J.-B. Poline, D. LeBihan, A.-M. Argenti, E. Dupoux, and J. Mehler. 2003. Brain Imaging of Language Plasticity in Adopted Adults: Can a Second Language Replace the First? Cerebral Cortex 13, 2 (2003), 155-161. https://doi.org/10.1093/cercor/13.2.155;
[9] Sylvain Sirois, Michael Spratling, Michael S. C. Thomas, Gert Westermann, Denis Mareschal, and Mark H. Johnson. 2008. Précis of Neuroconstructivism: How the Brain Constructs Cognition. Behavioral and Brain Sciences 31, 3 (2008), 321-331. https://doi.org/10.1017/50140525X0800407X;
[10] Jaap M. J. Murre and Joeri Dros. 2015. Replication and Analysis of Ebbinghaus' Forgetting Curve. PLOS ONE 10, 7 (07 2015), 1-23. https://doi.org/10.1371/journal.pone.0120644;
[11] Hu, J., Shen, L., and Sun, G. (2017). Squeeze-and-excitation networks. CoRR, abs/1709.01507;
[12] He, K. M., Zhang, X. Y., Ren, S. Q., and Sun, J. (2016). Deep residual learning for image recognition. 2016 Ieee Conference on Computer Vision and Pattern Recognition (Cpvr), pages 770-778;
[13] Xie, S., Girshick, R. B., Doll'ar, P., Tu, Z., and He, K. (2016). Aggregated residual transformations for deep neural networks. CoRR, abs/1611.05431;
[14] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going deeper with convolutions. CoRR, abs/1409.4842;
[15] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ra-malho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dhar-shan Kumaran, and Raia Hadsell. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114, 13 (2017), 3521-3526. https://doi.org/10.1073/pnas.1611835114 arXiv:http://www.pnas.org/content/114/13/3521.full.pdf;
This application claims priority to U.S. Provisional Patent Application No. 62/651,384, filed on Apr. 2, 2018, entitled “SUPPORTNET: A NOVEL INCREMENTAL LEARNING FRAMEWORK THROUGH DEEP LEARNING AND SUPPORT DATA,” the disclosure of which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2019/052500 | 3/27/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62651384 | Apr 2018 | US |