The present disclosure relates generally to machine learning models and stream data processing, and more specifically, to online adaptation for cross-domain streaming data.
Internet users often tend to leave significant amounts of digital footprints, e.g., by sharing texts, photos, videos or other forms of media via social media, via purchase orders and browsing history, via share ride histories, via registration at a website, and/or the like. As a lot of user shared data may exist on the Internet for an extended period of time, online privacy of users become increasingly difficult to preserve. Some recommendation systems may actively use user data for mining user interests and developing data-driven algorithms, only rendering the right to privacy more challenging. The Right to Be Forgotten (RTBF) movement grants Internet the right to ask an organization to delete their personal data.
Therefore, there is a need to for an efficient data privacy protection mechanism.
In the figures, elements having the same designations have the same or similar functions.
As used herein, the term “network” may comprise any hardware or software-based framework that includes any artificial intelligence network or system, neural network or system and/or any training or learning models implemented thereon or therewith.
As used herein, the term “module” may comprise hardware or software-based framework that performs one or more functions. In some embodiments, the module may be implemented on one or more neural networks.
Recommendation systems often actively use user data for mining user interests and developing data-driven algorithms, creating an issue that violates the right to privacy. Some existing systems may attempt to try to preserve user privacy via Federated Leaning, which limits sensitive data to be stored only on a few specific nodes. However, private training data for Federated Learning can be leaked through the gradients sharing mechanism deployed in distributed models, thus forfeiting the purpose of privacy protection. Ideally, a data privacy protection algorithm would delete user data right after use, but existing online learning frameworks cannot meet this need without addressing the distribution shift from public data (source domain) to private user data (target domain).
Embodiments described herein provide an online domain adaptation framework based on cross-domain bootstrapping for online domain adaptation, in which the target domain streaming data is deleted immediately after adapted. At each online query, the data diversity is increased across domains by bootstrapping the source domain to form diverse combinations with the current target query. To fully take advantage of the valuable discrepancies among the diverse combinations, a set of independent machine learning models (referred to as “learners”) are trained to preserve the differences. The knowledge of the learners are then integrated by exchanging their predicted pseudo-labels on the current target query to co-supervise the learning on the target domain, but without sharing the weights to maintain the learners' divergence.
In one embodiment, at inference/testing stage, a more accurate prediction may be obtained for the current target query by an average ensemble of the diverse expertise of all the learners. In this way, the right to be forgotten of each user can be realized by deleting the target querying right after the training or testing, while knowledge contained in the target query can be transferred to the public training data (source domain) for the learners.
For example, prediction model may be trained to identify an object in an input image, e.g., a “check shirt,” a “sneaker,” a “spaghetti strap top,” and/or the like. Source domain data (e.g., public data including different types of clothing items) from Fashion-MNIST (Xiao et aL, Fashionmnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017) may be used to train the prediction model. In addition, sensitive data from a target domain, e.g., photos of people wearing different types of garments shared on social media, may also be included in the training and testing of the prediction model. Such a data query in the target domain (e.g., a photo of a person wearing a type of garment) may be incorporated into the training back to increase the cross-domain data diversity. Two copies of the prediction models may generate a prediction on the data query, e.g., a predicted type of garment that the person in the photo is wearing, which can be used as pseudo labels to supervise the training. Specifically, the two copies of prediction models exchange the generated pseudo-labels as co-supervision. Once the current query (e.g., the photo of the person wearing the type of garment) is adapted in the training phase, the query is deleted after being tested. In this way, the user photo (even if the user voluntarily shares his or her photo on the Internet) is erased from the training data.
Specifically, the labeled source data DS={(si, yi)}i=1N
The unlabeled target data DT={ti}i=1N
For example, offline adaptation assumes access to every data point in DS or DT, synchronous or asynchronous domain-wise. The inference on DT happens after the prediction model 115 is trained on both DS and DT entirely. For online adaptation, access to the entire DS is assumed, while the data from DT arrives in a random streaming fashion of mini-batches {Tj={tb}b=1B}j=1M
One challenge of online adaptation is the limited access to the training data at each query, compared to offline adaptation. For example, for online adaptation, the target queries from the target dataset 112 may be received sequentially on a one-by-one basis while the model 115 is being trained. Assuming there are 103 source and target batches respectively. In an offline setting, the model 115 is tested after training on at most 106 combinations of source-target data pairs, while in an online setting, a one-stream model can see at most 103+500 combinations at the 500-th query. Thus, online adaptation faces a significantly smaller data pool and data diversity, and the training process of the online task suffers from two major drawbacks. First, the model is prone to underfitting on target domain due to the limited exposure, especially at the beginning of training. Second, due to the erasure of “seen” batches, the model lacks the diverse combinations of source-target data pairs that enable the deep network to find the optimal cross-domain classifier. In view of the challenge, the training framework 100 in
In the online setting, the target samples cannot be reused in the training due to user privacy concern. Data diversity may be increased across domains by bootstrapping the source domain to form diverse combinations with the current target domain query. Specifically, for each target query Tj 112a, a set of K mini-batches 110a-b are randomly selected, e.g., {Sjk={(sb)b=1B}}k=1K of the same size from the source domain with replacement. Correspondingly, a set of K base learners {wk}k=1K 115a-b which are the copies of a classifier are obtained. It is worth noting that the framework 100 in
As shown in framework 100, at each iteration, a learner wk 115a or 115b is first trained on the combination of source data 110a and the target query 112a, e.g., {Tj, Sjk}. Specifically, the independent learners 115a-b may have preserved the valuable discrepancies of cross-domain pairs, the framework 100 aims at integrating the learners' expertise into one better prediction on the current target query 112a. The K learners 115a-b may be trained jointly by exchanging their knowledge on the target domain as a form of co-supervision.
In one embodiment, the K learners (e.g., 115a-b) are trained independently with bootstrapped source supervision, but they exchange the pseudo-labels generated for target queries 112a. For example, when K=2 for simplicity, the learners are denoted as wu 115a and wv 115b. Given the current target query Tj 112a, the loss function L for each learner 115a-b consists a supervised loss term s from the source domain with the bootstrapped samples {(sb)b=1B}, and a self-supervised loss term
t from the target domain with pseudo-labels ŷb (e.g., 118a or 118b) from the peer learner. In the example shown in
t for each learner 115a or 115b is computed as:
pbu and pbv are the predicted probabilities of source training sample tb by wu and wv, respectively. {tilde over (t)}b is a strongly-augmented version of tb, and τ is the threshold for pseudo-label selection. In addition, to take advantage of the supervision from the limited target query, from pbu and pbv, an entropy minimization term ent and a class-balancing diversity term
div may be further added to the loss function L. For example, the entropy loss
ent may be computed as the cross-entropy between pbu and pbv. The class-balancing diversity term
div may be computed according to Liang et al., Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International Conference on Machine Learning, 2020.
After computing the loss terms, the learners 115a-b are then updated by:
where λ is a hyperparameter that scales the weight of the diversity term. To generalized to K leaners, a learner wk is updated via:
wk←wk−η(∇(wk,{Tj,Sjk}))
pjk=p(c|Tj;wk)
where η is the learning rate, c is the number of classes, pjk is the predicted probability by the k-th learner, and L(,) is the loss objective function.
The testing may be performed at the end of the iteration. In this way, each iteration handles a target query 112a from the target dataset 112, and comprises a training stage shown in
Therefore, to obtain a good estimation of the current target query 112a, the bootstrapping framework 100 for uncertainty estimation on the target domain 112 offsets the source dominance. With the scarcity of target samples 112a-n, source-target data pairs are bootstrapped for a more balanced cross-domain simulation. At a high-level, the bootstrap simulates multiple realizations of a specific target query given the diversity of source samples. Specifically, the bootstrapped source approximate a distribution over the current query Tj 112a via the bootstrap.
In this way, the bootstrapping framework 200 brings multi-view observations on a single target query 112a by two means. First, given K sampling subsets from DS, if letting be the ideal estimate of Tj,
be the practical estimate of the dataset, and
* be the estimate from a bootstrapped source paired with the target query,
will be the average of the multi-view K estimates. Second, besides the learnable parameters, the Batch-Normalization layers of K learners generate result in a set of different means and variances {μk, σk}k=1K that serve as K different initializations that affects the learning of *.
Memory 320 may be used to store software executed by computing device 300 and/or one or more data structures used during operation of computing device 300. Memory 320 may include one or more types of machine readable media. Some common forms of machine readable media may include floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
Processor 310 and/or memory 320 may be arranged in any suitable physical arrangement. In some embodiments, processor 310 and/or memory 320 may be implemented on a same board, in a same package (e.g., system-in-package), on a same chip (e.g., system-on-chip), and/or the like. In some embodiments, processor 310 and/or memory 320 may include distributed, virtualized, and/or containerized computing resources. Consistent with such embodiments, processor 310 and/or memory 320 may be located in one or more data centers and/or cloud computing facilities.
In some examples, memory 320 may include non-transitory, tangible, machine readable media that includes executable code that when run by one or more processors (e.g., processor 310) may cause the one or more processors to perform the methods described in further detail herein. For example, as shown, memory 320 includes instructions for an online adaptation module 330 that may be used to implement and/or emulate the systems and models, and/or to implement any of the methods described further herein. In some examples, the online adaptation module 330, may receive an input 340, e.g., such as source data or a target query via a data interface 315. The online adaptation module 330 may generate an output 350 in response to the input 340. For example, the output 350 may be a prediction label of the input target query.
In some examples, the online adaptation module 330 may include a plurality of prediction models 331a-331n, which may be a number of copies of a prediction model (e.g., see 115a-b as an example). The cross-domain adaptation module 330 and its submodules prediction models 331a-n may be implemented using hardware, software, and/or a combination of hardware and software.
At step 402, a training dataset of data samples (e.g., public data 106 in
At step 404, a first target query (e.g., 112a in
In one embodiment, a first set of data samples and a second set of data samples are sampled from the training dataset in the source domain. Each has a same number of data samples as the set of sequentially received data samples. The first set of data samples or the second set of data samples are then sent as the input to the first prediction model or the second prediction model, respectively.
At step 406, a first prediction model (e.g., 115a in
At step 408, a second prediction model (e.g., 115b in
At step 410, the first prediction model and the second prediction model generate a first output and a second output in response to an input from the data samples in the source domain, respectively.
At step 412, a first loss objective is computed based on the first output and a corresponding label from the training dataset, e.g., the supervised loss lsu. For example, the first loss objective is computed as a supervised loss objective supervised by pre-annotated labels of the data samples in the training dataset.
At step 414, a second loss objective based on the first prediction using the second prediction as a first pseudo label, e.g., the co-supervised loss ltv→u. The second loss objective is computed by: generating the first pseudo label based on the second prediction; and in response to determining that the second prediction is greater than a pre-defined threshold, computing a cross entropy between the first pseudo label and a distribution of the first prediction.
At step 416, the first prediction model is updated based at least in part on the first loss objective and the second loss objective.
Similarly, the second prediction model may be updated by performing similar steps as steps 412-416. For example, a third loss objective is computed based on the second output and the corresponding label from the training dataset. A fourth loss objective is computed based on the second prediction using the first prediction as a second pseudo label. The second prediction model is updated based at least in part on the third loss objective and the fourth loss objective.
In one embodiment, an entropy loss objective may be computed based on a distribution of the first prediction. A class-balancing diversity loss objective may be computed based on the distribution of the first prediction. The first prediction model may be jointly updated based on the first loss objective, the second loss objective, the entropy loss objective and the class-balancing diversity loss objective.
At step 418, the first target query (e.g., 112a in
Some examples of computing devices, such as computing device 300 may include non-transitory, tangible, machine readable media that include executable code that when run by one or more processors (e.g., processor 110) may cause the one or more processors to perform the processes of method. Some common forms of machine readable media that may include the processes of method are, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, and/or any other medium from which a processor or computer is adapted to read.
Example Performance
Two metrics have been adopted for evaluating online domain adaptation methods described in
The online cross-domain adaptation is evaluated on a large-scale medical dataset Camelyon17 from the WILDS (Koh et aL, Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, 2021), a histopathology image dataset with patient population shifts from source to the target. Camelyon17 has 455 k samples of breast cancer patients from 5 hospitals. Another practical scenario is the online fashion where the user-generated content (UGC) might be time-sensitive and cannot be saved for training purposes. Due to the lack of cross-domain fashion prediction dataset, adaptation from Fashion-MNIST to-DeepFashion category pre-diction branch is evaluated. For example, 6 fashion categories are selected shared between the two datasets, and design the task as adapting from 36,000 grayscale samples of Fashion-MNIST to 200, 486 real-world commercial samples from DeepFashion.
In one embodiment, the cross-domain adaptation framework 100 is implemented using Pytorch. ResNet-101 is used on VisDA-C pretrained on ImageNet. Pretrained ResNet-18 (He et al., Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016) is used on COVID-DA. DenseNet-121 (Huang et al., Densely connected convolutional networks, In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017) is used on Camelyon17 with random initialization, and the official WILDS codebase 3 is used for data split and evaluation. The pretrained ResNet-101 is used on Fashion-MNIST-to-DeepFashion. The confidence thresh-old τ=0.95 and diversity weight λ=0.4 are fixed throughout the experiments.
Example baseline models for comparison include DAN (Long et al., Learning transferable features with deep adaptation networks, in International conference on machine learning, 2015), CORAL (Sun et aL, Return of frustratingly easy domain adaptation, in AAAI, 2016), DANN (Domain-adversarial training of neural networks, in proceedings of JMLR, 2016), ENT (Grandvalet et aL, Semi-supervised learning by entropy minimization, CAP, 2005), MDD (Zhang), CDAN (Long et al., Conditional adversarial domain adaptation, arXiv preprint arXiv:1705.10667, 2017), SHOT and ATDOC (Liang et al., Domain adaptation with auxiliary target domain-oriented classifier, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021). ATDOC has multiple variants of the auxiliary regularizer, which is compared with the Neighborhood Aggregation (ATDOC-NA) with the best performance. Among the compared approaches, SHOT and ATDOC-NA require a memory module that collects and stores information of all the target samples, thus only apply the offline setting. For the other six approaches, we compare both offline and online results. Each offline model is trained for 10 epochs, and each online model takes the same randomly-perturbed target queries to make a fair comparison.
The results on two medical imaging datasets COVID-DA and WILDS-Camelyon17 are respectively summarized in
The results on the newly proposed large-scale Fashion benchmark, from Fashion-MNIST to DeepFashion category prediction branch, is summarized in
This description and the accompanying drawings that illustrate inventive aspects, embodiments, implementations, or applications should not be taken as limiting. Various mechanical, compositional, structural, electrical, and operational changes may be made without departing from the spirit and scope of this description and the claims. In some instances, well-known circuits, structures, or techniques have not been shown or described in detail in order not to obscure the embodiments of this disclosure. Like numbers in two or more figures represent the same or similar elements.
In this description, specific details are set forth describing some embodiments consistent with the present disclosure. Numerous specific details are set forth in order to provide a thorough understanding of the embodiments. It will be apparent, however, to one skilled in the art that some embodiments may be practiced without some or all of these specific details. The specific embodiments disclosed herein are meant to be illustrative but not limiting. One skilled in the art may realize other elements that, although not specifically described here, are within the scope and the spirit of this disclosure. In addition, to avoid unnecessary repetition, one or more features shown and described in association with one embodiment may be incorporated into other embodiments unless specifically described otherwise or if the one or more features would make an embodiment non-functional.
Although illustrative embodiments have been shown and described, a wide range of modification, change and substitution is contemplated in the foregoing disclosure and in some instances, some features of the embodiments may be employed without a corresponding use of other features. One of ordinary skill in the art would recognize many variations, alternatives, and modifications. Thus, the scope of the invention should be limited only by the following claims, and it is appropriate that the claims be construed broadly and in a manner consistent with the scope of the embodiments disclosed herein.
The present application is a nonprovisional of and claims priority under 35 U.S.C. 119 to U.S. provisional application No. 63/280,941, filed Nov. 18, 2021, which is hereby expressly incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
11816185 | Roth | Nov 2023 | B1 |
20160378919 | McNutt | Dec 2016 | A1 |
20200050984 | Liang | Feb 2020 | A1 |
20210182600 | Yu | Jun 2021 | A1 |
20210264111 | Aditya | Aug 2021 | A1 |
20210406744 | Dutt | Dec 2021 | A1 |
20220012595 | David | Jan 2022 | A1 |
20220327689 | Gilbertson | Oct 2022 | A1 |
20230290184 | Filipowicz | Sep 2023 | A1 |
Number | Date | Country | |
---|---|---|---|
20230153307 A1 | May 2023 | US |
Number | Date | Country | |
---|---|---|---|
63280941 | Nov 2021 | US |