This application is based on and claims priority to United Kingdom Patent Application No. 2206843.1 filed on May 10, 2022, in the United Kingdom Intellectual Property Office, the contents of which are incorporated herein by reference.
The present invention relates to a method of aggregating models using federated learning.
Machine learning is a powerful tool which can be used to identify patterns in a data sample. Machine learning models can be trained on a dataset to classify certain patterns or characteristics within the dataset, allowing the identification of a pattern or characteristic when the model is applied to a new data sample. Machine learning is limited by the type and quantity of data samples within a dataset that the model is trained on. If a machine learning model is being trained on a device such as a smartphone, handset, or tablet, it may only have access to a dataset which is local to the device, for example, for security or connectivity reasons. Therefore, the diversity of the dataset may limit what the machine learning model is capable of identifying.
In one example, images may be stored on a smartphone, handset, or tablet from either a camera or downloaded from the Internet. A machine learning model may be trained on this set of images (dataset) to identify patterns, objects, or other characteristics in the images. One characteristic which may be identified are the location of shadows in images.
Existing shadow detection methods are not mobile-deployment (e.g. a smartphone) friendly for a variety of reasons, including the following. The model sizes are large relative to the sizes of models for other typical image analysis tasks. The runtime of typical shadow detection models cannot reach real-time, that is, there may be significant processing time when the model is executed by a user, resulting in a noticeable delay before a result is obtained.
Further, relatively large memory consumption during inference can render such methods unusable on-device. For example, the state-of-the-art detection model (on SBU dataset) is around 170 MB in size (Chen, Zhihao, et al. “A multi-task mean teacher for semi-supervised shadow detection.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020), and alternative models can reach 300 MB in size. Lightweight detectors use probabilistic models for final refinement stage, introducing additional computational load (see Chen, Zhihao, et al. and also Zhu, Lei, et al. “Mitigating Intensity Bias in Shadow Detection via Feature Decomposition and Reweighting.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021), and thus may use more power, draining the battery of a handheld device such as a smartphone.
There is no known lightweight shadow detection model, which is suitable for on-device deployment which also has the necessary privacy considerations. Existing methods lack generalization, due to the fact that available datasets are extremely small in size. For example, the SBU dataset is 5,000 images in total, and the ISTD dataset is 2,000 images in total.
Synthetically generated data, where shadows are superimposed on real images, are not geometry-aware since the location of the shadow forming object may not be known. This leads to poor results since the networks do not learn about geometry (e.g. the position of the shadow-causing object), but only the colour information, that is, the difference in shades of the pixels in regions where the shadow has been cast compared to regions of the image lacking shadow.
There are many use cases which are required to achieve true generalization in multiple different domains. Currently, there are no datasets which exist for such scenarios. Such domains include: indoor scenes with multiple light sources; outdoor scenes with multiple light sources; complex shadows with varying intensities; and night time scenes with multiple light sources; varying light colours.
According to a first aspect of the invention, there is provided a computer-implemented federated learning method. The method comprises, for each of a number, n, of clients: determining a diversity score of a dataset corresponding to that client for training a machine learning model, wherein the diversity score is a measure of dataset variability, aggregating, weighted by the respective diversity score, models corresponding to each of the clients, and sending the aggregated model to at least one receiving client.
Thus, the aggregated model accounts for the diversity of the clients' models based on the clients' datasets. This may mean that no one particular type of data within the dataset is over represented resulting in a biased or overtrained model. In other words, the aggregated model may result in a less biased model. This may allow for diversity-aware federated learning.
Aggregating may comprise averaging the diversity score weighted model weights for n clients.
The receiving client may be one of the n clients.
The diversity score may also be referred to as a first score or first identity.
Aggregating models corresponding to each of the clients may comprise, for each of the number, n, of clients: assigning each client to a cluster based on one or more dataset attributes. For each cluster, generating aggregated cluster weights by aggregating, weighted by the respective diversity score, models corresponding to each of the clients, and aggregating the aggregated cluster weights.
Clustering clients according to one or more dataset attributes may allow for hierarchical, cluster-based federated learning which is also diversity-aware. By first aggregating model weights obtained from datasets with similar attributes, and then aggregating these weights, more efficient distributed learning can be achieved.
Assigning each client to a cluster may comprise assigning a clustering identity to the dataset used to train the model.
The clustering identity may also be referred to as a second identity. The clustering identity may be based on, for example, dataset location, domain (i.e. food, document, outdoor), time (e.g. time of day), diversity score and the like.
Assigning the clustering identity to the dataset may comprise calculating a vector of softmax probabilities or extracting an embedding vector from a classification model, for example, a deep neural network.
The vector of softmax probabilities may be a Ki-dimensional vector where Ki is the number of classes for the ith client for i=1, 2, . . . , n. The embedding vector may be a Di-dimensional vector for the ith client for i=1, 2, . . . , n.
Thus, the dataset of each client may be represented by a clustering identity. The clustering identity may be assigned by inputting a client's dataset into a pretrained classification network and then extracting either the softmax probabilities and/or the embedding vectors. Any suitable classification model may be used to provide an identity for each client's dataset.
The softmax probabilities may be generated using the equation:
for k=1, 2, . . . , Ki and xi=(x1, x2, . . . , xK
The dataset may have been used for training a local model cached on a client.
The method may further comprise, for each of the n clients: applying a differential privacy function to the diversity score weighted model weight.
The privacy function may be a simple function that adds random noise to the score weighted model weight. Thus, the privacy of data from a client is maintained as no identifiable information is shared with another client. Further, the privacy function may also protect against malicious parties, for example, in the potential case of data breach (i.e. model weights can be stolen).
The aggregation step(s) may be performed on one of the number, n, of clients.
The aggregation step(s) may be performed on a central server.
The dataset used to train the model may comprise image data.
For example, the image data may be a photo, e.g. a digital photo, comprising a region representing a shadow cast by an object.
The dataset used to train the model may comprise an input provided by a user.
For example, the input provided by a user may be an indication of an object, or the location of a shadow or an occlusion.
The dataset may comprise a mask.
The mask may indicate a region, area or subset of the dataset. The mask may be a shadow mask for an image that represents an area or region of an image which indicates a shadow cast over it.
Determining the diversity score of the dataset may comprise: determining a scene identity for a subset of data of the dataset or a data sample.
The subset of the dataset may be a data sample. That is, a scene identity, may be determined for each of the data samples before they are added to a dataset, and the scene identities are stored or used to update the diversity score whenever a new data sample/mask pair are added to the dataset. Scene identities may be determined by a scene understanding model, for example, a neural network model.
The diversity score may be determined by combining the scene identities from a plurality of subsets of data or data samples, for example, by averaging the scene identities, or using other descriptive statistics, the mean and standard deviation, variance and the like, of scene identities of the data samples in the dataset.
The scene identity may also be referred to as a second identity.
The scene identity may be determined by calculating a vector of softmax probabilities or extracting feature representations, for example, an embedding vector (e.g. a Di-dimensional vector) from the model (e.g. a network model). The vector of softmax probabilities may be a Ki-dimensional vector.
In response to a trigger condition, the method may further comprise, for each data sample: determining a confidence score of that data sample, adding the data sample to the dataset if the confidence score is above a threshold, and discarding the data sample if the confidence score is below the threshold.
In this way, data samples with a high confidence of accurate data will be added to the dataset, with data samples with low confidence of accurate data being discarded.
In response to a trigger condition, the method may further comprise for each data sample: determining a first confidence score for that data sample; augmenting that data sample; determining a second confidence score for the augmented data sample; discarding the data sample if the first and second confidence scores are above a first threshold distance; if the first confidence score is above a second threshold, adding the data sample to the dataset; and if the first confidence score is below the second threshold, discarding the data sample.
The trigger condition may be, for example, receiving one or more new data samples; one or more new data samples being available, or an input by a user.
Determining the confidence score of the data sample may further comprise determining the softmax probability for the data sample.
For example, if the data sample is an image, the softmax probability may be determined for each pixel in the image. The softmax probabilities for each pixel of an image may be averaged.
In response to a trigger condition, the method may further comprise for each data sample: determining whether the data sample is added to the dataset by comparing an attribute of the data sample to a corresponding attribute a subset of data in the dataset.
For example, if attributes of the data sample and attributes of a subset of data in the dataset have a distance below a threshold in an attribute space, then they may be too similar and therefore result in a low diversity score and in turn result in an overtrained model for data having that attribute. For example, if the data sample is an image of a document having a region indicating a shadow, and the dataset already includes one or more of similar images, then the data sample may be discarded.
The method may further comprise: if the distance between the attribute of the data sample and corresponding attribute of the dataset is below a threshold, discarding the data sample; and if the distance between the attribute of the data sample and corresponding attribute of the dataset is above a threshold, adding the data sample to the dataset.
Thus, if the dataset already contains a subset of data similar to the data sample, the data sample can be discarded and the diversity of the dataset is maintained.
The attribute may be, for example, embedding vectors to determine whether the data sample and a subset of data are similar. For example, an image embedding vector can be used to assess the similarity of two images.
The attribute may be location data, for example, mask location data indicating the location of an object or shadow in an image.
The method may further comprise determining that there is sufficient storage available to store the data sample when added to the dataset.
It may also be determined that a data sample is to be removed from the dataset based on an attribute of the existing data sample in the dataset and the corresponding attribute of a newly available data sample. For example, if a newly available data sample is of a higher value to the dataset, e.g. by increasing the dataset's diversity, then an existing data sample may be removed from the dataset.
According to a second aspect of the invention, there is provided a method comprising: receiving an image comprising a region indicating a shadow; identifying the region of the image indicating the shadow using the aggregated model of the first aspect.
Identifying the region of the image indicating the shadow may further comprise receiving an input from a user identifying the location of a shadow in the image.
Inputs from a user may be in the form of a mask which can be represented with Gaussian, Euclidian, Cosine or any other (distance) map. The mask may be integrated into multiple layers of the (decoder) network, by downsampling when necessary.
The method may further comprise displaying the image and a representation of the region of the image indicating the shadow.
The method may further comprise: identifying a second region of the image indicating the shadow using a second input from a user identifying or refining the location of a shadow in the image.
The method may be a computer implemented method.
According to a third aspect of the invention, there is provided a computer program comprising instructions which when executed by one or more processors causes the one or more processors to perform the method of either the first or second aspect.
According to a fourth aspect of the invention, there is provided a computer program product comprising a computer-readable medium storing the computer program of the third aspect.
According to a fifth aspect of the invention, there is provided a module configured to perform the method of either the first or second aspect.
The module may be a hardware module.
According to a sixth aspect of the invention, there is provided a monolithic integrated circuit comprising: a processor subsystem comprising at least one processor and memory; and the module of the fifth aspect.
According to a seventh aspect of the invention, there is provided a device comprising: the module of the fifth aspect; and at least one sensor for providing a data sample.
The sensor may be a camera, for example, a digital camera.
The device may be a tablet or smartphone.
According to an eighth aspect of the invention there is provided a computer system comprising: memory; and at least one processing unit; wherein the memory stores the dataset of the first or second aspect and the at least one processing unit is configured to perform the method of either the first or the second aspect.
According to a ninth aspect of the invention, there is provided a method comprising training a machine learning model using the dataset of the first or second aspect.
Training the machine learning model may be triggered by one or more of the following conditions: an input, by a user, corresponding to a data sample; a new data sample being available; client device temperature being above or below a threshold; client device power status or availability; client device processor usage above or below a threshold; or presence of data sample on client device.
Certain embodiments of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:
Referring to
Shadows can indicate the direction of a light source, and possibly the time of day. Referring also to
Different light sources generate different types of shadows. For example, light sources may be point light, non-point light, or be from multiple light sources. Referring to
Referring to
Shadows can also have the effect of obscuring objects in photos, therefore rendering them “missing” when image analysis is performed on the photo. Referring to
False positives, can also be a problem, for example, shadows can give the impression of an object being present in a photograph. Referring to
Identifying the location of a shadow in an image allows the removal of the shadow. Further, when coupled with object detection or identification, shadow identification gives strong clues about the number of light sources, and their respective directions in a scene. Referring to
Referring to
Referring to
User input can be a valuable tool when preparing a shadow mask for an image. For example, a user may want to eliminate only one shadow from an image, or there may be a subtle difference between a shadow and background which a user can easily identify. Referring to
Unlike certain computer vision tasks, there are no on-device sensors that can generate labels for shadow detection training. Depth estimation tasks can use stereo cameras or time of flight (ToF) sensors to generate ground-truth. Image tilt estimation task can use inertia measuring units (IMUs) to generate ground-truth. The ability to generate data or ground-truth pairs on-device unlocks privacy-preserving on-device training, which in turn makes models more accurate and robust to data-shift. One solution is to generate on-device data/ground-truth pairs to achieve generalization. However, synthetically generating data requires domain-adaptation solutions as some shadows would be inappropriate for certain domains. Synthetically superimposing shadows on real images assume shadow-free images, which may be unrealistic for images captured by a user's device, such as a smartphone. For example referring again to
Referring to
Referring to
While the problems and methods so far have been described in relation to identifying the location of shadows in images, and their removal from an image, the methods described herein can also be applied to other types of data, for example, sensor data such as temperature, movement, location and the like which may indicate the status of devices or structures indicating vital safety information.
Referring to
Referring to
Referring to
Referring to
Referring to
Confidence Evaluation
Referring to
Referring to
C
avg,i=1/PiΣp=0P
Where Pi is the number of confidence values Cp. If the average probabilities Cavg,i (confidence) across pixels (or other information from a sensor) of the data sample are below a threshold 2 (step S36), the data sample and mask prediction are discarded (step S37). If Cavg,i is above a threshold λ, the data sample and mask pair are possible candidates for training pairs.
Scene Evaluation
For certain types of data samples, there are a variety of domains that the data sample can occupy. For example, for images 21 having a region representing a shadow, there are variety of scenarios in which shadow detection can be used. Referring to
A domain or scene understanding model is used to assess the domain that a data sample occupies, and classify the data sample as being from one or more domains in a client. For example, an image 21 of an outdoor space would be input to a scene understanding model and a score indicating that it is a photograph of an outdoor space would be returned. Referring to
The softmax probabilities may be generated (e.g. calculated pixelwise) using the equation:
for k=1, 2, . . . , Ki and xi=(x1, x2, . . . , xK
Dataset Check
To save space in a database containing the dataset and also to achieve a diverse dataset to train a model with, the diversity of the dataset can be evaluated, e.g. by averaging the scene scores from the data samples which are present in the data set. Other metrics can also be used to provide an evaluation of the diversity of the dataset, for example, standard deviation, variance, mode, median and other descriptive statistics, which may be weighted accordingly. If a new data sample/training pair has been identified as a training candidate, the training pair's scene score can be compared to one or more diversity metrics from the dataset. If the comparison yields a result that indicated that the training pair would increase diversity of the dataset, the training pair is added to the dataset, otherwise, the training pair is discarded. For example, if the training pair are images 21 and shadow masks 23, metrics can be chosen as follows; image embedding vectors (are images similar?); shadow locations (are the shadows at the same place?); scene scores (do we have desired scenes in the batch?). Other metrics may be used. Such a method can also save space in the dataset.
In one example, three metrics may be computed for each incoming data sample/training pair. A fast search of the dataset can be performed to see if there is a similar data sample with same metrics already in the dataset. If the new data sample/training pair is similar in most metrics (e.g. 2 out of 3), the data sample/training pair is discarded. If the data sample/training pair is not similar to anything in dataset, the data sample/training pair is saved to the data set. Using discriminative metrics allows the saving of space and facilitates fast searching in the database for similar data. For example, two different training pairs are unlikely to have the same three metrics.
If the comparison of scene scores reveals that the incoming data sample/training pair is a suitable candidate, but that there is currently not enough space in the database to save the new data sample/training pair to the dataset, then a low priority sample may be removed from the dataset, and the new, higher-priority sample saved.
Referring to
Diversity-Aware Federated Learning
Referring to
Optionally, a differential privacy function may be applied to the diversity score-weighted model weight 43, for example, using the following equation:
ϕi:WD
Thereby generating a privatised weight 44 φi which may be shared among n clients and servers instead of model parameters, thus, maintaining privacy of user information or data.
Each of the n clients' diversity score-weighted model weights 43, or the privatised weight 44, are sent to a single of the n clients, a different client, or to a central server 45. After collecting all the diversity score-weighted model weights 43, the weights are aggregated, generating an aggregated weight Wagg 46. The weighting scheme can be used with any base aggregation method (i.e. FedAvg, etc.). Assuming simple averaging as the base aggregation method, Wagg may be calculated as follows:
This aggregated weight may then be sent to a receiving client 47. The receiving client 44 may be one of the n clients. Thus, the aggregated model Wagg weight now accounts for the diversity of the data in all of the n clients' datasets, so may not be overtrained on data from one particular domain.
Referring to
Clients 40 may be arranged into clusters 48 in dependence of the diversity scores for their dataset 41, or diversity weighted model weights 43. Thus, model weights which represent particular domains, for example, images of documents, are aggregated first, before being aggregated with the weights representative of other domains. Thus, an aggregated model weight 46 can be generated more efficiently.
Modifications
It will be appreciated that various modifications may be made to the embodiments hereinbefore described. Such modifications may involve equivalent and other features which are already known in the design and use of methods for federated aggregation and machine learning and component parts thereof and which may be used instead of or in addition to features already described herein. Features of one embodiment may be replaced or supplemented by features of another embodiment.
Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure of the present invention also includes any novel features or any novel combination of features disclosed herein either explicitly or implicitly or any generalization thereof, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention. The applicants hereby give notice that new claims may be formulated to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2206843.1 | May 2022 | GB | national |