The present specification relates to domain adaptation, for example for adapting computational models for classifying data which may be received from one or more sensors.
A computational model, e.g. provided on an encoder, may be trained using labelled training data, for example using the principles of machine learning. If data applied to such a computational model during a training phase has similar properties to data applied during a deployment phase, then high performing models can be provided. This is not always the case in real-world systems. There remains a need for further developments in this field.
The scope of protection sought for various aspects of the invention is set out by the independent claims. The aspects and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding the various aspects of the invention.
According to a first aspect, this specification describes an apparatus, comprising means for: providing a source dataset comprising a plurality of source data items associated with a source domain; providing a target dataset comprising a plurality of target data items associated with a target domain; providing a first computational model (34, 41) associated with the source domain dataset, the first computational model being associated with a plurality of source domain classes; generating, for each of a series of target data items xt input to the first computational model (34, 41), a target weight δT indicative of a confidence value that said target data item belongs to a class which is shared with known classes of the first computational model; generating, for each of a series of source data items xs input to the first computational model (34, 41), a source weight δS indicative of a confidence value that said source data item belongs to a known class of the first computational model (34, 41), shared with the target domain; adapting at least part of the first computational model (34, 41) by means of one or more processors, to generate a second computational model by training a discriminator (42) to seek to decrease a discriminator loss function, the discriminator loss function being computed using the source and target data items xs xt, respectively weighted by the source and target weights δS, δT; and deploying the second computational model for use in receiving one or more input data items associated with the target domain for producing an inference output.
The source and target datasets may comprise respective first and second sets of audio data items, and wherein the second computational model is an adapted audio classifier comprising at least one class shared with known classes of the first computational model. The first set of audio data items may represent audio data received under one or more first conditions and wherein the second set of audio data items may represent audio data received under one or more second conditions, wherein the first and second conditions comprise differences in terms of their respective ambient noise and/or microphone characteristics. The first and second sets of audio data items may represent speech, e.g. one or more keywords.
The respective first and second sets of audio data items may represent speech in a particular language having different accents. The respective first and second sets of audio data items may represent speech received by people of different genders and/or age groups.
The second computational model may be configured for use with a digital assistant apparatus for performing one or more processing actions based on received speech associated with the target domain. The source and target datasets may comprise respective first and second sets of video data items, and wherein the second computational model may be an adapted video classifier comprising at least one class shared with known classes of the first computational model. The respective first and second sets of video data items may represent video data received under first and seconds conditions, wherein the first and second conditions may comprise differences in terms of their respective lighting, camera, and/or image capture characteristics. The first set of video data items may represent video data associated with movement of a first type of object and the second set of video data items may represent video data associated with movement of a second type of object.
The source and target datasets may comprise respective first and second physiological data items, received from one or more sensors, and wherein the second computational model may be an adapted health or fitness-related classifier comprising at least one class shared with known classes of the first computational model.
The means for generating the target weight and for generating the source weights may be configured to use a probability distribution produced by inputting one or more target data items to the first computational model. The apparatus may further comprise a first classifier means for computing the target weight, the first classifier means being a computational model trained using a filtered subset of target data items based on the produced probability distribution. The apparatus may be configured for providing the filtered subset of target data items by: generating, using the first computational model, a probability distribution over the known source domain classes for a particular target data item; determining a confidence level for the particular target data item belonging to a source domain class using the generated probability distribution; and selecting the particular target data item for the subset if the confidence level is above an upper confidence level threshold or below a lower confidence level threshold. The confidence level may be determined using the difference between the two largest values of the generated probability distribution. The first classifier means may be configured as a binary classifier for computing a target weight of ‘1’ for indicating that a particular target data item belongs to a shared target domain class and ‘0’ for indicating that a target data item belongs to a private target domain class. The apparatus may further comprise a second classifier means for computing the source weight, the second classifier means being a computational model trained using a filtered subset of the source domain data items.
The apparatus may be configured for filtering the source data items by: inputting a batch of target data items to the first trained model to generate respective probability distributions; aggregating the probability distributions; identifying a subset of the source domain classes based on the aggregated probability distributions, including a predetermined number of largest value and lowest value classes; and selecting source data items associated with the identified subset of source domain classes.
The second classifier means may be configured as a binary classifier for computing a source weight of ‘1’ for indicating that a particular source data item belongs to a known class of the first computational model shared with the target domain and ‘0’ for indicating that a particular source data item belongs to a private source domain class.
The first computational model may comprise a feature extractor associated with the source domain dataset, and wherein the means for adapting the first computational model comprises means for updating weights of the feature extractor based on the computed discriminator loss function. The first computational model may further comprise a classifier for receiving feature representations from the feature extractor, and wherein the means for adapting the first computational model further may comprise determining a classification loss resulting from updating weights of the feature extractor and further updating the weights of the feature extractor based on the classification loss.
The apparatus may further comprise means to enable adaptation of the first computational model automatically, responsive to identifying that one or more conditions under which the set of target data items were produced are different from one or more conditions under which the set of source data items were produced. The enabling means may be configured to identify different characteristics of one or more sensors used for generating the respective sets of target data items and source data items. The enabling means may be configured to access metadata respectively associated with the source and target data items indicative of the one or more conditions under which the sets of source and target data items were produced.
According to a second aspect, this specification describes a method, comprising: providing a source dataset comprising a plurality of source data items associated with a source domain; providing a target dataset comprising a plurality of target data items associated with a target domain; providing a first computational model (34, 41) associated with the source domain dataset, the first computational model being associated with a plurality of source domain classes; generating, for each of a series of target data items xt input to the first computational model (34, 41), a target weight δT indicative of a confidence value that said target data item belongs to a class which is shared with known classes of the first computational model; generating, for each of a series of source data items xs input to the first computational model (34, 41), a source weight δS indicative of a confidence value that said source data item belongs to a known class of the first computational model (34, 41), shared with the target domain; adapting at least part of the first computational model (34, 41) by means of one or more processors, to generate a second computational model by training a discriminator (42) to seek to decrease a discriminator loss function, the discriminator loss function being computed using the source and target data items xs xt, respectively weighted by the source and target weights δS, δT; and deploying the second computational model for use in receiving one or more input data items associated with the target domain for producing an inference output.
The source and target datasets may comprise respective first and second sets of audio data items, and wherein the second computational model may be an adapted audio classifier comprising at least one class shared with known classes of the first computational model. The first set of audio data items represent audio data received under one or more first conditions and wherein the second set of audio data items represent audio data received under one or more second conditions, wherein the first and second conditions comprise differences in terms of their respective ambient noise and/or microphone characteristics.
The first and second sets of audio data items may represent speech, e.g. one or more keywords. The respective first and second sets of audio data items may represent speech in a particular language having different accents. The respective first and second sets of audio data items may represent speech received by people of different genders and/or age groups.
The second computational model may be for use with a digital assistant apparatus for performing one or more processing actions based on received speech associated with the target domain.
The source and target datasets may comprise respective first and second sets of video data items, and wherein the second computational model may be an adapted video classifier comprising at least one class shared with known classes of the first computational model.
The respective first and second sets of video data items may represent video data received under first and seconds conditions, wherein the first and second conditions may comprise differences in terms of their respective lighting, camera, and/or image capture characteristics. The first set of video data items may represent video data associated with movement of a first type of object and the second set of video data items represent video data associated with movement of a second type of object. The source and target datasets may comprise respective first and second physiological data items, received from one or more sensors, and wherein the second computational model may be an adapted health or fitness-related classifier comprising at least one class shared with known classes of the first computational model.
Generating the target and source weights may use a probability distribution produced by inputting one or more target data items to the first computational model. The method may further comprise using a first classifier for computing the target weight, the first classifier being a computational model trained using a filtered subset of target data items based on the produced probability distribution. The filtered subset of target data items may be obtained by: generating, using the first computational model, a probability distribution over the known source domain classes for a particular target data item; determining a confidence level for the particular target data item belonging to a source domain class using the generated probability distribution; and selecting the particular target data item for the subset if the confidence level is above an upper confidence level threshold or below a lower confidence level threshold. The confidence level may be determined using the difference between the two largest values of the generated probability distribution.
The first classifier may be configured as a binary classifier for computing a target weight of ‘1’ for indicating that a particular target data item belongs to a shared target domain class and ‘0’ for indicating that a target data item belongs to a private target domain class. The method may further comprise using a second classifier for computing the source weight, the second classifier being a computational model trained using a filtered subset of the source domain data items. Source data items may be filtered by: inputting a batch of target data items to the first trained model to generate respective probability distributions; aggregating the probability distributions; identifying a subset of the source domain classes based on the aggregated probability distributions, including a predetermined number of largest value and lowest value classes; and selecting source data items associated with the identified subset of source domain classes.
The second classifier may be configured as a binary classifier for computing a source weight of ‘1’ for indicating that a particular source data item belongs to a known class of the first computational model shared with the target domain and ‘0’ for indicating that a particular source data item belongs to a private source domain class.
The first computational model may comprises a feature extractor associated with the source domain dataset, and wherein the means for adapting the first computational model may comprise means for updating weights of the feature extractor based on the computed discriminator loss function. The first computational model may further comprise a classifier for receiving feature representations from the feature extractor, and wherein adapting the first computational model further comprises determining a classification loss resulting from updating weights of the feature extractor and further updating the weights of the feature extractor based on the classification loss.
The method may further comprise performing adaptation of the first computational model automatically, responsive to identifying that one or more conditions under which the set of target data items were produced are different from one or more conditions under which the set of source data items were produced. Adaptation may be performed responsive to identifying different characteristics of one or more sensors used for generating the respective sets of target data items and source data items. The method may further comprise accessing metadata respectively associated with the source and target data items indicative of the one or more conditions under which the sets of source and target data items were produced.
According to a third aspect, this specification describes a method computer program comprising computer-readable instructions, which, when executed by a computing apparatus, causes the computer apparatus to perform any method as described with reference to the second aspect.
According to a fourth aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to perform any method as described with reference to the second aspect.
Examples will now be described, by way of example only, with reference to the accompanying drawings, in which:
In the description and drawings, like reference numerals refer to like elements throughout.
Example aspects relate to domain adaptation in the field of machine learning, for example for the purpose of mitigating so-called domain shift which can lead to performance degradation in practical implementations as will be explained below.
Example aspects may relate to domain adaption for one or more specific technical purposes, for example relating to computational models for classifying data items which represent, or are generated by, real-world and/or technical entities, such as one or more electrical or electronic sensors. A sensor may comprise one or more of a microphone, camera, video camera, light sensor, heat sensor, geospatial positioning sensor, orientation sensor, accelerometer and a physiological sensor such as for estimating heart rate, blood pressure, temperature, an electrocardiogram (ECG) or the like. Specific examples may include one or more of (i) classifying audio data (e.g. music or speech), (ii) classifying video data (e.g. data representing captured images of people or objects) (iii) classifying technical or physiological performance data (e.g. representing health or fitness-related data derived from one or more body-worn sensors) and (iv) classifying data from one or more sensors associated with industrial machinery or processes. All such examples, as well as others, are susceptible to so-called domain shift, as will be explained below.
Machine Learning (ML) algorithms, as data-driven computational methods, typically attempt to fit a complicated function over a labelled dataset, e.g. a set of training data, with the expectation that comparable performance will be achieved when an unseen dataset, e.g. test data or operational data, is applied to the trained algorithm. Such training algorithms may be referred to as supervised learning algorithms in which a labelled training set is used to learn a mapping between input data and class labels.
In both theory and practice, machine learning and supervised learning methodologies typically assume that the data distributions of training datasets and deployment (e.g. testing) datasets are the same. Thus, in the example systems 10A and 10B, it may be assumed that the distribution of the input data of the system 10B matches the distribution of the labelled data in the system 10A. The labelled data and the input data are said to belong to the same domain.
Following this assumption, labelled training sets may be provided for each of a plurality of data distributions, even though many of these data distributions may be similar. Example sets of data for which separate labelled training data might be generated include images of the same object from different angles, paintings in different styles, human activity sensors in different body locations, processing of the same language with different accents and so on.
In real-world systems, the assumption that the data distributions of training datasets and deployment (e.g. testing) datasets are the same is not always valid.
A number of real-world factors may lead to variability between training and test data distributions. These factors could include, for example, variabilities induced by sensor processing pipelines, by environment factors, e.g., lighting conditions, by user-related issues e.g., different people wear their smart devices differently, and/or by audio data representing speech being from users with different accents. This shift in data distribution between training domains and testing/deployment domains is sometimes referred to as “domain shift”.
As discussed further below, “domain adaptation” seeks to address the issue of domain shift. In general, domain adaptation provides two similar (but different) domains, referred to herein as a source domain and a target domain. Data instances in the source domain are typically labelled (providing labelled training data for a source model), whereas data instances in the target domain are partially labelled (semi-supervised domain adaption) or not labelled at all (unsupervised domain adaption). The aim of domain adaption is to seek to train a target model, e.g. another encoder, by utilizing aspects of the source model.
Thus, instead of training each data distribution (or “domain”) from scratch, domain adaptation seeks to develop a target model by adapting an already-trained source model. This can lead to a reduction in labelling efforts, thereby providing processing and memory efficiencies, and, in some circumstances, to the development of more robust models. However, adaptation merely by aligning feature representations of source datasets to a target datasets without accounting for the existence of non-shared private classes in one or both domains can have negative consequences, even leading to a worse performance than if no adaptation were performed.
To illustrate,
Clearly, area 324 identifies the shared classes between the source and target domains 320, 322 including WAKE, PLAY and STOP. The source domain 320 has private classes SAVE and COPY and the target domain 322 has private classes GO and RETURN. Example aspects aim to distinguish, as part of an iterative training process, those classes that are shared and private among the source and target domains 320, 322 in order to provide a more robust adaptation.
The algorithm 20 starts at operation 22, where a source encoder is trained at some earlier point to provide a source domain model. The source encoder may, for example, be trained using labelled training data, as described above with reference to
The source encoder when trained represents a computational model which may be referred to as a source domain model. The source domain model may be usable in a subsequent inference phase for generating output data representing, for example, a prediction of a class that the input or test data belongs to. The source domain model may, for example, include one or more sub-models including a feature extractor for generating feature representations of input data, and a classifier.
At a later time, when adaptation of the source domain model is required for use with target domain data, respective operations 24 and 26 provide source domain data and target domain data. Source domain data may comprise one or more data items of a source domain dataset which corresponds (or closely corresponds) to that used to train the source domain model. For example, speech data from users with the same or closely similar accents. Operations 24 and 26 may be performed in parallel or in any order.
At operation 28, the source domain model may be adapted as described herein using the provided source and target domain data as training data. For example, a feature extractor and/or classifier may be iteratively updated (indicated by arrow) based on sequences of source and target domain data items. The feature extractor and/or classifier may be considered as sub-models, having their own parameters or weights. Updating may involve determining updated parameters or weights. The aim is to ‘align’ or shift the source domain model so that it can be used in an inference operation 29 later on, for example in which the adapted model is deployed (i.e. made effective for inference purposes) on an encoder for receiving target domain test data, which may be received from one or more sensors either directly or indirectly in real-time or near real-time. Note that, in some example aspects, the source domain model need not comprise a copy of the comprehensively trained source domain model on the source encoder 22 for adaptation, although this is one option.
Rather, another form of source domain model may comprise a set of arbitrary, e.g. random or pseudo-random, parameters, initialized based on the source domain model to have the same trained classes. As the operations disclosed herein are performed, the target domain model initialized in this way should converge as described.
As used herein, the term “provide” may also comprise “receive” or “generate”.
Example aspects may involve adapting the source domain model to provide a target domain model, which may include estimating so-called shared classes which are known classes of the source domain model that at least some of the target domain data also belong to. Example aspects may involve adapting those classes to have more significant weights as opposed to those of non-shared classes so that subsequent inputs relating to private can be identified and labelled appropriately, e.g. as unknown.
One way to perform adaptation is to align feature representations of the source and target domains; that is the feature representations of subsequent target data items are aligned or shifted by training a feature extractor so that a downstream classifier will maps the shifted feature representation of a given target data item to the correct class if it is a shared class. For source domain classes not represented in the target domain data, and vice versa, it is proposed that the adaptation process may not align. For this purpose, we estimate one or more private classes.
In overview, therefore, example aspects aim to estimate shared and private classes and appropriately weight their contribution in the adaptation process to counter so-called label mismatch. In the inference phase, test data associated with a private class can be classified as unknown. In this way, known issues associated with negative transfer, whereby an adapted model may perform worse than the original model, may be avoided. Also, by determining certain data items to be within a private class, and labelling them as unknown, we reduce the risk that data items be incorrectly labelled with a source domain class. This can seriously affect performance of subsequent applications reliant on class labels to perform operations.
Example aspects relate to training a second computational model by adapting a first, already trained computational model associated with a source domain. The second computational model may be initialized using the first computational model and thereafter iteratively adapted using a weighted loss function that uses target and source weights to indicate a confidence level that a particular target and source data item belongs to a shared class. A higher confidence level is so indicative, whereas a lower confidence level is indicative of the particular data item belonging to a private class.
In example aspects, there may be provided source data items Xs and labels Ys, sampled from a probability distribution S, and target data items Xt sampled from a probability distribution T. No labels are available from the target domain during training. We may denote the label sets of the source and target domains as CS and CT respectively. The set of classes shared between source and target domains may be denoted by CShared Finally, C′S and C′T may represent private label sets of the source and the target domains.
The algorithm 20 may be implemented, comprising one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations.
In some example aspects, the adaptation system 30 may be enabled or triggered to perform the adaptation operations described herein, automatically, responsive to identifying that one or more conditions under which the target data items Xt were received are different from one or more conditions under which the source data items Xs are or were received.
For example, a source or target computer system, or any system associated with the adaptation system 30, may store source model metadata indicative of one or more characteristics of one or more sensors used for generating the source data items. If the corresponding one or more characteristics of sensors used for generating the target data items are different, or different beyond a predetermined threshold in terms of values, then the adaptation system 30 may be enabled. For example, in the cause of audio or video data, the characteristics may relate to particular models or types of microphone or camera; if a first type or model is used to capture the source data items XS and a second type or model is used to capture the target data items Xt then the identified difference may be sufficient to trigger model adaptation by the adaptation system 30. In some aspects, other characteristics such as time or date of capture, lighting conditions, ambient noise conditions, and so on may be parameterized and stored as source model metadata for use in subsequently determining if and when to enable or trigger the adaptation system 30 based on corresponding characteristics identified for the target data items Xt.
The adaptation system 30 may comprise a sampling subsystem 32, a feature extraction subsystem 34 and an adaptation subsystem 36 according to example aspects.
The sampling subsystem 32 is optional and may be configured to, for example, resample time-varying data to a particular frequency, or, in the case of image data, resize the images.
The feature extraction subsystem 34 may be configured to perform feature extraction using any known means, for example to extract statistical features such as mean, variance and/or task-specific features such as Mel-Frequency Cepstral Components (MFCC) for speech models.
The adaptation subsystem 36 may be configured to perform adaptation of, or based on, an already-trained source domain model to produce a target domain model, as described herein.
The adaptation system 30 may comprise one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations.
In terms of its implementation on hardware, the components may be distributed so that certain functions are performed on one item of hardware and other functions are performed on one or more other items of hardware. The different items of hardware need not be local to one another, and certain intercommunications may be performed over one or more data networks between one or more remote locations.
The feature extraction subsystem 34 is shown as part of the
The feature extractor 34 is a computational model for generating a meaningful feature representation z=F(x) 40 for training and inference purposes. As a computational model, the feature extractor 34 may therefore comprise a set of parameters or weights WF which can be iteratively adjusted (trained) according to a loss function as part of, for example, a gradient descent algorithm to reduce or minimize the loss function.
During adaptation, both target and source data items Xt, and XS may be provided to the feature extractor 34.
Part of adapting the source domain model may comprise adapting a copy of its feature extractor by modifying weights WF such that target data items Xt result in feature representations more aligned with source data items belonging having a shared class CShared.
The adaptation subsystem 36 may also comprise a classifier 41. The classifier 41 may be a probabilistic classifier and may also comprise a computational model for generating, from received feature representations z from the feature extractor 34, a probability distribution ŷ over a set of classes. As a computational model, the classifier 41 may therefore comprise a set of parameters or weights WG which can be iteratively adjusted (trained) according to a classifier loss function Lcls 45 as part of, e.g. a gradient descent algorithm, to reduce or minimize the classifier loss function.
For example, part of adapting the source domain model may comprise adapting a copy of its classifier by modifying weights WG such that target data items Xt produce from the classifier 41 a probability distribution whereby target data items Xt produce higher probability values for a shared class CShared. The probability distribution ŷ may comprise a SoftMax probability distribution.
The adaptation subsystem 36 may also comprise a discriminator 42 for adversarial learning. The discriminators 42 are used in generative adversarial networks (GANs) and comprise a computational model for training, from received feature representations z, data indicative of whether a particular data item is associated with the source domain or some other domain, for example the target domain. An aim of the discriminator 42 is to separate source features from target features by said adversarial learning. As a computational model, the discriminator 42 may therefore comprise a set of parameters or weights Wadv which can be iteratively adjusted (trained) according to an adversarial loss function Ladv 46 as part of, e.g. a gradient descent algorithm, to reduce or minimize the adversarial loss function.
With knowledge of the value of Ladv 46 as training progresses iteratively, the discriminator 42 can be iteratively updated to improve separation of the source and target features. Further, by reversing the Ladv 46 gradient by multiplying it by minus 1, we obtain a reverse gradient representing feature loss LFeature 47.
LFeatute 47 may be used to update the weights WF of the feature extractor 34 in order to bring source and target feature representations closer together as part of the above-mentioned feature alignment. However, as noted above, this is desirable only for the feature representations associated with shared classes as opposed to for private classes.
Therefore, as part of the feature alignment process, the adaptation subsystem 36 is configured to place a higher importance or weight on shared classes in the feature alignment process than on private classes. Thus, the adversarial loss function Ladv 46 that the discriminator 42 acts to minimize is formulated to include first and second weighting terms δS and δT, respectively referred to as source and target weights.
For example, the adversarial loss function Ladv 46 may take the form:
where δS and δT are weights assigned to source and target data items respectively.
By assigning higher weights to data items from shared classes and lower weights to data items from private classes in the relevant domain, the adaptation process may be improved.
Referring back to
An operation 52 may comprise providing a source dataset comprising a plurality of source data items Xs associated with a source domain.
An operation 53 may comprise providing a target dataset comprising a plurality of target data items Xt associated with a target domain.
An operation 54 may comprise providing a first computational model associated with the source domain dataset. The first computational model may comprise a dataset or file defining nodes and parameters (e.g. weights) that may be transferred from one computational item, e.g. an encoder, to another. In some example aspects, the first computational model may be a trained computational model. Alternatively, in some example aspects, the first computational model may be a model initialized with random or pseudo-random parameters but having the same source domain classes associated with the source domain dataset as previously trained.
It should be appreciated that the operations 52, 53, 54 may be performed in parallel, substantially simultaneously or in any order.
An operation 55 may comprise generating, for each of a series of target data items Xt input to the first computational model, a target weight. The target weight may be indicative of a confidence value that said target data item belongs to a known class of the first computational model.
An operation 56 may comprise generating, for each of a series of source data items XS input to the first computational model, a source weight. The source weight may be indicative of a confidence value that said source data item belongs to a known class of the first computational model shared with the target domain.
It should be appreciated that the operations 55, 56 may be performed in parallel, substantially simultaneously or in any order.
An operation 57 may comprise adapting the first trained computational model to generate a second computational model by training a discriminator to seek to decrease a discriminator loss function, the discriminator loss function being computed using the source and target data items, respectively weighted by the source and target weights, for example as in (1).
The operations 52-57 may be performed on any of hardware, software, firmware or any combination thereof, for example, the operations be implemented, comprising one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations.
As will become clear, the means for generating the target weight and for generating the source weight may be configured to use a probability distribution produced by inputting one or more target data items to the first computational model.
Example methods for generating target and source weights δT, δS will now be described.
Determining Target Weights δT
Referring back to
It is assumed that the classifier 41 will be more confident in its predictions for target data items Xt from shared classes CShared as compared to those from the private classes C′T. This is reasonable because, despite the presence of domain shift, classes in CShared are likely to be closer to the source domain as compared to private classes C′T. Hence, a measure of classifier confidence can be derived as a weighting function to separate shared and private target classes during adaptation using the discriminator 46.
A so-called Maximum Margin (MM) method may be used as a criterion for classifier confidence. Formally, a Margin M may be defined as the difference between the top two SoftMax outputs in the probability distribution ŷ. When the classifier 41 has high confidence about its top prediction, M will be high. On the contrary, when the classifier 41 is less confident, M will be low. However, due to the presence of domain shift between the source and target domains, the margin M obtained on target data items could be noisy and may lead to incorrect target weights δT.
In example aspects, rather than using the margins M or class probabilities directly, adaptation subsystem 36 is configured to filter target data items with very high (and very low) margins M for training another form of classifier model, namely the margin predictor 44 mentioned above. Target data items Xt from private target classes (that will have both covariate shift and no sematic overlap with source classes (i.e., concept shift)) are likely to have very low margins. On the contrary, target data items Xt from shared target classes (that will only have covariate shift, but no concept shift) are likely to have higher margins M. Hence, by filtering the target data items Xt and training the margin predictor 44 based on the filtered target data items, we can derive better target weights δT for the adversarial loss function Ladv 46.
In some aspects, the margin predictor 44 may be configured as a binary classifier, outputting a “1” for high probability of belonging to a shared target class and a “0” for a low probability.
An operation 62 may comprise providing a data item Xt from the target dataset.
An operation 63 may comprise generating, using the first computational model, a probability distribution ŷ over the known source domain classes for a particular target data item.
An operation 64 may comprise determining a confidence level (M) for the particular target data item belonging to a source (i.e. shared) domain class using the generated probability distribution; and
An operation 65 may comprise selecting the particular target data item for the subset if the confidence level (M) is above an upper confidence level threshold or below a lower confidence level threshold.
The operations 62-65 may be performed on any of hardware, software, firmware or any combination thereof, for example, the operations may be implemented, comprising one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations. The one or more processors or controllers may include one or more processing units, such as graphical processing units (GPUs).
In more detail, for a batch of target data items input into the training pipeline represented in
where LBCE denotes the Binary Cross-Entropy Loss. The margin predictor 44 may be iteratively trained to reduce LMP 48.
As is evident, the margin predictor 44 may be trained to predict “1” when it is fed a target data item xthigh with high margin (i.e., high confidence in prediction) and “0” when it encounters a data item xtlow with low margin (i.e., low confidence in prediction). Therefore, the output of the margin predictor 44 may be used directly for the target weight δT as it satisfies the weighting criterion of shared and private classes.
Merely for illustration,
The margin predictor 44 may be implemented, comprising one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations.
Determining Source Weights δS
In example aspects, the source weights δS are determined based on another property of the probability distribution ŷ. Particularly, it is noted that source classes CShared which are shared with the target domain will have higher probabilities in ŷ and the private source classes C′S will have lower probabilities. This is reasonable because target data Xt has no overlap with private source classes, and hence the classifier 41 should estimate low probabilities for C′S.
Thus, by observing the probability distribution ŷ over source classes, it is possible to distinguish shared and private source classes and assign appropriate weights to them also. However, once again, due to domain shift and the presence of private classes, these class probabilities tend to be noisy.
Hence, example aspects follow a similar approach which may include filtering source domain data items xs with class probabilities at the extremes (e.g. top-K classes and bottom-K classes) and then training another form of classifier model, namely the source predictor 43, to predict whether a source data item xs belongs to one of the shared classes or private classes.
For each target data item Xt in a batch B, we may compute class probabilities yi=G(xti)) and average them over the entire batch to obtain a mean class probability vector η.
Then, we may obtain those classes with extreme per-class probabilities (e.g. the top-K and bottom-K classes) by analyzing per-class probabilities in η. This process filters-out potentially noisy classes and provides us with a more robust estimate for CShared.
Having identified the top-K and bottom-K classes, source data items Xs belonging to these classes may be used to train the source predictor 43.
In some aspects, the source predictor 43 may be configured as a binary classifier. A label “1” may be allocated to data items from the top-K classes and a label “0” to the bottom-K classes. The source predictor 43 can be iteratively trained to reduce LSP 49 as follows:
As is evident, the source predictor 43 is trained to predict “1” for source data items in CShared and “0” for those in private classes C′S.
The outputs of the source predictor 43 may be used as the source weights δS.
An operation 92 may comprise providing a batch B of data item Xt from the target dataset.
An operation 93 may comprise generating respective probability distributions ŷ over the source domain classes.
An operation 94 may comprise aggregating the probability distributions ŷ, e.g. averaging them over the batch B.
An operation 95 may comprise identifying a subset of the source domain classes based on the aggregated probability distributions, including a predetermined number of largest value and lowest value classes.
An operation 96 may comprise selecting source data items associated with the identified subset of source domain classes for training the source predictor 43.
The operations 92-96 may be performed on any of hardware, software, firmware or a combination thereof, for example, the operations may be implemented, comprising one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations.
Merely for illustration,
The example source predictor 43 may be implemented, for example, comprising one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations.
The various computational sub models may be implemented, for example, comprising one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations.
For example,
Inference Phase
In an inference phase, given a particular target data item Xt, sometimes referred to as a test data item, its feature representation is computed and provided to the margin predictor 44 to estimate whether is a shared (1) or private (0) class. If it is estimated as belonging to a private class then it is labelled as “unknown”. If it is estimated as a shared class, we compute using the classifier 41 the probability distribution ŷ and output argmax ŷ as its label.
System Architectures
Subsequently, the training manager 106 may be configured to read the source metadata store 116 and identify whether source conditions for a given source domain model are different from target conditions, e.g. above some measurable threshold, in order to enable or trigger adaptation of the source domain model.
Once source and target data items are available, the adaptation system 30 described above may function as described to perform pre-processing, feature extraction and then adaptation. The output is provided by the adaptation system 30 via the training manager 106 to update the current model stored on the encoder 112 which is then deployed, or made effective, for the inference stage whereby target data items can be received and labelled (inference output) as belonging to a particular class, or, if applicable, an unknown inference output is generated. During inference, target data items pass through the pre-processor 108, feature extractor 110 and updated model on the encoder 112 to produce either labelled or unknown inference output for some user application 118.
An example of inference output may include, for an audio (e.g. speech) model, a keyword or phrase based on received speech from a user. Another example of inference output may include, for a vision (e.g. video-based) model, a type of object present in an image. Another example of inference output may include, for an activity-based model, a particular physical activity performed by a user, e.g. running, walking, swimming.
The adaptation system 30 in the
Examples of target systems 104, 121 may include edge devices, such as a home gateway or router with a microphone, a camera and/or smartphone.
In example aspects, the source and target data items may be generated and/or received from one or more electrical or electronic sensors. A sensor may comprise one or more of a microphone, camera, video camera, light sensor, heat sensor, geospatial positioning sensor, orientation sensor, accelerometer and a physiological sensor such as for estimating heart rate, blood pressure, temperature, an electrocardiogram (ECG) or the like. During one or both of the adaptation and the inference stages, target data items may be received in real-time or near real-time. For the adaptation stage, source and target data items may be historical data items stored in one or more data memories.
Example aspects may comprise use of the adaptation system 30 shown in
For example, example aspects may involve audio classification, including, but not limited to speech classification. That is, a source computational model may be trained using a labelled dataset of spoken one or more keywords. The source computational model may comprise a keyword detection classifier. The keyword detection classifier may be configured for use in any computer-based apparatus or method which employs speech recognition based on one, or a sequence of keywords, and which may perform one or more actions in response thereto. An example may comprise a computerized digital assistant that responds to one or more keywords with one or more of an audio response and/or video output. The computerized digital assistant may additionally, or alternatively, perform one or more other responsive functions based on an inference output, such as requesting information, controlling one or more electronic systems or device, for example, a home automation, comprising a lighting, an alarm system and/or a heating system. The computerized digital assistant may be a standalone device or part of a vehicle or craft control system. The source computational model may have been trained using a labelled dataset representing a first spoken accent, e.g. an English accent. If the source computational model is to be deployed to a system for receiving speech in a second accent in the same or a similar language, e.g. a French-English accent, then the accent variability will likely cause domain shift for which the adaptation system 30 can provide an updated computational model.
For example, the source model metadata 116 indicated as stored in
Other phenomena that may cause domain shift in audio/speech classification include, but are not limited to, differences in ambient noise, channel and/or microphone variability and other environmental factors. For example, microphones produced by different manufacturers may produce different audio characteristics. Source (and target) metadata may indicate such differences. A lookup table (LUT) may be accessed to determine whether if different devices have characteristics deemed sufficiently different to require domain adaptation.
Gender may also contribute towards domain shift. For example, if the source computational model is trained using a labelled dataset comprising one or more keywords spoken by one or more females and is to be deployed to a system for receiving speech spoken my males, then domain shift may result. Differences in age can also cause domain shift.
Hence, example aspects may be specifically employed as described herein for enabling adaption of a source computational model for audio (e.g. speech) classification to a target domain, whilst achieving the computational efficiencies disclosed herein.
Other example aspects may involve video classification, including, but not limited to object and/or gesture and/or movement classification. In this context, the term video may include both static and moving images.
For example, a source computational model may be trained using a labelled dataset of images or video clips. The source computational model may comprise an object classifier for identifying from, e.g. RGB pixel data, specific classes of object such as human, man, woman, child, dog, cat, car, boat etc.
In the case of humans, if the source computational model is trained on a particular type of person, e.g. healthy adult, domain shift may result if the target data items relate to different types of people, e.g. young people or even elderly people exhibiting signs of dementia, due to differences in their respective movements.
In all such video applications, variations between the source and target domains may be affected by phenomena such as ambient lighting conditions, camera type, and/or image capture parameters (e.g. sensor resolution, capture rate) for different sensor manufacturers etc.
Hence, example aspects may be specifically employed as described herein for enabling adaption of a source computational model for video classification to a target domain, whilst achieving the computational efficiencies disclosed herein.
Other example aspects may involve fitness or health-related computational models, such as those used to monitor health-related performances of people or even animals for self-assessment or professional evaluation purposes. If a source computational model is trained on a particular type of person, e.g. healthy adult female of a certain age, domain shift may result if the model if the target data items relate to a different type of person e.g. an elderly male.
Other example aspects may involve the use of motion sensors placed on a monitored object, e.g. a person. A source computational model may be trained to identify a particular type of physical activity based on a particular type of motion detected by one or more motion sensors. For example, a motion sensor may be comprised within a smartphone, fitness tracker or smartwatch. Where a user places the motion sensor on, or relative to, their body is a matter of personal preference. Some users prefer to place their smartphone in their thigh pocket, chest pocket or on an arm-band. The different placements may induce domain shift where the source computational model is trained with regard to the thigh pocket placement but is worn using a different placement, e.g. on an arm-band.
Hence, example aspects may be specifically employed as described herein for enabling adaption of a source computational model for fitness and/or health inferences to a target domain, whilst achieving the computational efficiencies disclosed herein.
Evaluation
Use of the above-described adaptation system 30 has been tested on a limited range of speech-based adaptation tasks with results indicating improvements in terms of accuracy gains in the order of between 7-15%.
Neural Networks
Many of the elements described above may be implemented using neural network technology. By way of example,
The system 150 comprises an input layer 151, one or more hidden layers 152 and an output layer 153. At the input layer 151, input data (such as a portion of the target data set) may be received as inputs. The hidden layers 152 may comprise a plurality of hidden nodes, which may be connected in many different ways. At the output layer 153, output data (e.g. target encoder outputs) are generated.
The neural network of the system 150 comprises a plurality of nodes and a plurality of connections between those nodes. The neural network is trained by modifying the nodes, including modifying connections between the nodes and the weighting applied to such connections.
Hardware
For completeness,
The processor 302 is connected to each of the other components in order to control operation thereof.
The memory 304 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD). The ROM 312 of the memory 304 stores, amongst other things, an operating system 315 and may store software applications 316. The RAM 314 of the memory 304 is used by the processor 302 for the temporary storage of data. The operating system 315 may contain code which, when executed by the processor, implements aspects of the algorithms described herein, e.g. those indicated in flow diagrams.
The processor 302 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors. Processor 302 may comprise processor circuitry.
The processing system 300 may be a standalone computer, a server, a console, an apparatus, a user device, a mobile communication device, a smart phone, a vehicle, a vehicle telematics unit, a vehicle Electronic Control Unit (ECU), an IoT device, a sensor, a software application, a communication network, or any combination thereof.
In some example embodiments, the processing system 300 may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications. The processing system 300 may be in communication with the remote server device in order to utilize the software application stored there.
Some example embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialized circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.
The one or more of the modules for implementing the algorithms in the target and/or the source domains described above, which hereafter are referred to generically as processing systems 300 may be performed on any of hardware, software, firmware or a combination thereof, for example, comprising one or more processors or controllers under computer program control for performing operations described herein, for example, an apparatus comprising at least one processor, and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform defined functions and/or operations.
Alternatively, the one or more of the modules for implementing the algorithms in the target and/or the source domains described above, which hereafter are referred to generically as processing systems 300 may be performed by one or more circuitry. As used in this application, the term “circuitry” may refer to one or more or all of the following:
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagrams of
It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present specification.
Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.
Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described example embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.
It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2014302.0 | Sep 2020 | GB | national |