The disclosure generally relates to image classification. More particularly, the subject matter disclosed herein relates to improvements to systems, methods, and apparatus for image classification with domain invariant regularization.
A machine learning model for image classification may be trained using images from one or more source domains, each of which may have corresponding domain-specific features such as styles, lighting conditions, textures, and/or the like. The model may perform well when used to classify images from the same domain or domains on which it was trained, but it may perform poorly when used to classify images from a new target domain that was not used for training. For example, images from a new target domain may have style discrepancies that may result in a domain shift between the source domain(s) and the target domain which may lead to performance degradation.
To solve this problem, a domain adaptation (DA) technique may use data from one or more source domains and a target domain to generate a feature space that is invariant to domains but retains discriminative class information that may be used for classification. This may enable a machine learning model to learn domain invariant features that may be used to classify images from either a source domain or the target domain. One issue with this approach is that it involves the use of images from the target domain to train the model. Thus, a DA technique may not be used to train a machine learning model to classify images from a domain that is not available during training.
In another approach, a domain generalization (DG) technique may train a machine learning model using images from diverse source domains in an attempt to learn a domain invariant feature representation that may be used to classify images in a new target domain. One issue with this approach is that, as the source domains become more diverse, training the model may become more difficult because each domain may contain domain-specific information that may be difficult to separate from domain invariant features.
To overcome these issues, systems and methods are described herein for using and training a machine learning model using domain invariant regularization techniques in which training images having disentangled content-specific and domain-specific feature spaces may be used to train the model to be invariant to domain-specific features. This may be achieved, for example, by using a domain invariant regularization loss function to train the model to compute similar outputs for training images that have the same content-specific features but different domain-specific features. The training images may be generated, for example, using a generative model that may perturb a domain-specific feature of an image from a source domain to create a new image. This may be achieved, for example, by combining a content-specific feature of the source image with a randomly selected style-specific feature. The generative model may be trained using a multi-domain image-to-image translation network in which a classifier may be trained to classify source images and then used to train the generative model to generate domain-translated images having the same class as the source images.
The above approaches improve on previous systems and methods because they may improve the generalization performance of a classification model across domains, including domains that are not available during training.
In an embodiment, a method comprises: receiving an input image; using a domain invariant machine learning model to compute an output based on the input image, wherein the domain invariant machine learning model is trained using domain invariant regularization; and displaying information based on the output.
In an embodiment, a system comprises: one or more processors; and a memory storing instructions which, when executed by the one or more processors, causes a domain invariant machine learning model to compute an output based on an input image, wherein the domain invariant machine learning model is trained using domain invariant regularization.
In an embodiment, a method of training a machine learning model comprises: receiving a plurality of input images; generating a plurality of perturbed input images based on the plurality of input images' and training the machine learning model to be domain invariant using the plurality of input images and the plurality of perturbed input images.
In the following section, the aspects of the subject matter disclosed herein will be described with reference to exemplary embodiments illustrated in the figures, in which:
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the disclosure. It will be understood, however, by those skilled in the art that the disclosed aspects may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail to not obscure the subject matter disclosed herein.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment disclosed herein. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” or “according to one embodiment” (or other phrases having similar import) in various places throughout this specification may not necessarily all be referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. In this regard, as used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not to be construed as necessarily preferred or advantageous over other embodiments. Additionally, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Also, depending on the context of discussion herein, a singular term may include the corresponding plural forms and a plural term may include the corresponding singular form. Similarly, a hyphenated term (e.g., “two-dimensional.” “pre-determined,” “pixel-specific,” etc.) may be occasionally interchangeably used with a corresponding non-hyphenated version (e.g., “two dimensional.” “predetermined,” “pixel specific,” etc.), and a capitalized entry (e.g., “Counter Clock,” “Row Select.” “PIXOUT,” etc.) may be interchangeably used with a corresponding non-capitalized version (e.g., “counter clock,” “row select,” “pixout,” etc.). Such occasional interchangeable uses shall not be considered inconsistent with each other.
Also, depending on the context of discussion herein, a singular term may include the corresponding plural forms and a plural term may include the corresponding singular form. It is further noted that various figures (including component diagrams) shown and discussed herein are for illustrative purposes only, and are not drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, if considered appropriate, reference numerals have been repeated among the figures to indicate corresponding and/or analogous elements.
The terminology used herein is for the purpose of describing some example embodiments only and is not intended to be limiting of the claimed subject matter. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be understood that when an element or layer is referred to as being on, “connected to” or “coupled to” another element or layer, it can be directly on, connected or coupled to the other element or layer or intervening elements or layers may be present. In contrast, when an element is referred to as being “directly on.” “directly connected to” or “directly coupled to” another element or layer, there are no intervening elements or layers present. Like numerals refer to like elements throughout. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
The terms “first.” “second.” etc., as used herein, are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.) unless explicitly defined as such. Furthermore, the same reference numerals may be used across two or more figures to refer to parts, components, blocks, circuits, units, or modules having the same or similar functionality. Such usage is, however, for simplicity of illustration and ease of discussion only; it does not imply that the construction or architectural details of such components or units are the same across all embodiments or such commonly-referenced parts/modules are the only way to implement some of the example embodiments disclosed herein.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this subject matter belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, the term “module” refers to any combination of software, firmware and/or hardware configured to provide the functionality described herein in connection with a module. For example, software may be embodied as a software package, code and/or instruction set or instructions, and the term “hardware,” as used in any implementation described herein, may include, for example, singly or in any combination, an assembly, hardwired circuitry, programmable circuitry, state machine circuitry, and/or firmware that stores instructions executed by programmable circuitry. The modules may, collectively or individually, be embodied as circuitry that forms part of a larger system, for example, but not limited to, an integrated circuit (IC), system on-a-chip (SoC), an assembly, and so forth.
The domain invariant machine learning model 104 may be trained, for example, using domain invariant regularization (DIR). Domain invariant regularization may use training images with domain-specific features from multiple domains to train the domain invariant machine learning model 104 to reduce or eliminate the variability of the output 105 to the domain-specific features of the training images.
The one or more outputs 105 may be in the form of, for example, a classification (e.g., a class label for an input image or a portion thereof, a probability that an input image or a portion thereof belongs to a class, and/or any other type of classification output), a feature identified in an input image (e.g., a type of feature, a location of a feature, and/or the like), a regression output (e.g., a discrete or continuous value that may represent, for example, a characteristic of an input image or a portion thereof), or any other type of machine learning output, or combination thereof.
The system 100 may display information 106 based on the one or more outputs 105. For example, if an output 105 is a classification, the information 106 may include a class label for an input image 101, or a portion thereof, for which the domain invariant machine learning model 104 computed the output 105. As another example, if an output 105 is a classification, the information 106 may include a probability that an input image 101, or a portion thereof, may belong to a class. As a further example, if an output 105 is a classification that may be used for selecting or filtering one or more content-specific features of an input image 101, the information 106 may include the input image 101 (e.g., if the input image 101 passes the selection or filtering process) or a substitute image, notification, warning, and/or the like (e.g., if the input image 101 fails the selection or filtering process).
The system depicted in
Referring to
The domain invariant loss function 208 may also receive a second input ƒ(x2) which may be generated based on a second training image that may be referred to as x2. The second input ƒ(x2) to the domain invariant loss function 208 may be computed by the same machine learning model 204 as the first input ƒ(x1) or a different machine learning model.
The first training image x1 may include a content-specific (e.g., category related) feature and a domain-specific (e.g., style related) feature. The second training image x2 may include a content-specific feature that is the same as, or similar to, the content-specific feature of the first training image x1 but a domain-specific feature that is different from the domain-specific feature of the first image x1. The content-specific features and the domain-specific features may be disentangled. Thus, the first training image x1 and the second training image x2 may preserve content across different domains, thereby enabling the domain invariant loss function 208 to be used to train the machine learning model 204 to be domain invariant.
In an embodiment, the domain invariant loss function 208 may be implemented using one or more expectation functions, distance functions, and/or the like. For example, the inputs ƒ(x1) and ƒ(x2) to the domain invariant loss function 208 may be implemented as probabilities and/or probability vectors, and the domain invariant loss DI may be computed using an expectation function (e.g., based on a probability distribution of the second training image x2), and a distance function (e.g., based on a distance between the inputs ƒ(x1) and ƒ(x2) to the domain invariant loss function 208).
Referring to
The first output ƒ(x) may be applied as an input to a first loss function 307 which may be implemented, for example, using a cross entropy loss function as shown in
Referring to
C=cls+λDI (1)
where λ may control the contribution of the domain invariant loss D relative to the cross-entropy loss cls. The combined loss C may be used to provide training feedback 310 to the machine learning model 304, for example, using gradient descent, backpropagation, and/or the like to update (e.g., modify) one or more parameters such as weights, hyperparameters, and/or the like of the machine learning model 304.
In an embodiment, the second training image {tilde over (x)} may be generated by perturbing the first training image x. For example, x may have a content-specific feature and a disentangled domain-specific feature from a first source domain. The second training image {tilde over (x)} may be generated by combining the content-specific feature of x with a domain-specific feature from a different source domain, thereby preserving content while changing the source domain of the second training image {tilde over (x)}.
Applying the first and second training images x and {tilde over (x)} to the machine learning model 304 and constraining ƒ(x) and ƒ({tilde over (x)}) to be consistent over x and {tilde over (x)} by using the domain invariant loss function 308 may cause the output ƒ({tilde over (x)}) for the generated training image {tilde over (x)} in the second source domain to be close to the output ƒ(x) for the original training image x in the first source domain, thereby training the machine learning model 304 to be domain invariant. Thus, for example, the machine learning model 304 having a classifier function ƒ may be trained to be a domain invariant machine learning model ƒ* as follows:
where C may be a combined loss including the cross-entropy loss cls and the domain invariant loss DI as shown in Eq. (1). In such an embodiment, the parameter λ in Eq. (1) may operate, for example, as a hyper-parameter to control a trade-off between the prediction accuracy of the machine learning model 304 on the original training image x and the consistency of the machine learning model 304 with the generated training image {tilde over (x)}.
The system 400 depicted in
The system 400 may include a generator model 411 and a classifier 404. The classifier 404 is depicted multiple times in
The system 400 may exploit a latent space for the generator model 411 utilizing training images from multiple source domains to capture latent domain-specific features of the training images. The classifier 404 may implement a classification function ƒ and may be trained to be invariant to such domain-specific features (e.g., using a domain invariant loss function 407), thereby making the classifier 404 more robust, for example, when used to classify images from a new domain.
The generator model 411 may generate training images 402 and 403 based on an underlying latent feature space =C×S that may be a product space of two subspaces: (i) a content-specific feature space C (indicated as element 413) that may include semantic (e.g., subject-related content) information (e.g., content-specific features zC∈C); and (ii) a style-specific feature space S (indicated as element 414) that may include domain-related (e.g., style) information (e.g., style-specific features zS∈S).
The content-specific feature space C and the style-specific feature space S may be disentangled such that style changes to an image may preserve the content of the image. For example, when classifying images to discriminate cats versus elephants, different parts of the animals may constitute content, whereas, color, texture, background, lighting conditions, camera lens characteristics, illumination, contrast, saturation, image quality, and/or the like may constitute style.
The generator model 411 may implement a generator function :→ that may map features from the underlying latent feature space to a data (e.g., image) space . During inference operation of the classifier 404, content-related features may be relevant to the classification task whereas style-related features may be irrelevant. Thus, a domain invariant regularization loss function may be used to enforce the classification function ƒ of the classifier 404 to be invariant to the underlying style-specific features of the input images. In the system 400, this may be accomplished, for example, by (1) using training images that may be generated by replacing a style-specific feature zS of a sample image x with a new randomly selected style-specific feature zS∈S to create a new image {tilde over (x)}=(zC,{tilde over (z)}S) in a different style where zC denotes the content-specific feature of the original image x; and (2) constraining (e.g., encouraging or enforcing) the classification function ƒ of the classifier 404 to be consistent over x and {tilde over (x)}.
Referring to
In
Each group in the domain separated arrangement 415 is indicated as 417-n and the corresponding style is represented by a rounded rectangle 418-n, where n uses the same style indicators as the training images 412. For example, in the domain separated arrangement 415, group 417-1 includes training images 412-1-1, 412-2-1, and 412-3-1 which have style n=1 (shaded object) which is represented by a rounded rectangle 418-1.
Each group in the category separated arrangement 416 is indicated as 419-m and the content is represented by a shape 420-m, where in uses the same content indicators as the training images 412. For example, in the category separated arrangement 416, group 419-1 includes training images 412-1-1, 412-1-2, 412-1-3, and 412-1-4 which have content m=1 (square object) which is represented by a square 420-1.
The rounded rectangles 418-n in
Referring to
To generate a second training image, the generator model 411 may combine the same content-specific feature zC used for the first training image x with a different (e.g., randomly selected) style-specific feature {tilde over (z)}S from style-specific feature space S to generate a new image {tilde over (x)} indicated as 403. In this example, the style-specific feature {tilde over (z)}S is shaded object (n=1). Thus, a second training image {tilde over (x)} indicated at 403 includes the triangle content of the first training image x but in the shaded object style. The second training image {tilde over (x)} may be applied to the classifier 404 which may compute an output (e.g., class prediction) ƒ({tilde over (x)}) that, in this example, may be implemented as a probability vector 422.
The classifier output ƒ(x) based on the first training image x may be applied to a cross-entropy loss function 407 to compute a cross-entropy loss cls. The classifier output ƒ({tilde over (x)}) based on the second training image {tilde over (x)} may be applied to a domain invariant regularization loss function 408. In some implementations, the classifier output ƒ(x) based on the first training image x may also be applied to the domain invariant regulation loss function 408 which may compute a domain invariant regularization loss reg based on the classifier outputs ƒ(x) and ƒ({tilde over (x)}). In some implementations, the cross-entropy loss cls and domain invariant regularization loss reg may be combined, for example, as follows:
C=cls+λreg (3)
to generate a combined loss C in a manner similar to Eq. (1). The combined loss C may be used to provide training feedback to the classifier 404, for example, using gradient descent, backpropagation, and/or the like to train the classifier 404 to be invariant to style-specific feature {tilde over (z)}S in the training images.
Thus, the system 400 may implement domain invariant regularization by combining a training image's underlying content-specific (category related) feature zC with randomly selected style-specific (domain related) features {tilde over (z)}S∈S and enforcing the classifier's prediction ƒ({tilde over (x)}) for generated training images {tilde over (x)} to be close to ƒ(x) through the use of a domain invariant regularization loss function 408.
A domain generalization technique according to an embodiment may involve the use of data (e.g., independent and identically distributed (i.i.d.) data) from S source domains {D1, . . . , DS} to train a classifier having a classification function ƒ to generalize well to one or more new (e.g., previously unseen) target domains where no data about a target domain is available during training (which may be referred to as out-of-domain generalization), as well as new data from existing domains that have been used to train the classifier (which may be referred to as (in-domain generalization). Training a robust predictive model (e.g., a classifier) having a classification function ƒ to be invariant across different domains with different data distributions according to an embodiment may involve exploiting one or more statistical invariances across training and test domains to incorporate such invariances into ƒ.
For example, a generative model may exploit a disentangled latent space having a latent subspace (which may also be referred to as a space) that is domain invariant. The generative model may have two independent sources of variation: (1) a style-specific latent space zS∈S containing domain-related information for input data (e.g., training images); and a content-specific latent space zC∈C containing semantic (e.g., subject-related content) information for input data. A generator may implement a model :→ that may map features from a latent feature space =C×S to a data (e.g., image) space X. For a classification task that may predict a label y for a data point x, a content-specific feature zC may be relevant, while a style-specific feature zS may be irrelevant. This may be illustrated, for example, by the causal relationships between an input data sample (e.g., an image) having the form (x, y, and d) and its latent features depicted in
Referring to
The directed arrow 523 from zC to the observed data x (e.g., an image) and the directed arrow 524 from zS to the observed data x indicate that x may be generated based on content and style. The directed arrow 525 from zC to the class label y indicates that content (e.g., shape) zC may influence the class label y, while the absence of a directed arrow from zS to the class label y indicates that style does not influence the class label y. (In some implementations, one or more style features may be correlated to class labels, but not causally related to them). Thus, the content-specific feature zC may include the information used to predict y (e.g., all of the information needed to predict y).
Similarly, the directed arrow 526 from the style-specific feature zS to the domain label d indicates that style zS may influence the domain label d, while the absence of a directed arrow from zC to the domain label d indicates that content does not influence the domain label d. Thus, the style-specific feature zS may include the information used to determine the domain d. Moreover, in some implementations, the absence of a directed path between zC and zS may indicate that zC and zS are marginally independent (e.g., zCzS). Thus, only the content-specific feature zC may be regarded as a plausible feature for predicting y.
Based on the causal relationships illustrated in
(y|zC,zS)=(y|zC). (4)
Thus, given an input image x, manipulating its style feature zS does not influence its class label. Hence, an invariant classifier ƒ* that outputs a probability distribution over the class label space may be consistent with an invariance relationship:
ƒ*((zC,zS))=ƒ*((zC,{tilde over (z)}S)), ∀{tilde over (z)}S∈S. (5)
In some implementations, this relationship may be used to achieve domain invariant prediction, for example, by explicitly enforcing invariance under style perturbations through the use of a domain-invariant regularization. For example, a domain invariant regularization loss function may compute a domain invariant regularization loss reg as follows:
reg=[(ƒ(x),ƒ({tilde over (x)}))],
{tilde over (x)}=(zC,{tilde over (z)}S), zC=(x), {tilde over (z)}S=(d,u). (6)
where denotes an expectation operator, :→C is a content-specific encoder function that maps the sample x to its content-specific feature zC, and :×n→S is a style-specific generator function (where indicates a domain label space and n indicates an n-dimensional set of real numbers) that takes a domain index d and maps an n-dimensional probability vector (e.g. a Gaussian vector) u∈n (where u has the distribution (0,I)) to a point on S of domain d.
(p1,p2) denotes a distance measure between two probability vectors p1 and p2. The L1 distance (e.g., the absolute values of the difference between the classifier's probabilistic outputs) may be used as:
where p1k and p2k denote a probability output of p1 and p2 for class k, respectively. However, any distance measure for distributions may be used in place of the L1 distance. Thus, domain invariant regularization may encourage a classifier ƒ to be invariant to induced semantically irrelevant perturbations in the input data (e.g., images) that may arise from altering the input samples through style perturbations. These perturbations to the input may use a disentangled latent feature that may encode independent controllable factors, for example, where style-specific factors are known to be independent from the class label. Thus, a domain invariant regularization classifier ƒ* may k written as
where λ may be a hyper-parameter that may control a trade-off between the prediction accuracy of the classifier ƒ* on the source samples and the consistency of the classifier ƒ* over the sample perturbations, and cls denotes a multi-class cross-entropy loss function given by
cls=[−log([ƒ(x)]y)] (9)
where [a]i returns the i-th coordinate of a.
Referring to
The original sample image x (indicated as 602) may be applied to a classifier 604 which may compute an output (e.g., class prediction) ƒ(x) that, in this example, may be implemented as a probability vector 621. The new (semantically preserved) image {tilde over (x)} (indicated as 603) may be applied to the classifier 604 which may compute an output (e.g., class prediction) ƒ({tilde over (x)}) that, in this example, may be implemented as a probability vector 622. The classifier 604 is depicted multiple times in
The classifier output ƒ(x) based on the first training image x may be applied to a cross-entropy loss function 607 to compute a cross-entropy loss cls. The classifier output ƒ({tilde over (x)}) based on the second training image {tilde over (x)} may be applied to a domain invariant regularization loss function 608. In some implementations, the classifier output ƒ(x) based on the first training image x may also be applied to the domain invariant regularization loss function 608 which may compute a domain invariant regularization loss reg based on the classifier outputs ƒ(x) and ƒ({tilde over (x)}).
Thus, the classifier ƒ (indicated as 604) may be trained using two losses. The cross-entropy loss cls may cause the classifier ƒ to predict the correct class label for image x, whereas the domain invariant regularization loss reg may cause the classifier ƒ to make similar predictions for images x and {tilde over (x)}.
The operations of the content-specific encoder , the domain-specific generator , the generator mode (which may also be referred to as an image generator), and the classifier ƒ as described above may be repeated with other source sample images x obtained from the source domains 637 to train the classifier ƒ. For example, any or all of the source sample images x obtained from the source domains 637 may be used as inputs 634 to classifier ƒ along with their corresponding newly generated semantically preserved images {tilde over (x)} (indicated as 635). Alternatively, or additionally, instead of applying a source sample image x obtained from one of the source domains 637 directly to the classifier ƒ, the content-specific feature zC extracted by the content-specific encoder may be combined with the style-specific feature zS for the source domain by the image generator to reconstruct the source sample image x, before applying it to the classifier ƒ.
The content-specific encoder may implement a mapping :→C that may map a source sample image x to its content-specific feature zC, by extracting the content-specific feature zC from the image x. In some implementations, the extracted content-specific feature zC may be added to the collection of content-specific feature zC in the content-specific feature space C (indicated as 613). Additionally, or alternatively, the extracted content-specific feature zC may be used to generate more training images such as the training images 412 depicted in
The domain-specific generator may implement a mapping :×n→S where indicates a domain label space and n indicates an n-dimensional set of real numbers. The domain-specific generator may randomly select a domain index d and map an n-dimensional probability vector (e.g., a Gaussian vector) u∈n (where u has the distribution (0,I)) to a point on S of domain d to generate a randomly selected style-specific feature {tilde over (z)}S.
In an embodiment, the content-specific encoder , the domain-specific generator , and the image generator may be pretrained and then fixed during the training of the classifier 604.
Referring to
However, when used with any of the training systems described herein, class information for sample images x from the source domains 737 may be known and used to improve the training of the MI2I network 738. Therefore, the system 700 may include a classifier :→ indicated as 739. The classifier C may be trained to classify sample images x from the source domains 737 using the available class information. Once the classifier is trained, it may then be used to train the content-specific encoder , the domain-specific generator , and/or the image generator indicated in the MI2I network 738.
For example, as shown in
Once the classifier is trained, the MI2I network 738 may use the same and/or other sample images x from the source domains 737 to generate domain-translated (e.g., semantically preserved) images 742 that may be applied to the classifier to compute outputs in the form of probability vectors 743. The probability vectors 743 may be applied to a loss function (e.g., a cross-entropy loss function) 744 to generate a classification loss class() that may be used to provide training feedback (e.g., using gradient descent, backpropagation, and/or the like) to the content-specific encoder , the domain-specific generator , and/or the image generator indicated in the MI2I network 738. Thus, the trained classifier be used to impose constraints on , , and/or to cause them to generate domain-translated images 742 having the same class (and therefore, content) as the sample images x used to generate the domain-translated images 742. Thus, the system 700 may incorporate categorical semantics into the MI2I network 738 to train , , and/or to translate input images to new domains belonging to their own classes by encouraging them to minimize the loss on the generated images.
The classifier is depicted multiple times in
In some implementations, may be considered a style generator function of the MI2I network 738, and ((.)) may be considered an image generator function of the MI2I network 738. Given an image x and its domain label d, any number of the following objective functions of the MI2I network 738 may be implemented for training , , and/or .
Domain Adversarial objective: During training, a latent code u˜(0,I) and a target domain d may be randomly sampled to generate a target (domain-specific) style code is {tilde over (z)}S=(u,d). and may be trained to generate an output image {tilde over (x)}=((x),{tilde over (z)}S) via an adversarial loss using a domain discriminator . may learn to provide a style code {tilde over (z)}S that is likely in the target domain d, and g may be trained to utilize {tilde over (z)}S and generate an image that may be indistinguishable from real images of the domain d. The domain adversarial loss may be formulated as
adv=x,d[log Dd(x)]+x,{tilde over (d)},u[log(1−D{tilde over (d)}({tilde over (x)}))],
x=((x),(u,{tilde over (d)})) (10)
where Dd(.) may denote the output of the discriminator D corresponding to the domain d.
Style reconstruction Objective: To enforce the generator module to utilize the style code {tilde over (z)}S when generating the image {tilde over (x)}, a style reconstruction loss may be employed to learn a mapping E from an image to its style code. The style reconstruction loss may be written as
sty=x,{tilde over (d)},u[∥{tilde over (z)}S−E{tilde over (d)}({tilde over (x)})∥1], {tilde over (x)}=((x), {tilde over (z)}S=(u,{tilde over (d)}), (11)
where Ed(.) may denote the output of the mapping network E corresponding to the domain d.
Style diversification Objective: To enable a generator module to produce diverse images, and may be regularized with a diversity sensitive loss. The regularization term may encourage or force and to explore the image space and discover meaningful style features to generate diverse images. The style diversification loss can be expressed as
sd=x,{tilde over (d)},u
z
1=(u1,{tilde over (d)}), z2=(u2,{tilde over (d)}), (12)
Cycle Consistency Objective: To cause a generated image {tilde over (x)} to properly preserve the domain invariant characteristics (e.g. shape) of its input image x, a cycle consistency loss may be used. This loss may encourage the generator module to preserve the original characteristics of x while changing its style reliably. This loss can be represented as
cyc=x,d,{tilde over (d)},u[∥x−(({tilde over (x)}),s)∥1],
{tilde over (x)}=((x),(u,{tilde over (d)})), s=Ed(x). (13)
Category Classification Objective: Although some MI2I models may learn accurate and diverse transformations between multiple source domains, they may also result in arbitrary mappings as translations may be performed without supervision between domains that share common semantic attributes (e.g. class labels). Some MI2I models may be applied on domains in which a translation may entail small geometric changes and the style of the generated image may be independent of the semantic content in the source sample (e.g., translating horses to zebras). To exploit category labels of source samples, a classification module :→ may be incorporated into an MI2I model as described above. This may be implemented by adding a classification loss function class into the MI2I model as follows:
class=[−log([(x)]y], (14)
During training of an MI2I model, the classifier may only be trained on the actual labeled source samples using the cross entropy loss. and may then be trained to translate input images to new domains belonging to their own classes by encouraging them to minimize the cross entropy loss on the generated images.
Full objective: The various objective functions for the MI2I model described above may be combined as follows:
where λsty, λcyc, λds, and λclass may be hyperparameters for the different terms. In some implementations, during training of the MI2I model, the classifier may only be trained on the actual labeled source samples, whereas and may be trained using the classification losses of generated samples as well as other MI2I loss functions.
Once T, F. and/or G are trained, a classifier ƒ (e.g., classifier 604 in
and Ud(.) may denote a discrete uniform distribution over the index set {1, 2, . . . , S}. For each training image xi, ƒ may be encouraged to (i) correctly predict its class label yi, and (ii) provide a similar prediction with a set of M perturbed images {{tilde over (x)}J}i=1N with the same content as xi under varying styles.
Referring to
The processor 1020 may execute software (e.g., a program 1040) to control at least one other component (e.g., a hardware or a software component) of the electronic device 1001 coupled with the processor 1020 and may perform various data processing or computations.
As at least part of the data processing or computations, the processor 1020 may load a command or data received from another component (e.g., the sensor module 1076 or the communication module 1090) in volatile memory 1032, process the command or the data stored in the volatile memory 1032, and store resulting data in non-volatile memory 1034. The processor 1020 may include a main processor 1021 (e.g., a central processing unit (CPU) or an application processor (AP)), and an auxiliary processor 1023 (e.g., a graphics processing unit (GPU), an image signal processor (ISP), a sensor hub processor, or a communication processor (CP)) that is operable independently from, or in conjunction with, the main processor 1021. Additionally or alternatively, the auxiliary processor 1023 may be adapted to consume less power than the main processor 1021, or execute a particular function. The auxiliary processor 1023 may be implemented as being separate from, or a part of, the main processor 1021.
The auxiliary processor 1023 may control at least some of the functions or states related to at least one component (e.g., the display device 1060, the sensor module 1076, or the communication module 1090) among the components of the electronic device 1001, instead of the main processor 1021 while the main processor 1021 is in an inactive (e.g., sleep) state, or together with the main processor 1021 while the main processor 1021 is in an active state (e.g., executing an application). The auxiliary processor 1023 (e.g., an image signal processor or a communication processor) may be implemented as part of another component (e.g., the camera module 1080 or the communication module 1090) functionally related to the auxiliary processor 1023.
The memory 1030 may store various data used by at least one component (e.g., the processor 1020 or the sensor module 1076) of the electronic device 1001. The various data may include, for example, software (e.g., the program 1040) and input data or output data for a command related thereto. The memory 1030 may include the volatile memory 1032 or the non-volatile memory 1034.
The program 1040 may be stored in the memory 1030 as software, and may include, for example, an operating system (OS) 1042, middleware 1044, or an application 1046.
The input device 1050 may receive a command or data to be used by another component (e.g., the processor 1020) of the electronic device 1001, from the outside (e.g., a user) of the electronic device 1001. The input device 1050 may include, for example, a microphone, a mouse, or a keyboard.
The sound output device 1055 may output sound signals to the outside of the electronic device 1001. The sound output device 1055 may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as playing multimedia or recording, and the receiver may be used for receiving an incoming call. The receiver may be implemented as being separate from, or a part of, the speaker.
The display device 1060 may visually provide information to the outside (e.g., a user) of the electronic device 1001. The display device 1060 may include, for example, a display, a hologram device, or a projector and control circuitry to control a corresponding one of the display, hologram device, and projector. The display device 1060 may include touch circuitry adapted to detect a touch, or sensor circuitry (e.g., a pressure sensor) adapted to measure the intensity of force incurred by the touch.
The audio module 1070 may convert a sound into an electrical signal and vice versa. The audio module 1070 may obtain the sound via the input device 1050 or output the sound via the sound output device 1055 or a headphone of an external electronic device 1002 directly (e.g., wired) or wirelessly coupled with the electronic device 1001.
The sensor module 1076 may detect an operational state (e.g., power or temperature) of the electronic device 1001 or an environmental state (e.g., a state of a user) external to the electronic device 1001, and then generate an electrical signal or data value corresponding to the detected state. The sensor module 1076 may include, for example, a gesture sensor, a gyro sensor, an atmospheric pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an infrared (IR) sensor, a biometric sensor, a temperature sensor, a humidity sensor, or an illuminance sensor.
The interface 1077 may support one or more specified protocols to be used for the electronic device 1001 to be coupled with the external electronic device 1002 directly (e.g., wired) or wirelessly. The interface 1077 may include, for example, a high-definition multimedia interface (HDMI), a universal serial bus (USB) interface, a secure digital (SD) card interface, or an audio interface.
A connecting terminal 1078 may include a connector via which the electronic device 1001 may be physically connected with the external electronic device 1002. The connecting terminal 1078 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (e.g., a headphone connector).
The haptic module 1079 may convert an electrical signal into a mechanical stimulus (e.g., a vibration or a movement) or an electrical stimulus which may be recognized by a user via tactile sensation or kinesthetic sensation. The haptic module 1079 may include, for example, a motor, a piezoelectric element, or an electrical stimulator.
The camera module 1080 may capture a still image or moving images. The camera module 1080 may include one or more lenses, image sensors, image signal processors, or flashes. The power management module 1088 may manage power supplied to the electronic device 1001. The power management module 1088 may be implemented as at least part of, for example, a power management integrated circuit (PMIC).
The battery 1089 may supply power to at least one component of the electronic device 1001. The battery 1089 may include, for example, a primary cell which is not rechargeable, a secondary cell which is rechargeable, or a fuel cell.
The communication module 1090 may support establishing a direct (e.g., wired) communication channel or a wireless communication channel between the electronic device 1001 and the external electronic device (e.g., the electronic device 1002, the electronic device 1004, or the server 1008) and performing communication via the established communication channel. The communication module 1090 may include one or more communication processors that are operable independently from the processor 1020 (e.g., the AP) and supports a direct (e.g., wired) communication or a wireless communication. The communication module 1090 may include a wireless communication module 1092 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 1094 (e.g., a local area network (LAN) communication module or a power line communication (PLC) module). A corresponding one of these communication modules may communicate with the external electronic device via the first network 1098 (e.g., a short-range communication network, such as Bluetooth™, wireless-fidelity (Wi-Fi) direct, or a standard of the Infrared Data Association (IrDA)) or the second network 1099 (e.g., a long-range communication network, such as a cellular network, the Internet, or a computer network (e.g., LAN or wide area network (WAN)). These various types of communication modules may be implemented as a single component (e.g., a single IC), or may be implemented as multiple components (e.g., multiple ICs) that are separate from each other. The wireless communication module 1092 may identify and authenticate the electronic device 1001 in a communication network, such as the first network 1098 or the second network 1099, using subscriber information (e.g., international mobile subscriber identity (IMSI)) stored in the subscriber identification module 1096.
The antenna module 1097 may transmit or receive a signal or power to or from the outside (e.g., the external electronic device) of the electronic device 1001. The antenna module 1097 may include one or more antennas, and, therefrom, at least one antenna appropriate for a communication scheme used in the communication network, such as the first network 1098 or the second network 1099, may be selected, for example, by the communication module 1090 (e.g., the wireless communication module 1092). The signal or the power may then be transmitted or received between the communication module 1090 and the external electronic device via the selected at least one antenna.
Commands or data may be transmitted or received between the electronic device 1001 and the external electronic device 1004 via the server 1008 coupled with the second network 1099. Each of the electronic devices 1002 and 1004 may be a device of a same type as, or a different type, from the electronic device 1001. All or some of operations to be executed at the electronic device 1001 may be executed at one or more of the external electronic devices 1002, 1004, or 1008. For example, if the electronic device 1001 should perform a function or a service automatically, or in response to a request from a user or another device, the electronic device 1001, instead of, or in addition to, executing the function or the service, may request the one or more external electronic devices to perform at least part of the function or the service. The one or more external electronic devices receiving the request may perform the at least part of the function or the service requested, or an additional function or an additional service related to the request and transfer an outcome of the performing to the electronic device 1001. The electronic device 1001 may provide the outcome, with or without further processing of the outcome, as at least part of a reply to the request. To that end, a cloud computing, distributed computing, or client-server computing technology may be used, for example.
Embodiments of the subject matter and the operations described in this specification may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification may be implemented as one or more computer programs. i.e., one or more modules of computer-program instructions, encoded on computer-storage medium for execution by, or to control the operation of data-processing apparatus. Alternatively or additionally, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, which is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer-storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial-access memory array or device, or a combination thereof. Moreover, while a computer-storage medium is not a propagated signal, a computer-storage medium may be a source or destination of computer-program instructions encoded in an artificially generated propagated signal. The computer-storage medium can also be, or be included in, one or more separate physical components or media (e.g., multiple CDs, disks, or other storage devices). Additionally, the operations described in this specification may be implemented as operations performed by a data-processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
While this specification may contain many specific implementation details, the implementation details should not be construed as limitations on the scope of any claimed subject matter, but rather be construed as descriptions of features specific to particular embodiments. Certain features that are described in this specification in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination may in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described herein. Other embodiments are within the scope of the following claims. In some cases, the actions set forth in the claims may be performed in a different order and still achieve desirable results. Additionally, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.
As will be recognized by those skilled in the art, the innovative concepts described herein may be modified and varied over a wide range of applications according to the inventive principles of this patent disclosure. Accordingly, the scope of claimed subject matter should not be limited to any of the specific exemplary teachings discussed above, but is instead defined by the following claims.
This application claims the priority benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No. 63/252,142, filed on Oct. 4, 2021, the disclosure of which is incorporated by reference in its entirety as if fully set forth herein.
Number | Date | Country | |
---|---|---|---|
63252142 | Oct 2021 | US |