LOW-DIMENSIONAL PROBABILISTIC DENSITY OF HIGH-DIMENSIONAL DATA MANIFOLD

Description

BACKGROUND

This disclosure relates generally to computer modeling of high-dimension data spaces, and more particularly to probabilistic modeling of the high-dimensional data in a low-dimensional space.

As machine learning techniques and infrastructures become more sophisticated and increase performance on data sets, machine models are increasingly tasked with processing high-dimensional data sets and to generate new instances (also termed data points). Existing solutions struggle with effectively representing the complete range of high-dimensional data set or in doing so in a low-dimensional space (e.g., representing a manifold of the relatively higher-dimensional data in a lower-dimensional space) while simultaneously permitting effective probabilistic modeling of the data and with an approach that is actually computable (i.e., tractable). For example, while generative adversarial network (GAN) models have been used to learn to generate data in conjunction with feedback from a discriminative model, the generative model can neglect to learn how to generate certain types of content from the training data and do not model underlying probabilities. In other examples, some models like variational autoencoders (VAE) may be used to model high-dimensional data points in low-dimensional spaces without consideration of probabilistic distribution.

Alternative solutions, such as normalizing flows, that do provide probabilistic information maintain the same data dimensionality and do not effectively learn complex high-dimensional spaces in which the high-dimensional data is better characterized as a manifold describable with a low-dimensional representation.

As such, there is a need for an approach to tractably model data points of a high-dimensional space while accounting for a manifold of the data within the high-dimensional space while also providing effective density/probabilistic modeling.

SUMMARY

A computer model provides an approach for describing high-dimensional data in a high-dimensional space as a manifold described by a low-dimensional space and also modeled by a probability distribution. To model the data effectively and tractably, a first transform (also termed a manifold transform) between the high-dimensional space and the low-dimensional space includes one or more conformal flows. Various conformal flows provide operations for transforming data points in the high-dimensional space to the low-dimensional space. The low-dimensional space describing the manifold of the high-dimensional space is termed a low-dimensional manifold space to designate the coordinate system in which the high-dimensional manifold is represented. The manifold transformation (as applied to data points in the high-dimensional space) describes a manifold of the high-dimensional space in the low-dimensional space as a corresponding low-dimensional manifold. To provide density estimation, a second transformation (a density transformation) transforms between the low-dimensional manifold space and a low-dimensional density space, in which a base probability distribution (e.g., a gaussian) is readily determined.

The parameters of the first transformation (the manifold transformation) and the second transformation (the density transform) are learned based on training data in the high-dimensional space. After training, the model may be used to transform to and from the high-dimensional space and the base probability distribution in the low-dimensional density space. For example, a data point from the high-dimensional space may be transformed to the density space for comparison with the base distribution (e.g., to evaluate a new sample with respect to the learned probability distribution as in- or out-of-distribution) or an output of the model may be sampled by sampling a point from the base probability distribution and transforming the sampled point to the high-dimensional space as an output.

By transforming between the high and low-dimensional spaces to represent the manifold of the high-dimensional space, the actual regions of data distribution in the high-dimensional space may be effectively modeled, while the transformation to the density space permits the data to also be modeled with respect to the base probability distribution. In addition, the use of conformal flows enables the transformation between high- and low-dimensional spaces to be tractable, invertible, and include multiple layers (e.g., multiple conformal operations may be sequentially applied and maintain these properties).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a computer modeling system including components for probabilistic modeling of a high-dimensional space.

FIG. 2 shows an example of data points and a learned probability density.

FIG. 3 illustrates a high-dimensional space in which data points lie along a manifold.

FIG. 4 shows an example structure for a probabilistic computer model for modeling high-dimensional data with a manifold and probability density in low-dimensional space, according to one embodiment.

FIGS. 5A-E show example conformal flows in a two-dimensional space.

FIG. 6 shows an example of a manifold and an off-manifold data point.

The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.

DETAILED DESCRIPTION
Architecture Overview

FIG. 1 illustrates a computer modeling system 110 including components for probabilistic modeling of a high-dimensional space. The computer modeling system 110 includes computing modules and data stores for generating and using a computer model 160. In particular, the computer model 160 is configured to represent high-dimensional data with a low-dimensional manifold and as a probability density. The probabilistic computer model 160 is trained by the training module 120 to learn parameters for a model learned probability density describing the training data of training data store 140. Individual training data items are referred to as data points or data instances and may be represented in a “high-dimensional” space. The computer model 160 represents points in the high-dimensional space as a manifold in a low-dimensional space along with a probability density for the data. This enables the model to simultaneously address the appearance of the training data within a sub-region of the high-dimensional space while also enabling effective probabilistic applications for the model. To tractably convert the data from the high-dimensional space to a low-dimensional space, the model uses one or more conformal flows, which enables the transformation between high- and low-dimensional spaces to be effectively learned and allows the learning of a transformation in the low-dimensional space to a base probability density to learn the probability density with respect to the low-dimensional space. In various embodiments, these transformations are jointly learned such that the probability density and the manifold directly reflect the high-dimensional training data distribution.

After training, the sampling module 130 may sample outputs from the probabilistic computer model 160 by sampling a value from a base probability density in a low-dimensional space and transforming the sampled value to an output in the high-dimensional space, enabling the model to generatively create outputs similar in structure and distribution to the data points of the training data 140. Similarly, an inference module 150 may receive a new data point in the high-dimensional space and convert it a point with respect to the base probability density for determination of the expected frequency of the data point given the learned probability density. This may be used to determine, for example, whether the new data point may be considered “in-distribution” or “out-of-distribution” with respect to the trained probability density. Further details of each of these aspects is discussed further below.

FIG. 2 shows an example of data points and a learned probability density 220. In general, data points for which the model is trained are considered to be sampled from an unknown probability density 200. Each of the data points 210 has a set of values in the dimensions of a high-dimensional space, and thus can be considered to represent a position in the high-dimensional space. Formally, the data points 210 may also be represented as a set of points, {x_i} drawn from the unknown probability density p_x*(x). The model is trained to learn a probability density p_x(x) as represented by trained/learned parameters of the computer model based on the data points {x_i}. In many cases, however, high-dimensional data lies on a manifold of the high-dimensional space that may be more effectively modeled when described in a low-dimensional space, such that directly learning a probability density on the high-dimensional data may prove both ineffective and require many parameters to describe in particularly high-dimensional data sets. In general, the high-dimensional space has a number of dimensions referred to as n, and the low-dimensional space has a number of dimensions referred to as m. While the concepts discussed herein may apply to situations in which the high-dimensional space is relatively higher than the low-dimensional space (e.g., m<n), and may thus apply to dimensions of n=3 and m=2, in many cases the high-dimensional space may have tens or hundreds of thousands, or millions of dimensions, and the low-dimensional space may have fewer dimensions by an order of magnitude or more.

FIG. 3 illustrates a high-dimensional space in which data points lie along a manifold. In this example, the high-dimensional space 300 represents image data in two dimensions. Each point of high-dimensional image data represents an image having dimensions that may have a value for each channel (e.g., 3 channels for RGB color) for each pixel across a length and width of the image. Hence, the total dimensional space for an image data point in the high-dimensional space 300 for this example is the image length times the width times the number of channels times the bit length representing the color value: L×W×C×B. Stated another way, each color channel for each pixel across the image can have any value according to the bit length. In practice, however, only some portions of the complete dimensional space may be of interest and are represented in the training set. While the range of the complete high-dimensional image space can be used for any possible image, individual data sets typically describe a range across a subset of the high-dimensional space 300. In this example, a data set of human faces include data points 310A-C. However, many points in the image data space do not represent human faces and may have no visually meaningful information at all, such as data points 320A-C, depicting points in the high-dimensional space that have no relation to the type of data of the human face data set. As such, while the high-dimensional data space 300 may permit a large number of possible positions of data points, in practice data sets (e.g., human faces) represent some portion of the high-dimensional space that may be characterized in fewer parameters (i.e., in lower dimensions). The region of the high-dimensional space may be described as a manifold 330 of the high-dimensional space. As discussed below, the shape of the manifold 330 in the high-dimensional space may be learned and represented in a low-dimensional space to characterize the actual positions of data points in the high-dimensional space 300. The manifold 330 is thus learned to generally describe the “shape” of the data points within the high-dimensional space and may thus be considered to describe constraints on the areas in which data points exist and interactions between them. For example, a data set of human faces may generally exist in a region of possible images in which there is a nose, eyes, mouth, and the image is mostly symmetrical.

FIG. 4 shows an example structure for a probabilistic computer model for modeling high-dimensional data with a manifold and a probability density in low-dimensional space, according to one embodiment. As a general overview, the computer model performs a probability estimation with respect to a base probability density 400 in a low-dimensional density space 410 and models a probability density for a high-dimensional manifold 470 with a low-dimensional manifold density 440 in a low-dimensional manifold space 430. The low-dimensional manifold space 430 is a space that may model a manifold of the high-dimensional data in a low-dimensional space based on the manifold transformation 450. In addition, the low-dimensional manifold space 430, in conjunction with a density transformation 420 from the base probability density 400, provides a low-dimensional manifold density 440 describing density information in the low-dimensional manifold space 430. Probability density information may be determined for the low-dimensional manifold density 440 by applying a density transformation 420 to the base probability density 400. Similarly, the low-dimensional manifold density 440 may be changed to the high-dimensional space 460 by applying the manifold transformation 450 to the low-dimensional manifold density 440.

The individual “spaces” may be considered to represent different coordinate systems for which the manifold transformation 450 and density transformation 420 provide change-of-variable equations for changing coordinates with respect to one space to coordinates with respect to another space. In this sense, the low-dimensional manifold space 430 provides a bridge between 1) the low-dimensional manifold learned for the high-dimensional data and 2) the probabilistic density of the base probability density 400. The transformations may also be referred to as “flows” between the different data representations in the different spaces. Considered this way, the base probability density 400 flows through the density transformation 420 and then the manifold transformation 450 to provide a probability density in the high-dimensional space within the region of the learned high-dimensional manifold 470. As such, while the low-dimensional density space 410 may have the same number of dimensions as the low-dimensional manifold space 430, it represents a distinct coordinate system such that a position in one space must be translated to the other via the appropriate density transformation 420 (e.g., h or h⁻¹). As discussed more fully below, the training data in high-dimensional space 460 and the known base probability density 400 is used to learn the respective transformations and corresponding density distributions, permitting the model to effectively model high-dimensional data probabilistically with a low-dimensional manifold.

In some embodiments, the density transformation 420 and manifold transformation 450 generally apply one or more layers of operations in sequence, and in practice may be applied together without explicit designation of a low-dimensional manifold space 430. As such, the low-dimensional manifold space 430 and the respective low-dimensional manifold density 440 may reflect an intermediate state within a general sequence of transformations between the base probability density 400 in a low-dimensional space relative to a high-dimensional space of a data set/output. The manifold transformation 450 may thus refer to a transformation (one or more functions/layers) that changes the dimensionality of the high-dimensional space to describe an embedding in a low-dimensional space, while the density transformation 420 may refer to a transformation (one or more functions/layers) that retains dimensionality between a known probability density (e.g., the base probability density 400) and the manifold transformation 450.

In this example model structure, the computer model represents data with respect to a high-dimensional space 460, describing the high-dimensional space in which training data exists and for which sampled outputs from the model may be generated. The high-dimensional space may also be formally referred to as custom-character . The model learns a manifold transformation 450 between a high-dimensional manifold 470 and a low-dimensional manifold density 440 in a low-dimensional manifold space 430. The low-dimensional manifold space 430 (and its respective coordinate system) may be referred to as custom-character . The low-dimensional manifold density 440 describes the location of the high-dimensional manifold 470 with respect to the reduced dimensionality of the low-dimensional manifold space 430.

The manifold transformation 450 is includes functions g and inverse g^† for transforming between the high-dimensional space 460 and the low-dimensional manifold space 430. The function g^† transforms points in the high-dimensional space 460 to the low-dimensional manifold space 430: g^†: custom-character →. The inverse function g transforms points in the low-dimensional space 430 to the high-dimensional space 460: g: →. The range of outputs of the manifold transformation g (a subset of the high-dimensional space 460) is the high-dimensional manifold 470. Stated another way, the high-dimensional manifold 470 (as learned by the transformation) is defined by the manifold transformation g applied to coordinates in the low-dimensional manifold space 430: custom-character =g (). As discussed further below, the manifold transformation 450 includes one or more conformal flows, which permit the manifold transformation to be tractable and learnable by automated training processes, along with permitting it to optionally be combined with the density transformation 420 in the training.

Similarly, the low-dimensional density space 410 is also referred to as Z, with a density transformation 420 having functions h and its inverse h⁻¹for transforming between the low-dimensional density space 410 and the low-dimensional manifold space 430, with corresponding equations custom-character =h(Z) and Z=h⁻¹() respectively. In particular, the low-dimensional density space 410 is the coordinate system in which the base probability density 400 may be sampled. The base probability density 400 is a known probability density, such as a Standard multivariate probability density (e.g., a Gaussian). The base probability density 400 is generally continuous, such that the probability density at a particular point may be described by a derivative. In addition, the probability of a region (e.g., a range of points) may be determined by the integral of that region with respect to the base probability density 400. In a standard distribution centered at an origin of the low-dimensional density space, a region having a particular distance from the origin may be evaluated to determine the relative accumulated probability of the points in the distribution having that distance or less to the origin. For example, a point having a distance to the origin corresponding to a 20% accumulated probability reflects that the point is more likely than 80% of the points in the probability distribution; while a point having a distance corresponding to a 95% accumulated probability reflects that the point is less likely than that 95% of accumulated points, and more likely than only 5% of points in the distribution. In other known probability densities, the respective accumulated probability may be determined based on another metric, such as an accumulated probability relative to a mean, median, or mode of the base probability density 400.

As such, the density transformation 420 may be considered as changing the positions in the known probability density to positions of the low-dimensional manifold space 430, such that the probability information of the base probability density 400 may be represented as a low-dimensional manifold density 440. As one application, points may be sampled from the base probability density 400 and transformed to the low-dimensional manifold density 440 with the density transformation 420. Similarly, points in the low-dimensional manifold density 440 may be transformed to the base probability density 400 for calculation of the respective likelihood of the point in the base probability density 400. In some embodiments, the density transformation 420 is a bijective flow.

Generally, the density transformation and the manifold transformation are invertible, continuous, and differentiable, such that the base probability density 400 may be converted forward to the respective manifolds while providing the equivalent probabilistic volume in the translated coordinate spaces. That is, the differential probabilistic volume dz of a point z in low-dimensional density space custom-character should remain equivalent when converted to positions in low-dimensional manifold space 430 () and high-dimensional space 460 (). Thus, the volume over equivalent regions across , , and conserve the same probabilistic value. Similarly, the transformations should be invertible such that the transformations can be learned based on the training data in the high-dimensional manifold 470.

In the discussion below, the various spaces and transformations may be referred to with reference numbers as shown in FIG. 4 or with respective symbols. Table 1 provides a correspondence table for the avoidance of ambiguity:

Name
Ref No.
Symbol

High-dimensional space
460

custom-character

High-dimensional manifold
470

custom-character

Low-dimensional manifold space
430

custom-character

Low-dimensional density space
410

custom-character

Base probability density
400
p_z( )

Low-dimensional manifold density
440
P_u( )

Density Transformation
420
h: custom-character

→

hr⁻¹:

→

Manifold Transformation
450
g: custom-character

→

∈

g^†:

→

In general cases, the transformations of such spaces have been difficult to learn, and in many cases may be intractable and cannot be automatically solved in the general case by a trained model. In particular, the transformations are generally smooth and account for the volumetric change in density as the coordinates are transformed across spaces. This may be particularly challenging when converting from custom-character to as the number of dimensions increases from the low-dimensional manifold space 430 to the high-dimensional space 460. In the general case, the differential change in volume du when changing variables to the n-dimensional space from the m-dimensional low-dimensional manifold space custom-character may be expressed by an n×m Jacobian matrix J_g, describing the differential change in variables of the coordinates in relative to differential change in variables of . To determine the change in probability density of p_u(u) for a point u in when converted to corresponding coordinates of custom-character as point x as a probability density P_x(x), the instantaneous change in density across coordinate spaces can be described by a change in volume of relative to the change in volume in :

$\begin{matrix} \frac{\partial X}{\partial U} (u) = \sqrt{\det [J_{g}^{T} (u) J_{g} (u)]} & Equation 1 \end{matrix}$

Eq. 1 shows that the Jacobian J_g(u) of the transform g, and its transpose J_g^T(u) are multiplied to obtain a square matrix with respect to the coordinates of custom-character , for which the determinant can be determined as a scalar. As shown in Eq. 1, the square root of the determinant of the Jacobian transpose J_g^Tmultiplied by the Jacobian J₉may be used to determine the equivalent probability density when converting from to (more precisely, the instantaneous change in probability density volume at point u when converted from a volume in custom-character to a volume in at the corresponding point x). Generally, for the probability density of points u on the low-dimensional manifold density 440, the probability density for points x may thus be determined by converting points in to , determining the probability density in and converting the density volume to custom-character per Equation 1:

$\begin{matrix} p_{x} (x) = p_{u} (u) {❘ \det [J_{g}^{T} (u) J_{g} (u)] ❘}^{- \frac{1}{2}} & Equation 2 \end{matrix}$

In Equation 2, as an abbreviation, “u” may represent the conversion of a point x in custom-character to the low-dimensional manifold space 430 (i.e., ) with the high-to-low density manifold transformation 450: u=g^†(x). For example, p_u(g^†(x)) was substituted as p_u(u) in Equation 2. As such, the probability density for is defined in Eq. 2 after converting points in high-dimensional manifold space custom-character to low-dimensional manifold space , determining the probability density in and applying Eq. 1 to the density to determine the equivalent change in density volume after change-of-variables back to the high-dimensional space .

Similarly, and more simply, the probability density p_uof points in custom-character based on the probability density p_zin may be simplified when the low-dimensional spaces have the same dimensionality, and can be given by:

P
_u(u)=p_z(h⁻¹(u))|det J_h(h⁻¹(u))|⁻¹ Equation 3

To combine the density transformation 420 and manifold transformation 450 in equations 2 and 3 for transforming the base probability density 400 in custom-character to applies transformations g and h sequentially: g∘h. Applying the chain rule J_g∘h=J_gJ_hprovides a determinant det [J_h^TJ_g^TJ_gJ_h]=(detJ_h)²det[J_g^TJ_g] due to the square Jacobian of the outer h density transformation 420. As such, the probability density of the high-dimensional manifold 470 in the high-dimensional space custom-character may be defined as:

$\begin{matrix} p_{x} (x) = p_{z} (z) {❘ \det J_{h} (z) ❘}^{- 1} {❘ \det J_{g}^{T} (u) J_{g} (u) ❘}^{- \frac{1}{2}} & Equation 4 \end{matrix}$

In Equation 4, points “z” may represent points x transformed to Z: z=h⁻¹(g^†(x)). In this formulation, however, to properly learn both the manifold itself and its density transformation, the transformations may be trained to maximize a log-likelihood of the transformations based on the training data. However, the log det [J_g^TJ_g] term is generally intractable in training and cannot be effectively learned, preventing effective automated machine learning for the general case.

To enable this term to be tractable (i.e., computable) and to permit layering of individual transformational layers (e.g., such that g includes layers g₁, . . . g_k: g=g₁g_k), the manifold transform includes one or more conformal flows. A conformal flow is a transformation in which the Jacobian satisfies:

J
_g
^T(u)J_g(u)=λ²(u)I_m Equation 5

As shown in Eq. 5, the Jacobian transpose multiplied by the Jacobian in a conformal flow are equal to a scalar λ (a function of u) squared and an Identity matrix of m (the dimensionality of the origin space, here the low-dimensional manifold space custom-character ). The relationship of Eq. 5 is also illustrated in the following matrix in which m=3 (i.e., I_m=I₃):

$J_{g}^{T} J_{g} = (\begin{matrix} λ^{2} (u) & 0 & 0 \\ 0 & λ^{2} (u) & 0 \\ 0 & 0 & λ^{2} (u) \end{matrix})$

The scalar λ is non-zero and may be referred to as the conformal factor. By selecting layers of the manifold transformation 450 as conformal flows, multiple such layers may be sequentially applied (g₁then g₂etc.) and the transformation becomes tractable for automated learning of the manifold transformation with the density transformation. With conformal flows, the probability density of custom-character transformed from (Equation 2) simplifies to a transformation based on the scalar as shown in Equation 6:

p
_x(x)=p_u(u)λ^−m(u) Equation 6

FIGS. 5A-E show example conformal flows in a two-dimensional space. As shown in these figures, the transformation of a space is shown along with a field showing the relative movement within a space. In FIG. 5A, a translation moves points a constant amount. FIG. 5B shows an orthogonal transformation, in which points are rotated about an axis (e.g., the origin). FIG. 5C shows scaling, in which points are scaled outward or inward from an origin. FIG. 5D shows a special conformal transformation (“SCT”) in which an inversion is followed by a translation and then another inversion. FIG. 5E shows an inversion, in which points are inverted, e.g., about a unit circle. As shown by these example conformal flows, another property of conformal flows is that orthogonal intersections between lines remain orthogonal after transformation. Stated another way, conformal flows preserve local angles during transformation.

Using a transformation with conformal flows, the transformation of the probability density from custom-character to from Equation 4 simplifies to:

p
_x(x)=p_z(z)|det J_h(z)|⁻¹λ^−m(u) Equation 7

As shown in Equation 7, the density and manifold inverse transforms are applied to convert points in custom-character to as before, the Jacobian of the density transform J_hremains, while the Jacobian of the manifold transform is simplified to the scalar λ^−m(u) term as a function of u and the dimensionality m. As discussed below, this also permits a mixed loss function that permits joint training of the density transformation and manifold transformation because the probability density conversion between custom-character and is tractable. It is possible when sequentially training the transformations (e.g., the manifold transformation followed by the density transformation) for the manifold transformation to learn a configuration that is not effectively learned by the density transformation relative to other possible manifold transformations. Because the transforms custom-character to are tractable, the two may be jointly learned and thus increase the likelihood that the manifold transform learns a configuration effective for representing the density.

Conformal Layers

The manifold transform may include a number of layers (e.g., of individual transform operations) that together transform and change the dimensionality from the high-dimensional space to the low-dimensional space. In one embodiment, the manifold transform includes one or more of the transformations shown in FIGS. 5A-E, namely translation, orthogonal transformation, inversion, scaling, and SCT (special conformal transform). The manifold transform may include various layers performing individual transformational operations. The layers may include operations that change the dimensionality of the input and output (e.g., the transformational matrix is non-square) and layers which maintain the dimensionality through the operation (e.g., the transformational matrix is square). The layers are parametrizable conformal flows such that the layers maintain the simplification shown by Equations 6 and 7 and the respective parameters may be learned during training. As layered conformal flows maintain the conformality, many such layers may be stacked to modify dimensionality between the low-dimensional space to the high-dimensional space while learning the parameters describing the manifold in custom-character and maintaining the conformal properties through the layers of the manifold transformation 450 (e.g., the complete sequence of layers in g and its inverse).

The layers that preserve dimensionality may include the transformations shown in FIGS. 5A-5E, namely, translation, orthogonal transformation, inversion, scaling, and SCT. These layers provide transforms for transforming an input space u to an output space v, and are parametrizable with respective scalar values as shown in Table 1:

TABLE 1

Conformal Mappings

TYPE
FUNCTIONAL FORM
PARAMS
INVERSE
λ(u)

Translation
u custom-character

u + a
a ϵ custom-character

^d
v custom-character

v − a
1

Orthogonal
u custom-character

Qu
Q ϵ O(d)
v custom-character

Q^Tv
1

Scaling
u custom-character

λu
λ ϵ custom-character

λ⁻¹v
λ

Inversion
u custom-character

u/∥u∥²

v custom-character

v/∥v∥²
∥u∥⁻²

SCT

u \mapsto \frac{u + {a [[u]]}^{2} b}{1 - 2 b \cdot u + {{[[b]]}^{2} [[u]]}^{2}}

b ϵ

v \mapsto \frac{v + {a [[v]]}^{2} b}{1 + 2 b \cdot v + {{[[b]]}^{2} [[v]]}^{2}}

1 − 2b · u + ∥b∥²∥u∥²

Each of these conformal mapping is briefly discussed in turn.

The translation may learn a parameter a describing the relative movement of points in the input as a shift relative to the origin, which may be inverted by subtracting the value of a values.

The orthogonal transformation uses matrix Q to rotate about an origin. The matrix Q as a parameter for the orthogonal transformation is selected from the orthonormal matrices O(d) (of the respective layer dimensionality d) that preserve local angles and where Q multiplied by its transpose yields the identity (QQ^t=I_d). The orthogonal matrix Q may be parameterized for training, including the use of a Householder matrix and by parameterizing the special orthogonal group with a matrix exponential of skew-symmetric matrices. Equation 8 shows a definition of a Householder Matrix in which v may be learned for constructing Q:

$\begin{matrix} Q = I - 2 \frac{v v^{T}}{{❘ ❘ v ❘ ❘}^{2}} (v \in ℝ^{m}) & Equation 8 \end{matrix}$

In the skew-symmetric parameterization, Q may be parameterized with Equation 9:

Q=exp(A)(A^T=−A) Equation 9

The scaling transform increases or decreases the distance of point from the origin based on the scaling amount.

The inversion inverts the values of points about a distance from the origin, typically but not always the unit distance. In one embodiment, the inversion may be numerically instable, such that the SCT may be used as an alternative. As discussed above, the SCT (special conformal transform) includes, sequentially, an inversion followed by a translation followed by an inversion.

As such, to learn a conformal mapping at a particular dimensionality (without changing the dimensionality), the translation, orthogonal transform, scaling, and inversion layers may be stacked.

Various transforms may also be used to modify the dimensionality of the input and output for a layer. For example, the example transforms above may also be modified to versions which modify dimensionality while maintaining conformal properties. As additional examples, a layer may include non-square matrices with orthonormal columns (which are conformal) to modify the dimensionality of a layer. As another example, a layer may include zero-padding to modify the dimensionality of a layer by adding zeros in additional dimensions. By following the zero-padding layer with additional transformations, the additional dimensions in a relatively higher-dimensional space may be populated based on information from the lower-dimensional layers.

As an additional example, a layer may include convolutions within the transformation. As one embodiment of a conformal convolutional (which is also invertible), the convolutional layer may include a k×k convolution with a stride of k, such that the convolutional layer has a block diagonal Jacobian. The layer may thus implement a set of convolutional filters that together form an orthogonal matrix to provide a conformal layer. Similarly, the blocks may be inverted with a transposed convolution of the same filter.

In addition, to account for additional types of manifold transform layers, the conformality in some embodiments may be relaxed and allow layers which are not completely smooth. As one example, a manifold transformation layer may be required to be conformal with respect to regions of custom-character to which the density transformation may transform points from . I.e., conformal at h(z), such that g(u) remains conformal from the positions of corresponding to h(z). As another example, the conformal layers may include piecewise conformal layers, such as a piecewise activation (ReLU) layer or a conditional Orthogonal layer. Examples of these piecewise layers are shown in Table 2:

TABLE 2

Piecewise Conformal Embeddings

TYPE
FUNCTIONAL FORM
PARAMS
LEFT INVERSE
λ(u)

Conformal ReLu

u \mapsto ReLU [\begin{matrix} Qu \\ - Qu \end{matrix}]

Q ϵ O(d)

[\begin{matrix} v_{1} \\ v_{2} \end{matrix}] \mapsto Q^{T} (v_{1} - v_{2})

1

Conditional Orthogonal

u \mapsto {\begin{matrix} Q_{1} u & if  u  < 1 \\ Q_{2} u & if  u  \geq 1 \end{matrix}

Q₁, Q₂ϵ O(d)

v \mapsto {\begin{matrix} Q_{1}^{T} u & if  v  < 1 \\ Q_{2}^{T} u & if  v  \geq 1 \end{matrix}

As shown by the foregoing discussion, many types of conformal flows (e.g., individual layers) may be included while providing the simplified and overall tractable transformation for the manifold transformation that was not previously effective to analyze. While conformal layers provide some constraint on the types of transform that may be considered in modeling the manifold in the high-dimensional space, the various types of transforms permit complex transformations of the space while reducing the dimensionality. The model structure may include a large number of different layers for which parameters are learned and may be constructed according to the particular type of data.

Training

The parameters of the model may be learned to optimize the manifold transform, which characterizes the manifold of the high-dimensional space, and the density transform, which characterizes the probability density on the manifold. Thus, generally, the transforms must achieve two objectives: align the learned manifold with the training data and evaluate densities for off-manifold points.

FIG. 6 shows an example of a manifold 610 and an off-manifold data point 600. As shown, the data point x 600 is not accurately captured by the manifold. As one way of describing the error in the manifold transformation, the high-dimensional training data may be converted to the low-dimensional space with the inverse transform and then re-converted to the high-dimensional space with g (g^†(x)). As a result, the point converted back to high-dimensional space will be located on the respective manifold according to the values of the transform, allowing a reconstruction loss to be described by the difference between the original position of x and its position when the manifold transforms and its inverse are applied. In one embodiment, the manifold transformation may be learned based on minimizing such a reconstruction error given the high-dimensional points in the training set. The density manifold may then be sequentially learned to describe the probability density with respect to the low-dimensional manifold based on the manifold transform applied to the training data.

However, when using conformal flows, because the manifold transform is tractable, it may be jointly learned in conjunction with the density transform. As one example training loss, the loss may be defined as:

custom-character =_x˜p_x_*[−log p_x(x)+α∥x−g(g^†(x))∥²] Equation 10

As shown in Equation 10, the loss may minimize the log-likelihood directly for the manifold transformation, a result which is now possible because the manifold transformation is actually computable, allowing the parameters for both the density transform and manifold transform to be jointly learned. In some embodiments, the manifold transform may be initialized with reconstruction loss before applying the joint loss such that the low-dimensional manifold has a more effective starting value for the joint learning. As another example training approach, the transforms may be trained a loss function to minimize the Wasserstein distance between the training data distribution and the learned probability density.

Model Application

After training the model, the model may then be used for inference or sampling by the inference module 150 and sampling module 130, respectively. To perform inference, a new data point may be converted through the transforms to the low-dimensional density space custom-character and compared with the probability density (e.g., the accumulated probability) to determine the respective likelihood of the point relative to the training data. This may be used, for example, to describe the relative portion of points that are more or less likely than the new point, or to determine whether the point may be considered to be in or out of distribution based on its likelihood. To perform sampling from the model, a point may be sampled from the base probability density and passed through the transforms to a point in the high-dimensional space, which may be output as a sample of the model.

The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.

Some portions of this description describe the embodiments of the invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, hardware, or any combinations thereof.

Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.

Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

Embodiments of the invention may also relate to a product that is produced by a computing process described herein. Such a product may comprise information resulting from a computing process, where the information is stored on a non-transitory, tangible computer readable storage medium and may include any embodiment of a computer program product or other data combination described herein.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.

Claims

1. A system for probabilistic manifold modeling, comprising: a processor; anda computer-readable medium having instructions executable by the processor for: identifying a high-dimensional output space;identifying a low-dimensional space with a base probability distribution;applying a first transformation comprising one or more conformal flows between the high-dimensional output space and a first position in the low-dimensional space, the first transformation describing a manifold of the high-dimensional output space in the low-dimensional space; andapplying a second transformation between the first position and a second position corresponding to the base probability distribution in the low-dimensional space.
2. The system of claim 1, wherein the first transformation consists of one or more conformal flows.
3. The system of claim 1, wherein the high-dimensional space is an image space having dimensions describing a plurality of pixels at a resolution.
4. The system of claim 1, wherein the instructions are further executable for learning the first transformation and second transformation based on a training set of data points in the high-dimensional space.
5. The system of claim 4, wherein the first transformation and second transformation are jointly learned.
6. The system of claim 1, wherein the instructions are further executable for determining the second point by sampling from the base probability distribution; and wherein applying the first and second transformation comprises applying the second transformation to the second point to determine the first position and applying the first transformation to the first position to generate a sampled output in the high-dimensional output space.
7. The system of claim 1, wherein the instructions are further executable for: receiving a test data point in the high-dimensional output space, the first transformation being applied to the test data point to determine the first position and the second transformation being applied to the first position to determine the second position; anddetermining a likelihood of the test data point with respect to an unknown distribution in the high-dimensional output space based on a likelihood of the second data point with respect to the base distribution.
8. A method for probabilistic manifold modeling, comprising: identifying a high-dimensional output space;identifying a low-dimensional space with a base probability distribution;applying a first transformation comprising one or more conformal flows between the high-dimensional output space and a first position in the low-dimensional space, the first transformation describing a manifold of the high-dimensional output space in the low-dimensional space; andapplying a second transformation between the first position and a second position corresponding to the base probability distribution in the low-dimensional space.
9. The method of claim 8, wherein the first transformation consists of one or more conformal flows.
10. The method of claim 8, wherein the high-dimensional space is an image space having dimensions describing a plurality of pixels at a resolution, each pixel having one or more color channels.
11. The method of claim 8, further comprising learning the first transformation and second transformation based on a training set of data points in the high-dimensional space.
12. The method of claim 11, wherein the first transformation and second transformation are jointly learned.
13. The method of claim 8, further comprising determining the second point by sampling from the base probability distribution; and wherein applying the first and second transformation comprises applying the second transformation to the second point to determine the first position and applying the first transformation to the first position to generate a sampled output in the high-dimensional output space.
14. The method of claim 8, further comprising: receiving a test data point in the high-dimensional output space, the first transformation being applied to the test data point to determine the first position and the second transformation being applied to the first position to determine the second position; anddetermining a likelihood of the test data point with respect to an unknown distribution in the high-dimensional output space based on a likelihood of the second data point with respect to the base distribution.
15. A non-transitory computer-readable medium for probabilistic manifold modeling, the non-transitory computer-readable medium comprising instructions executable by a processor for: identifying a high-dimensional output space;identifying a low-dimensional space with a base probability distribution;applying a first transformation comprising one or more conformal flows between the high-dimensional output space and a first position in the low-dimensional space, the first transformation describing a manifold of the high-dimensional output space in the low-dimensional space; andapplying a second transformation between the first position and a second position corresponding to the base probability distribution in the low-dimensional space.
16. The non-transitory computer-readable medium of claim 15, wherein the first transformation consists of one or more conformal flows.
17. The non-transitory computer-readable medium of claim 15, wherein the high-dimensional space is an image space having dimensions describing a plurality of pixels at a resolution, each pixel having one or more color channels.
18. The non-transitory computer-readable medium of claim 15, wherein the instructions are further executable for learning the first transformation and second transformation based on a training set of data points in the high-dimensional space.
19. The non-transitory computer-readable medium of claim 15, wherein the instructions are further executable for determining the second point by sampling from the base probability distribution; and wherein applying the first and second transformation comprises applying the second transformation to the second point to determine the first position and applying the first transformation to the first position to generate a sampled output in the high-dimensional output space.
20. The non-transitory computer-readable medium of claim 15, wherein the instructions are further executable for: receiving a test data point in the high-dimensional output space, the first transformation being applied to the test data point to determine the first position and the second transformation being applied to the first position to determine the second position; anddetermining a likelihood of the test data point with respect to an unknown distribution in the high-dimensional output space based on a likelihood of the second data point with respect to the base distribution.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of provisional U.S. application No. 63/210,957, filed Jun. 15, 2021, the contents of which are incorporated herein by reference in their entirety.

Provisional Applications (1)

	Number	Date	Country
	63210957	Jun 2021	US

LOW-DIMENSIONAL PROBABILISTIC DENSITY OF HIGH-DIMENSIONAL DATA MANIFOLD

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATION

Provisional Applications (1)