Tensors (or arrays) are computational and/or mathematical objects employed for organizing, structuring, and storing “atoms” of data. An array (or tensor) may include one or more cells (or components), where each cell stores an “atom” of data (e.g., discrete values of data). An atom of data may include a single value, e.g., an integer, a float, a double, a char (e.g., a single character data type), a string of chars, or the like. A data atom may be a real number (e.g., a rational number if represented or stored via digital means), a complex number, or other such value. The cells or components of an array may be uniquely indexed (e.g., addressed) via a set of indices, where each index of the set of indices corresponds to a single dimension of the array. The number of indices required to uniquely refer to (or address) any particular data atom (e.g., a value) or cell is often referred to the order, rank, degree, or way of the array or tensor. Thus, the number of indices required to access a particular cell is equivalent to the order or rank of the array. The order (or rank) of an array may take on the value of any non-negative integer, without upper bound, i.e., order=0, 1, 2, 3, . . . .
For example, a 0th-order tensor may represent (and/or store) a scalar object, a 1st-order tensor may represent a vector object, and a 2nd-order tensor may represent a matrix object. The order or rank of an array may be referred to as the dimensionality (e.g., D) of the array. For instance, a 1D array may store a (multi-dimensional) vector, a 2D array may store a M×N matrix, and the like. Note that the components of each “dimension” or 1D slice of the array may store a multi-dimensional object (e.g., a multi-dimensional vector). The dimensionality (e.g., the number of components/cells) associated with a particular 1D slice of an array (e.g., the portion of the array that is referenced by selecting a single index to vary, while holding all of the other indices constant) may be referred to as the length or depth of the corresponding 1D slice or dimension.
Multi-dimensional data (and thus tensors or arrays) are ubiquitous in computational settings. For example, multi-dimensional arrays are employed to store images, videos, numerical ratings, social networks, knowledge bases, and other such multidimensional data. At least due to the combinatorial explosion inherent with increasing dimensionality, an array may be large (e.g., the array includes a significant number of dimensions and/or a significant length or depth is associated with one or more of its dimensions).
Oftentimes, when accessed for processing, such a large array may be incomplete. That is, the data to be stored in a significant number of the array's cells is not (at least yet) available, or has yet to be generated, acquired, and/or collected. For such an incomplete array, a first portion of its cells may store “relevant” atoms of data, and an incomplete portion of its cells do not (yet) store a relevant value. For instance, when allocated, the cells may be initialized to store a “null” or “zero” value (e.g., a non-relevant value). As relevant data is collected, the relevant values may be stored in its corresponding cell. Thus, the array may be “filled-in” over time as data is generated or collected. Arrays or tensors that are significantly incomplete (i.e., a significant portion of their cells do not store a relevant value) may be referred to as “sparse” arrays or tensors.
The technology described herein is directed towards enhanced methods and systems for the prediction of multi-dimensional data via tensor-completion models. Some embodiments employ various statistical influence and data augmentation techniques to select certain cells of a tensor for augmentation and additionally employ one or more neural-based tensor-completion models to predict the multi-dimensional data.
Given a tensor with cells associated with a set of entities, various embodiments initially train a first machine learning model (e.g., a neural tensor-completion model) using cells from the tensor. Some non-limiting embodiments utilize influence functions to estimate the importance of at least a portion of the tensor's cells on minimizing loss at intervals during training of the first machine learning model. The influence functions may be employed to determine the importance for any cell of the tensor, including training cells, test cells, or any other cell (or component) of the tensor. In some embodiments, the importance for each of the training cells is estimated, via one or more influence functions. That is, some embodiments determine a cell-importance metric for each cell used to train the first machine learning model. Various embodiments compute the importance (e.g., entity-importance metric) of each entity associated with the tensor by aggregating the cell-importance metrics of the entity's associated training cells. For example, to compute the entity-importance metric associated with a particular entity of the tensor, the cell-importance metrics for each of the particular entity's associated cells may be combined to generate the entity-importance metric for the particular entity. The importance of an entity (e.g., as encoded by its corresponding entity-importance metric) signifies the entity's impact in reducing the prediction error.
Some embodiments select cells to augment and generate new data points (e.g., augmented cells) by sampling entities proportional to their entity-importance metrics. The new or augmented cells may form an augmented tensor. Values of the augmented tensor cells are predicted via a trained machine learning method (e.g., either the first machine learning model or a second machine learning model). This influence-based sampling of entities is employed to generate augmented data points by using important entities (e.g., as weighted by their entity-importance metrics in the sampling process), and thus, can lead to higher test prediction accuracy than conventional tensor completion methods, with associated decreased time and/or space complexities.
Many existing data-analysis schemes assume an input of a complete tensor or array. That is, these data-analysis schemes may fail when the input is a sparse array. Thus, investigations into tensor or array completion tasks (i.e., the task of predicting missing relevant values for an incomplete tensor's incomplete cells) constitute an active area of research. However, many conventional tensor completion methods lead to inaccurate or noisy predictions. Furthermore, as a consequence of the potential combinatorial explosion of large arrays, the time and/or space complexity of many such conventional methods results in computation-times that render the widespread adoption of these conventional methods infeasible or impractical.
The technology described herein is generally directed towards tensor or array completion methods and systems. As used throughout, the terms “tensor” and “array” may be employed interchangeably to refer to a mathematical and/or computational object that structures and/or stores atoms of data (e.g., data atoms) in cells or components of the tensor or array. As noted above, conventional tensor completion methods often impute significant errors when predicting missing values for incomplete tensors. Even if a particular conventional tensor completion method proves sufficiently accurate when predicting particular missing values for a particular incomplete tensor, the computational time and/or space complexities of such a conventional tensor completion method may be significant enough to render the wide deployment of the conventional method impractical or infeasible. The various embodiments overcome these and other limitations of conventional tensor completion methods, and provide enhancements over conventional methods, at least due in part to the statistical influence and data augmentation methods discussed throughout.
Various embodiments receive, as input, a “target” tensor that is an incomplete tensor and provide, as output, a complete (or near-complete) tensor, where the previously incomplete (or “empty”) cells are now storing relevant values. The relevant values for the newly-populated cells have been predicted and/or inferred (e.g., “interpolated”) from the relevant values stored in the target tensor, via the tensor completion methods and systems discussed herein. Some embodiments employ one or more tensor-completion (or array-completion) models to predict the missing relevant values. In various embodiments, a tensor-completion model is implemented by one or more neural networks, and thus may be referred to as a neural tensor-completion model. The one or more neural tensor-completion models may be trained via training data. As discussed throughout, data augmentation of the training data may be applied to generate more accurate predictions of the missing relevant values. That is, the embodiments leverage the strength of neural tensor-completion methods, and improve such neural tensor-completion methods via statistical influence and data augmentation.
To achieve such enhancements over conventional methods, various statistical methods, known as influence (or influence functions) are applied in the training of one or more tensor-completion models to determine “influential” training data points. During a data augmentation stage (of an automated pipeline implemented by the various embodiments), the entities of the training data may be sampled (weighted according to its associated influence) to generate augmented data points. Data augmentation increases the generalization capability of a model by generating new data points (e.g., augmented data points) for training a tensor-completion model. The augmented data points may be employed during the training of a second tensor-completion model, or to further train the initial completion model. Once trained, the completion model may predict or infer relevant values for the incomplete portions of the input target tensor.
In one non-limiting embodiment, the incomplete target tensor stores data that is employed in a recommendation system (e.g., a movie recommendation system). The relevant values stored in such a tensor for a movie recommendation system may include real (or at least rational, e.g., a float or double) numbers that indicate user-provided movie ratings. As used throughout, a tensor may be referenced via the script X. An input tensor to such a movie recommendation system may be a 3-rank tensor with the associated set of indices (i,j,k), where the index i indicates the ith user of the system, the index j indicates the jth movie of the system, and the index k indicates the kth time slice of the recommendation system. Thus, the rating the ith user gave to the jth movie during the kth time slice may be referenced as X(i,j,k). X may be an incomplete tensor when not every user has provided a rating to every movie during every time slice. The embodiments are enabled to predict values for unobserved tensor or array cells. Thus, some embodiments may receive (as input) an incomplete X (e.g., encoding values of movie ratings that some users provided for some movies during some time slices). Based on the incomplete target tensor, various embodiments predict movie ratings that are not included in the incomplete target tensor, e.g., movie ratings that the users did not actually provide, based on the ratings that the users did actually provide.
An “entity” of a tensor may refer to a particular value of particular index of the array, and its associated cells. Thus, an entity of a tensor may correspond to a sub-tensor (or sub-array) of the tensor. In the above movie recommendation scenario, an entity of the target tensor may refer to a particular user (and all their ratings), a particular movie (and all its ratings), or a particular time slice (and all the ratings provided during the particular time slice). An entity of a tensor may refer to a portion of the tensor (which is a sub-tensor of the tensor) that is referenced by holding a value of particular index constant constant, while varying each of the other indices across their corresponding ranges (e.g., the depth or length of the dimensions corresponding to the other indices). Thus an entity of a N-order tensor may be associated with a (N−1)-order tensor. A dimension of a tensor may refer to a 1D slice of the tensor. Thus, a dimension of a tensor may refer to a 1D array or a multidimensional vector object. A particular dimension of a tensor may be referenced by allowing a particular index to vary across its associated range, while holding the other N−1 indices constant. The “depth” or “length” of a dimension may refer to the dimensionality of the corresponding vector object. Each dimension of a tensor may have a separate depth or length. Accordingly, the index for a particular dimension may range from 1 to the positive integer corresponding to the length of the dimension. Furthermore, the number of entities corresponding to a tensor is the arithmetic sum of the lengths of each of its dimensions.
Various embodiments employ an influence-guided data augmentation technique, which may be referred herein as DAIN (Data Augmentation with INfluence Functions). At a high-level overhead, some embodiments train a first tensor-completion model (e.g., a neural tensor-completion model) with the input tensor (e.g., a received target tensor that is an incomplete tensor) and one or more training tensors. Cells of the target tensor may be referred to as target cells and cells of the one or more training tensors may be referred to as training cells. The dimensionality (and lengths of each dimension) of the training tensor may be equivalent to the dimensionality (and lengths of each dimension) of the incomplete target tensor.
Upon training the first tensor-completion model, various embodiments utilize influence functions to estimate the importance of each training cell (e.g., the importance of a rating in a movie rating tensor) on reducing imputation (or prediction) error. That is, the embodiments determine a cell-importance metric for each cell of the training data. Next, some embodiments compute the importance of every entity (of the training tensor) by aggregating the importance values (e.g., cell-importance metrics) of all its associated training cells. For example, to compute the importance of the entity (e.g., an entity-importance metric) associated with the particular user i, the importance of all the ratings given by the particular user associated with the particular value of i are aggregated or combined. The importance of an entity (e.g., as encoded by its corresponding entity-importance metric) signifies the entity's impact in reducing the prediction error. Specifically, for each entity, a cell-importance metric is calculated (via an influence function) for each of the entity's associated cells. The cell-importance metrics (for the cells associated with the entity) are aggregated across all of the entity's associated cells. The aggregation of the cell-importance metrics may be employed to determine an entity-importance metric for the entity.
Some embodiments then generate new data points (e.g., augmented cells) by sampling entities proportional to their entity-importance metrics. The new or augmented cells may form an augmented tensor. Values of the augmented tensor cells are predicted via a trained neural tensor completion method (e.g., either the first completion model or a second completion model). This influence-based sampling of entities is employed to generate augmented data points by using important entities (e.g., as weighted by their entity-importance metrics in the sampling process), and thus, can lead to higher test prediction accuracy than conventional tensor completion methods. Furthermore, as discussed below, various embodiments provide enhancements to the time and space complexities, over that of conventional tensor completion methods.
Tensor factorization (TF) is one such conventional method employed to predict missing values in a tensor. However, many conventional TF methods may exhibit high imputation error when estimating missing values in a tensor. One reason for the inaccuracy of conventional TF methods is that many TF models regard missing values in a tensor as zeros. Hence, when conventional TF models are trained with a sparse tensor, their predictions may be biased toward zeros, instead of the observed values. Other conventional TF methods attempt to improve their accuracy by focusing only on observed entries; however, these conventional TF methods may suffer from overfitting when the input tensor is very sparse.
Other conventional tensor completion methods include neural network-based tensor completion methods. However, these methods still suffer from data sparsity, and can become a bottleneck for neural tensor completion methods, which require a large amount of data for training. Moreover, these conventional neural networks-based methods may not generate new data points for data augmentation to solve the sparsity issue. That is, these conventional methods do not include the enhancement provided by data augmentation that the present embodiments do.
Accordingly, the embodiments include systems and methods that leverage the strength of neural tensor-completion models, and improves neural tensor-completion methods through the utilization of data augmentation. Data augmentation increases the generalization capability of a tensor-completion model by generating new (or augmented) data points while training the neural tensor-completion models. The data augmentation during the training enhances the models' prediction accuracy, and provides improvements to the time and space complexities, as compared to conventional tensor completion methods. Thus the embodiments include data augmentation techniques for enhancing neural tensor-completion models. The embodiments further include a framework for deriving the importance of tensor entities on reducing prediction error using influence functions. With the entity importance values (e.g., entity-importance metrics), new (e.g., augmented) data points are generated via weighted sampling and value predictions. The embodiments outperform conventional tensor completion methods on various real-world tensors in terms of prediction accuracy with statistical significance, as well as enhancements to the time and space complexity of the associated computations.
As used throughout, the terms “tensor” and “array” may be employed interchangeably to refer to an object that structures and/or stores atoms of data (e.g., data atoms). Accordingly, the “components” or values stored in the components of a tensor, may, but need not, transform via conventional covaraint or contravarient transformation laws. Furthermore, a tensor object need not be associated with a product of one or more conventional vector spaces. That is, as the term is used herein, a complete set of basis objects (e.g., state-vectors or functions) need not be considered to span a product of vector spaces associated with a tensor.
The term “data atom” may refer to a single discrete element of data. The data types of a data atom include, but are not limited to integers, floats, doubles, chars, strings, and the like. In some embodiments, a data type includes a real or complex object (e.g., a real or complex number). The terms “cell” or “component” are used interchangeably throughout to refer to the discrete elements (e.g., bins) of an array and/or tensor that stores an atom of data. Thus, a tensor or array may have or include a set of cells or set of components. Each cell or component of a tensor or array may be indexed, addressed, or referenced by an ordered set of indices, where the cardinality of the ordered set of indices is equivalent to the dimensionality, rank, order, or way of the array. Each index in the set of indices corresponds to exactly one of the dimensions of the array. Each cell may store a relevant or a non-relevant (e.g., a null) value. In at least one embodiment, a value of zero is considered as a non-relevant value. An “incomplete” tensor or array may be a tensor or an array, wherein at least one of its cells stores a non-relevant value. The term “sparse” may be applied to an incomplete tensor or array, wherein a significant number of its cells store a non-relevant value.
As used herein, the term “set” may be employed to refer to an ordered (i.e., sequential) or an unordered (i.e., non-sequential) collection of objects (or elements), such as but not limited to indices, machines (e.g., computer devices), physical and/or logical addresses, graph nodes, graph edges, and the like. A set may include N elements, where N is any non-negative integer. That is, a set may include 0, 1, 2, 3, . . . M objects and/or elements, where M is a positive integer with no upper bound. Therefore, as used herein, a set may be a null set (i.e., an empty set), that includes no elements (e.g., N=0 for the null set). A set may include only a single element. In other embodiments, a set includes a number of elements that is significantly greater than one, two, three, or billions of elements. A set may be an infinite set or a finite set. In some embodiments, “a set of objects” that is not a null set of the objects may be interchangeably referred to as either “one or more objects” or “at least one object.” A set of objects that includes at least two of the objects may be referred to as “a plurality of objects.”
As used herein, the term “subset,” is a set that is included in another set. A subset may be, but is not required to be, a proper or strict subset of the other set that the subset is included within. That is, if set B is a subset of set A, then in some embodiments, set B is a proper or strict subset of set A. In other embodiments, set B is a subset of set A, but not a proper or a strict subset of set A. For example, set A and set B may be equal sets, and set B may be referred to as a subset of set A. In such embodiments, set A is also referred to as a subset of set B. Two sets may be disjoint sets if the intersection between the two sets is the null set.
In the non-limiting embodiment shown in
Because target tensor 140 is an incomplete tensor, a first subset of the set of target cells stores a relevant value (e.g., a movie rating provided by the corresponding user, for the corresponding movie, during the corresponding time slice) and a second subset of the set of target cells does not store a relevant value (e.g., the second subset of target cells may store a non-relevant value because the movie rating has not been provided by the corresponding user for the corresponding movie during the corresponding time slice). The first and second subset of target cells may be disjoint subsets of the set of target cells. The first and second subset of target cells may be complementary subsets of the set of target cells. The incompleteness of the target tensor 140 is demonstrated by the “sparseness” (or relatively low density) of the relevant values stored in the target cells. That is, the first subset of target cells (e.g., those target cells storing a relevant value in the target tensor 140) are represented by “dots” within the “block” of values stored in target tensor 140. One such target cell that stores a relevant value is shown as first target cell 142. The target cells of the second subset of target cells (e.g., those target cells that do not store a relevant value) are not visually shown in the target tensor 140. The specific index values (i, j, k) address or reference the first target cell 142.
Tensor completion engine 120 may receive the incomplete target tensor 140 as input, and generate a corresponding completed output tensor 150 as output, via pipeline 200. The output tensor 150 may be the completed “version” of the incomplete target tensor 106, where the “missing values” of the incomplete target tensor 106 have been predicted and “filled-in” via the tensor completion engine 120. Thus, the tensor completion engine 120 is enabled to perform the enhanced tensor completion tasks discussed herein. The increased density of dots shown in the completed output tensor 150 demonstrates the “filling-in” (with relevant values) of the target cells included in the second subset of target cells. For example, a second target cell 154 (which is included in the second subset of target cells) now stores a predicted relevant value. The specific index values (i′, j′, k′) address or reference the second target cell 154 that now stores a predicted relevant value (e.g., a predicted movie rating). Each of the target cells in the output tensor 150 may now store a relevant value. Some of the relevant values may be included in the target tensor 140 and other relevant values may have been predicted by the tensor completion engine 120. The output tensor 150 may be referred to throughout as a new tensor and/or an augmented tensor. As such, the rank (and length of each of its dimensions) of the output tensor 150 may match the rank (and corresponding lengths of its dimensions) of the target tensor 140. Thus, the cells of the output tensor are referenced via the set of indices 106.
Tensor completion engine 120 may include a neural network trainer 122, an entity embedder 124, and a cell-importance calculator 126. In some embodiments, the tensor completion engine 120 may further include an entity-importance calculator 128, an augmented data generator 130, and an incomplete cell predictor 132. Other embodiments may include additional and/or alternative components to that shown in the exemplary embodiment of a tensor completion engine 120. The neural network trainer 122 is generally responsible for the training of one or more neural tensor-completion models discussed within. The entity embedder 124 is generally responsible for employing a neural network model to generate vector embeddings of cells and/or entities of tensors. The cell-importance calculator 126 is generally responsible for employing a loss signal (generated by the training of a neural network) to determine a cell-importance metric of each cell of a tensor when predicting missing values of a tensor. The entity-importance calculator 128 is generally responsible for aggregating the cell-importance metrics to determine an entity-importance metric for each entity of a tensor when predicting missing values of a tensor. The augmented data generator 130 is generally responsible for generating augmented data points as discussed herein. The incomplete cell predictor 132 is generally responsible for employing one or more tensor-completion models to predict the missing values of the target tensor 140 and generating the completed output tensor 150. The functionalities, operations, features, and actions implemented by the various components of tensor completion engine 120 are discussed at least in conjunction with pipeline 200 of
Communication network 110 may be a general or specific communication network and may directly and/or indirectly communicatively coupled to client computing device 102 and server computing device 104. Communication network 110 may be any communication network, including virtually any wired and/or wireless communication technologies, wired and/or wireless communication protocols, and the like. Communication network 110 may be virtually any communication network that communicatively couples a plurality of computing devices and storage devices in such a way as to computing devices to exchange information via communication network 110.
Tensors (and arrays, as such terms are used interchangeably throughout) are mathematical and/or computational objects (e.g., data structures) that organize, structure, and store multi-dimensional data. Tensors include scalar objects (e.g., 0-order tensors), vectors objects (1-order tensors), matrix objects (2-order tensors), and higher order generalizations of such mathematical objects. As noted above, an N-way or N-order tensor has N dimensions, and the dimension size (e.g., dimensionality, length, or depth of each dimension) is denoted by I1 through IN, respectively. An N-order tensor may be denoted or referred to by boldface Euler script letters (e.g., X∈I
(1), . . . , α(N)
As used throughout, the term tensor (or array) completion may refer to the process of predicting the missing values of a partially observed (e.g., an incomplete) tensor. The enhanced tensor completion methods employed by the various embodiments may include training one or more tensor-completion models by iteratively adjusting the values of the model's parameters (or weights) by employing observed cells (Ωtrain) to predict values of unobserved cells (Ωtest) with the trained parameters. Specifically, given an N-order tensor X(∈I
where {circumflex over (X)}i
where R is a target factorization rank, and U1, . . . , UN are referred to as factor matrices. Various neural tensor completion methods employed within may utilize different neural network architectures to compute {circumflex over (X)}(i
In various embodiments, a root-mean-square error (RMSE) metric may be employed to measure the accuracy of a tensor completion method. Specifically, a test RMSE may be utilized to check how accurately a tensor completion model predicts values of unobserved tensor cells. The formal definition of test RMSE is given as follows.
Notice that a tensor completion model with the lower test RMSE is more accurate.
Various embodiments may employ an influence estimator referred to as the T
where Θt
Influence estimation with TRAM has clear advantages in terms of speed and accuracy over conventional methods, such as but not limited to conventional influence function-based methods and conventional representer point methods. The various embodiments may utilize TracIn to generate a cell-importance tensor.
Pipeline 200 may include four stages. The first stage 220 is generally responsible for training entity embeddings. An entity embedding may be a vector representation of the entity. The second stage 240 is generally responsible for generating a cell-importance tensor (CIT). The third stage 260 is generally responsible for determining an entity importance for each entity of a set of entities associated with the target tensor 140. The fourth stage 280 is generally responsible for performing data augmentation based on the entity importance and generating the completed output tensor 150 based on the data augmentation.
In pseudo-code 300 of
As indicated in pseudo-code 300, lines 1-3 of pseudo-code 300 refer to actions that are performed in the first stage 220 of pipeline 200. Line 4 of pseudo-code 300 refers to actions that are performed in the second stage 240 of pipeline 200. Line 5 of pseudo-code 300 refers to actions that are performed in the third stage 260 of pipeline 200. Lines 6-15 of pseudo-code 300 refer to actions that are performed in the fourth stage 280 of pipeline 200.
The enhanced methods and techniques (collectively referred herein as DAIN) employed via the embodiments may be implemented by a tensor completion engine, as demonstrated by the four stages of pipeline 200 of an enhanced tensor completion engine (e.g., tensor completion engine 120 of
In the third stage 260 of pipeline 200, the tensor completion engine uniformly distributes cell-importance metrics to the corresponding entities, and determines an entity-importance metric (α(n)) for each entity by aggregating the corresponding cell-importance metrics. The entity-importance metrics may be encoded in an entity-importance tensor, e.g., a 1D array. Accordingly, an entity-importance calculator (e.g., entity-importance calculator 128 of
More specifically, in the first stage 220 of pipeline 200, the neural completion engine may employ a neural network trainer (e.g., neural network trainer 122 of
Also in the first stage 220 of pipeline, an entity embedder (e.g., entity embedder 124 of
The neural network trainer may train an end-to-end trainable neural network (e.g., the first tensor-completion model) to learn such entity embeddings (e.g., see line 1 of pseudo-code 300 of
Z
1=ϕ1(W1[Ei
Z
M=ϕM(WMZM−1+bM)
{circumflex over (X)}
(i
, . . . ,i
)
=W
M+1
Z
M
+b
M+1 (4)
Note that, in equation 4, [Ei
More specifically, in the second stage 240 of the pipeline 200, a cell-importance calculator (e.g., cell-importance calculator 126 of
Equation (3) may be employed to compute the influence of a training cell z on the loss of a test cell z′. Since the test data may not be accessible, the influence αz (e.g., the cell-importance metric) of a training cell z on reducing overall validation loss is computed by equation (5).
Note that K in equation 5 is the number of checkpoints, ηi is a step size at a checkpoint Θt
Such important entities are identified in the third stage 260 of pipeline. Identifying the most important entities may be beneficial for the various embodiments. The output of the second stage 240 includes a cell importance for each cells (e.g., as encoded in the CIT). The cell-importance metrics associated with the cells associated with a particular entity may be combined to calculate an entity-importance metric for the particular entity. For instance, in the movie rating tensor example, the cell-importance metric indicates the importance of the rating on the prediction loss, while it may not reflect the importance of the user, the particular movie, or the particular time slice that is associated with the cell (e.g., the movie rating). If the importance of entity (e.g., each user, movie, and time slice) is determined, new influential data points may be generated (via data augmentation in the fourth stage 280) to minimize prediction error by combining entities to have high importance values.
In the third stage 260, an entity-importance calculator (e.g., the entity-importance calculator 128 of
Recall that α indicates the cell-importance tensor (CIT), and Ωtrain represents a set of training cells from a training tensor. In the movie rating tensor example, equation 6 indicates that a user's entity importance is the aggregation of the importance scores of all the ratings the user gives (over all time slices). Similarly, a movie's importance is the sum of the importance of the ratings it receives (e.g., from all users over all time slices), and the importance of a time slice is the sum of the importance of all the ratings given during the time slice (e.g., from all pairs of users and movies).
In at least one alternative embodiment, the entity-importance metrics may be calculated by applying rank-1 CP factorization (as discussed above) on the cell-importance tensor. The output factor matrices from the CP model include the entity-importance metrics. Specifically, a value of each output array indicates the importance of the corresponding entity on predicting values in a training tensor.
The loss function of the rank-1 CP model is given in equation 7.
In equation 7, α(1), . . . , α(N) indicates entity-importance metrics, λ is a regularization factor, and ∥X∥ is Frobenius norm of a tensor X. This aggregation scheme was selected to compute the entity-importance metrics as the rank-1 CP model may produce inaccurate decomposition results when a given tensor is highly sparse.
In the fourth state 280 of pipeline 200, the tensor cells and values are identified for data augmentation using the entity-important metrics and a value predictor, respectively. Lines 6-15 in pseudo-code refer to the operations of the fourth stage 280. A high entity-importance metric indicates that the corresponding entity plays an important role in improving the validation set prediction. Thus, new cells (e.g., augmented cells) are generated using these important entities. An augmented data generator (e.g., augmented data generator 130 of
As referred to in line 6 of the pseudo-code 300, a neural network trainer (e.g., neural network trainer 122 of
to be sampled. The sampled entities from all dimensions may be combined to form one augmented tensor cell. The generation of the augmented tensor cell is referred to in line 13 of pseudo-code 300. As shown in the for loops of lines 8-13 of pseudo-code 300, this process is repeated to generate the required number of data points for augmentation.
Once indices of the augmented cells are sampled, the incomplete cell predictor may predict their values by employing the second (or first) trained neural tensor-completion model. Generating such predicted values is referred to in line 14 of pseudo-code 300. In some embodiments, the overall average value
may be employed. In other embodiments, the most similar index in the embedding space is founds and the predicted values may be set to its value. In embodiments where these heuristics can be inaccurate and computationally expensive, respectively other, more advanced methods of value prediction may be employed.
In these other embodiments, the predicted values of the augmented data points may be assigned by predicting the values using a tensor completion model (either the previously trained first tensor-completion model (e.g., Θ) or the second trained tensor completion model (e.g., Θp)). Specifically, in some embodiments, the incomplete cell predictor can employ the trained entity embeddings generated via the first tensor-completion model (e.g., Θ) to predict the values. In other embodiments, the incomplete cell predictor can employ the second tensor-completion model (e.g., Θp) and the training data to predict the missing values.
Reusing the first tensor-prediction model (e.g., the trained neural network Θ) may be computationally cheaper since it only needs to do a forward pass for inference, which is fast. However, this may result in overfitting of the downstream model since the resulting augmentation cell values may be homogeneous with the original tensor. On the other hand, employing the second tensor-completion model Θp may increase the generalization capability of a downstream model by generating more heterogeneous data compared to the first tensor completion model Θ. Thus, in some embodiments, the second tensor-completion model Θp is employed in the fourth stage 280 of pipeline 200. In other embodiments, embodiments, the first tensor-completion model Θ is employed in the fourth stage 280 of pipeline 200. In some embodiments, the second tensor-completion model may include the CoSTCo model previously discussed, which is employed to predict the values of the augmented tensor cells. In other embodiments, the second tensor-completion model may be a MLP model or an NFT model.
After all tensor cell indices and values needed for augmentation are generated, these values and/or cells may be combined with the input target tensor 140 to generate an augmented tensor Xnew (e.g., the completed output tensor 150). Generating the augmented and/or output tensor is referred to in line 15 of pseudo code 300. This augmented tensor can be used for downstream tasks, e.g., movie recommendations.
In this section, the time and space complexities of the various embodiments are analyzed and quantified. Considering the time complexity of the embodiments, the first stage 220 of pipeline 200 includes training a neural tensor completion model (e.g., the first tensor-completion model Θ) to generate entity embeddings and gradients, which takes O(TΘ) assuming O(TΘ) is the time complexity of training Θ as well as gradient calculations. The second stage 240 of pipeline 200 includes computing the cell importance αz for each training cell z∈Ωtrain. A naive computation of αz in equation (5) for all training cells takes O(KD|Ωtrain∥Ωval|), where K and D are the number of checkpoints and the dimension of the gradient vector, respectively. The computation of O(KD(|Ωtrain|+|Ωval|)) may be accelerated by precomputing ΣZ′∈Ω
Considering the space complexity of the embodiments, the first stage 220 of pipeline includes obtaining entity embeddings and gradients, which takes O(MΘ+KD(|Ωtrain|+|Ωval|)) space, assuming O(MΘ) is the space complexity of training a neural tensor completion model (e.g., the first tensor-completion model Θ (including entity embeddings)). O(KD(|Ωtrain|+|Ωval|)) space is required to store D-dimension gradients of training and validation cells for all K checkpoints. The second stage 240 of pipeline 200 includes computing the cell importance αz, which takes O(|Ωtrain|) space since we need to store all cell importance values. The third stage 260 of pipeline 200 includes calculating the entity importance with the aggregation method, which takes O(NI) space since the entity-importance metrics of all entities from N dimensions (assuming I1= . . . =IN=I) need to be stored. Finally, the data augmentation stage (e.g., the fourth stage 280 of pipeline 200) takes O(MΘ
Processes 400-460 of
Process 400 begins at block 402, where each of a target tensor, a training tensor, and a validation tensor are received. The target tensor may include a set of target cells and be referenced throughout as X. The target tensor may be an incomplete tensor, e.g., incomplete target tensor 140 of
At block 404, a first tensor-completion model may be trained. The first tensor-completion model may be trained as indicated in line 1 of pseudo-code 300. As such, the first tensor-completion model may be a neural tensor-completion model, and referred to as Θ. In some embodiments, the first tensor-completion model may be a MLP model that employs a ReLU activation function. In other embodiments, the first tensor-completion model is a CoSTCo model. In at least one embodiment, the first tensor-completion model is NTF model.
The training of the first tensor-completion model may be performed as indicated by equation 1. A neural network trainer (e.g., neural network trainer 122 of
At block 406, for each entity of the set of entities, an entity-embedding may be generated based on the first tensor-completion model and the set of training cells. An entity embedder (e.g., entity embedder 124 of
At block 408, a training loss signal may be acquired, accessed, received, obtained, and/or generated. The loss signal may have been generated during the training of the first tensor-completion model. The loss signal may include a set of loss gradients generated during a plurality of epochs of the training of the first tensor-completion model. The loss gradients may be obtained as referenced in line 3 of pseudo-code 300. As such, the loss signal (and thus the loss gradients) may include training loss gradients and validation loss gradients for all the epochs of the training (e.g., at all training checkpoints). Therefore, the loss signal may be based on the estimated values for the set of test cells determined in block 406. The loss signal may be received from the neural network trainer. The loss signal may be received at a cell-importance calculator (e.g., cell-importance calculator 126 of
At block 410, a cell-importance tensor (CIT) may be generated based on the loss signal. A cell-importance calculator may generate the cell-importance tensor and calculate the cell-importance metrics stored within its cells. The cell-importance tensor may be referred to as α. Generating the cell-importance tensor is referenced in line 4 of pseudo-code 300, and thus may be based on the set of training cells and the set of validation cells. The cell-importance tensor may be calculated via equation 5. The cell-important tensor may include a set of cell-importance cells with a one-to-one correspondence to the set of training cells. A cell-importance cell may be referenced as z. Each cell-importance cell may store a cell-importance metric, e.g., αz. Each cell-importance metric may be based on the loss signal. Each cell-importance cell may correspond to one of the training cells (e.g., via the one-to-one correspondence) and the cell's stored cell-importance metric may indicates an influence of the corresponding training cell on the training of the first tensor-completion model. The cell-importance metric for each cell-importance cell may be based on employing an influence function that is based on the loss signal. The cell-importance calculator may employ the TRAM influence estimator to calculate the cell-importance metrics. The T
At block 412, an entity-importance tensor may be generated based on the cell-importance tensor. An entity-importance calculator (e.g., entity-importance calculator 128 of
The entity importance metric for an entity of the set of entities may indicate an influence of a subset of the training cells, which are associated with the entity, on the training of the first tensor-completion model, as indicated by the cell-importance metrics for the subset of training cells. The entity-importance metric for each entity-importance cell may be based on employing an influence function that is based on the loss signal.
At optional block 414, a second tensor-completion model may be trained. The second tensor-completion model may be trained as indicated in line 6 of pseudo-code 300. The second tensor-completion model may be trained based on the set of training cells. The second tensor-completion model may be referred to as Θp. The training of the second tensor-completion model may be performed as indicated by equation 1. A neural network trainer (e.g., neural network trainer 122 of
At block 416, an augmented tensor may be generated based on data augmentation. The augmented tensor may be referred to as Xaug. The augmented tensor may be a tensor in I
At block 418, values (e.g., relevant values) may be predicted for the augmented tensor (e.g., for at least a portion of the set of augmented cells). More specifically, at block 418, the values of Ωaug may be imputed or predicted by a trained value predictor (e.g., a trained neural tensor-completion model, such as but not limed to at least one of Θ or Θp. An incomplete cell predictor (e.g., incomplete cell predictor 132 of
At block 420, a final or new tensor may be generated based on a combination of the target tensor and the augmented tensor. The new tensor may be referenced as Xnew, Generating the new tensor is referenced in line 15 of pseudo-code 300. In at least one embodiment, the complete tensor may be a new tensor that includes a union (e.g., a combination) of the first subset of target cells. The new tensor may store the relevant values determined in block 418. The complete tensor may be tensor that is outputted from the tensor completion engine (e.g., the completed output tensor 150 of
Process 440 begins, at block 442, where an incomplete target array (e.g., incomplete target tensor 140 of
At block 444, an entity-importance array may be generated by an entity-importance calculator (e.g., entity importance calculator 128 of
At block 446, a set of augmented cells may be generated by an augmented data generator (e.g., augmented data generator 130 of
Process 460 begins, at block 462, where an incomplete target array (e.g., incomplete target tensor 140 of
At block 466, an entity-importance metric may be calculated for each entity of the set of entities. The calculation of the entity-importance metric for an entity may be based on a training of a first neural-based model that employs the set of training cells. At block 468, a relevant value may be determined to store in each cell of the second subset of target cells. The determination of a relevant value may be based on the entity-importance metric for each entity of the set of entities and at least one of the first neural-based model or a second neural-based model.
Having described embodiments of the present invention, an example operating environment in which embodiments of the present invention may be implemented is described below in order to provide a general context for various aspects of the present invention. Referring to
Embodiments of the invention may be described in the general context of computer code or machine-readable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a smartphone or other handheld device. Generally, program modules, or engines, including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. Embodiments of the invention may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialized computing devices, etc. Embodiments of the invention may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
With reference to
Computing device 500 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 500 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media.
Computer storage media include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 500. Computer storage media excludes signals per se.
Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
Memory 512 includes computer storage media in the form of volatile and/or nonvolatile memory. Memory 512 may be non-transitory memory. As depicted, memory 512 includes instructions 524. Instructions 524, when executed by processor(s) 514 are configured to cause the computing device to perform any of the operations described herein, in reference to the above discussed figures, or to implement any program modules described herein. The memory may be removable, non-removable, or a combination thereof. Illustrative hardware devices include solid-state memory, hard drives, optical-disc drives, etc. Computing device 500 includes one or more processors that read data from various entities such as memory 512 or I/O components 520. Presentation component(s) 516 present data indications to a user or other device. Illustrative presentation components include a display device, speaker, printing component, vibrating component, etc.
I/O ports 518 allow computing device 500 to be logically coupled to other devices including I/O components 520, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
Embodiments presented herein have been described in relation to particular embodiments which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present disclosure pertains without departing from its scope.
From the foregoing, it will be seen that this disclosure in one well adapted to attain all the ends and objects hereinabove set forth together with other advantages which are obvious and which are inherent to the structure.
It will be understood that certain features and sub-combinations are of utility and may be employed without reference to other features or sub-combinations. This is contemplated by and is within the scope of the claims.
In the preceding detailed description, reference is made to the accompanying drawings which form a part hereof wherein like numerals designate like parts throughout, and in which is shown, by way of illustration, embodiments that may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. Therefore, the preceding detailed description is not to be taken in a limiting sense, and the scope of embodiments is defined by the appended claims and their equivalents.
Various aspects of the illustrative embodiments have been described using terms commonly employed by those skilled in the art to convey the substance of their work to others skilled in the art. However, it will be apparent to those skilled in the art that alternate embodiments may be practiced with only some of the described aspects. For purposes of explanation, specific numbers, materials, and configurations are set forth in order to provide a thorough understanding of the illustrative embodiments. However, it will be apparent to one skilled in the art that alternate embodiments may be practiced without the specific details. In other instances, well-known features have been omitted or simplified in order not to obscure the illustrative embodiments.
Various operations have been described as multiple discrete operations, in turn, in a manner that is most helpful in understanding the illustrative embodiments; however, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations need not be performed in the order of presentation. Further, descriptions of operations as separate operations should not be construed as requiring that the operations be necessarily performed independently and/or by separate entities. Descriptions of entities and/or modules as separate modules should likewise not be construed as requiring that the modules be separate and/or perform separate operations. In various embodiments, illustrated and/or described operations, entities, data, and/or modules may be merged, broken into further sub-parts, and/or omitted.
The phrase “in one embodiment” or “in an embodiment” is used repeatedly. The phrase generally does not refer to the same embodiment; however, it may. The terms “comprising,” “having,” and “including” are synonymous, unless the context dictates otherwise. The phrase “A/B” means “A or B.” The phrase “A and/or B” means “(A), (B), or (A and B).” The phrase “at least one of A, B and C” means “(A), (B), (C), (A and B), (A and C), (B and C) or (A, B and C).”