1. Field of Invention
The present invention relates to the field of matrix factorization. More specifically, it relates to the field of matrix factorization with incorporated data classification properties.
2. Description of Related Art
Matrix factorization is a mechanism by which a large matrix U (where Uε) is factorized into the product of two, preferably smaller matrices: a basis matrix V (where Vε) and a coefficient matrix X (where Xε). A motivation for this is that is often easier to store and manipulate smaller matrices V and X, than it is to work with a single, large matrix U. However, since not all matrices can be factorized perfectly, if at all, matrices V and X are often approximations. An objective of matrix factorization is therefore to identify matrices V and X such that when they are multiplied together, the result closely match matrix U with minimal error.
Among the different approaches to matrix factorization, an approach that has gained favor in the community is nonnegative matrix factorization (NMF) due to its ease of implementation and useful applications
Nonnegative matrix factorization has recently been used for various applications, such as face recognition, multimedia, text mining, and gene expression discovery. NMF is a part-based representation wherein nonnegative inputs are represented by additive combinations of nonnegative bases. The inherent nonnegativity constraint in NMF leads to improved physical interpretation compared to other factorization methods, such as Principal Component Analysis (PCA).
Although NMF, and its variants, are well suited for recognition applications, they lack classification capability. The lack of classification capability is a natural consequence of its unsupervised factorization method, which does not utilize relationships within input entities, such as class labels.
Several approaches have been proposed for NMF to generate more descriptive features for classification and clustering tasks. For example, “Fisher Nonnegative Matrix Factorization”, ACCV, 2004, by Y. Wang, Y. Jiar, C. Hu, and M. Turk, proposes incorporating the NMF cost function and the difference of the between-class scatter from the within-class scatter. However, the objective of this Fisher-NMF is not guaranteed to converge since it may not be a convex function. “Non-negative Matrix Factorization on Manifold”, ICDM, 2008, by D. Cai, X. He, X. Wu, and J. Han proposes graph regularized NMF (GNMF), which appends terms representing favorable relationships among feature vector pairs. But, GNMF is handicapped by not considering unfavorable relationships.
A different approach better suited for classification is a technique called “graph embedding”, which is derived from topological graph theory. Graph embedding, embeds a graph G on a surface, and is a representation of graph G on the surface in which points of the surface are associated with vertices.
Recently, J. Yang, S. Yang, Y. Fu, X. Li, and T. Huang suggested combining a variation of graph embedding with nonnegative matrix factorization in an approached termed “Non-negative graph embedding” (NGE), in CVPR, 2008. NGE resolved the previous problems by introducing the concept of complementary space so as to be widely considered the state-of-the-art. NGE, however, does not use true graph embedding, and instead utilizes an approximate formulation of graph embedding. As a result, NGE is not effective enough for classification, particularly when intra-class variations are large.
In a general sense, all of these previous works tried to incorporate NMF with graph embedding, but none of them successfully adopted the original formulation of graph embedding because the incorporated optimization problem is considered intractable. In addition, all the works are limited in that they depend on suitable parameters which are not easy to determine appropriately.
It is an object of the present invention to incorporate NMF with graph embedding using the original formulation of graph embedding.
It is another object of the present invention to permit the use of negative values in the definition of graph embedding without violating the requirement of NMF to limit itself to nonnegative values.
The above objects are met in method of factorizing a data matrix U file by supervised nonnegative factorization, SNMF, including: providing a data processing device to implement the following step: accessing the data matrix U from a data store, wherein data matrix U is defined as Uε; defining an intrinsic graph G, wherein G={U,W}, each column of Uε representing a vertex, and each element of similarity matrix W measures the similarity between vertex pairs; defining a penalty graph
deriving an SNMF objective from a sum of F(1)(V, X) and F(2)(X), and determining the SNMF objective through iterative multiplicative updates.
Preferably, F(1)(V, X) is defined as F(1)(V, X)=½∥U−VX∥F2; and F(2)(X) is defined as
where λ is a multiplication factor determined by a validation technique.
Further preferably, F(1)(V, X) is defined as F(1)(V, X)= 1/2∥U−VX∥F2; F(2)(X) is defined as
where λ is a multiplication factor determined by a validation technique, and where
and the SNMF objective is defined as
Following this definition of F(1)(V, X) and F(2)(X), the SNMF objective is approximated as
where V=Vt and X=Xt at time t and
Following this approach, the SNMF objective is determined through the following iterative multiplicative updates:
In a preferred embodiment, matrix U is comprised of n samples and each column of U represents a sample. Further preferably, each of the samples is an image file.
W and
In a preferred embodiment, each column of feature matrix X is a low dimensional representation of the corresponding column of U.
Also preferably, at least one of similarity matrix W or dissimilarity matrix
In an embodiment of the present invention, similarity matrix W and dissimilarity matrix
wherein yi is a class label of the i-th sample, yj is a class label of the j-th sample, and nc is the size of class c; and dissimilarity matrix
wherein n is the number of data points.
The present invention is also embodied in a data classification system for classifying test data, having: a data processing device with access to a data matrix U of training data and with access to the test data, the data matrix U being defined as Uε; wherein the data processing device classifies the test data according to a classification defined by Xij; wherein an intrinsic graph G is defined as G={U,W}, each column of Uε representing a vertex and each element of similarity matrix W measuring the similarity between vertex pairs; a penalty graph
λ is a multiplication factor determined by a validation technique, and
and an approximation of supervised nonnegative factorization, SNMF, is defined as
where V=Vt and X=Xt at time t,
factorized matrices Xij and Vij are identified by the following iterative multiplicative updates:
Preferably, data matrix U is comprised of n samples and each column of U represents a sample. In this case, each of the samples may be an image file.
Further preferably, the data pairs are class labels of data. Additionally, each column of feature matrix X may be a low dimensional representation of the corresponding column of U.
In an embodiment of the present invention, at least one of similarity matrix W or dissimilarity matrix
Additionally, similarity matrix W=[Wij] is preferably defined as:
wherein yi is a class label of the i-th sample and nc is the size of class c; and dissimilarity matrix
wherein n is the total number of data points.
The above objects are also met in a method of factorizing a data matrix U file by supervised nonnegative factorization, SNMF, having: providing a data processing device to implement the following step: accessing the data matrix U from a data store, wherein data matrix U is defined as Uε defining an intrinsic graph G, wherein G={U,W}, each column of Uε represents a vertex, and each element of similarity matrix W measures the similarity between vertex pairs; defining a penalty graph
defining unfavorable relationships between features vector pairs as:
defining an SNMF objective function as
and applying the following iteratively multiplicative updates to achieve the SNMF objective function:
Other objects and attainments together with a fuller understanding of the invention will become apparent and appreciated by referring to the following description and claims taken in conjunction with the accompanying drawings.
In the drawings wherein like reference symbols refer to like parts.
a is an example of face class images found in the CBCL dataset.
b is an example of non-face class images found in the CBCL dataset.
Recently, Nonnegative Matrix Factorization (NMF) has received much attention due to its representative power for nonnegative data. The discriminative power of NMF, however, is limited by its inability to consider relationships present in data, such as class labels. Several works tried to address this issue by adopting the concept of graph embedding, albeit in an approximated form. Herein, a Supervised NMF (SNMF) approach that incorporates the objective function of graph embedding with that of nonnegative matrix factorization is proposed.
Before describing SNMF, it is beneficial to first provide background information regarding non-negative matrix factorization (NMF) and graph embedding.
With reference to
Given a raw matrix U=└u1, u2, . . . , un┘ε, SNMF, like NMF, factorizes a matrix U into the product of two, preferably smaller matrices: a basis matrix V (where V=└v1, v2, . . . , vr,┘ε) and a coefficient matrix (or feature matrix) X (where X=└x1, x2, . . . , xn┘ε). For example, matrix U may be a raw data matrix of n samples (or data points) with each sample being of dimension d such that Uε (step S1). A specific example of this may be if each of the n columns of U (i.e. each of the n samples) is an image of size d. Matrix U is factorized into the product of basis matrix V and a feature matrix X by minimizing the following reconstruction error:
Where ∥•∥F denotes the Frobenius norm. Since Eq. (1) is not a convex function of both V and X, there is no closed form solution for the global optimum. Thus, many researchers have developed iterative update methods to solve the problem. Among them, a popular approach is the multiplicative updates devised by Lee and Seung in “Learning the parts of objects by non-negative matrix factorization”, Nature, 401:788-791, 1999, which is hereby incorporated in its entirety by reference. These multiplicative updates, shown below as equation (2), are popular due to their simplicity.
These updates monotonically decrease the objective function in Eq. (1).
Graph embedding, on the other hand, may be defined as the optimal low dimensional representation that best characterizes the similarity relationships between data pairs. In graph embedding, dimensionality reduction involves two graphs: an intrinsic graph that characterizes the favorable relationships among feature vector pairs and a penalty graph that characterizes the unfavorable relationships among feature vector pairs. Thus, applying graph embedding to data matrix U would organize its raw data into classes according to specified favorable and unfavorable relationships. To achieve this, however, one first needs to define graph embedding as applied to data matrix U.
For graph embedding, one let G={U,W} be an intrinsic graph where each column of Uε represents a vertex and each element of W (where Wε) measures the similarity between vertex pairs (step S3). In the same way, a penalty graph
In addition, the diagonal matrix D=[Dij] is defined, where Dii=Σj=1nWij (step S7) and the Laplacian matrix L=D−W is defined (step S9). Matrices
As is explained above, to factorize data matrix U, which is defined as Uε, one defines a basis matrix V such that Vε (step S15), defines a feature matrix X such that Xε (step S17), and seeks to populate V and X such that the product of V and X approximates U with minimal error. An object of the present invention, however, is to combine graph embedding with the factorization of matrix U such that the classification properties of graph embedding are incorporated into factorized basis matrix V and a feature matrix X. The present embodiment achieves this by defining the objective of graph embedding in terms of feature matrix X.
First, let each column of feature matrix X be a low dimensional representation of the corresponding column of U. Then, one can measure the compactness of the intrinsic graph G and the separability of the penalty graph G by the weighted sum of squared distances of feature matrix X, as follows:
F
DIS(X)=Σi<jnWij∥xi−xj∥2=Tr(XLXT) (Step S19)
DIS(X)=Σi<jnWij∥xi−xj∥2=Tr(X
where FDIS expresses the compactness of favorable relationships,
It is desired to minimized FDIS and maximize
To acquire both the benefits of part-based representation and the classification power of graph embedding, the present approach addresses both the objectives of NMF and the objective of graph embedding. However, unlike previous works, the present invention utilizes the ratio formation of graph embedding. The objective of NMF, F(1)(V, X), can be derived from equation (1), or can be re-expressed as equation (7) (step S23), where the constant multiple of ½ may be optionally dropped for simplicity. That is, it simplifies the derivative.
The objective of graph embedding, F(2)(X), can be derived from equation (5) or re-expressed as equation (8) (step S25), as:
where parameter λ is a multiplication factor determined using a validation technique, i.e. determined by running experiments with different values of λ's and selecting the best one.
Thus the objective of SNMF may be defined by the combined objectives formulation of NMF and graph embedding (step S27) as:
or alternatively,
This approach explicitly minimizes the ratio of two distances, which is relative compactness of the favorite relationship. Consequently, SNMF can employ any definitions of similarity and dissimilarity matrices W and W (including negative values) if both Tr(XLXT) and Tr(X
Also unlike, NGE, SNMF does not require any complementary spaces. NGE requires the introduction of complementary spaces to construct objective functions by addition of nonnegative terms. However, it is doubtful whether the complementary space exists without violating the nonnegative constrains. Even if such spaces exist, one has no guarantee that the objective function of NGE can discover the complementary space.
Before describing a detailed implementation for achieving the objectives of SNMF, as described in equations (9) and (10), a sample definition of W and
where n is the total number of data points Alternatively, matrices W=[W.] and W=F. May also be defined as
Note that the elements of W can be negative, which means that NGE cannot use W and
Preferably, all the pair-wise distances are computed based on the unit basis vectors. This normalized distance calculation prevents the distance ratio from meaninglessly decreasing due to rescaling of basis vectors.
With reference to
F(V, X) is not a convex function of both V and X. Therefore, interactive updates are needed to minimize the objective function (11). Due to its fractional term, F(V, X) can be troublesome to optimize by multiplicative updates. Therefore, a presently preferred embodiment uses an approximation of its fractional term with a subtraction of two terms at each time t. Suppose that V=Vt and X=Xt at time t (step S33). The approximate function of F(V, X) may be defined as (step S35):
then {tilde over (F)}(Vt, X) is non-increasing under the following multiplicative update rules (step S37).
In addition, for a matrix A, A+=[Aij+] and A−=[Aij−], where
Therefore, {tilde over (F)}(Vt, X) is non-increasing under the following multiplicative update (step S39):
This leads to the following theorem:
Theorem 1: The approximation of objective function F in equation (12) is non-increasing under the update rules of equations (14) and (18). A proof of Theorem 1 is provided the appendix, attached below.
Since the multiplicative factors of equations (14) and (18) are always non-negative by Theorem 1, it follows that all elements in V and X are maintained non-negative after each update.
As is stated above, the distance ratio part of SNMF, which may be computed based on class labels, can be incorporated into other NMF variation. As an illustrated example in
Beginning with step S41, let φ: R+M→ be a mapping that projects u image to a Hilbert space of arbitrary dimensionality. In Kernel NMF, the decomposed matrix contains the projected images by the mapping φ. More formally, Kernel NMF solves the following optimization problem:
min½∥Uφ−VφX∥F2 (20)
subject to:
v
φ
ij≧0 and xij≧0 for ∀i,j
where Uφ=[φ(u1), φ(u2), . . . , φ(uN)] and Vφ=[φ(v1), φ(v2), . . . , φ(vR)]. To solve this optimization problem, KNMF assumes that every φ(vj) can be represented as a linear combination of φ(u1): i.e. φ(vj)=Σi=1NHijφ(ui).
Then the objective function in Eq. (20) can be converted (Step S43) to
½∥Uφ−UφHX∥F2 (21)
This objective can be monotonically minimized by the following updates.
Using the is Kernel NMF as a feature generation method, the presently suggested approach for SNMF can now be applied. The normalized compactness of favorable relationships is (Step S45):
Therefore the objective function F is defined as (step S47):
Following a similar logic as described above, the approximation of F is non-decreasing under the following multiplicative update rules (step S49):
The present SNMF approach was tested in various applications, and the results compared to other techniques known in the art.
In a first application, the present invention is first illustrated as applied to a simplified, face classification application, and its ability to generate basis images and identify specific image features is tested.
With reference to
Because of SNMF's ability to make distinctions based on labels, it is possible to specify specific features on which one wishes to focus. For example, in a first test run, the present invention is asked identify basis images (i.e. characteristic images used to classify features) to distinguish between types of eyes in the sixteen test face images. In a second test run, the present invention is asked to identify basis images to distinguish between mouth shapes. The results are shown in
In
In
The prior art NMF approach is also applied to the sixteen test images 51 of
Unlike the present approach, NMF cannot utilize label information, and NMF can therefore not focus on specific parts of images, which is often an importance feature for classification purposes. Consequently, NMF needs to represent all the components sufficiently well for classification of each part. As a result, NMF requires more basis images to achieve classification of any specific feature.
The sixteen test face images 51 of
Because the present approach can use class data to focus on specific features, it is much more resistant to such noise, and obtains greater performance with fewer basis images. This ability is particularly important in identifying specific features, such as facial expressions.
Two examples using two industry-standard databases of actual human faces are provided below. A first example uses the JAFFE database, and the second example uses the CBCL database. The JAFFE database contains 213 images of 10 Japanese female subjects. For each subject, 3 or 4 samples for each of 7 basic facial expressions are provided, as is illustrated in
For evaluation purposes when using the JAFFE database, once the face region is cropped, each image is down-sampled to 40×30 pixels. Following the typical approach of previous works, 150 images from the JAFFE database are randomly selected as a training set (i.e. training data), and the rest are utilized as a test set (i.e. test data). The results after ten tests is presented and compared with the accuracy results of previous works.
To test the effectiveness of the present SNMF approach, the results of the present SNMF approach is compared with eight other popular subspace learning algorithms: Nonnegative Matrix Factorization (NMF), Localized NMF (LNMF), polynomial NMF (PNMF), Principal Component Analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), kernal independent component analysis (KICA), and kernal principle component analysis (KPCA).
In the feature generation and classification setup, each column of a data matrix U is constructed by concatenating all column of an image. All elements of U are adjusted (i.e. normalized) to range from 0 to 1. U is then divided into a training set Utraining and a test set Utest. Training set Utraining is factorized into V×X. The feature matrices for the training set (i.e. Xtraining) and the test set (i.e. Xtest) are obtained as Xtraining=(VTV)−1VT Utraining and Xtest=(VTV)−1VT Utest, respectively.
For classification, a linear kernel SVM is used. The SVM parameter is determined through a validation approach. The parameter λ, which is the multiplication factor of the distance ratio part, is determined using a validation.
The above described methods of SNMF, which, as is illustrated below, is well suited for data classification, may be implemented in various types of data processing hardware.
With reference to
In the present example of
Similarly, test data 37, which is the data that is to be classified, may be accessible via a direct link 34 or through communication network 29 and communication links 31/35. It is to be understood that test data 37 may be an archive of data (such as a store of face images) or may be generated in real time (such as face images created by surveillance cameras). It is further to be understood that communication links 31-35 may be wired or wireless communication links.
The results of this first approach are summarized in
For illustration purposes,
The results of the present invention upon the CBCL database are summarized in
While the invention has been described in conjunction with several specific embodiments, it is evident to those skilled in the art that many further alternatives, modifications and variations will be apparent in light of the foregoing description. Thus, the invention described herein is intended to embrace all such alternatives, modifications, applications and variations as may fall within the spirit and scope of the appended claims.
This application is related to U.S. patent application Ser. No. ______ and ______ (Attorney Docket No. (AP431HO and AP445H0), filed on the same day as the instant application and entitled “Supervised Nonnegative Matrix Factorization.” These related applications are hereby incorporated by reference for all purposes.