The systems and methods disclosed herein relate generally to more efficient data classification for a plurality of different forms of physical sensor data.
Modern computer systems increasingly rely on data processing techniques for rapid training and accurate identification and classification of datasets. These datasets may be sparse and over-constrained. For example, a radio communications receiver may receive only a few messages comprising data with many dimensions. Such a situation is referred to as being “overconstrained” since the system must infer a general characteristic based on only a few, very complex, samples. Despite this difficulty, the receiver must classify message patterns to accurately distinguish errors from authentic messages.
Various tools may be used to reformulate data into a form more amenable to analysis and data classification. Fisher's linear discriminant analysis (LDA) is one method for distinguishing classes of data within a dataset. Traditionally, LDA may be used in statistics and pattern recognition to linearly project high-dimensional observations from two or more classes onto a low-dimensional feature space before classification. By projecting data onto a lower dimensional feature space it may be easier to classify incoming data than if classification were attempted on the higher dimensional space. Furthermore, operating in a lower dimensional feature space may facilitate more efficient classification than in the original space.
While one could simply identify the appropriate classification for a set of new data points by referring to the default coordinates of 102a, 102b, it is regularly the case that these default coordinates are not necessarily the best coordinates in which to represent the data to perform classification. Instead, another unidentified coordinate system may be more amenable to rapid classification. Furthermore, it may be preferable to use fewer dimensions when performing classification, as certain of the default dimensions 102a, 102b may be less useful for classification than other of the default dimensions (as mentioned above, not all 1600 pixels of an image are likely equally useful for facial classification). Identifying a smaller number of dimensions within which to perform classification is sometimes referred to as “dimensionality reduction”.
Once a new set of coordinates (103a, 103b) has been identified, the classifiers and these incoming data points may then be projected upon these new coordinates to facilitate data classification. In the example of
One method for identifying the vector 103b is the Fisher Discrimination method which relies upon the Fisher Discrimination Criterion. The Fisher Discrimination Criterion relates the between-class variation (Sb) to the within-class variation (Sw) of the classifiers, as projected upon a candidate vector 103b. One may also refer to the total scatter St as SW+Sb. The between-class scatter Sb may be defined as:
S
b=(μ1−μ2)(μ1−μ2)TεRN×N (1)
In this example, the within class scatter may be defined as
S
w
=S
1
+S
2
εR
N×N (2)
and the total scatter may be defined as
S
t
=S
b
+S
w
εR
N×N (3)
Intuitively, projected classifiers with high between-class variation and low within-class variation will facilitate better datapoint segregation than the converse. This is reflected in the Fischer Criterion which is defined as:
A high between-class variation (Sb) and a low within-class variation (Sw) will have a higher Fischer Criterion and will better facilitate classification. This criterion may be used to identify, of all the possible vectors in the space of coordinates 102a, 102b, the vector φ 103b which best segregates the classifiers 101a, and 101b. Some methods first identify the vector transpose φ0 103a, but the general concept is the same, as would be recognized by one in the art. Although in the simplified example of
The vector φ 103a may be identified by iterating through possible vectors in the space of 102a, 102b, and finding the vector which maximizes the Fisher Criterion for the classifiers. This “maximum vector”φ*F may be represented as
One may determine φ*F by alternatively computing the maximization of an equivalent criterion φF.
For the sake of simplicity, the total scatter St is used, so that the values of λF fall within the range of 0 to 1. λF is referred to as the Fisher's Linear Discrimination Criterion (FLDC).
It can be shown that a vector φ that maximizes the FLDC must satisfy (a proof is provided in the attached appendix):
S
b
φ=λS
tφ, (7)
for some constant λ. This is a generalized eigenvalue decomposition problem.
When Sb and St are both N×N symmetric matrices, there are N pairs of eigenvalues and eigenvectors that satisfy (7): (λ0, φ0), . . . , (λN−1, φN−1). The eigenvalues λ0, . . . , λN−1 are all real and, when Sb and St are scatter matrices lying in the range from 0 to 1. Without loss of generality, assume λ0≧ . . . ≧λN−1. Since Sb is a rank-one matrix, it can additionally be inferred that only one of the N eigenvalues λf is non-zero.
0<λ0<1 and λ0, . . . , λN−1=0 (8)
Thus, the Fisher's Linear Discriminant Vector is the generalized eigenvector, φ0, corresponding to the only non-zero generalized eigenvalue, λ0, of Sb and St:
The following is one proposed method for identifying λ0. From (7), consider performing a classical eignevale decomposition of (Sb−λSt) for a fixed λ. Let Eλ=[e0λ, . . . , eN−1λ] and Dλ=diag [d0λ, . . . , d0λ, . . . , dN−1λ], respectively denote the eigenvector and eigenvalue matrices of (Sb−λSt). The eigenvalue decomposition can be written as
D
λ
=E
λT(Sb−λSt)Eλ (11)
An eigenvalue d0λ is related to its eigenvector d0λ by Dλ=EλT(Sb−λSt)Eλ. Without loss of generality, assume [d0λ, . . . , dN−1λ].
Thus, the optimal value of the Fisher's Discriminant criterion, may be computed as a value of 0<λ<1 that makes (Sb−λSt) semi-negative definite. It can be shown that there exists only one unique value of λ in the range [0,1] that satisfies the above condition (proof is provided in the Appendix). Therefore, if we let f(λ):[0,1]−>R represent the largest eigenvalue of (Sb−λSt) as a function of λ, i.e.
The optimal value of the Fisher's criterion, λ*F, may then be computed as
=e0λ
The Fisher's discriminant vector φ*F may then be given by
φ*f=eλ*
The function f(λ): is bounded on [0,1] and satisfies the following properties on the closed interval.
λ<λ*Ff(λ)>0 (17)
λ>λ*Ff(λ)=0 (18)
λ=λ*Ff(λ)=0 (19)
While the preceding section and attached appendices are intended to provide a thorough treatment of the Fisher Discrimination Analysis methodology as used in certain embodiments,
As discussed above, the analysis begins 201 by recognizing that we would like to use the Fisher criterion to determine an appropriate projection vector φ*F 202. Determining φ*F requires that we find the maximum argument of
This may be rewritten as an eigenvalue decomposition problem 203. By the proof provided in Appendix B, it may then be shown that the optimal value of the Fisher's Discriminant criterion can be computed by finding a value between 0 and 1 that makes (Sb−λSt) semi-negative definite. Fortuitously, there is only one value in that range which will make (Sb−λSt) semi-negative definite. From these conditions we may define the function 304.
This function has various properties 205. In view of these properties, we recognize that we may find λ* by iterating through possible value of λ, and plugging them into the equation 204, until we identify a value of λ which produces an f(λ) of 0. This λ will be λ*, which we may then use in conjunction with the equation (21) to determine the projection vector φF*, which we had originally sought.
The following section discusses one possible algorithm for finding λ* from the equation f(λ) 204.
Algorithmic Search for λ* Using the Function f(λ)
Referring to the conditions 205 of
Search algorithms such as the bisection search of
Unfortunately, as the computational complexity of linear feature extraction increases linearly with dimensionality of the observation samples, computation can become intractable for high dimensional data, particularly where the operation is to be performed in real time. As mobile devices and portable computers become more prevalent, there is an increasing need for more efficient and robust classification systems. In particular, the calculation of the metric f(λ) as part of the search algorithm for λ* discussed above is computationally intensive and represents a barrier to more efficient training.
Certain embodiments contemplate a method, implemented on an electronic device, for generating physical sensor data classifiers, the method comprising: receiving a plurality of physical sensor data; identifying a projection vector based on the physical sensor data using a search algorithm, the search algorithm comprising a metric function, wherein identifying a projection vector comprises calculating one or more eigenvalues associated with the metric function using a sparse matrix transform; and producing physical sensor data classifiers by projecting at least a portion of the physical sensor data upon the projection vector.
In some embodiments, the physical sensor data comprises one of facial image data, speech audio data, wireless communication signals, or laser range-finder data. In some embodiments, the search algorithm is iteratively calculated. In some embodiments, the metric function comprises the Fisher Discriminant In some embodiments, the search algorithm comprises a bijective search.
Certain embodiments contemplate a mobile electronic device comprising: a memory, the memory configured to store a data set comprising physical sensor data; a processor configured to: receive a plurality of physical sensor data; identify a projection vector based on the physical sensor data using a search algorithm, the search algorithm comprising a metric function, wherein identifying a projection vector comprises calculating one or more eigenvalues associated with the metric function using a sparse matrix transform; and produce physical sensor data classifiers by projecting at least a portion of the physical sensor data upon the projection vector.
In some embodiments, the physical sensor data comprises one of facial image data, speech audio data, wireless communication signals, or laser range-finder data. In some embodiments, the search algorithm is iteratively calculated. In some embodiments, the metric function comprises the Fisher Discriminant. In some embodiments, the search algorithm comprises a bijective search.
Certain embodiments contemplate a non-transitory, computer-readable medium, comprising instructions configured to cause a processor to implement a method to classify physical sensor data, the method comprising: receiving a plurality of physical sensor data; identifying a projection vector based on the physical sensor data using a search algorithm, the search algorithm comprising a metric function, wherein identifying a projection vector comprises calculating one or more eigenvalues associated with the metric function using a sparse matrix transform; and producing physical sensor data classifiers by projecting at least a portion of the physical sensor data upon the projection vector.
In some embodiments, the physical sensor data comprises one of facial image data, speech audio data, wireless communication signals, or laser range-finder data. In some embodiments, the search algorithm is iteratively calculated. In some embodiments, the metric comprises the Fisher Discriminant In some embodiments, the search algorithm comprises a bijective search.
Certain embodiments contemplate a mobile electronic device comprising: means for receiving a plurality of physical sensor data; means for identifying a projection vector based on the physical sensor data using a search algorithm, the search algorithm comprising a metric function, wherein identifying a projection vector comprises calculating one or more eigenvalues associated with the metric function using a sparse matrix transform; and means for producing physical sensor data classifiers by projecting at least a portion of the physical sensor data upon the projection vector.
In some embodiments, the receiving means comprises a processor running software, the identifying means comprises a processor running software, the producing means comprises a processor running software. In some embodiments, the physical sensor data comprises one of facial image data, speech audio data, wireless communication signals, or laser range-finder data. In some embodiments, the search algorithm is iteratively calculated. In some embodiments, the metric function comprises the Fisher Discriminant In some embodiments, the search algorithm comprises a bijective search.
One embodiment is a system and method for classifying data received by a sensor in an electronic device. In this embodiment, the electronic device includes a rapid data classification process that captures the sensor data, generates an appropriate classifier, and classifies data into one or more classifications. For example, the sensor may be an image sensor, and thus the rapid classification process may be configured to classify images captured by the sensor. In one specific embodiment, the captured image may include one or more faces, and the rapid classification process may be used to identify portions of the captured image that contain a face. In a related embodiment, the captured image may include one or more faces, and the rapid classification process may be used to match faces captured by the image sensor against other pre-stored images in order to retrieve other images of the same person from a data storage. One will readily recognize that any object, not just a face, may employ the classification method of this embodiment.
In certain of these embodiments, the rapid classification system and process may use a modified version of the LDA, termed herein “sparse matrix LDA”, wherein a sparse matrix transformation replaces the more typical matrix transform described above for a conventional LDA process. Although LDA is optimized in this embodiment, the optimization may likewise be applied to search algorithms of other classifiers as described in greater detail below. In this embodiment, a discriminant vector in the sparse LDA can be computed as a solution to a constrained optimization problem. This embodiment contemplates optimizing the calculation of a metric function associated with this constrained optimization problem. Particularly, the metric function may be modified to employ a Sparse Matrix Transform (SMT). The SMT may be used to perform a sub-calculation of the metric function, such as the computation of eigenvalues and eigenvectors. The SMT provides a plurality of constraints to control the accuracy of this computation. Particularly, these constraints can be relaxed or made more stringent to control the number of non-zero entries in the optimized discriminant vector. More non-zero entries in the optimized discriminant vector leads to higher dimensionality and computational complexity, whereas fewer non-zero entries leads to lower dimensionality and complexity in the data to be analyzed. By allowing the constraints to be relaxed or to be made more stringent a desired level of computational accuracy may be achieved. Throughout this application, LDA which incorporates the SMT as part of its metric function will be referred to as sparse matrix LDA. The Classical Fisher's LDA becomes a special case of the proposed sparse matrix LDA framework when the constraints in the optimization problem are fully relaxed.
Tightening the constraints leads to sparse features, which can result in lower classification accuracy, but is also computationally much more efficient. The metric that is optimized for computation of sparse matrix features is the same as that used in classical Fisher's LDA. However, the search space is constrained to include only a set of potentially-informative vectors, which are sparse in RN, for discriminating data from different classes. Thus, sparse matrix LDA focuses on the vectors that would be most informative for the target purpose, such as facial recognition, while ignoring vectors that would materially help increase the accuracy of the final classification.
In one embodiment, for generating a pool of candidate sparse discriminant directions, a sparse matrix transform may be used for regularization of covariance estimates of high-dimensional signals. The SMT model estimates an orthonormal transformation as a product of a sequence of pairwise coordinate rotations known as the Givens rotations. The sparsity of the eigen decomposition can be controlled by restricting or increasing the number of Givens rotations in the SMT model. The experimental results show that the sparse discriminant direction searched using the proposed algorithm, in a two-class data set, exhibits a superior discriminatory ability than a classical Fisher's discriminant direction hard-thresholded to retain only a desired number of non-zero elements. The new linear discriminant analysis framework, thus, provides an advantageous compromise between classification accuracy and computational complexity.
While certain of these embodiments are discussed with particular reference to face identification, particularly implemented with LDA, the improved sparse matrix methods disclosed in these embodiments may be applied to any search algorithm comprising a metric function, where the metric function requires successive calculation of eigenvalues or eigenvectors under various resource constraints (time, computational power, etc.). For example, the disclosed embodiments may be readily adapted to other search metric functions computing eigenvalues, such as Principal Component Analysis (PCA) and reformulations of the Fourier Transform.
Traditionally, the Sparse Matrix Transform (SMT) is used with the intention of finding a sparse-regularized solution to a classical eigenvalue problem (or PCA). Such problems are approached with the expectation that SMT may be applied to a single matrix. Applicant has instead recognized that the SMT framework may be extended to find a sparse-regularized solution to a generalized eigenvalue problem comprising two symmetric matrices rather than one (See, e.g., the matrices Sb and St of 204 in
Certain search algorithms improved by the present embodiments may seek a projection vector upon which to project classifiers, such as in LDA, while others may seek to optimize parameters through other known methods, such as using linear programming Generally speaking, any classification algorithm requiring multiple calculations of a set of eigenvalues and eigenvectors, may employ the improvements discussed below.
In the present disclosure, physical sensor data is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art (ie, it is not to be limited to a special or customized meaning) and includes, without limitation, facial image data, speech audio data, wireless communication signals, laser range-finder data, or any data set derived from a sensor such as a camera, microphone, pressure sensor, and the like. Similarly, in the present disclosure, a projection vector is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art (ie, it is not to be limited to a special or customized meaning) and includes, without limitation, any data structure within a computer system upon which data is projected, i.e. transformed so as to be described upon a different set of dimensions than the dimensions upon which the data was previously represented. In the present disclosure, search algorithm is a broad term and is to be given its ordinary and customary meaning to a person of ordinary skill in the art (ie, it is not to be limited to a special or customized meaning) and includes, without limitation, any algorithm used to identify a vector or value of interest. For example, the algorithm depicted in
Each of these systems may acquire their input as a dataset comprising a number of dimensions. For example, by one interpretation of image data, each image pixel within the image may comprise a separate dimension of the data, with the pixel intensity corresponding to the value associated with a given dimension. In a 40×40 pixel image, by this example, there would accordingly be 1600 dimensions.
Fisher's Linear Discriminant analysis (LDA) is a known method for identifying an appropriate separator between two or more classes of data. This separator may then be used to quickly classify new datasets. Where there are many dimensions, as in image content, LDA is useful as it facilitates dimensionality reduction. That is, all 1600 pixels of a 40×40 pixel image in the above example are unlikely to all be equally relevant for, say, recognizing a face. Given a first “training” dataset of classified images, LDA may be performed to identify those dimensions (i.e., pixels) most pertinent to characterizing the facial image. Subsequently, when a new dataset from a facial image needing to be identified is provided only these pertinent dimensions need be considered to quickly determine the proper categorization for the new data (i.e., as comprising a “face” or “no-face”, or a particular individual's face).
The pseudo-code for the sparse matrix transform (SMT) is shown in
inside S(m) for the 504. At each iteration the Givens rotation T(m) 505 is determined and applied to the pending sparse matrix S to acquire the successive intermediate sparse matrix S(m+1).
As recognized by one in the art, the classical eigenvalue decomposition problem can be written as SE=ED, where S is a real and symmetric matrix, E is an orthonormal matrix of eigenvectors, and D is a diagonal matrix of eigenvalues. The Jacobi eigenvalue algorithm may be used to solve the classical eigenvalue problem. The Jacobi eigenvalue algorithm iteratively transforms S into a diagonal matrix through a sequence of orthonormal transformations 606:
S
(m+1)
=T
(m)T
S
(m)
T
(m) (23)
where S(0)=S and T(m):=T(im, jm, θm) is a pairwise Givens rotation which differs from a unity matrix only in four elements: ti
where Πm−1rT(m)=T(m)=T(0), T(1), . . . , T(r) in the limit when r→∞, Dr→D.
The sparse matrix transform (SMT), is a Jacobi eigen decomposition that terminates early, i.e., the number of Givens rotations in Jacobi eigen decomposition is kept small: r<<N(N−1)/2. As mentioned above, the variable r is sometimes referred to as the SMT-model order.
The sparsity of the eigen decomposition can be increased or decreased by varying the SMT-model order, r. The choice of appropriate Givens rotation T(m)=T(im, jm, θm) at each iteration of SMT determines the accuracy of sparse eigen decomposition. To determine the optimal coordinate pair (im, jm), the SMT algorithm examines 2×2 sub-matrices
inside S(m) for the largest ratio of off-diagonal to diagonal elements (step 603 of
The rotation angle θm is then selected so that the off-diagonal elements at (im,jm) and (jm,im) vanish in S(m+1) (step 504 of
After the desired number of iterations, pursuant to the order number r, have been preformed, the sparse eigen decomposition may then be determined, based on (24), by (steps 508 and 509 of
As discussed above, the SMT permits the sparsity of the estimated eigenvectors to be increased or decreased by varying r. The following section elaborates upon the incorporation of SMT into the bisection search algorithm of
Let the set E0r,λ denote the collection of all SMT-estimated eigenvectors e0r,λ for a fixed r and for λ in the range [0, 1]. We propose to compute the sparse Fisher's linear discriminant vector, φ*Fλ, as a solution to the following constrained optimization problem:
Typically a smaller value of the SMT-model order, r, yields a sparser estimate of the discriminant vector, and vice versa. When r is large, sparsity constraints on the set E0r,λ are relaxed, and sparse Fisher's LDA reduces to classical Fisher's LDA.
According to (29), the optimal sparse discriminant projection, φ*Fλ, is computed as an eigenvector e0r,λ of (Sb−λSt) for some in the range [0, 1]. In a manner analogous to the discussion above, it can be shown that the value of λ that maximizes the discriminant criterion is φF(φ)|φ=e
Thus, if we let fr(λ):[0, 1]→R represent the largest eigenvalue of (Sb−λSt), estimated using an order-r SMT model, as a function of i.e.
f
r (λ)≡e0r,λ
the constrained optimization problem in (21) can equivalently be solved as follows:
In the limit when r→∞, fr(λ)→f(λ) and, therefore, λ*Fr→λ*F.
In certain embodiments, the function fr(λ) may be referred to as a “metric function” for the search algorithm LDA. Again, in a manner analogous to that discussed above, the function fr(λ) is bounded on [0, 1] and satisfies the following properties on the closed interval (See Appendix B):
λ<=λ*Frfr(λ)>0 (32)
λ>λ*Frfr(λ)=0 (33)
λ=λ*Frfr(λ)=0 (34)
Therefore, the solution of the non-linear equation fr(λ)=0 can be computed by using the bisection method, as discussed earlier. An iterative strategy for estimating λ*Fλ and φ*Fλ based on the bisection method is given in the algorithm of
Thus, the algorithm estimates the optimal value of the sparse Fisher's discriminant criterion, λ*Fr, as the root of the non-linear equation fr(λ)=0. The root is estimated as the midpoint of a closed interval [a, b], where the initial values of a and b are 0 and 1, respectively, and then updated iteratively until the gap between a and b reduces to 2−K. The final values of λ*Fr and φ*Fr are then calculated as λ*Fr=λ(K) and φ*FreFλ(K), where λ(k) denotes the midpoint of a and b in the k-th iteration and K denotes the total number of iterations for which the root-finding algorithm is run. Every time λ(k) is updated, fr(λ(K))=0 is computed by performing SMT-decomposition of (Sb−λ(k)St) for the new value of λ(k). Again, although this particular bisection algorithm has been provided for purposes of illustration, one will readily recognize numerous variations to this particular example. For example, the parameters a and b may be substituted with more refined intermediate search variables, and the iterations performed may be determined based on additional factors than simply K. Similarly, the order r may vary across iterations and depend on a plurality of unspecified parameters pertinent to the domain in which the algorithm is performed, such as the conditions under which a user is attempting facial recognition training on a mobile device.
The following tables and figures provide various results for certain experimental demonstrations of the above algorithms. For the purposes of these examples, a plurality of 20×20 patches representing faces and non-faces, were used (class ω1 comprising faces and class ω2 comprising non-faces). For simplicity, class ω1 may be referred to as the “positive” class and ω2 as the “negative” class. The 20×20 patches are de-meaned, variance normalized, and arranged as 400-dimensional vectors. For training, 2000 samples from ω1 and 51000 samples from ω2 were used. For testing, the number of samples used from ω1 is 1400 and from ω2 is 34000. The training data vectors from the face and non-face classes are used to compute the between-class and within-class scatter matrices, Sb and St, respectively. The optimal Fisher's discriminant criterion λ*F and the discriminant vector λ*F were computed using the algorithm of
With regard to
where the threshold T is adjusted to give the desired classification rate.
With regard to
where θc denotes the hard-thresholding function and the constant c controls the sparsity of the thresholded discriminant vector θc(φ*F).
To trade-off classification accuracy for lower computation, we choose a value of c to get a thresholded discriminant vector, θc(φ*F), with only 10 non-zero components, as shown in
The classification of face and non-face test samples using the rule. The blue broken line in each of
Table 1 depicts the classification accuracy as a function of sparsity of the thresholded discriminant θc(φ*F). The positive detection is fixed at 99%.
With regard to
The sparse projection vector φ*F260 was arranged as a 20×20 mask 901c.
This projection vector has only 11 non-zero components; hence the computation involved in projecting a test vector x onto the space of φ*F260 is only 11 multiplications and 10 additions. The data classification performance with φ*F260 as the discriminant direction is shown in 902c. The threshold T was adjusted to give a positive detection rate of 99%; the false positive rate achieved was 16.7%. For comparison, for a similar computational burden and positive detection rate, the hard-thresholded Fisher's feature θc(φ*F) 902b, yields a false positive rate of 42.0%
Generally speaking, a larger value of λ implies better class separability. The eigen vector corresponding to λF* may be dense, i.e. computationally inefficient, but may deliver higher classification accuracy. On the other hand, the eigen vector corresponding to λF*r is sparse, i.e. computationally efficient, but delivers lower classification accuracy. Thus, embodiments of the algorithm facilitate a tradeoff between classification accuracy and computational efficiency. Sparsity is generally larger for lower values of r, and Applicant has accordingly performed tests (described herein) to facilitate identification of the appropriate value of r.
Table 2 indicates the optimal value of the sparse discriminant criterion, λ*Fr, and sparsity of the discriminant projection, φ*Fr, as a function of the SMT-model order, r as applied to a particular set of face vs. non-facial recognition data.
Table 3 provides a comparison of the classification performance of traditional Fisher-LDA vs. SMT-LDA on a face/non-face dataset. The face detection rate is kept fixed at 99%. Generally, there exists a tradeoff between a positive detection rate and false positives. Parameters producing a higher positive detection rate also tend to produce more false positives, and vice versa. Optimal face detection algorithm parameters found 95% or higher faces in a given sequence of images, while preventing non-face regions being labeled as face regions. The number of multiplications required to compute the dot product in equation (35),i.e., (φ*Fr)Tx, equals the number of non-zero components in a discriminant feature vector θC(φ*F) or φ*Fr.
These experimental results demonstrate that the proposed applications of the Sparse Matrix Transform provides an excellent framework for controlling the trade-off between accuracy of classification and computational complexity of the classification algorithm.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
All of the processes described above may be embodied in, and fully automated via, software code modules executed by one or more general purpose or special purpose computers or processors. The code modules may be stored on any type of computer-readable medium or other computer storage device or collection of storage devices. Some or all of the methods may alternatively be embodied in specialized computer hardware.
All of the methods and tasks described herein may be performed and fully automated by a computer system. The computer system may, in some cases, include multiple distinct computers or computing devices (e.g., physical servers, workstations, storage arrays, etc.) that communicate and interoperate over a network to perform the described functions. Each such computing device typically includes a processor (or multiple processors or circuitry or collection of circuits, e.g. a module) that executes program instructions or modules stored in a memory or other non-transitory computer-readable storage medium. The various functions disclosed herein may be embodied in such program instructions, although some or all of the disclosed functions may alternatively be implemented in application-specific circuitry (e.g., ASICs or FPGAs) of the computer system. Where the computer system includes multiple computing devices, these devices may, but need not, be co-located. The results of the disclosed methods and tasks may be persistently stored by transforming physical storage devices, such as solid state memory chips and/or magnetic disks, into a different state.
This appendix demonstrates the claim made above that the solution to the constrained optimization problem in (29) can equivalently be found by solving the problem in (31).
From (29), since
φ*FrT(Sb−λ*FrSt)φ*Fr=0 (36)
If we select a vector φεE0r arbitrarily, then and since St is positive-definite,
And since St is positive-definite:
φT(Sb−λ*FrSt)φ≦0 (38)
Combining (36) and (38) we have
Thus we have
λ=λ*Frfr(λ)=0 (40)
Conversely, assume that λ0 is some constant in the interval [0, 1] for which
Also, for an arbitrary φεE0r, we have
φT(Sb−λ0St)φ≦0 (45)
Combining (44) and (46),
Thus, we have
λ0=λ*Frfr(λ)=0 (48)
In this appendix, we demonstrate the claims made above that the function fr(λ) on λε[0,1],
λ<λ*Frfr(λ)>0 (49)
λ>λ*Frfr(λ)=0 (50)
In this appendix, we prove only (49). Proof of (50) is similar. For an arbitrary λ in the closed interval [0, 1] and a fixed r, first assume
f
r(λ)>d0r,λ>0 (51)
e
0
r,λ
(Sb−λSt)e0r,λ>0 (52)
Since St is positive-definite, therefore
Thus we conclude
f
r(λ)>0λ<λ*Fr (54)
Now, conversely, suppose we arbitrarily select a λε[0, 1] such that λ<λ*Fr.
Since φ*FrεE0r, therefore
From above, we conclude
f
r(λ)>0λ<λ*Fr. (59)
This application claims the benefit under 35 U.S.C. Section 119(e) of co-pending U.S. Provisional Patent Application Ser. No. 61/415,228, filed on Nov. 18, 2010, by Hasib Siddiqui, entitled “Sparse Fisher Linear Discriminant Analysis” which application is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61415228 | Nov 2010 | US |