Discriminant feature extraction is an important topic in pattern recognition and classification. Current approaches used for linear discriminant feature extraction include Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Applications for PCA and LDA include pattern recognition and computer vision. These methods use a vector-based representation and compute scatters in a Euclidean metric, i.e., an assumption is made that the sample space is Euclidean, where an example metric is a function that computes a distance or similarities between two points in a sample space.
Despite the utility of these subspace learning algorithms, the reliance on a Euclidean assumption of a data space when computing a distance between samples has drawbacks, including the potential of a singularity in a within-class scatter matrix, limited available projection directions, and a high computational cost. Additionally, these subspace learning algorithms are vector-based and arrange input data in a vector form regardless of an inherent correlation among different dimensions in the data.
In one nonlinear approach, Linear Laplacian Discrimination (LLD), weights are introduced to scatter matrices to overcome the Euclidean assumption, however, the weights are defined as a function of distance and therefore still use an a priori assumption on a metric of the sample space.
Accordingly, various embodiments for tensor linear Laplacian discrimination (TLLD) for feature extraction are described below in the Detailed Description. For example, one embodiment comprises generating a contextual distance based sample weight and class weight, calculating a within-class scatter using the at least one sample weight and a between-class scatter for multiple classes of data samples in a sample set using the class weight, performing a mode-k matrix unfolding on scatters and generating at least one orthogonal projection matrix.
This Summary is provided to introduce concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
In example system 100, computing device 110 includes an input, a memory 120, and a processor 115 in communication with the input and memory 120 and is configured to generate projection matrices 190. Computing device 110 further includes a computer program 130, and a weight generator module 140 to receive at least one data sample 107 from a set of data samples 105, and generate a sample weight 142 based on a contextual distance for each of the plurality of data samples. Additionally, weight generator module 140 may generate a class weight 144 based on a contextual distance for a first class of data samples, as will be described in the following description in more detail.
Computing device 110 may also include a scatter module 160 in communication with the weight generator module 140, the scatter module 160 being configured to receive the at least one sample weight 142 and a class weight 144, and then calculate a within-class scatter 162 using the at least one sample weight 142, and to calculate a between-class scatter 164 for multiple classes of data samples using the class weight 144.
Computing device 110 may also have an unfolding module 150 coupled with the weight generator module 140 and the scatter module 160 and generate one or more scatter matrices 155. The unfolding module 150 is configured to perform a mode-k matrix unfolding 152 on the within-class scatter 162 to generate a mode-k within-class scatter matrix. The unfolding module 150 is also configured to perform a mode-k matrix unfolding on the between-class scatter 164 to generate a mode-k between-class scatter matrix.
A projection matrix module 170 may also be configured with the weight generator module 140, the unfolding module 150, and the scatter module 160, in computing device 110. The projection matrix module 170 may be used to generate at least one orthogonal projection matrix 172 using the mode-k within-class scatter matrix and the mode-k between-class scatter matrix, as described in the following description.
Some embodiments may use a Tensor Linear Laplacian Discrimination (TLLD) method for non-linear feature extraction from tensor data. TLLD is a non-linear feature extraction technique utilizing the tensor nature of data, is relatively independent on any metric assumptions of a subject sample space, and improves parameter tuning resolution. In following paragraphs, definitions of some tensor operations are provided and an embodiment formulation of TLLD is then described.
Tensors have some features that may be applied favorably to feature extraction.
An illustration of an order-3 tensor's matrix unfolding 400 is shown in
A TLLD discriminative feature extraction approach operates without unfolding tensors into vectors, and reduces within-class variance and increases between-class variance of low dimensional features after projections. In one example, let the samples in an order-n tensor representation be Xi, i=1,2, . . . , N, where N is the number of samples. If si is assigned as the class label of Xi, Ns is a number of samples in an sth class, and the total number of classes is c, group of orthogonal projection matrices Uk ∈ Rm
Y
i
=X
i
x
1
U
1
T
x
2
U
2
T
. . . x
n
U
n
T
, i=1,2, . . . ,N (1)
have minimal with-class variance and maximal between-class variance.
Therefore, a within-class scatter may be calculated according to following formula:
where
denotes a centroid of the sth projected class, and wi is the weight for the ith sample. Similarly, a between-class scatter may be defined as:
where ws is the weight for the sth class. Some example approaches to calculate wi and ws will be presented below. Next, orthogonal projection matrices Uk, such that α is minimized and β is maximized are calculated. One approach may use Fisher's criterion, where
However, it is non-trivial to solve equation (4) for Ui (i=1,2, . . . ,n) at the same time. Some embodiments may use iterative methods to solve equation (4), for example α and β may be reformulated using mode-k unfolding, according to the following formula:
where Zi=Xix1U1Tx2U2T . . . xk−1Uk−1Txk+1Uk+1T . . . xnUnT, and (Zi−
Furthermore, β may be mode-k unfolded as follows:
where
In this way, a within-class scatter and a between-class scatter may be generated where:
α=tr(UkTSw(k)Uk), and β=tr(UkTSb(k)Uk), (7)
where the formula
provides the mode-k within-class scatter matrix and
is the mode-k between class scatter matrix.
Then, Uk may be solved successively in the following equation,
by fixing the rest Ui's to prepare Sb(k) and Sw(k), and repeating this procedure until convergence.
In some embodiments, weights wi and ws may be generated in the following forms:
where d(·.·) is some distance, t is the time variable, and Ωs={Xt|si=s} and Ω={Xi|i=1,2, . . . , N} are the sets of an sth class and all samples, respectively.
In this way, weights may be defined based on the structure of data, rather than on a Euclidean distance between data samples. In one embodiment, a contextual distance may be used and the weights calculated based on this contextual distance. Contextual distance may be defined on a contextual set X of nearest neighbors of a sample x. In this way, contextual distance is related to the contribution of the samples to a structural integrity of the contextual set, which may be depicted by a structural descriptor f, which may be scalar or vector valued, as examples. As a descriptor f(X) is an intrinsic structural characterization of the set X, if x complies with the structure of X, then removing x from X will have limited effect on overall structure. In contrast, if x is an outlier or a noise sample, then removing x from X will likely change the structure significantly. In this way, the contribution of x to the structure of X may be measured by
δf=f(X)−f(X\{x}). (10)
Therefore, a distance from x to X may be defined as:
d(x,X)=∥δf∥=∥f(X)−f(X\{x})∥. (11)
Therefore, the weights in equation (9) may be defined according to:
d(Xi, Ωs
d(Ωs, Ω)=∥f(Ω)−f(Ω\Ωs)∥, (12)
However, to utilize contextual distance based weights, an appropriate structural descriptor may be used. Therefore, a centroid descriptor may be defined as
where |Ω| is the cardinality of Ω. Therefore, a coding length descriptor may be f(Ω)=L(Ω), where L(Ω) is the minimal number of bits to encode data in Ω, up to a tolerable distortion ε, where:
where X=[x1,x2, . . . ,xN] is the data matrix of samples in Ω with each sample represented by an m-dimensional vector,
Unfortunately, these two descriptors are not particularly suitable for a TLLD approach due to the centroid descriptor inherently assuming a Euclidean sample space while a current formulation of coding length is vector-based. Therefore, to match the tensor nature of TLLD, a tensor coding length may be generated.
To generate a tensor coding length, each tensor is developed to a vector and then a mode-k coding length is computed for the set of these vectors:
L
(k)(X)=L({(X1)(k),(X2)(k), . . . ,(XN)(k),}) (14)
where X={X1,X2, . . . ,XN} and (Xi)(k) is the mode-k unfolding of Xi. Then, a tensor coding length of X may be defined as the following vector:
L(X)=[L(1)(X),L(2)(X), . . . ,L(n)(X)]T. (15)
An example of a tensor coding length may be computed by using the following empirically chosen tolerable distortion in equation (13):
Now the parameter t in equation (9) may be determined. In an LLD approach, this parameter may be difficult to tune as its value may vary significantly for different applications. However, in a TLLD approach as described herein, t can be rescaled as: t=t′σw for wi and t=t′σb for ws, respectively, where
and wherein an example t′ may be around 1. This treatment easily simplifies the parameter tuning for t. An embodiment of a TLLD method will next be described with reference to
Method 200 also comprises calculating a within-class scatter for a class of data samples in a sample set, wherein the within-class scatter is calculated using the at least one sample weight from block 210, and calculating a between-class scatter for multiple classes of data samples using the class weight from block 210, as indicated in block 220.
Next, method 200 comprises performing a mode-k matrix unfolding on the within-class scatter to generate a mode-k within-class scatter matrix, and also performing a mode-k matrix unfolding on the between-class scatter to generate a mode-k between-class scatter matrix, as indicated in block 230.
Next, in block 240, method 200 comprises generating at least one orthogonal projection matrix using the mode-k within-class scatter matrix and the mode-k between-class scatter matrix. In some embodiments, method 200 further comprises generating a tensor coding length for each of the tensor based data samples.
It will be appreciated that the embodiments described herein may be implemented, for example, via computer-executable instructions or code, such as programs, stored on a computer-readable storage medium and executed by a computing device. Generally, programs include routines, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. As used herein, the term “program” may connote a single program or multiple programs acting in concert, and may be used to denote applications, services, or any other type or class of program. Likewise, the terms “computer” and “computing device” as used herein include any device that electronically executes one or more programs, including, but not limited to, personal computers, servers, laptop computers, hand-held devices, cellular phones, microprocessor-based programmable consumer electronics and/or appliances, and other computer image processing devices.
It will further be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated may be performed in the sequence illustrated, in other sequences, in parallel, or in some cases omitted. Likewise, the order of any of the above-described processes is not necessarily required to achieve the features and/or results of the embodiments described herein, but is provided for ease of illustration and description.
The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.