1. Technical Field
The present invention relates to data mining and, in particular, to methods and systems for extracting meaningful information from temporal event data.
2. Description of the Related Art
Temporal event mining is the process of identifying and extracting latent temporal patterns from large complex event data. The goal of event mining is to derive insight that leads to useful information and true knowledge for an accurate understanding of the underlying event processes and relationships. It is the transformation from data to information to true knowledge that is of interest to business, the science, and government.
In this regard, finding optimal knowledge representations is important when one talks about abstract models as surrogates of the real world that must be understood by humans. Knowledge representation (KR) and reasoning is an area of artificial intelligence whose fundamental goal is to represent knowledge in a manner that facilitates the inference process from knowledge. An optimal knowledge representation should exhibit the following characteristics: i) minimalism, ii) interpretability, and iii) novelty. Further, the knowledge representation should be commensurate with human constraints and abilities so people can quickly absorb, understand, and make most efficient use of complex event data. All attempts to produce such a knowledge representation to date have fallen short. One drawback of these attempts is that symbolic languages specify temporal knowledge a priori, which limits the flexibility of learning unknown patterns in temporal event data.
A method for event pattern mining includes representing longitudinal event data in a measurable geometric space as a temporal event matrix representation (TEMR) using spatial temporal shapes, wherein event data is organized into hierarchical categories of event type, and performing temporal event pattern mining with a processor by locating visual event patterns among the spatial temporal shapes of said TEMR using a constraint sparse coding framework.
A further method for event pattern mining includes mapping longitudinal event data from a probabilistic event space to a temporal event matrix representation (TEMR) in a measurable geometric space, wherein event data is organized into hierarchical categories of event type, performing temporal event pattern mining with a processor by locating visual event patterns among the spatial temporal shapes of said TEMR using a processor to perform online non-negative matrix factorization by clustering and ranking multiple entities into pattern groups, and refining the learned patterns using user feedback.
A system for event pattern mining includes an event matrix generator configured to represent longitudinal event data in a measurable geometric space as a temporal event matrix representation (TEMR) using spatial temporal shapes, wherein event data is organized into hierarchical categories of event type, and a temporal event data mining component configured to perform temporal event pattern mining with a processor by locating visual event patterns among the spatial temporal shapes of the TEMR using a constraint sparse coding framework.
These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
Temporal event mining derives insights from data that lead to useful information and true knowledge, allowing for a precise understanding of underlying event processes and relationships. A general framework for open-ended visual interactive temporal event pattern mining is presented herein, wherein the extracted latent temporal patterns and mining system are commensurate with human capabilities. In particular, the present principles produce interpretable patterns and low dimensional visual representations of complex event data that can be quickly absorbed by human beings. The present temporal event mining includes an open-ended learning system that is able to incrementally learn the latent temporal patterns on large-scale data. Also the mining framework accounts for missing values, data sparsity, multiple sources, mixed continuous/categorical variables, binary/multi/vectorial valued elements, and heterogeneous data domains.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Referring now to the drawings in which like numerals represent the same or similar elements and initially to
TEMR 106 allows the encoding of the temporal concepts of order, duration, coincidence, concurrency, synchronicity, and periodicity of time patterns such as events, periods, and trends. The use of geometric visual shape primitives provides a rich set of constructs to model, learn, and perform inference on the chosen representational space 108. Temporal operators for qualitative temporal reasoning have quantitative meaning in the measurable geometric space that allows temporal operators to be expressed in terms of geometric properties and distance metrics. These temporal operators may include, for example, before, after, close, equals, shortly after, and soon after. By using shape invariant metrics, semantic invariances can be modeled and introduced into the analysis. The chosen geometric representational space offers a wide set of tools which may be applied from the signal and image processing community.
In this way, the present principles provide a geometric knowledge representation 108 for temporal event mining. TEMR 106 maps the space-time dimensionality of a probabilistic event space 104 onto a measurable geometric space 108 by encoding event data as a structured spatial-temporal shape or point process. This projective mapping may be achieved by using a rich geometric visual symbol system forming a structured two-dimensional sparse matrix. This approach is exemplified herein with point processes assuming the most simplistic form of an event. Such point processes are modeled herein as a binary random variable.
Referring now to
Sparsity may arise in two contexts. First, binary event realizations, where a ‘1’ indicates that an event happened and a ‘0’ indicates that an event did not happen, usually are sparse in the number of events that fill the space. TEMR 106 thus encodes missing events within the measurable geometric space 108. As a result, the majority of the sparse two-dimensional matrix elements will have ‘0’ entries and only a few elements will have non-zero entries. This is referred to as data sparsity. Second, latent temporal event patterns also exhibit a sparse structure. Latent temporal patterns, shown as a set of rows and columns within the sparse two-dimensional matrix or alternatively as a sub-region within the matrix, can happen at sparse time instances as well. A sparse representation offers efficient storage and also aids interpretability at the data level and the model level. Learning a sparse representation of TEMR 106 is also referred to as learning a latent factor model. Sparsity is particularly relevant to model interpretability. To exemplify this, consider
The constrained convolutional sparse coding framework learns minimalistic and interpretable latent temporal event patterns from single and multiple spatial-temporal point processes (STPP). Double sparsity (i.e. the sparsity in the data as well as in the latent factor model) can be dealt with by using the beta-divergence. The beta-divergence is a parameterized family of cost functions that measures the difference between two probability distributions. The present principles use a special case of the beta-divergence, where β=0.5, since because this parameterization optimally copes with binary data. Convolutional sparse population coding is further introduced below. This coding is able to learn multiple group specific latent factors simultaneously. Stochastic optimization allows for large-scale incremental learning of multiple STPPs in a group.
Each event group itself contains a variable number of different event types. The horizontal lines in the Structured Event Category graph of
The fundamental unit of analysis is a single event from which a pair of events can be used to define an event interval or a set of ordered events to define an event sequence. Each event sequence belongs to a labeled category that is itself part of a hierarchical grouping 208. Multiple event sequences and hierarchies can be encoded with TEMR forming a multivariate representation, where the rows of the temporal event matrix encode event categories and the columns of the matrix the time domain. In what follows, such a multivariate event sequence may be viewed as a structured STPP.
The term sparse coding refers to the modeling of data x as a sparse linear combination of basis atoms Φ=[φ1, . . . , φk], with φiεRm×l and their weighting or activation coefficients α=[α1, . . . , αk]T. Consider a sparse linear data model with additive noise nεRm×l of the following form:
The input data xεRm permits a sparse approximation over a bases set ΦεRm×k, with k atoms determined by the weighting coefficients αεRk, where one can find a linear combination of a sparse set of bases atoms from Φ, such that it closely approximates the input data.
The sparse decomposition problem can be formulated as the sum of a loss function L(•) and a regularization term φ(•):
where λ denotes the sparsity including weighting factor for the regularizer and ∥•∥F denotes the Frobenius norm. By choosing φ(•) to be the l1-norm (e.g.,
equations 2-4 may also be interpreted as a basis-pursuit or lasso problem. To obtain a sparse representation, the l1-norm regularizer is convex and serves as a proxy for the NP-hard l0-norm. The objective in the above equations has no analytic closed form solution, such that iterative approaches are used. These iterative approaches may include greedy algorithms, homotopy, soft-thresholding, active set, or reweighted l2 methods. These methods are listed for the sake of example and are not intended to be limiting.
Non-negative matrix factorization (NMF) arises as an extension of large-scale non-negative least squares. NMF finds a “parts-based” or “sum-of-parts” linear representation of non-negative data. Given a non-negative data matrix X=[x1, . . . , x2]εR+m×n and a positive integer r=min(m,n), NMF computes a low-rank approximation:
with W being a factor matrix, H being a scores matrix, and r being a low-rank factor. Usually, r<<(m,n) (n+m)r<nm, and i, j, and k are matrix indices. The indices i and j refer to a matrix row and column respectively. The above equation decomposes a non-negative matrix into a product of two lower-rank non-negative matrices. The r columns of WεR+m×r are basis vectors and each row of HεR+r×n is a linear encoding representing the mixing coefficients or “weights” for each basis.
In vector notation, each n columns of X=[x1, . . . , x2]εR+m×n is a linear combination of the columns of W weighted by the row coefficients of H:
where w.i is the column vector of W. The conventional solution approach to eq. (6) is to solve the following constrained optimization problem:
where ∥•∥F is the matrix Frobenius norm. Various strategies may be used to minimize eqs (7) and (8), such as alternating least-squares, multiplicative update rules, projected gradient descent algorithms, and hybrid derivatives. These strategies are presented for the sake of example and are not intended to be limited in any way.
The NMF framework may incorporate sparsity by using a regularizer, as above in eq (3), which enforces sparsity in the coefficient matrix H. Non-negative sparse coding of a non-negative data matrix XεR+m×n may be implemented by formulating the optimization problem:
where the last constraint is a scaling constraint enforced on W to prevent trivial solutions of the factorization due to the sparsity inducing regularizer. A multiplicative update rule may be used for H and a projected gradient descent may be used to update W:
A transformation-invariant form includes introducing a linear transformation operator (or “shift” operator) T into the NMF objective. Transformation invariance can be incorporated into the sparse coding formulation described above by performing a stochastic optimization within a convolutional sparse coding framework to support large-scale factorization of single and multiple STPPs for common group learning. When β=0.5, parameterized beta-divergence achieves superior performance and can be extended to allow factorization of a common dictionary for group learning.
By using online NMF (ONMF), latent factors may be automatically updated by combining old factors with newly acquired data. At the same time, ONMF can discover the connections between old and new factors, allowing the evolutionary patterns of latent factors to be tracked naturally. In addition, when the available data is incomplete, latent factors produced by NMF may be incorrect. To prevent the partial-data problem in ONMF, a set of orthogonal constraints is imposed on all of the latent factors. Additionally, an orthogonal NMF algorithm according to the present principles guarantees the uniqueness of the NMF decomposition, which helps in tracking latent factors. The present approach incorporates stochastic optimization within a convolutional sparse coding framework by employing a special case of the parameterized beta-divergence. The update equations, which are discussed in greater detail below, employ multiplicative update rules.
The temporal event matrix representation (TEMR) 106 provided by the present principles and shown in
Temporal event pattern mining uses an efficient knowledge representation to facilitate intelligent reasoning and inference for knowledge discovery. An optimal knowledge representation should exhibit the following characteristics: i) flexibility, ii) minimalism, iii) interpretability, and iv) novelty. The knowledge representation should be commensurate with human capabilities, such that information can be quickly absorbed and understood to make the most efficient use of complex event data. Flexibility allows the knowledge representation to be used in many different application contexts. Efficiency allows for large-scale analysis and for environments with limited resources. Interpretability is important for an improved understanding of the representation and its underlying model as well as the ability to adapt and change. Novelty allows the representation to discover knowledge through the extracted patterns.
TEMR 106 comprises of a set of geometric shape primitives that symbolize event data in a two-dimensional sparse matrix representation. As noted above with respect to
As noted above with respect to
More formally, let E denote the event space, which resides in a compact subset S of Rn. A point process on S is a measurable map ξ:Ω→R from the probability space (Ω, F, P) to the measurable space (R, ), where Omega is the sample space, F and a sigma-Algebra, and P the probability measure. Stochastic point processes (SPPs) can describe gaps, aggregation, and inhibition in spatial data. Every point process ξ can be represented as
where δ denotes the Dirac measure, N is an integer-valued random variable, and Xi are random elements of S. We consider a stack of multiple point processes to form a measurable map ξ:Ω→Rc×t, where c indexes the event category and t indexes the time domain.
This leads to STPPs, which are random collections of points, wherein each point represents the time and location of an event. In comparison to a pure SPP, the temporal aspect of an STPP induces a natural point ordering that does not generally exist for SPPs. An STPP ξ may be mathematically defined as a random measure on a region S⊂R×R3 of space-time, taking values in the non-negative integers Z+.
Only binary random variables Xε[0,1] are considered herein. However, this is intended only for the purpose of illustration. Alternative variables, such as multi-valued random variables Xε[0, . . . , K] or vectorial representations may also be employed in accordance with the present principles. One can generalize an STPP to higher order primitives such as spatial line or shape processes. TEMR 106 allows one to incorporate such stochastic shape processes to provide a rich geometric symbol set of visual variables to encode different event entities and temporal concepts.
An event sequence εp,q=(et=1, et=2, . . . , et=T) may be defined as a univariate STPP ξ on R, characterizing a sequence of random variables Xi. Each event et is an element of the realization of ξ. The indices p and q index the event group level and event type level respectively (where such group and event type levels are illustrated above in
Multiple STPPs {ξi}i=1n form a three-way sparse tensor of the form Rc×t×n. An alternative representation of a group of STPPs ξ is a tensor unfolded view of the form Rc×(t*n+(n+l)*w, where multiple spatial-temporal point processes ξi are concatenated along the time domain. Multiple bi-, tri-, or multi-variate STPPs ξi form the group ξ on Rc×l×n, where c is the event type domain, t is the time domain, and n is the sample cardinality of ξ. Multiple groups ξ(t) form a population ξ={ξ(t)}i=tg of STPPs ξi.
In order to adapt TEMR to particular instances of ξi, ξ, or ξ, the matrix factorization framework discussed above is used to learn a minimalistic sparse representation suitable for temporal event pattern mining. In particular, the following temporal patterns are relevant: i) structured high activity patterns, ii) repeating patterns, iii) pattern trends, and iv) similar patterns in a group or population.
This may be formulated as finding a function ƒ:X→Y that maps an input space to an output space. Here X refers to the input space of ξiεS={ξ1, . . . , ξn} and Y refers to a hypothetical label space inducing a multi-pathway partitioning on X. Let ={(ξi,ŷi)}j=1n be a pair of spatial-temporal point processes ξiεRc×t and partition encodings ŷiεZk, where ŷi is unknown.
First, a minirnalistic efficient sparse representation Θ={Φ,α} of ξi is generated that captures the latent structure for individual and group-based event realizations in S:
Since n is large, a stochastic optimization scheme is used where data samples arrive in a sequential manner and Θ is adaptively learned from ξi incrementally:
Θt+1=Θt+O(ξi,Θ,α).
Given , ŷ is found such that the learned approximation to ƒ groups similar entities to the same hypothetical label class of ŷ. Given an instance of ξiεS, a ranking can be found that returns the most similar instances of ξj≠i to ξi.
A dynamic data model is used to learn ξ. The static linear data model in eq (1) does not account for the temporal aspect encoded in ξ. A dynamic linear data model is used having the form:
where ξεRn×t, ΦεRu×v×k, αεRk×t, * denotes a shift-invariant convolutional operator, M denotes the shift dimension, and r denotes the shift index.
Referring now to
A shift-invariant dictionary Φ 304 may be found given a realization of ξ, as well as an associated sparse code α 306 of Φ 304. The emphasis is on learning an efficient minimalistic representation Θ with θ={Φ,α) of ξ 302 such that a 306 exhibits a sparse structure. Learning Θ can be achieved by coupling an approximation error objective with a sparseness constraint on the weighting coefficients, where the data model exhibits a convolutional form:
Eq (21) is also known as the shift-invariant or convolutional sparse coding problem. Assuming a sparse data matrix XεRc×t as a realization of ξ 302, one can rewrite eq (21) into a convolutional sparse matrix factorization problem:
where ∥•∥F denotes the Frobenius norm, * denotes the 2D convolutional operator, and ƒ denotes a sparseness measure on the coefficient matrix H (e.g, maximum absolute column sum norm, spectral norm, or maximum absolute row sum norm). Instead of ∥•∥F, other loss functions may be defined, such as the alpha-divergence, beta-divergence, or gamma-divergence. The beta-divergence will be discussed herein, but this is intended for the purpose of illustration and is not meant to be limiting.
The beta-divergence for R[0,1] is defined as
where X≈W*H. Special cases of the beta-divergence are the generalized Kullback-Liebler divergence (KLD), the Itakura-Saito divergence (ISD), and the Euclidean distance (ED).
The scale invariant property
dβ=0(γX,γ{tilde over (X)})=dβ=0(X,{tilde over (X)}),∀γ>0 (28)
of ISD as the limiting case of the beta-divergence (β=0) allows one to find basis components of differing complexity, whereas the ED and KLD penalizes basis components of low energy. To account for double sparsity, (having sparsity in the data as well as in the latent factor model, one may regularize the divergence by inducing a double sparsity constraint on W and H. The double sparsity constraint further allows one to build an over-complete bases set, where the factorization rank k can be defined such that k is larger than the actual bases' elements in the data. The over-complete bases representation addresses the rank selection problem where irrelevant basis elements are squashed to be zero and only a few basis elements that are supported in the data are retained. The generalized form of eqs 22 and 23 for the beta-divergence is
The joint objective of eqs 29 and 30 is non-convex overall, but convex with respect to W and H individually. For individual patterns, the objective function of equations 29 and 30 comprises a loss function that measures the approximation error of the factorization and a regularizer φ on the individual temporal patterns W and the individual activation codes H. λ1 and λ2 are weighting factors that influence the strength of the regularizer and trades-off the approximation error vs. the induced constraints. The problem uses an alternative optimization, such as block coordinate descent, where each factor is optimized in an alternate fashion. By setting φi, to be ∥•∥2 or ∥•∥1, smoothness or sparsity can be enforced in trade-off with the approximation error of the factorization. Update rules for the doubly sparse convolutional decomposition model can be obtained by taking the gradient of L with respect to factors W and H diagonally resealing the gradient with an appropriate learning rate η:
W=W−ηW∇WL (31)
H=H−ηH∇HL (32)
where ηW and ηH are learning rates that define the step size of the alternating gradient descent walk. For clarity, the non-convolutional forms of
are described first. The update rules for a static linear data model of eq (12) with the Frobenius loss can be derived, resulting in:
where a ‘∘’ (the Hadamard or Shur product) and ‘/’ are element-wise operators and ε is a small constant that is added to the denominator to prevent division by zero. The additional regularizers φ1 and φ2, require a scaling constraint on the factors W and H to prevent trivial solutions in the factorization. Normalization constraints on both factors can be enforced to balance the terms of the objective function. W uses unit norm bases, while H uses unit norm coefficients. The update rules for equations 26 and 27 are
These equations are particular instances of the general beta-divergence, where the generalized update equations take the form
The convolutional form of eqs 39 and 40 can be interpreted as finding T different factorizations {W}t=1T and {H}t=1T. Here T denotes the size of the time window that is used to capture the dynamic structure of the spatial-temporal process ξ. The convolutional form of eqs 39 and 40 may be written in the form of a zero-one shift matrix.
Shifting matrices are zero-one matrices that have useful properties for the analysis of time series models. A shifting matrix S is a n×n matrix of the form
where S1 is the first associated shifting matrix out of n shifting matrices. S1 has off-diagonal entries of value one and zero elsewhere. The pre- or post-multiplication of a matrix H with S1 shifts the entries of H by one row or column respectively. The operator fills up the empty space that is created due to the shift with zero values. Shifting can be performed in a bi-directional manner. To right- and left-shift H by n shifts, one can use:
Hn→=HSnT (42)
H←n=HSn. (43)
Thus, in convolutional form, equations 39 and 40 can be written as:
where WWT results from the normalization constraint on Wnormalized=∥W∥F. The complete tensor W may be normalized by its l2-norm, the Frobenius norm. The additional regularizers φ1 and φ2 require a scaling constraint on the factors W and H to prevent trivial solutions in the factorization. In general, normalization constraints on both factors can be enforced to balance the terms of the objective function. W needs to have unit norm bases. This normalization constraint is sufficient for the standard sparse coding framework, but insufficient for coping with the double sparsity problem. This factorization allows the learning of an efficient minimalistic representation of ξ 302.
Whereas one could now use the latent factor model to perform group based analysis using a clustering scheme, the non-convex and non-unique objective of equations (29) and (30) impose problems regarding the reproducibility of the learned representation. In this regard, our goal is to learn the hidden group structure jointly given ξ, or ξ.
Multiple spatial-temporal point processes form a 3-way tensor where individual realizations of ξi are stacked up on top of each other. To learn a group of multiple spatial point processes ξ={ξ}i=1nεRc×t×n, several different strategies may be employed. Referring now to
A learning process for a group of STPPs may be of the form:
where X(n)εRc×t×n, WεR+u×v×k, and H(n)ε+k×t×n. In this setting, an efficient minimalistic group representation of ξ402 arises that is able to account for variable length STPPs ξi. Using the multiple TEMRs 302 in ξ402, the objective function of equations (48) and (49) use a W that includes population temporal patterns and an H(i) that includes population temporal codes. The update equations for learning W and H(i) are
A(i)=(WH(i)StT)β-1 (50)
B(i)=X(i)∘(WH(i)StT)β-2, (51)
with the partial derivative being
For H(i):
A stochastic gradient descent scheme is used to learn the group structure of ξ402 in an incremental fashion. Exemplary methods are shown herein, though it is contemplated that other learning schemes might be used. Referring now to
Block 520 begins a new loop for the W matrices, initializing an index tW=1. Block 522 updates W according to equation (54). Block 524 normalizes W again for all k elements. Block 526 again computes L(X(n), W, H(n)). Block 528 checks for convergence. If convergence is found, the loop breaks and processing moves on to block 534. If not, the loop checks whether tW has reached TW. If so, the loop ends and processing moves to block 534. If not, block 532 increments tW and returns processing to block 522. Block 534 determines whether the overall index t has reached T. If not, block 536 increments t and processing returns to block 504. If so, block 538 returns the group structure Θ is returned.
Referring now to
Referring now to
Having described preferred embodiments of a system and method for mining temporal patterns in longitudinal event data using discrete event matrices and sparse coding (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments disclosed which are within the scope of the invention as outlined by the appended claims. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
6718317 | Wang et al. | Apr 2004 | B1 |
6907426 | Hellerstein et al. | Jun 2005 | B2 |
7526461 | Srinivasa et al. | Apr 2009 | B2 |
7644078 | Sastry et al. | Jan 2010 | B2 |
7644079 | Sastry et al. | Jan 2010 | B2 |
8037010 | Jaros et al. | Oct 2011 | B2 |
8185481 | Long et al. | May 2012 | B2 |
8219507 | Jaros et al. | Jul 2012 | B2 |
8285667 | Jaros et al. | Oct 2012 | B2 |
20060106797 | Srinivasa et al. | May 2006 | A1 |
20090018994 | Hajdukiewicz | Jan 2009 | A1 |
20090228443 | Lapin et al. | Sep 2009 | A1 |
20090281838 | Lynn et al. | Nov 2009 | A1 |
Entry |
---|
O'Grady P. et al., “Discovering speech phones using convolutive non-negativematrix factorisation with a sparseness constraint”, Neurocomputing 72 (2008) 88-101. |
Cichocki A. et al., “Fast Local Algorithms for Large Scale NonnegativeMatrix and Tensor Factorizations”, IEICE Trans. Fundamentals, 2009. |
Lee N. et al., “Mining electronic medical records to explore the linkage between healthcare resource utilization and disease severity in diabetic patients”, 2011 First IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology, pp. 250-257. |
Wang F. et al., “A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, No. 2, Feb. 2013, pp. 272-285. |
Bengio, S., et al., “Group Sparse Coding”, Neural Information Processing Systems Foundation; Poster Session, Dec. 2009. pp. 1-8. |
Chen, S., et al., “Atomic Decomposition by Basis Pursuit”, 2001 Society for Industrial and Applied Mathematics; SIAM Review, vol. 43, No. 1, Feb. 2001. pp. 129-159. |
Cichocki, A., et al., “Non-Negative Tensor Factorization Using Alpha and Beta Divergences”, Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, IEEE International Conference; Apr. 2007. 4 pages. |
Eggert, J., et al., “Sparse Coding and NMF”, 2004 IEEE, Jul. 2004. pp. 2529-2533. |
Eggert, J., et al., “Transformation-Invariant Representation and NMF”, Proc. Int. Joint Conf. on Neural Networks IJCNN 2004, Jul. 2004. pp. 2535-2539. |
Fevotte, C., et al., “Algorithms for Nonnegative Matrix Factorization With the B-Divergence”, hal-00522289, version 3, Oct. 2010. pp. 1-20. |
Hoyer, P., “Non-Negative Sparse Coding”, Neural Networks for Signal Processing XII, Proc. IEEE Workshop on Neural Networks for Signal Processing, Martigny, Feb. 2002. 8 pages. |
Lee, D., et al., “Algorithms for Non-negative Matrix Factorization”, IIn Advances in Neural Information Processing Systems 13, Apr. 2001. 7 pages. |
Lee, H., et al., “Group Nonnegative Matrix Factorization for EEG Classification”, Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, vol. 5 of JMLR: W&CP 5, Apr. 2009. pp. 320-327. |
Lin, J., et al., “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms”, DMKD'03, ACM1-58113-763-x, Jun. 2003. pp. 2-11. |
Moerchen, F., et al., “Efficient Mining of Understandable Patterns From Multivariate Interval Time Series”, Data Mining and Knowledge Discovery vol. 15 Issue 2, Oct. 2007. pp. 1-39. |
Moerchen, F., et al., “Robust Mining of Time Intervals With Semi-Interval Partial Order Patterns”, In Proceedings SIAM Conference on Data Mining (SDM), Apr. 2010, pp. 315-326. |
Moerchen, F., “Time Series Knowledge Mining”, KDD '06 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ISBN:1-59593-339-5, Aug. 2006. 178 pages. |
O'Grady, P., et al., “Discovering Convolutive Speech Phones Using Sparseness and Non-Negativity”, Proceedings of the Seventh International Conference on Independent Component Analysis, Sep. 2007. pp. 520-527. |
Smaragdis, P., “Non-Negative Matrix Factor Deconvolution; Extracation of Multiple Sound Sources from Monophonic Inputs”, International Congress on Independent Component Analysis and Blind Signal Separation, ISBN: 3-540-23056-4, vol. 3195/2004, Sep. 2004. pp. 494-499. |
Number | Date | Country | |
---|---|---|---|
20120191640 A1 | Jul 2012 | US |