The efficient processing and monitoring of large amounts of data for anomalies, associations, and clusters is becoming increasingly important as governments, businesses, entities and individuals store and/or require access to growing amounts of data.
This data is often stored in databases. Effectively monitoring data for anomalies, association and clustering has numerous applications. Examples of such applications include network intrusion detection, credit card fraud, calling card fraud, insurance claim and accounting inefficiencies or fraud, electronic auction fraud, cargo shipment faults, and many others. In addition to revealing suspicious, illegal or fraudulent behavior, anomaly detection is useful for spotting rare events, as well as for the vital task of data cleansing or filtering.
Traditional approaches to anomaly, association and clustering detection have focused on numerical databases, while approaches for categorical databases are few. Typically, numerical databases can be converted into categorical form, but categorical databases are often difficult and expensive to convert into numerical form.
Embodiments of the invention provide techniques for dynamic anomaly, association and clustering detection.
For example, in one embodiment, a method comprises the following steps. At least one code table is built for each attribute in a set of data containing one or more attributes. One or more clusters associated with one or more of the code tables are established. One or more new data points are received. A determination is made if a given one of the new data points is an anomaly. At least one of the one or more code tables is updated responsive to the determination. At least one of the building, establishing, receiving, determining and updating steps are performed by a processor device.
Further embodiments of the invention comprise one or more of the following features.
The determining step comprises estimating a threshold compression cost for each of the one or more clusters, calculating the compression cost of the given one of the new data points for each of the clusters, and comparing the compression cost of the given one of the new data points with the threshold compression cost for each of the one or more clusters. When the compression cost of the given one of the new data points is greater than the threshold compression cost for each of the one or more clusters, the given one of the new data points is an anomaly.
The method further comprises the step of tracking the detection of anomalies over a period of time. The determining step is based at least in part on the tracked detection of anomalies.
The step of establishing one or more clusters comprises creating a matrix of data points and code words for each of the one or more code tables and inferring at least one data cluster from at least one of the matrices.
Advantageously, one or more embodiments of the invention allow for efficient dynamic anomaly, association and clustering detection in databases using dictionary based compression.
These and other embodiments of the invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
Illustrative embodiments of the invention may be described herein the context of an illustrative method of dynamic anomaly, association and/or clustering detection in a database. However, it is to be understood that embodiments of the invention are not limited to the illustrative databases or methods described but are more broadly applicable to other suitable methods, databases and data storage systems.
Embodiments of the invention address the problem of dynamic anomaly detection in categorical databases using dictionary based compression. One or more embodiments of the invention use compression as an efficient way to spot anomalies, association and clustering. The norm of the data in a database defines the patterns that compress the database well, and thus any data point that cannot be compressed well can be defined as abnormal (i.e., anomalous, extreme, rare, interesting, suspicious, outlier, etc.).
One or more embodiments of the invention may be implemented in a method, referred to as Multi-Krimp in this description. The Multi-Krimp technique uses a collection of dictionaries to encode a given database. Dictionaries may also be referred to as code tables in this description. Multi-Krimp exploits correlations between the features in a database, groups the features that have high information gain together, and builds a dictionary for each group of features. The dictionaries capture the frequent patterns in a given database, and the higher the frequency of a pattern, the shorter its encoding length becomes. Multi-Krimp finds the optimal set of dictionaries that yields the minimum total encoding (compression) cost in bits.
One key feature of the Multi-Krimp approach is that it is parameter free; it employs the Minimum Description Length (MDL) principle to handle the trade-off between the savings in bits from encoding features in groups and the overhead in bits from having a possible larger dictionary for a group of features. Therefore, the number of groups as well as the assignment of features to groups is decided automatically.
MDL is a model selection criteria based on lossless compression principles. More specifically, given a set of models , MDL selects the best (MDL-optimal) model Mε which minimizes:
L(M)+L(D|M), (1)
in which L(M) is the length in bits of the description of model M, and L(D|M) is the length of the description of the data, encoded by M. Therefore, the MDL-optimal compressor for a database D encodes D most succinctly among all possible compressors.
In order to use the MDL principle in the Multi-Krimp approach, it is necessary to define the collection of models and how to encode the data with a model and encode the model itself. The Multi-Krimp approach takes a dictionary, or look-up/code table, based compression approach to encode a given database.
The following is a description of how to encode a database using a single code table. For a code table CT, feature sets are ordered by length and support. The support of a feature set s in a database D is simply |dεD|s⊂d|. The length of a code word of a feature set depends on the database that is compressed. The more often a code word is used, the shorter its length should be. This is shown in the code tables 203 of
Given the usages of the feature sets in a code table, the lengths of the code words can be computed using the Shannon entropy from information theory. The Shannon entropy gives the optimal length for a prefix code s as
The compression cost of the encoding of a data point is simply the sum of the code lengths of the feature sets in its cover, that is,
The total length in bits of the encoded database is then the sum of the lengths of the encoded data points.
To find the MDL-optimal compressor, the compressed size of the database and the size of the code table must be taken into account. The size of the code word column in a given code table CT that contains the prefix code word s is the sum of their lengths. For the size of the features set column, all the singleton items in the feature set must be considered. For the encoding of these individual items, the frequency of their appearance in the feature set column is calculated. Arithmetic encoding is used for their optimal compression cost. Specifically, the encoding of a feature set column in a code table requires cH (P) bits, where c is the total count of singleton items in the features sets, H(.) denotes the Shannon entropy function, and P is a multinomial random variable with the probability
in which ri is the number of occurrences of a singleton item i in the feature set column. In some embodiments, an ASCII table providing the matching from the (arithmetic) codes to the original names may be provided for the actual items. Since all such tables are over , this only adds an additive constant to the total cost. The length of a code table is thus
The Multi-Krimp approach used in embodiments of the invention uses multiple code tables, rather than a single code table as described above. A set of data points in a multi-dimensional feature space may be highly correlated (have high information gain) and thus can compress well together. By exploiting correlations among feature groups and building a separate code table for each partitioning of features, Multi-Krimp improves on the above approach which uses a single code table.
The object of the Multi-Krimp approach is to minimize a compression cost for a set of data. For example, let F be a set of features and let D be a set of data points (a database) over F (i.e., dεD is a F dimensional feature vector). The goal is to find a grouping S1, S2, . . . , Sk of F and a set of associated code tables CT1, CT2, . . . , CTk such that the total compression cost in bits is minimized
Where ds
The number of feature groups k is not a parameter of the Multi-Krimp approach, but rather is determined by MDL. In particular, MDL ensures that there will not be two separate code tables for a pair of highly correlated features as it would yield lower data cost to encode them together. On the other hand, combining feature groups may yield larger code tables, that is higher model cost, which may not compensate for the savings from the data cost. In other words, Multi-Krimp groups features for which the total encoding cost given in (6) is reduced. MDL is used to find which features to group together as well as how many groups there should be.
The search space for finding the optimal code table for a given set of features, yet alone for finding the optimal grouping of features is very large. Finding the optimal code table for a set of |Si| features involves finding all the possible feature sets with different value combinations up to length |Si| and choosing a subset of those feature sets that would yield the minimum total cost on the part of the database induced on Si. Furthermore, the number of possible groupings of a set of f features is the well-known Bell number Bf. While the search space is prohibitively large, it does not have a structure or exhibit monotonicity properties which could help prune it. As a result, Multi-Krimp is a heuristic algorithm.
The basic methodology of Multi-Krimp is now described. Given a set of data in a data table, a code table is built for each feature (attribute) in the data table. These initial code tables may be referred to as elementary code tables. Next, two code tables are selected and merged. A determination is made as to whether to accept or reject the merged code table. If a determination is made to accept the merged code table, the merged code table is stored with the elementary code tables and may be selected in a future iteration. If a determination is made to reject the merged code table, the merged code table is discarded. These steps are repeated as directed using an MDL principle. Once the elementary code tables have been merged into the number of attribute groups specified by the MDL principle, the process ends.
As is often the case, some features of data points are highly correlated (e.g., the age of a car and its fuel efficiency, the weather temperature and flu outbreaks, etc.). In such cases, it may be advantageous to group these features together with one CT as it would be far less costly to combine them than to encode them separately.
Given two sets of random variables (in this example feature groups) Si and Sj, the average number of bits saved when compressing Si and Sj together instead of separately is the information gain (IG)
IG(Si,Sj)=H(Si)+H(Sj)−H(Si,Sj)≧0, (7)
in which H(.) denotes the Shannon entropy. In fact, the IG of two sets of variables is always non-negative (zero when the variables are independent from each other), which implies that the data cost would be the smallest if all the features were represented by a single CT. On the other hand, the objective function (6) also includes the compression cost of the CT(s). Having a large CT with many (possibly uncorrelated) features might require more bits for model cost than the savings in bits it would give in data cost. Therefore, the algorithm uses IG as a guide to point out good candidate feature sets to be merged, and essentially employs MDL to decide whether the total cost is reduced and whether or not to approve the merge.
The iterative process begins by computing the IG matrix (
Next, all the unique rows of the database induced on the concatenated feature subspace Si|Sj are found (
During the inner iterations, the algorithm may try to insert all the candidates or stop after a certain number of iterations have not improved the total cost any further for speed. In any stop case, if there have been no successful insertions that reduced the total cost, then the merge is rejected and the new CTi|j, is discarded. Otherwise the new CTi|j is added to the collection of the current CTs after CTi and CTj are dropped. The IG between the new feature group and the rest are computed and the algorithm continues to search for possible merges. The search terminates when there are no more pairs of feature groups that can be merged for reduced cost.
In the
In some embodiments, no particular data structure is used and instead an integer vector of usages is kept. In such a case, step (1) above needs to performed on the fly scanning the entire database once and possibly using many linear scans and comparisons over the unique rows found so far in the process. Step (2) above would thus require a linear scan over the feature sets in a code table for each new insertion. The total computational complexity of these linear searches depends on the database, however, with the outer and inner iteration levels this may become computationally infeasible for very large databases.
In other embodiments, a sparse matrix C for feature sets versus data points is used instead of an integer vector of usages. The binary entries cji in the sparse matrix C indicate whether data point i contains feature set j in its cover. The row sum of the C matrix gives the usages of the feature sets. Using matrix C, step (1) above works as follows. Say that feature groups Si and Sj are to be merged. Let Ci denote the fi×n matrix for CTj. The number of usages of unique rows (merged feature sets) in the database under the merged feature subspace Si|Sj is obtained by multiplying Ci and CjT into a fi×fj matrix I, which is an O(finfj) operation. Note that the actual number of occurrences of the merged feature sets in the database is an upper bound on the usages obtained by this multiplication, however it still serves as a good approximation for the
In certain embodiments of the invention, the Multi-Krimp technique may be used to detect anomalies. In a given code table, the feature sets with short code words corresponding to high usage represent the patterns in the data that can effectively compress the majority of data points. In other words, these feature sets capture the patterns summarizing the norm of the data. On the other hand, feature sets with longer code words are rarely used and thus encode the sparse regions in the data. Consequently, the data points in a database can be scored by their encoding cost for anomalousness.
One or more clusters are then established 502 for the set of data. The Multi-Krimp technique described above may be used for cluster detection. Each point in a database is encoded with a collection of feature sets from each code table. The feature sets used in the encoding of a data point are referred to as the cover. Clusters can be detected based on the similarity or overlap between the cover of a group of data points. Clusters may be detected for groupings of data points in different contexts as well. For example, clusters may be detected for different code tables, which is referred to herein as contextual clustering.
Returning to the methodology 500, new data is received 503.
For each new data point, the methodology 500 determines 504 if the data point is an anomaly.
In some embodiments, Ctotal may be computed as follows. Given a set of code tables CT1, . . . , CTk returned by the Multi-Krimp algorithm of
The scores of the data points can be computed and then sorted to report the top k data points with highest scores as possible anomalies. Detecting such data points with extreme or rare features in a given, static database is often referred to as “data filtering” or “data cleansing.” Another task in anomaly detection is dynamically spotting anomalous data points that arrive over time. The Multi-Krimp compression method is quite flexible and can also handle dynamic data. For example, a newcoming data point d may be considered anomalous if its compression cost score(d) is more than three standard deviations away from the mean of the scores in the database. That is,
Again returning to methodology 500, if a determination is made that given new data point is an anomaly, a new cluster is established 505. In some embodiments, a new code table is built if an anomaly is detected. If an anomaly is not detected, the methodology 500 determines 506 the cluster membership of the new data point. Existing code tables are then updated 507 to reflect the new data points. Steps 503-507 are repeated for each new data point, and the process ends 508.
The Multi-Krimp technique can also be used to exploit correlations among features of a database and partition features into groups. A separate code table is built for each group of features (attributes). A similar method may be used for data points to perform association detection, without necessarily building a set of code tables.
In some embodiments, the merging step 1101 of methodology 1100 is performed by sorting the attribute groups according to the information gain of each of the attribute groups and merging two or more attribute groups when a merged compression cost is less than the sum of the compression costs for the two or more attribute groups.
In some embodiments, the splitting step 1102 of methodology 1100 is performed by calculating an average compression cost for each of the data groups and splitting the data group with the highest average compression cost into one or more split data groups. The splitting step may further be performed by removing a given data point from one of the data groups if removal of the given data point lowers the average compression cost of the data group. The given data point may be assigned to the data group for which the compression cost of the given data point is minimized.
In the algorithm of
Co-Part first tries to find a pair of feature groups that would reduce the total cost when merged. One example of how to implement this approach is shown in the pseudocode algorithm of
It is important to note that one or more steps in
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, apparatus, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be but are not limited to, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Referring again to
Accordingly, techniques of the invention, for example, as depicted in
One or more embodiments can make use of software running on a general purpose computer or workstation. With reference to
The processor 1302, memory 1304, and input/output interface such as a display 1306 and keyboard 1308 can be interconnected, for example, via bus 1310 as part of data processing unit 1312. Suitable interconnections, for example, via bus 1310, can also be provided to a network interface 1314, such as a network card, which can be provided to interface with a computer network, and to a media interface 1316, such as a diskette or CD-ROM drive, which can be provided to interface with media 1318.
A data processing system suitable for storing and/or executing program code can include at least one processor 1302 coupled directly or indirectly to memory elements 1304 through a system bus 1310. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboard 1308 for making data entries; display 1306 for viewing data; a pointing device for selecting data; and the like) can be coupled to the system either directly (such as via bus 1310) or through intervening I/O controllers (omitted for clarity).
Network adapters such as a network interface 1314 may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
As used herein, a “server” includes a physical data processing system (for example, system 1312 as shown in
It will be appreciated and should be understood that the exemplary embodiments of the invention described above can be implemented in a number of different fashions. Given the teachings of the invention provided herein, one of ordinary skill in the related art will be able to contemplate other implementations of the invention. Indeed, although illustrative embodiments of the present invention have been described herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various other changes and modifications may be made by one skilled in the art without departing from the scope or spirit of the invention.
The present application claims the benefit of U.S. Provisional Application No. 61/569,349, filed Dec. 12, 2011, the disclosure of which is incorporated by reference herein. The present application is also related to commonly-assigned U.S. Patent Application 13/524,773, entitled “Anomaly, Association and Clustering Detection,” filed concurrently herewith and incorporated by reference herein. The field of the invention relates to anomaly, association and clustering detection and, more particularly, to techniques for dynamically detecting anomalies, co-association, and contextual clustering from data.
This invention was made with government support under Contract No.: W911NF-11-C-0200 awarded by Defense Advanced Research Projects Agency (DARPA). The government has certain rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
6229918 | Toyama | May 2001 | B1 |
6826724 | Shimada et al. | Nov 2004 | B1 |
7072435 | Metz et al. | Jul 2006 | B2 |
7647524 | Ide et al. | Jan 2010 | B2 |
7774363 | Lim | Aug 2010 | B2 |
7882128 | Bollinger et al. | Feb 2011 | B2 |
7925599 | Koren et al. | Apr 2011 | B2 |
8090721 | Fogel | Jan 2012 | B2 |
8140301 | Abe et al. | Mar 2012 | B2 |
8346766 | Jamjoom et al. | Jan 2013 | B2 |
8375446 | Eiland et al. | Feb 2013 | B2 |
8478785 | Jamjoom et al. | Jul 2013 | B2 |
8612169 | Lin et al. | Dec 2013 | B2 |
8645339 | Kang et al. | Feb 2014 | B2 |
8694979 | Rosu et al. | Apr 2014 | B2 |
8694980 | Rosu et al. | Apr 2014 | B2 |
8775335 | Lin et al. | Jul 2014 | B2 |
8818918 | Lin et al. | Aug 2014 | B2 |
8903824 | Akoglu et al. | Dec 2014 | B2 |
9009147 | He et al. | Apr 2015 | B2 |
20030093411 | Minor | May 2003 | A1 |
20040024720 | Fairweather | Feb 2004 | A1 |
20050025232 | Parida et al. | Feb 2005 | A1 |
20050210151 | Abdo et al. | Sep 2005 | A1 |
20050222929 | Steier et al. | Oct 2005 | A1 |
20070112824 | Lock et al. | May 2007 | A1 |
20070220030 | Bollinger et al. | Sep 2007 | A1 |
20080114806 | Kosche | May 2008 | A1 |
20080270077 | Ozonat et al. | Oct 2008 | A1 |
20090018996 | Hunt et al. | Jan 2009 | A1 |
20090055332 | Lee | Feb 2009 | A1 |
20090089630 | Goldenberg et al. | Apr 2009 | A1 |
20090150560 | Bestgen et al. | Jun 2009 | A1 |
20100100774 | Ding et al. | Apr 2010 | A1 |
20100107255 | Eiland et al. | Apr 2010 | A1 |
20100131526 | Sun et al. | May 2010 | A1 |
20110029469 | Yamada | Feb 2011 | A1 |
20110055210 | Meredith et al. | Mar 2011 | A1 |
20110178967 | Delp | Jul 2011 | A1 |
20110246409 | Mitra | Oct 2011 | A1 |
20110271146 | Mork et al. | Nov 2011 | A1 |
20110282876 | Tchagang | Nov 2011 | A1 |
20110295892 | Evans et al. | Dec 2011 | A1 |
20110302194 | Gonzalez et al. | Dec 2011 | A1 |
20120047123 | Coifman et al. | Feb 2012 | A1 |
20120054184 | Masud et al. | Mar 2012 | A1 |
20120254333 | Chandramouli et al. | Oct 2012 | A1 |
20130073490 | Baughman et al. | Mar 2013 | A1 |
20130138428 | Chandramouli et al. | May 2013 | A1 |
20130159248 | Mueller | Jun 2013 | A1 |
20130245793 | Akiyama et al. | Sep 2013 | A1 |
Number | Date | Country |
---|---|---|
102005049561 | Apr 2007 | DE |
2010125781 | Nov 2010 | WO |
PCTEP2012070289 | Feb 2013 | WO |
Entry |
---|
Attribute clustering for grouping, selection, and classification of gene expression data Au, W.-H.; Chan, K.C.C.; Wong, Andrew K.C.; Yang Wang Computational Biology and Bioinformatics, IEEE/ACM Transactions on vol. 2, Issue: 2 DOI: 10.1109/TCBB.2005.17 Publication Year: 2005, pp. 83-101. |
Continuous Value Attribute Decision Table Analysis Method Based on Fuzzy Set and Rough Set Theory, Zhang Shuhong ; Sun Jianxun Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on vol. 2 DOI: 10.1109/FSKD.2009.456 Publication Year: 2009 , pp. 75-79. |
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis Ordonez, C.; Zhibo Chen Knowledge and Data Engineering, IEEE Transactions on vol. 24, Issue: 4 DOI: 10.1109/TKDE.2011.16 Publication Year: 2012, pp. 678-691. |
Index-Based OLAP Aggregation for In-Memory Cluster Computing Wang, Yu; Ye, Xiaojun Cloud Computing and Big Data (CCBD), 2014 International Conference on DOI: 10.1109/CCBD.2014.13 Publication Year: 2014, pp. 148-151. |
A Multifaceted Perspective at Data Analysis: A Study in Collaborative Intelligent Agents, Pedrycz, W.; Rai, P. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on Year: 2008, vol. 38, Issue: 4 pp. 1062-1072, DOI: 10.1109/TSMCB.2008.925728. |
A Performance Management System for Telecommunication Network Using AI Techniques, Shaoyan Zhang; Rui Zhang; Jianmin Jiang, Dependability of Computer Systems, 2008. DepCos-RELCOMEX '08. Third International Conference on Year: 2008 pp. 219-226, DOI: 10.1109/DepCoS-RELCOMEX.2008.32. |
Semi-Supervised Clustering of Corner-Oriented Attributed Graphs, Jin Tang; Chunyan Zhang; Bin Luo Hybrid Intelligent Systems, 2006. HIS '06. Sixth International Conference on Year: 2006 pp. 33-33, DOI: 10.1109/HIS.2006.264916. |
Cooperative sensor anomaly detection using global information, Zhang, R.; Ji, P.; Mylaraswamy, D.; Srivastava, M.; Zahedi, S. Tsinghua Science and Technology Year: 2013, vol. 18, Issue: 3 pp. 209-219, DOI: 10.1109/TST.2013.6522580. |
K. Smets et al., “The Odd One Out: Identifying and Characterising Anomalies,” http://win.ua.ac.be/adrem/bibrem/pubs/smets11a.pdf, Aug. 2011, 12 pages, Antwerpen, Belgium. |
D. Basin et al., “ZISC Annual Report 2008-2009,” http://www.zisc.ethz.ch/about/pubrel/ZISCAnnualReport20082009.pdf, Jan. 2009, 44 pages. |
J. Vreeken et al., “Characterising the Difference,” Department of Information and Computing Sciences, Utrecht University, http://www.cs.uu.nl/research/techreps/repo/CS-2007/2007-014.pdf, Aug. 2007, 11 pages, Utrecht, Netherlands. |
E. Keogh et al., “Towards Parameter-Free Data Mining,” University of California, www.researchgate.net, Jan. 2004, 12 pages, Riverside, California. |
G. Bristow et al., “Design of a System for Anomally Detection in HAL/S Programs,” http://www.ip.com/pubview/IPCOM000150950D, Mar. 1979, 115 pages. |
L.D. Fosdick et al., “The Detection of Anomalous Interprocedural Data Flow,” http://www.ip.com/pubview/IPCOM000150984D, Apr. 1976, 23 pages. |
Z. Guo et al., “Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems,” International Conference on Dependable Systems and Networks (DSN), Jun. 2006, pp. 259-268. |
D. Johnson et al., “Compressing Large Boolean Matrices Using Reordering Techniques,” Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), Sep. 2004, pp. 13-23, vol. 30, Toronto, Canada. |
A.L. Buchsbaum et al., “Engineering the Compression of Massive Tables: An Experimental Approach,” Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms (SODA), Jan. 2000, pp. 175-184. |
U. Kang et al., “GBASE: A Scalable and General Graph Management System,” 17th ACM International Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2011, pp. 1091-1099. |
H. Tong et al., “Colibri: Fast Mining of Large Static and Dynamic Graphs,” 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2008, pp. 686-694. |
J. Sun et al., “Less is More: Compact Matrix Decomposition for Large Sparse Graphs,” 7th SIAM International Conference on Data Mining (SDM), Apr. 2007, pp. 366-377. |
J. Sun et al., “GraphScope: Parameter-Free Mining of Large Time-Evolving Graphs,” 13th ACM International Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2007, pp. 687-696. |
B. Cao et al., “Detect and Track Latent Factors with Online Nonnegative Matrix Factorization,” 20th International Joint Conference on Artificial Intelligence (IJCAI), Jan. 2007, pp. 2689-2694. |
F. Wang et al., “Efficient Document Clustering via Online Nonnegative Matrix Factorizations,” 11th SIAM International Conference on Data Mining (SDM), Apr. 2011, pp. 908-919. |
J. Sun et al., “Neighborhood Formation and Anomaly Detection in Bipartite Graphs,” 5th IEEE International Conference on Data Mining (ICDM), Nov. 2005, pp. 418-425. |
L. Akoglu et al., “OddBall: Spotting Anomalies in Weighted Graphs,” 14th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD), Part II, Jun. 2010, pp. 410-421, Hyderabad, India. |
C.C. Noble et al., “Graph-Based Anomaly Detection,” 9th ACM International Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2003, pp. 631-636. |
Deepayan Chakrabarti, “AutoPart: Parameter-Free Graph Partitioning and Outlier Detection,” 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Sep. 2004, pp. 112-124, Pisa, Italy. |
D.D. Lee et al., “Algorithms for Non-Negative Matrix Factorization,” Advances in Neural Information Processing Systems (NIPS), 2000, pp. 556-562. |
A. Banerjee et al., “A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximation,” 10th ACM International Conference on Knowledge Discovery and Data Mining (KDD), Aug. 2004, pp. 509-514. |
C.H.Q. Ding et al., “Convex and Semi-Nonnegative Matrix Factorizations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Jan. 2010, pp. 45-55, vol. 32, No. 1. |
U.S. Appl. No. 13/094,724, filed in the name of H. Tong et al. Apr. 26, 2011 and entitled “Method and Apparatus for Detecting Abnormal Behaviors on Large Bi-Partite Graphs.” |
U.S. Appl. No. 13/198,790, filed in the name of F. Wang et al. Aug. 5, 2011 and entitled “Method and Apparatus for Privacy-Aware On-Line User Role Tracking.” |
Number | Date | Country | |
---|---|---|---|
20140074796 A1 | Mar 2014 | US |
Number | Date | Country | |
---|---|---|---|
61569349 | Dec 2011 | US |