This disclosure relates generally to the field of computer graphics and, more specifically, to systems and methods for generating summarized graphical displays of sequences of data.
Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to the prior art by inclusion in this section.
Event sequence data, i.e., multiple series of timestamped or ordered events, is increasingly common in a wide range of domains. Website click streams, user interaction logs in software applications, electronic health records (EHRs) in medical care and vehicle error logs in automotive industry can all be modeled as event sequences. It is crucial to reason about and derive insights from such data for effective decision making in these domains. For example, by analyzing vehicle error logs, typical fault development paths can be identified, which can inform better strategies to prevent the faults from occurring or alert drivers in advance, and therefore improve driver experience and reduce warranty cost. Similarly, by analyzing users' interaction log with software applications, usability issues and user behavior patterns can be identified to inform better designs of the interface.
Modern computing systems are capable of generating graphical displays of large sets of event sequences including, for example, sets that contain hundreds, thousands, and millions of event sequences. However, while modern computing hardware can produce graphical depictions of extremely large sets of event sequences, the display of so much complex information often overwhelms a human user, which results in the data being less useful for analysis. The display of large sets of event sequences for rea-world data often produces a visual “clutter” due to the noisy and complex nature of the event sequences with high event cardinality, which presents challenges to constructing concise yet comprehensive overviews for such data. Consequently, improvements to methods and systems that generate graphical depictions of event sequences that improve the generation of graphics representing large sets of event sequences to reduce clutter and improve understandability of the graphs would be beneficial.
Event sequences analysis plays an important role in many application domains with a non-limiting set of uses including visualization of customer behavior analysis, electronic health record analysis, and vehicle fault diagnosis. The embodiments described herein provide a visualization techniques based on the minimum description length (MinDL) optimization process to construct an intuitive coarse-level overview of event sequence data while balancing the information loss in it. The method addresses a fundamental trade-off in visualization design: reducing visual clutter vs. increasing the information content in a visualization. The method enables simultaneous sequence clustering and pattern extraction and it is highly tolerant to noises such as missing or additional events in the data. Based on this approach, the embodiments provide a visual analytics framework with multiple levels-of-detail to facilitate interactive data exploration.
In one embodiment, a method for generating a graphical depiction of summarized event sequences has been developed. The method includes receiving, with a processor, a plurality of event sequences, each event sequence in the plurality of event sequences including a plurality of events, and generating, with the processor, a plurality of clusters using a minimum description length (MDL) optimization process, each cluster in the plurality of clusters including a set of at least two event sequences in the plurality of event sequences that maps to a pattern in each cluster. The pattern in each cluster further includes a plurality of events included in at least one event sequence in the set of at least two event sequences in the cluster. The method further includes generating, with the processor and a display output device, a graphical depiction of a first cluster in the plurality of clusters, the graphical depiction including a graphical depiction of a first plurality of events in the pattern of the first cluster.
In another embodiment, a system for generation of graphical depictions of a bipartite graph has been developed. The system includes a display output device, a memory, and a processor operatively connected to the display output device and the memory. The memory is configured to store program instructions and a plurality of event sequences, each event sequence in the plurality of event sequences including a plurality of events. The processor is configured to execute the program instructions to generate a plurality of clusters using a minimum description length (MDL) optimization process, and each cluster in the plurality of clusters includes a set of at least two event sequences in the plurality of event sequences that maps to a pattern in each cluster. The pattern in each cluster further includes a plurality of events included in at least one event sequence in the set of at least two event sequences in the cluster. The processor is further configured to generate a graphical depiction of a first cluster in the plurality of clusters with the display output device, the graphical depiction including a graphical depiction of a first plurality of events in the pattern of the first cluster.
For the purposes of promoting an understanding of the principles of the embodiments disclosed herein, reference is now be made to the drawings and descriptions in the following written specification. No limitation to the scope of the subject matter is intended by the references. This disclosure also includes any alterations and modifications to the illustrated embodiments and includes further applications of the principles of the disclosed embodiments as would normally occur to one skilled in the art to which this disclosure pertains.
The embodiments described herein generate two-part graphical representations of a sequence of data that simplifies the display of the event sequence data into a set of patterns and a set of corrections, if needed, for sequences in the original input that do not exactly match one of the patterns. A sequence of data S is an ordered list of individual events S=[e1, e2, . . . , en] where eiϵΩ is an event alphabet. The events in each set of sequence data S forms a linear sequence that can also be referred to as a linear graph in which each node in the graph is an event in the sequence and the sequences are connected linearly in a sequence by edges. Given a set of event sequences S={S1, S2, . . . , Sm}, the embodiments described herein perform a minimum description length (MINDL) optimization process to identify a set of patterns ={P, P=[e1, e2, . . . , ei]} and a mapping f:→ from the event sequences to the patterns that minimize a total description length: L(, f)=ΣP∈L(P)+ΣS∈L(S|f(S)). In the preceding equation, L(P) is the description length of each pattern P and L(S|f(S)) is the description length of a sequence S given by the pattern f(S). Each pattern P can be described by listing all of the events that are in the pattern and an edit to the pattern P that changes an event can be fully specified by the position and the event involved to produce an alternative form of the total description length as: L(, f)=ΣP∈len(P)+αΣS∈∥edits(S|f(S))∥+λ∥ where len(P) is the number of events in the pattern P and edits(S|f(S)) is a set of edits that can transform the pattern f(S) back to the event sequence S. As described in more detail below, edits include event additions, event deletions, and transpositions between two successive events in a sequence. The parameter α is a numeric parameter that controls the amount of information displayed in the event sequence compared to the level of errors that are accepted in the summarization of the displayed events, where a more cluttered display generally has fewer errors and an uncluttered display generally has more errors. The parameter λ is added to directly control the total number of patterns P. Increasing λ reduces the number of patterns P that are present in the optimized result.
The mapping f clusters the event sequences together: sequences that map to the same pattern P can be considered to be in a single cluster. The cluster is denoted as a tuple c=(P, G) where G={S|SϵΛf(S)=P} is the set of sequences mapped to the pattern P. The set of tuples for all of the clusters is denoted as ={(P1, G1), (P2, G2), . . . , (Pk, Gk)} for k tuples where each of {G1, G2, . . . , Gk} forms a partition in the sets . The embodiments described herein seek to find an estimated mapping {circumflex over (f)} and estimated set of patterns that minimizes the total description length L(, f) by finding an estimated set of clusters that minimize the description length L(): L()=Σ(P,G)∈ΣS∈G∥edits(S, P))∥+λ∥∥.
The embodiments described herein minimize the description length of the clusters to enable graphical summarization of complex sequence data. The summarization includes a graphical display of the generated patterns in one or more clusters, where each pattern summarizes one or more of the input sequences S to reduce visual clutter. Because some sequences may not be completely accurately depicted by one of the patterns, the summarization also includes a graphical display of correction data for sequences that do not exactly match the pattern to ensure accuracy in the visual display of the event sequences.
In the system 100, the processor 108 includes one or more integrated circuits that implement the functionality of a central processing unit (CPU) 112 and graphics processing unit (GPU) 116. In some embodiments, the processor is a system on a chip (SoC) that integrates the functionality of the CPU 112 and GPU 116, and optionally other components including the memory 120, into a single integrated device, while in other embodiments the CPU 112 and GPU 116 are connected to each other via a peripheral connection device such as PCI express or another suitable peripheral data connection. In one embodiment, the CPU 112 is a commercially available central processing device that implements an instruction set such as one of the x86, ARM, Power, or MIPS instruction set families. The GPU 116 includes hardware and software for display of at least two-dimensional (2D) and optionally three-dimensional (3D) graphics. In some embodiments, processor 108 executes software programs including drivers and other software instructions using the hardware functionality in the GPU 116 to accelerate generation and display of the graphical depictions of summarized event sequences and corrections that are described herein. During operation, the CPU 112 and GPU 116 execute stored programmed instructions 124 that are retrieved from the memory 120. The stored program instructions 124 include software that control the operation of the CPU 112 and the GPU 116 to generate graphical depictions of event sequences based on the embodiments described herein. While
In the system 100, the memory 120 includes both non-volatile memory and volatile memory devices. The non-volatile memory includes solid-state memories, such as NAND flash memory, magnetic and optical storage media, or any other suitable data storage device that retains data when the system 100 is deactivated or loses electrical power. The volatile memory includes static and dynamic random access memory (RAM). In some embodiments the CPU 112 and the GPU 116 each have access to separate RAM devices (e.g. a variant of DDR SDRAM for the CPU 112 and a variant of GDDR, HBM, or other RAM for the GPU 116) while in other embodiments the CPU 112 and GPU 116 access a shared memory device. The memory 120 stores software programmed instructions 124 and data, including event sequence data 128, locality-sensitive hash (LSH) table 132, summarized event sequence and correction data 134, and output image data 136 of the summarized event sequences and corrections.
The memory 120 stores the event sequence data 128 in any suitable format including, for example, a data file format that stores sequences of data in a comma separated value (CSV), tab-delimited, space-delimited, or other delimited data format that stores sequences of the events. In other embodiments the system 100 receives graph data for the event sequences in a graph data format such as the DOT graph description language format, the graph modeling language (GML), various extensible markup language (XML) based formats including, but not limited to, GraphXML, GraphML, Graph Exchange Language (GXL), Extensible Graph Markup and Modeling Language (XGMML), and any other suitable data format that encodes the data for the nodes with a predetermined set of events E in an event dictionary, and each sequence includes an ordered combination of events from the set E in a sequence S. In the embodiments described herein, the event sequence data 128 includes multiple sequences where the term represents all of the event sequences of the event sequence data 128. In many instances the event sequence S represents the sequential occurrence of events over time, which is also referred to as a temporal sequence. However, the embodiments described herein can also produce graphical summarizations of other linear event sequences that place events in a sequential order even if the events are not ordered in a temporal sequence. The system 100 summarizes all of the event sequences to produce a graphical display of patterns that summarize the event sequences while reducing the visual clutter that occurs when merely displaying all of the event sequence data 128 directly.
The memory 120 optionally stores an LSH table 132 that the processor 108 generates based on the event sequence data 128. In the embodiment of
The memory 120 also stores the summarized event sequence and correction data 134. As described in further detail below, the system 100 generates the summarized event sequences as a set of one or more clusters in which each cluster includes a pattern that summarize one or more of the event sequences in the event sequence data 128. The system 100 generates graphical depictions of summarized event sequences based on the summarized event sequence data 134 to reduce visual clutter when visualizing multiple event sequences. The correction data enables the system 100 to track and generate a graphical display of corrections between the summarized event sequence pattern and one or more of the original event sequences if the pattern does not exactly match the event sequence.
The memory 120 also stores output summarized event sequences and correction image data 136, which include one or more sets of image data that the system 100 generates to produce a graphical output of a summary of the event sequence data and optionally a graphical depiction of corrections to the summarization of the event sequences. In some embodiments, the processor 108 generates the output image data 136 using a rasterized image format such as JPEG, PNG, GIF, or the like while in other embodiments the processor 108 generates the output image data 136 using a vector image data format such as SVG or another suitable vector graphics format.
In the system 100, the input device 150 includes any devices that enable the system 100 to receive the event sequence data 128. Examples of suitable input devices include human interface inputs such as keyboards, mice, touchscreens, voice input devices, and the like. Additionally, in some embodiments the system 100 implements the input device 150 as a network adapter or peripheral interconnection device that receives the event sequence data from another computer or external data storage device, which can be useful for receiving large sets of event sequence data in an efficient manner.
In the system 100, the display output device 154 includes an electronic display screen, projector, printer, or any other suitable device that reproduces a graphical display of the summarized event sequences and the correction graphics that the system 100 generates based on the event sequence data. While
The process 200 begins as the system 100 receives the event sequence data (block 204). The event sequence data includes multiple sequences in which each sequence includes a plurality of events E that are linked together linearly with edges to form a sequence. As depicted in
The process 200 continues as the system 100 initializes pattern clusters and a priority queue that are used in a minimum description length (MinDL) optimization process that occurs as part of the process 200 (block 208). In one embodiment, the processor 108 sets the initial patterns P to be equal to the original input sequences S and the mappings G between sets and patterns maps a single set S to the corresponding pattern P. In effect, the process 200 initially treats each initial input sequence S as an individual cluster that includes only one sequence. The initial set of all clusters includes an individual pattern and individual sequence for each pattern that matches the original inputs. For example, as depicted in
The process 200 continues as the system 100 performs a first merger of pairs of clusters to fill the priority queue Q with clusters prior to performing additional iterative merging operations (block 212). Merging clusters together forms a new pattern that combines elements of the original sequences in two clusters, although the new pattern may not exactly match the original sequences in the two clusters. Additional details describing the merger operation between each pair of clusters is provided below in
While the process of
where Si is the set of unique events in the pattern Pi representing cluster ci and Sj is the set of unique events in the pattern Pj representing cluster cj. In the embodiments described herein the values of J(ci, cj) range from [0, 1] where 0 indicates the lowest level of similarity and 1 indicates the maximum level of similarity.
In the LSH embodiment that is depicted in
Referring again to
In the embodiment of
The process 200 continues as the system 100 performs additional merger operations as described above in block 216 for as long as the priority queue Q still includes elements and is not empty (block 220). In the embodiment of
As described above, the process 200 performs a pair-wise merging process between pairs of clusters.
The process 300 begins as the process 300 receives two clusters ci and cj as inputs for merger (block 304). As described above and depicted in
The process 300 continues as the processor 108 initializes a pattern P* based on the longest common sequence (LCS) of events that are common to both of the patterns Pi and Pj in the input clusters (block 308). The longest common pattern refers to a set of events in each pattern that match each other with the pattern that includes the largest number of events being considered the longest common sequence. Using
The process 300 continues as the processor 108 identifies a set of candidate events Ec that are eligible to be included in the merged cluster and sorts the candidate events based on frequency in descending order (block 312). As depicted in
The process 300 continues as the processor 108 tests the candidate events Ec in order starting with the candidate events that have the highest frequency to identify a new reduction in the description length ΔL′ that occurs if the candidate event is merged into the pattern P* (block 316). As depicted in
In the equation above the edits represent errors between the sequence of events in the pattern P and the events in one or more of the original event sequences in the cluster that the processor 108 identifies in the MinDL optimization process. The processor 108 also identifies errors based on the edits as the basis for generating corrections via a graphical display of correction data that is described in further detail below to reproduce one or more of the original sequences S that are included in a cluster with the pattern P. Examples of edits include deletions of one or more events in P to match one of the original sequences S, insertions of events into P to match S, and pair-wise transpositions of events in P to reorder events to match S.
Referring again to
If the result of ΔL′ indicates that the next candidate event improves the minimum description length, then the processor 108 adds the candidate event to the pattern P* by using the candidate pattern P as the new value of P* and updates the value of ΔL=ΔL′ (block 328). If another candidate event is present (block 332), then process 300 returns to block 316 to test the next candidate event. The process 300 continues to merge candidate events until either all of the candidate events are merged (blocks 332) or until the tested merger for the next candidate event e produces either an absolute increase in the description length (ΔL′<0) or otherwise reduces the effective description length reduction ΔL that has been achieved during an earlier iteration of the merge process (ΔL′<ΔL) (block 320). As described above, the merged patterns P and P* always include events that are included in at least one of the two input patterns Pi and Pj that in turn correspond to events in at least one of the event sequences of Gi and Gj, although the final merged pattern may not exactly match either or both of the original input patterns. As depicted in
Referring again to
The process 200 continues as the processor 108 generates a graphical depiction of one or more of the summarized event sequence patterns and optionally a graphical display of corrections for a summarized event sequences of the original event sequences with reduced visual clutter (block 232). In the system 100, the processor 108 generates the graphical depiction data 136 based on the summarized event sequence and correction data 134 and uses the display output device 154 to display the graphical depiction of the patterns and corrections in the summarized event sequences or transmits the output image data 136 to a remote computing device for display. The graphical depiction can include the pattern for one cluster in the plurality of clusters, patterns for a subset of the clusters including at least two clusters in the plurality of clusters, or all of the clusters depending upon the complexity of the clusters and the effective size of the output device 154. The processor 108 also updates the output image data 136 for display with the display output device 154 or transmission to a remote computing device for display based on interactive user inputs that are described in more detail below.
In more detail, the graphical depiction 804 of the pattern P1 includes the same events C A B D that are depicted in the pattern P1 of
In some embodiments, the system 100 generates an interactive user interface to enable a user to select the addition corrections 806A-806D, the deletion correction 812, or other elements in the graphical depiction 804 using the user interface device 150 to provide more detailed correction information for all or a portion of the event sequences in the patterns of the summarized event sequence. For example, in one embodiment the processor 108 generates a graphical depiction of the event sequences S5 and S6 from
Referring again to
One example of a supported interaction includes aligning the view of the patterns in the summarized event sequences and the original event sequences at a selected event. By default, the event sequences in summary view and the detail view are aligned at the first event. Users can select one event in the summary view and both views will be aligned to the selected event through animated transition. Displays A and B in the user interface 900 of FIG. 9 show an example where the events are aligned at the event labeled ‘gh’. The system 100 receives a selection of the event type ‘gh’ using the input device 150 and generates the graphical depiction of the patterns in the summarized event sequence aligned to the selected event type ‘gh’. The alignment provides a clear graphical depiction of the events in different patterns that occur before and after the aligned event to enable analysis of the differences in event sequences between different patterns in the graphical depiction of the summarized event sequences.
Another interaction enables detail on demand. Besides expanding the addition correction triangles and deletion correction rectangles to display more detailed correction data as shown in
Another interaction enables filtering of data. Besides filtering events, the system 100 can also filter the event sequences through their attribute values as shown in displays C and D of
Another interaction enables changes to the temporal ‘X’ axis of the event sequences to view the events that occur in specific time ranges in more detail, although of course an alternative embodiment can display the temporal axes vertically or at an angle instead of horizontally. The horizontal scale in the detailed view can be changed to show accurate temporal information instead of only sequential orders.
Another interaction enables the reordering of patterns in the summarized event sequence view based on user criteria. In the summary view, the system 100 provides sorting the sequential patterns by 1) the number of sequences in the corresponding cluster and 2) the similarity between the patterns measured through the editing distance. To reorder by similarity, the processor 108 first performs a hierarchical clustering of the patterns, which occurs after the process 200 performs the MinDL or MinDL+LSH operations to produce the clusters. The hierarchical clustering process produces groups of similar patterns. The processor 108 subsequently sorts the patterns within each group by the order of leaves in a dendrogram, which is a diagram of a tree structure, which the hierarchical clustering process generates.
One example of a computer graphics system that employs the embodiments described above implements functions that produce summarized graphical depictions of the event sequence data and further enable a user to review a small subset of the records in more detail, compile descriptive information about the dataset or a subgroup of records and events (e.g. through aggregated views), identify a set of records of interest using filtering criteria, and study antecedents or sequalae of an event of interest. In particular, one embodiment of the system 100 as depicted in the interface 900 of
Non-limiting examples of usage scenarios for the embodiments described herein include analysis of sequences of events that occur during a fault analysis to assist in the review and analysis of faults that occur in a product. One example of a complex product that often encounters a sequence of events that occur prior to and after a fault is a motor vehicle. The system 100 generates graphical depictions of the summarized event sequences for different events, such as a temporal sequence of OBD-II diagnostic events that are recorded in the on-board electronic control unit of a vehicle, for a large set of vehicles that encounter a fault. The system 100 enables both summarized analysis of event sequences that occur for a large number of vehicles corresponding to patterns that the system 100 generates for one or more clusters of similar event sequences. Additionally, the system 100 enables analysis of the event sequences in individual vehicles that may deviate from one of the patterns, and enables a display of filtered, sorted, and aligned patterns to enable analysis of the underlying causes of faults to help improve the repair process and preventive maintenance of the motor vehicles by identifying common sequences of events that precede the occurrence of a fault.
Another non-limiting example of a usage scenario of the embodiments described herein is in the analysis of log file information that is generated during the execution of various software applications including, for example, desktop or web software applications as part of a process to analyze the usage patterns of the software to improve the design of user interfaces in the software programs. In this usage scenario, each event corresponds to a keyboard, mouse, or other input that the user provides while using the program, and a series of these inputs provides an event sequence for additional analysis. The system 100 generates the graphical depiction of the summarized event sequences that enables an analyst to select individual patterns for a detailed view of sequences that correspond to each pattern as depicted in the detailed view of event sequences B in
The embodiments described herein provide improvements to the operation of computer systems that generate graphical summarizations of sequences of data. As described in the embodiments herein, these improvements can be implemented using software that is stored in a non-transitory memory and executed by a computer, hardware, or a combination of software and hardware. A non-limiting example of an improvement to the operation of a computer system that is described herein is an automated process to generate a two-part graphical depiction of summarized event sequences including both a set of sequential patterns that summarize the original event sequences and a set of corrections for sequences that do not exactly match the patterns. In combination with the MinDL optimization process, the patterns can be used to generate graphical depictions of complex sequence event data while reducing visual clutter. Another non-limiting example of an improvement is a computationally efficient process to identify an optimal set of patterns to summarize the data based on the MinDL optimization process. Another non-liming example of an improvement is a visual analytics system that supports level-of-detail exploration of event sequence data.
It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems, applications or methods. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements may be subsequently made by those skilled in the art that are also intended to be encompassed by the following claims.
This application claims the benefit of U.S. Provisional Application No. 62/537,621, which is entitled “Sequence Synopsis: Optimize Visual Summary of Temporal Event Data,” and was filed on Jul. 27, 2017, the entire contents of which are hereby incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
20050240544 | Kil | Oct 2005 | A1 |
20100107254 | Eiland | Apr 2010 | A1 |
20110227925 | De Pauw et al. | Sep 2011 | A1 |
20140074850 | Noel | Mar 2014 | A1 |
20150154263 | Boddhu | Jun 2015 | A1 |
20160086185 | Adjaoute | Mar 2016 | A1 |
20160224835 | Newman et al. | Aug 2016 | A1 |
20160359872 | Yadav | Dec 2016 | A1 |
20170118093 | Dontcheva et al. | Apr 2017 | A1 |
20170132291 | Liu | May 2017 | A1 |
Entry |
---|
International Search Report and Written Opinion corresponding to International Patent Application No. PCT/US2018/044035 (13 pages). |
Tatti, N. et al., “The Long and Short of It: Summarising Event Sequences with Serial Episodes,” KDD'12, 2012 (10 pages). |
Maguire, E. et al., “Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs,” IEEE Transactions on Visualization and Computer Graphics , vol. 19, No. 12, 2013 (10 pages). |
Google analytics, retrieved from https://analytics.google.com/, Oct. 22, 2018 (7 pages). |
E. Brill and R. C. Moore, “An improved error model for noisy channel spelling correction,” In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286-293, Association for Computational Linguistics, 2000 (8 pages). |
I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. White, “Model-based clustering and visualization of navigation patterns on a web site,” Data Mining and Knowledge Discovery, 7(4):399-424, 2003 (29 pages). |
V. Cao, Y.-R. Lin, F. Du, and D. Wang, “Episogram: Visual summarization of egocentric social interactions,” IEEE computer graphics and applications, 36(5):72-81, 2016 (9 pages). |
J. Chuang, D. Ramage, C. Manning, and J. Heer, “Interpretation and trust: Designing model-driven visualizations for text analysis,” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI'12, pp. 443-452. ACM, New York, NY, USA, 2012 (10 pages). |
F. Du, C. Plaisant, N. Spring, and B. Shneiderman, “Eventaction: Visual analytics for temporal event sequence recommendation,” Proceedings of the IEEE Visual Analytics Science and Technology, 2016 (10 pages). |
F. Du, B. Shneiderman, C. Plaisant, S. Malik, and A. Perer, “Coping with volume and variety in temporal event sequences: Strategies for sharpening analytic focus,” IEEE Transactions on Visualization and Computer Graphics, pp. (99):1-14, 2016 (14 pages). |
J. A. Ferstay, C. B. Nielsen, and T. Munzner, “Variant view: Visualizing sequence variants in their gene context,” IEEE transactions on visualization and computer graphics, 19(12):2546-2555, 2013 (10 pages). |
D. Fisher, “Agavue event data sample: Full datase,” version of Oct. 20,2016, Microsoft research. Retrieved from http://eventevent.github.io (8 pages). |
D. Gotz, “Soft patterns: Moving beyond explicit sequential patterns during visual analysis of longitudinal event datasets,” In Proceedings of the IEEE VIS 2016 Workshop on Temporal & Sequential Event Analysis, 2016 (2 pages). |
D. Gotz and H. Stavropoulos, “Decisionflow: Visual analytics for high dimensional temporal event sequence data,” IEEE transactions on visualization and computer graphics, 20(12):1783-1792, 2014 (10 pages). |
P. Grunwald, “A tutorial introduction to the minimum description length principle,” arXiv preprint math/0406077, 2004 (80 pages). |
P. D. Gr{umlaut over ( )}unwald, “The minimum description length principle,” MIT press, 2007 (Preface, 8 pages). |
S. Ioffe, “Improved consistent sampling, weighted minhash and L1 sketching,” In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pp. 246-255. IEEE, 2010 (10 pages). |
H. Koga, T. Ishibashi, and T. Watanabe, “Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing,” Knowledge and Information Systems, 12(1):25-53, 2007 (2 pages). |
J. Krause, A. Perer, and H. Stavropoulos, “Supporting iterative cohort construction with visual temporal queries,” IEEE transactions on visualization and computer graphics, 22(1):91-100, 2016 (10 pages). |
M. Krstajic, E. Bertini, and D. Keim. Cloudlines, “Compact display of event episodes in multiple time-series,” IEEE Transactions on Visualization and Computer Graphics, 17(12):2432-2439, Dec. 2011 (8 pages). |
B. C. Kwon, J. Verma, and A. Perer, “Peekquence: Visual analytics for event sequence data,” In ACM SIGKDD 2016 Workshop on Interactive Data Exploration and Analytics, 2016 (4 pages). |
J. Leskovec, A. Rajaraman, and J. D. Ullman, “Mining of massive datasets,” Cambridge University Press, 2014 (513 pages). |
Z. Liu, H. Dev, M. Dontcheva, and M. Hoffman, “Mining, pruning and visualizing frequent patterns for temporal event sequence analysis,” In Proceedings of the IEEE VIS 2016 Workshop on Temporal & Sequential Event Analysis, 2016 (3 pages). |
Z. Liu, Y. Wang, M. Dontcheva, M. Hoffman, S. Walker, and A. Wilson, “Patterns and sequences: Interactive exploration of clickstreams to understand common visitor paths,” IEEE Transactions on Visualization and Computer Graphics, 23(1):321-330, 2017 (10 pages). |
A. Makanju, S. Brooks, A. N. Zincir-Heywood, and E. E. Milios, “Logview: Visualizing event log clusters,” In Privacy, Security and Trust, 2008, PST'08. Sixth Annual Conference on, pp. 99-108, IEEE, 2008 (10 pages). |
M. Monroe, R. Lan, H. Lee, C. Plaisant, and B. Shneiderman, “Temporal event sequence simplification,” IEEE transactions on visualization and computer graphics, 19(12):2227-2236, 2013 (10 pages). |
A. Perer and D. Gotz, “Data-driven exploration of care plans for patients,” In CHI'13 Extended Abstracts on Human Factors in Computing Systems, pp. 439-444, ACM, 2013 (6 pages). |
A. Perer and F. Wang, “Frequence: interactive mining and visualization of temporal frequent event sequences,” In Proceedings of the 19th international conference on Intelligent User Interfaces, pp. 153-162, ACM, 2014 (10 pages). |
C. Plaisant, B. Milash, A. Rose, S. Widoff, and B. Shneiderman, “Lifelines: visualizing personal histories,” In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 221-227, ACM, 1996 (7 pages). |
C. Plaisant and B. Shneiderman, “The diversity of data and tasks in event analytics,” In Proceedings of the IEEE VIS 2016 Workshop on Temporal & Sequential Event Analysis, 2016 (4 pages). |
P. J. Polack, S.-T. Chen, M. Kahng, M. Sharmin, and D. H. Chau, “Timestitch: Interactive multi-focus cohort discovery and comparison,” In Visual Analytics Science and Technology (VAST), 2015 IEEE Conference on, pp. 209-210, IEEE, 2015 (2 pages). |
R. A. Ruddle, J. Bernard, T. May, H. L{umlaut over ( )}ucke-Tieke, and J. Kohlhammer, “Methods and a research agenda for the evaluation of event sequence visualization techniques,” In Proceedings of the IEEE VIS 2016 Workshop on Temporal & Sequential Event Analysis. Leeds, 2016 (4 pages). |
D. Salomon and G. Motta, “Handbook of data compression,” Springer Science & Business Media, 2010 (Table of Contents, 7 pages). |
J. M. Santos and M. Embrechts, “On the use of the adjusted rand index as a metric for evaluating supervised classification,” In International Conference on Artificial Neural Networks, pp. 175-184, Springer, 2009 (10 pages). |
Z. Shen and N. Sundaresan, “Trail explorer: Understanding user experience in webpage flows,” IEEE VisWeek Discovery Exhibition, pp. 7-8, 2010 (3 pages). |
Z. Shen, J. Wei, N. Sundaresan, and K.-L. Ma, “Visual analysis of massive web session data,” In Large Data Analysis and Visualization (LDAV), 2012 IEEE Symposium on, pp. 65-72, IEEE, 2012 (8 pages). |
B. Shneiderman, “The eyes have it: A task by data type taxonomy for information visualizations,” In Visual Languages, 1996, Proceedings, IEEE Symposium on, pp. 336-343, IEEE, 1996 (8 pages). |
J. Stasko and E. Zhang, “Focus-F context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations,” In Information Visualization, 2000, InfoVis 2000, IEEE Symposium on, pp. 57-65, IEEE, 2000 (9 pages). |
R. Veras and C. Collins, “Optimizing hierarchical visualizations with the minimum description length principle,” IEEE Transactions on Visualization and Computer Graphics, 23(1):631-640, Jan. 2017 (10 pages). |
K. Vrotsou, J. Johansson, and M. Cooper, “Activitree: interactive visual exploration of sequences in event-based data using graph similarity,” IEEE Transactions on Visualization and Computer Graphics, 15(6):945-952, 2009 (15 pages). |
G. Wang, X. Zhang, S. Tang, H. Zheng, and B. Y. Zhao, “Unsupervised clickstream clustering for user behavior analysis,” In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 225-236, ACM, 2016 (12 pages). |
T. D.Wang, C. Plaisant, A. J. Quinn, R. Stanchak, S. Murphy, and B. Shneiderman, “Aligning temporal data by sentinel events: discovering patterns in electronic health records,” In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 457-466, ACM, 2008 (10 pages). |
T. D.Wang, C. Plaisant, B. Shneiderman, N. Spring, D. Roseman, G. Marchand, V. Mukherjee, and M. Smith, “Temporal summaries: Supporting temporal categorical searching, aggregation and comparison,” IEEE transactions on visualization and computer graphics, 15(6), 2009 (8 pages). |
J. Wei, Z. Shen, N. Sundaresan, and K.-L. Ma, “Visual cluster exploration of web clickstream data,” In Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on, pp. 3-12, IEEE, 2012 (10 pages). |
K. Wongsuphasawat and D. Gotz, “Exploring flow, factors, and outcomes of temporal event sequences with the butflow visualization,” IEEE Transactions on Visualization and Computer Graphics, 18(12):2659-2668, Dec. 2012 (10 pages). |
K. Wongsuphasawat, J. A. Guerra G'omez, C. Plaisant, T. D. Wang, M. Taieb-Maimon, and B. Shneiderman, “Lifeflow: visualizing an overview of event sequences,” In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1747-1756, ACM, 2011 (10 pages). |
K. Wongsuphasawat and B. Shneiderman, “Finding comparable temporal categorical records: A similarity measure with an interactive visualization,” In Visual Analytics Science and Technology, 2009, VAST 2009, IEEE Symposium on, pp. 27-34, IEEE, 2009 (8 pages). |
E. Zgraggen, S. M. Drucker, D. Fisher, and R. Deline, “(sjqu)eries: Visual regular expressions for querying and exploring event sequences,” In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI 2015, Seoul, Republic of Korea, Apr. 18-23, 2015, pp. 2683-2692, 2015 (10 pages). |
J. Zhao, C. Collins, F. Chevalier, and R. Balakrishnan, “Interactive exploration of implicit and explicit relations in faceted datasets,” IEEE Transactions on Visualization and Computer Graphics, 19(12):2080-2089, 2013 (11 pages). |
J. Zhao, Z. Liu, M. Dontcheva, A. Hertzmann, and A.Wilson, “Matrixwave: Visual comparison of event sequence data,” In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 259-268, ACM, 2015 (10 pages). |
Number | Date | Country | |
---|---|---|---|
20190034519 A1 | Jan 2019 | US |
Number | Date | Country | |
---|---|---|---|
62537621 | Jul 2017 | US |