Analysis of large data sets, such as trace data sets that include information collected during operation of computing systems, is often complicated by the size and complexity of the data sets. For example, the size of a trace data set including several minutes' worth of information regarding the operation of a computing system, in which a multitude of internal or external hardware, software, and/or firmware components interact, may be as large as, or larger than, several gigabytes.
When large data sets are stored and queried in traditional manners (for example, using multidimensional databases configured for online analytical processing (“OLAP”) querying), various views of the same data set may be built and permanently stored for future user queries, which is often time-consuming and storage-inefficient.
It is desirable to provide efficient, responsive data storage/analysis systems and techniques, which enable a user to interactively visualize relationships, such as patterns or causality, within data of large, sometimes disparate/distinct complex data sets, which relationships may be practically undetectable via examination of the data sets themselves.
A data visualization and analysis system (“DVAS”) is described herein, which provides techniques and data models that address various challenges of efficiently storing, manipulating, correlating/retrieving, and interactively presenting information within complex data sets. The DVAS achieves near real-time responsiveness to user input affecting manipulation and presentation of information within one or more complex data sets from one or more sources. For purposes of example and not limitation, both the data models and techniques of the DVAS are described with reference to trace data sets, which are files or other data storage constructs used to record information regarding certain defined events occurring during operation of one or more computing systems (an exemplary computing system is referred to as a “computing system under test”) or portions thereof. Other data sets are possible, however. One example of another possible data set is a set of genome information (human or non-human) based on DNA sequencing activities.
The DVAS is configured to receive and parse one or more trace data sets to create a data model that includes a number of layers and other inner or auxiliary data structures, which are used for storing, retrieving, analyzing, and presenting event information in an abstract and efficient manner.
In accordance with one aspect of the data model, data units are created and associated with particular defined events.
A data unit has a number of associated data structures, which may be explicitly or implicitly defined, and which may be part of the same or different physical or logical constructs. In one exemplary scenario, the associated data structures include a first data structure that is populated with an item of first data representing a particular defined event. A second data structure is populated with an item of second data representing a particular computing system component with which the particular defined event is associated. A third data structure is populated with an item of third data representing a timestamp of the particular defined event. A fourth data structure is populated with one or more items of fourth data representing metrics (which may be created and/or calculated) associated with the particular event, such as duration, start and end times, calling and/or callee data processing operations, display color selections, and the like.
To improve retrieval and/or rendering speed without penalizing main memory, a data unit may be split into two or more parts. In an exemplary implementation, a data unit has two parts. A first part, referred to as a base unit, stores data values on a main memory (for example, a RAM), which are referred to by items of data used during rendering of visual representations (such as data that can be used to derive coordinates of geometric objects). A second part, which in some instances may not be used or present, is referred to as an extended unit. The extended unit stores on a persistent memory those data values that are referred to by items of data that are not used during rendering of a visual representation, although those items of data may be used to retrieve specific user-selected information.
In accordance with another aspect of the data model, various data collections are generated. A particular data collection is a subset of data units that have one or more items of data in common, referred to as “pivot items,” which have pivot types. A metadata data structure may be generated to aggregate the commonalities of the base units and extended units of the data units of a particular data collection.
In accordance with yet another aspect of the data model, one or more data collection sets are generated, either automatically or in response to user input. A particular data collection set groups a subset of data collections that share certain common pivot types. A structured representation of the data collection set, which has a number of nodes (with a particular node corresponding to a unique permutation of the one or more pivot items) may be generated and processed in real-time (for example, in response to user input) to identify one or more data units and/or items of data associated with particular nodes. For each data collection set, a matrix that includes a list of data unit identifiers, such as indices, which are mapped to actual data structure indices, may be created, to facilitate efficient data retrieval, transformation, and presentation.
The DVAS is also configured to efficiently and concurrently visually render one or more sets of geometric objects such as points, rectangles, etc., corresponding to particular data collection sets (from the same or different trace data sets) in accordance with different drawing modes. The drawing modes enable users to effectively identify patterns through visualization. A drawing mode encompasses information such as system axes, coordinate systems, scales, geometries, etc. Each data collection set may include a data structure (referred to as a descriptor) that describes, and enables dynamic changes to, how a particular drawing mode for a particular data collection set relates to the data units and/or data collections associated therewith.
In one exemplary scenario, for a particular data collection set, a particular geometric object is rendered within a coordinate system. A particular geometric object corresponds to a particular data unit associated with a particular node of a particular structured representation of the particular data collection set. The coordinates of the particular geometric object are specified using expressions that have variables as operands (it is noted that a geometric object may be specified by one coordinate, and that a stand-alone variable may constitute an expression). The variables represent items of data of particular data units within the data collection set, and are used to indirectly retrieve data values represented thereby. The expressions that specify the coordinates are evaluated to obtain coordinate values at which a particular geometric object is to be visually rendered. Such indirection enables dynamic switching between coordinate systems and/or data collections sets with little impact on user experience—achieving near real-time responsiveness to user input that affects manipulation and visualization of information. A user is able to identify to interactively visualize relationships, such as patterns or causality, within data of large, sometimes disparate/distinct complex data sets, which relationships may be practically undetectable via examination of the data sets themselves.
Various examples of interactive visualizations of trace data sets and possible uses therefor, as well as certain optimization algorithms, are also discussed herein.
This Summary is provided to introduce a selection of concepts in a simplified form. The concepts are further described in the Detailed Description section. Elements or steps other than those described in this Summary are possible, and no element or step is necessarily required. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended for use as an aid in determining the scope of the claimed subject matter. The claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this document.
Using the data visualization and analysis system (“DVAS”) and techniques described herein, it is possible to model, store, retrieve, analyze and visually represent large data sets in a fast, lightweight, flexible, and highly interactive fashion. For purposes of example and not limitation, both the data models and techniques of the DVAS are described with reference to trace data sets, which are files or other data storage constructs used to record information regarding certain defined events occurring during operation of an exemplary computing system (referred to as a “computing system under test”) or a portion thereof. Other large data sets are possible, however, such as genome information based on DNA sequencing activities.
Turning now to the drawings, where like numerals designate like components,
Defined events 107 may be specified by trace collection system 102, DVAS 101, and/or user 111, and represent occurrences or activities within CSUT 100. Such occurrences or activities may be single occurrences or activities represented by event information 109, or may be aggregations of occurrences or activities represented by event information 109, which are interpreted and/or defined to have certain meanings. As such, it will be appreciated that virtually unlimited numbers and types of events may be defined. Exemplary event information 109 associated with defined events 107 includes but is not limited to caller/callee processing unit and/or data processing operation; resource identification; start/entry time; end/exit time; or execution/access times.
CSUT 100 may include various system components 120, such as one or more processing units in one or more locations (a processing unit may be any processing construct now known or later developed, including but not limited to a CPU, GPU, core, or hardware thread); one or more data processing operations (including but not limited to processes, software threads, or service-providing entities exposed thereby), which are executable by one or more of the processing units to provide certain functionality; and various internal or external resources, including but not limited to files, data, computer-readable storage media, registry keys, objects, and the like, which are accessible via the processes. Examples of defined events 107 include but are not limited to activities related to certain processing units, calls to/from certain data processing operations, and resource access activities.
Trace data collection system 102 represents any known or later developed hardware, software, firmware, or combination thereof deployed to execute certain test scenarios 105 during operation of CSUT 100 to produce trace data set 110. Trace data set 110 includes any event information 109 gathered pursuant to execution of one or more test scenarios 105.
Based on trace data set 110, aspects of DVAS 101 create a data model 141 which, when processed in accordance with the techniques described herein, visually and interactively represents the occurrences of defined events 107 within CSUT 100 to a user 111.
With continuing reference to
In an exemplary implementation, DVAS 101 includes: a data access engine 204, which is responsible for creating aspects of data model 141 based on trace data set 110; and a visualization engine 202, which is responsible for visually representing defined events 107 based on data model 141. In one exemplary scenario, multiple data access engines 204 (or portions thereof) may be implemented as plug-ins to visualization engine 204. As discussed further below, various declaratory language data structures are created and/or used by DVAS 101. Any known or later developed schemas or base templates (not shown) to which the syntax of the data structures may conform, can be used to facilitate organization and interpretation of such data structures. For example, various schemas are available and/or definable for XML, although it will be appreciated that any declarative programming language may be used.
In connection with the functionality of data access engine 204, data reader 240 receives and parses trace data set 110. Data reader 240 is responsible for classification, filtering, and aggregation of event information 109 to implicitly or explicitly populate data units 242 with items of data, and for extracting the commonalities of data units 242 to form data collections 244.
Data units 242 are sets of declaratory language data structures associated with particular defined events 107. A particular data unit 242 is associated with a particular defined event 107, or an aggregation of defined events. As discussed above, a particular defined event 107 may be described by one or more units of event information 109 recorded within trace data set 110.
The data structures associated with data unit 242 may be part of one or more physical or logical constructs, and are implicitly or explicitly populated with items of data representing data values stored in one or more computer-readable storage media. Such items of data may be any direct or indirect references, such as one or more strings, numeric values, variables, pointers, vectors, or URLs.
Turning briefly to
A first data structure 401 associated with data unit 242 is populated with one or more items of first data 411, which represent a particular defined event 107 and/or type thereof. In one possible implementation, first data structure 401 is part of metadata data structure 246.
A second data structure 402 is populated with one or more items of second data 412, which represent particular computing system components 120 and/or types thereof with which the particular defined event is associated. In one possible implementation, second data structure 401 is part of metadata data structure 246.
A third data structure 403 is populated with an item of third data 413, which represents a timestamp of the particular defined event. The timestamp may be any desired time associated with occurrence of the particular defined event, such as the start time, end time, a time value calculated or derived from start and/or end times (a duration, for example), or a time value calculated or derived from other event information 109. It is also noted that in addition to or instead of time, one or more other dimensions or constructs may be used to establish a relative sequence between data units.
A fourth data structure 404 is populated with one or more items of fourth data 414, which represent metrics associated with the particular defined event. In one possible implementation, items of fourth data are expressed as an array. Examples of metrics include but are not limited to: a reason/source for a particular defined event; a start time; an end time; a callee data processing operation identifier; a caller data processing operation identifier; a start address or other information associated with a particular caller or callee data processing operation; and a duration. It will be appreciated, however, that many additional metrics are possible. It will also be appreciated that certain metrics may be calculated or derived based on other event information 109 within trace data set 110.
A fifth data structure 405 is optionally populated with one or more items of fifth data 415, which indicate whether the particular defined event represented by the data unit itself and/or any aspect of the particular defined event represented by any items of data are selected for visual representation to a user based on current user selections (user selections are discussed further below).
Data units 242 optionally have two or more parts. Splitting the data structure into multiple parts improves the speed of the visualization engine without penalizing the main memory. In one exemplary implementation, data units have two parts. The first part, referred to as a “base unit” 301, may be stored in a main memory 311, which is generally a non-persistent memory such as a RAM. Base unit 301 stores information that is used to render visual representations, such as data that can be used to derive coordinates of geometric objects 229 (discussed further below) and data that can be used to render other aspects of drawing modes 271 (also discussed further below) to a user. The second part, which may not be present for all data units, is referred to as an “extended unit” 303. Extended unit 303 has a data item pointer 305 used to retrieve data values 312 and data items lengths 307 from secondary storage, which is generally a persistent memory location 310. Extended unit 303 stores information that is not used during rendering of visual representations, but may be used to retrieve specific user-selected information.
Metadata data structure 246 includes a number of sub-data structures, which aggregate commonalities of the base units and the extended units for the sub-set of data units forming a particular data collection 244 (discussed further below). In this manner, the metadata data structure helps to reduce the amount of data stored in the main memory. Examples of sub-data structures composing the metadata data structures include but are not limited to: pivot items/types 258 (discussed further below); friendly names of the data on the base units (base unit names 322); friendly names of the data on the extended units (extended unit names 324); and other items of information about a state and/or status of a property associated with particular data units and/or data collections, such as selection indicators 326 and color indicators 328.
Referring again to
A data collection set generator 248 generates sub-sets of data collections having pivot items 258 of the same type, referred to as data collections sets 260. Data collection sets 260 may be pre-generated or generated in response to user selections. For example, different views of the same data may be generated, such as distinct data collection sets for bus operations, core operations, and transfer speeds of CSUT 100. In addition, more than one data collection set (from more than one trace data set) may be processed concurrently.
An arrangement of data collection sets 260 is referred to as a data silo 220. A data silo 220 is an aggregation of data collection sets 260 from a given trace data set 110 or designated amount of data from a given trace data set 110. It also encapsulates other “silo wide” capabilities, such as skew and offset, that facilitate synchronizations between traces (such silo-wide capabilities are discussed in examples further below, following the discussion of
One or more structured representations 250 may be generated based on a particular data collection set 260. A structured representation is generally created based on processing of one or more declarative language instructions, and may include different nodes corresponding to different data collections 244. In an exemplary scenario, a structured representation includes nodes 251 based on unique, user-selectable permutations of pivot items 258 of the data collections 244 of a particular data collection set 260. User input may be used to re-order a data collection set representation based on a different ordering of pivot items 258. Nodes 251 of structured representation 250 are be processed (queried, for example, using any known or later developed query language or technique) to identify one or more data units 242 associated with each processed node.
A descriptor data structure 291 includes information such as the different drawing modes 271 (discussed further below) that are associated with a particular data collection set 260, and/or information that facilitates indirect retrieval of items of data from the data units of the data collections. For example, a matrix 282 may be included within descriptor data structure 291, and be invoked (for example, by visualization engine 202, which discussed further below) to indirectly retrieve stored data values for data units 242 associated with particular nodes 251 of a particular structured representation 250. In one possible implementation, matrix 282 is an index-to-label matrix populated with desired labels expressing pivot items/types 258 and/or other items of data used to determine associated data unit data structures (or vice versa), including but not limited to unique identifiers 450, first data structure 401, second data structure 402, third data structure 403, fourth data structure 404, or fifth data structure 405.
As shown, visualization engine 202, which is responsible for visually representing defined events 107 based on data model 141, includes a data collection set transformer 257 and a presenter 224, although it will be appreciated that more, fewer, or different components are possible.
Data collection set transformer 257 is responsible for transforming information regarding defined events 107 associated with data units 242 of one or more data collection sets 260—associated with one or more data silos 220—into data that defines, among other things, coordinates for a set of geometric objects 229 (such as points, rectangles, etc.) for each data collection set 260 in accordance with a particular drawing mode 271. A particular data collection set 260 is generally bound (in an exemplary implementation, via descriptor data structure 291) to one or more drawing modes 271. Data collection set transformer 257 (or another aspect of visualization engine 204) may also be responsible for validating information provided by data access engine 204 for legal syntax against one or more schemas and/or templates.
A particular drawing mode 271 encompasses information such as system axes, coordinate systems, scales, geometries, etc. used for rendering visual information associated with a particular data collection set 260, including but not limited to geometric objects 229 (discussed further below) and user-selectable objects (“USOs”) 226 (also discussed further below), to a user. Presenter 224 is responsible for visually representing aspects of a particular drawing mode 271 to a user.
In one exemplary scenario, for a particular data collection set 260, a particular set of n-dimensional geometric objects 229 is rendered in an n-dimensional space/coordinate system based on coordinates 261 having derived by data collection set transformer 257. A particular geometric object 229 corresponds to a particular data unit 242 associated with a particular node 251 of a particular structured representation 250 of a particular data collection set 260. Examples of geometric objects include but are not limited to points, rectangles, volumes, lines, curves, charts, etc.
For a particular drawing mode 271, coordinates 261 of a particular geometric object 229 are specified using one or more expressions 262 that have one or more variables 263 as operands. It is noted that a geometric object may have a single coordinate, and that a stand-alone variable may constitute an expression. Variables 263 represent items of data 411, 412, 413, 414, 415 of particular data units 242 within a data collection set 260, and the variables are used to indirectly retrieve data values 312 represented thereby. A particular data collection set 260 is generally bound to a particular matrix 282, which may be invoked to indirectly retrieve data values 312 from appropriate computer-readable storage media 310/311.
Once data values 312 have been retrieved, expressions 262 are evaluated to obtain coordinate values at which a particular geometric object 229 is to be visually rendered. Such indirection enables dynamic switching between drawing modes/coordinate systems, data silos, data collection sets, and/or data collections with little impact on user experience—achieving near real-time responsiveness to user input that affects manipulation and visualization of information. Moreover, visual representations need not be pre-created and stored for future user queries—they may be created and changed on-the-fly.
In one exemplary scenario, user input is received via user selectable objects (“USOs”) 226. USOs 226 represent presentation tools or controls in the form of visible objects presented via a user interface (a user interface 716 is shown and discussed further below, in connection with
USOs 226 are selectable to access interactive functionality of DVAS 101. Examples of interactive functionality of DVAS 101 include but are not limited to: dynamic switching between drawing modes/coordinate systems, data silos, data collection sets, and/or data collections; navigating sets of geometric objects to view and/or toggle between different selections; enabling/disabling synchronization between contents of the presentations of different sets of geometric objects; selecting/changing the way in which the same data is visualized (e.g., changing the permutation order of the nodes of a representation of a particular data collection set); selecting/changing the perceptual enhancements (e.g., highlighting, color schemes, etc.) of selected and/or unselected geometric objects within various drawing modes; and dynamically highlighting and presenting detailed event information 109 within trace data set 110 pursuant to certain user input, such as hovering over certain geometric objects.
It will be appreciated that numerous, virtually unlimited, data collection sets 260 and/or drawing modes 271 therefor can be devised for a particular data silo 220, enabling the rapid, responsive, and efficient visualization of performance of one or more CSUTs 100 or components thereof in a manner that would be too difficult, impractical, or complex to accomplish with reference to raw trace data sets 110.
With continuing reference to
The method begins at block 500, and continues at block 502, where trace data, such as trace data set 110, is identified.
Next, at block 504, the trace data is parsed (by one or more data readers 240, for example) to implicitly or explicitly populate a plurality of data units, such as data units 242, with items of data, such as items of data 411, 412, 413, 414 and optionally 415. Each data unit is associated with a particular defined event 107 and has a number of data structures associated therewith (such as data structures 401, 402, 403, 404, and optionally 405), which are populated with corresponding items of data. The data structures associated with a particular data unit 242 may be part of the same or different physical or logical constructs.
At block 506, data collections, such as data collections 244, are generated from sub-sets of data units. Each data unit of a data collection has at least one common item of data, referred to as a pivot item, and each pivot item has a pivot type (for example, pivot items/types 258).
A data collection set, such as data collection set 260, is generated, as indicated at block 508. A data collection set includes a sub-set of data collections, each data collection having at least one pivot type in common, such as an event type, component type, or data value type.
At block 510, one or more structured representations of the data collection set, such structured representations 250, are generated. A structured representation has nodes, and a particular node corresponds to a particular data collection-that is, a particular unique (and generally user-selectable) permutation of pivot items/types 258. As indicated at block 512, at least some of the nodes are processed to identify one or more data units associated with each processed node.
At block 514, a coordinate system, such as coordinate(s) 261, which is generally associated with a particular drawing mode 271, is identified for visually rendering geometric objects, such as geometric objects 229, which correspond to the identified data units. In one exemplary implementation, a data structure, such as descriptor 291, includes information regarding the relationship between different data collection sets 260 and drawing modes 271.
A particular geometric object is specified within the coordinate system using at least one coordinate value, such as a coordinate value 261, which is specified by an expression, such as an expression 262, having an operand in the form of a variable, such as a variable 263, which represents an item of data of a particular identified data unit. In one exemplary implementation, the variable represents the item of third data, representing the timestamp of the particular identified data unit. Other coordinate values may be specified by other expressions that have one or more operands in the form of variables used to indirectly retrieve data values represented by items of data of particular data units. For example, such expressions may have variable operands representing one or more items of fourth data—that is, one or more metrics associated with a particular data unit.
As indicated at block 516, for a particular identified data unit, the variable is used to indirectly retrieve, from a computer-readable storage medium (such as persistent memory 310 or main memory 311), a particular data value, such as a data value 312, associated with the particular item of data represented by the variable 263. In one exemplary implementation, a matrix, such as matrix 282, is used to indirectly retrieve stored data values for particular data units.
The expression is evaluated based on the retrieved data value, to obtain the coordinate value for the particular geometric object, as indicated at block 518, and the coordinate value is used to visually render the geometric object, as indicated at block 520.
As an example, assume that there is an “interval tick” drawing mode available for a particular data collection set 260, which specifies a 2-dimensional coordinate system in which rectangles having coordinates (X, Y, X+W, Y+H) are rendered. The following is a description, in pseudo-code form, of an exemplary interval tick drawing mode:
Interval Tick {DefaultHighlight, H(P1), W (P0)}, where P0 represents an item of data (for example, a metric such as duration) for a particular data unit 242 of a data collection set, and P1 is a fixed value (such as “1”) used to represent a particular data processing operation.
To visually render a rectangle for a particular data unit/event of the data collection set, visualization engine 204 invokes descriptor 291 associated with the particular data collection set, to indirectly retrieve data values described in the interval tick drawing mode. Once the data values are fetched from the data units, the data values may be used to draw a rectangle having coordinates (X, Y, X+W, Y+H), which results in a rectangle with a fixed height corresponding to the fixed value, and a width corresponding to the metric (for example, duration).
The method illustrated in
Multi-System Trace Visualization Synchronization of System Wide Activity
In distributed computing environments, users need tools to derive correlations of events that happened on different systems, for example in different computers or networks. Synchronization of such events becomes a problem due to different technological challenges intrinsic to the diverse and sparse environments, such as clock skews, precision, event alignment, etc.
These problems may be solved via aspects of DVAS 101. In one exemplary implementation, a sequence of primitives may be applied to the original (raw) trace data, detecting the problems and “fixing” them by approximation. To accomplish this, the following techniques may be applied: determine a defined event on each source that can be correlated to; make that event the “reference event” relative to each source; align every source by the reference event (t=0); determine a “stop event” on each source and make them also “reference events”.
Due to clock skews and event resolution mismatches, it is very likely that each segment on each data source, determined by the start and stop reference events, will not match in size. Therefore the segments may have to be scaled and aligned to match in size. Any known or later developed skewing algorithms may be used to accomplish this.
Different approaches may be used to determined the reference. The determinations may be user-driven (that is, a user chooses the points) or automatically inferred. To automatically infer the points a reliable synchronization event can be identified on each source (e.g. a timestamp event) so the system can attempt to match them.
The heartbeat approach is another way to increase the reliability of the segment detection. From time to time, the systems will generally log a particular (heartbeat) event, and that event can be used to align the segments. The higher the heartbeat frequency, the higher the alignment precision will be.
Sparse Layered Representation of System Wide Activity
Users desire to have tools to visually recognize patterns of system activity and causality. It is a very common problem in performance optimization to find patterns that are not done in the optimal case, especially if those patterns are executed many times.
For example, the user is suspicious that the application is abusing on registry use. He needs tools to investigate when (how often) and how (which operations) were performed. He needs to be able to identify those recurring patterns easily (assuming he is correct in his suspicion, the application will be definitely accessing the registry heavily).
In one possible implementation, each registry key may be represented by a different color, and in a different position in the Y axis. The recurring patterns become readily visually apparent, reducing dramatically the amount of time necessary to find performance issues.
Causality is something that can be inferred by this approach as well. By stacking the operations rooted on the execution timeline, aggregated with a unique representation of the operation, if that operation repeats itself they will show in the same order to the user. The sparse layered representation includes a visual representation of hierarchal (tree representation) data on a timeline, where each event instance of a leaf node of the tree is assigned an (X, Y) coordinate in a visual display. The X coordinate corresponds to the event (data unit) timestamp. The Y coordinate represents the count of unique leaf nodes found up to a given point in time in the X coordinate. Such that, once an event that establishes a new leaf node is found, a Y coordinate of the total count of “unique leafs+1” is assigned to that leaf, and all subsequent events from that same leaf will be presented in the same Y coordinate. Such sparse representation enables the visualization and facilitates the perception of reoccurrence patterns which may indicate a causality relation between events that occurs at a given approximate order and visual representation proximity.
Multi-Dimensional Axis Synchronization of System Wide Activity Visual Representation of Trace Data
In complex environments users will be dealing with many different data collection sets and/or drawing modes and attempting to correlate them. The more complex the environment, the harder it is for the user to keep track of the visual representations. Using DVAS 101, it is possible to synchronize the contents of the X, Y or X+Y axes. In other words, when a user pans a view to the right, all related content (and only the related content) will follow suit. The same applies to the Y axis. The user may also combine different views and synchronize only in one of the axis, e.g. X. That means that even two 2D graphs can be synchronized, say, by time.
To accomplish this an array of index types may be defined, that define what kind of data each axis holds. The drawing mode provides the mapping between the actual values and the “units” they represent. If the units match as well the axis, they can be kept track of together. That means if the user applies a zoom, pan, or any other operation that changes the coordinate system (or other aspects of what the user sees), it can be propagated to all other drawing modes currently being displayed, affecting them as well.
Another advantage of this approach is the ability to correlate different data sets on key variables. For example, if the user opens a “Registry Usage” view (that shows which keys are being used) at the same time he opens “Process and Threads”, as both are synchronized by time, by focusing on the key he is looking for the other window will shift and display what process is accessing that key at that very same time. All is done automatically and in real time.
Exceptions to this approach may also be defined, and the user may enable/disable them as desired. For example, a user can open two views of “Process and Threads” and view different sections of the process activity. If the modes above are in effect, when he scrolls one of them, the other can be detached and the user can navigate freely.
Bi-Directional Manipulation of System Wide Trace Data and Trace Visual Representation
This exemplary implementation allows a user to view neighbor events on the system, by selecting a target event on the screen. The user may also specify how wide his search is, in other words, how far from the target his correlations will span. To accomplish this, data collections are traversed and items of data that match the user's criteria are found and highlighted, as well as being displayed in a textual table. Note that highlight is just one of the many possible ways of displaying the results. It is also possible to toggle from the textual representation to the “wide search” results. For example, “Next >” and “<Back” arrows may be enabled in the textual representation and at the same time updating the highlights on-the-fly.
The following is an exemplary algorithm for syncing textual and graphical representation, in pseudo-code form:
Multi-Pivoting Hierarchical Aggregation and Manipulation of Distinct System Activity Representation that Share Common Properties
A user will often desire to correlate diverse data without building complex indexing schemes or pre-processing the answers. In many cases, the same data can be viewed from many different angles. For example, for a list of processes and registry key access, a user could ask:
Q1: “What registry keys process P accessed?”
Q2: “What processes accessed registry key K?”
DVAS 101 provides the ability to pivot data collection sets based on different permutations of pivot items and/or types. The following technique may be used: in the metadata data structure an array of labels is defined, with each item of the array defining a node that represents the level in the hierarchy. A list of possible pivot items (list of labels) is provided, as well as what indexes are used to build the tree. Alternatively, the generation of the list of possible pivots can be done automatically, by creating all or a subset of the permutations. When the user selects a new pivot representation, the hierarchical view of the item of the array may be cleaned, and the matrix 282 may be used to rebuild the tree by inquiring the data collection about each of the labels following the order defined in the index list. If an item is not there, it can be added as a new level of the tree, and if an item is there, it can be added to the existing parent determined by the previous label index. Once the tree is built for each node, all data collections may be aggregated to that level if they have the same path. That way all items on the leaf node will be left at the same level
In the exemplary questions Q1 and Q2 above, consider a “Registry per Process” data collection set. The data collection set is built with the following labels: (a) Process, (b) Key and (c) Operation. A default view may show all the processes, and it is possible to dive into the process details of what registry keys it operated on, answering question Q1, accomplished by passing the label index list as [0, 1, 2]. To be able to see what processes operated on a specific key, by passing the label index list as [1, 0, 2], a new tree will be built with the Key as the top level item. That will answer Q2.
Perceptual Enhancement of Selection Highlighting for High Density System Activity Data Representation Graphs
Oftentimes users attempt to infer new conclusions about problems where the root cause is unknown. To aid the user to understand what find of data is being handled, DVAS 101 enables matching selections and visual representations of the data.
When the user is analyzing an ordinary computing system, he will be dealing with many different components, such as processes, threads, registry keys, interrupts, etc. To make it more challenging all this information can be displayed at the same time in the various visual representations the user is dealing with. It is possible to filter the data the user is interested in, but often unselected data also stays in the visual representations. Different color schemes may be used to represent the selected and unselected data points. That allows a user not just to visualize his data but also compare against other data that is not filtered. For example, if a user selects to see where a process executed, he can compare this against all the other processes and see how much this process is interfering with the others or vice-versa. Note that selecting a process is just an example. This principle should allow the user to select various data points, tag them, chose the display modes, compare strategies, etc.
Dynamic Visual Highlighting and Presentation of Detailed Information During Mouse Hover for High Density System Activity Data Representation Graphs
Once a user spots a point in a visual representation that he is interested in, he may want to receive extra details about the segment he is looking at. That problem gets complicated due to not just the density of the data, but also the disparity (diversity) of the data being attempted to be detailed.
For example, when stopping over a visual representation of a process, a user wants more information about that process at only that specific point in time. He may wants to see the global process information such as process ID, name, etc., but more importantly he wants to see what the process is doing at that instant in time. Some useful information that can be retrieved and displayed for this example would be: (a) date/time process was scheduled to run at this round; (b) what priority did the process get into the processor; (c) what is the process doing; (c.1) what module is running; (c.2) what call stack and so on and so forth; (d) how much memory is allocated at that time; etc.
DVAS 101 is able to retrieve and present this data in a very efficient and generic way, including retrieving some information from base units 301 and possibly from extended units 303, allowing different drawing modes to bring different data upon user request.
Segment Resource Usage Representation Over a Timeline
There are certain resources (for example, scarce resources) that are formally allocated to be available by the processes to be used. Resources like RAM, disk storage are common examples yet not inclusive list of such resources. A common problem users face is that resources are allocated and not freed, resulting in defects referred to as “leaks.” For memory, for instance, these defects are commonly referred as “memory leaks.”
Tracking scarce resources may be important for analyzing problems resulting from leaks. The user may want to know when the resource was allocated, when it should have been freed, when it was actually freed (if ever freed), and so on. DVAS 101 enables this problem to be solved in an effective and visually compelling way to make the resource usage readily apparent.
In one exemplary implementation, the concept of a “heat map” was used, where resources are plotted on a two-dimensional matrix—for example, for memory the column X would represent the memory bank, Y represents the memory address within the bank, and the coordinate [X, Y] would represent the allocation of that memory block. For example, by having [X, Y] be a bit, when set would mean allocated and not set mean free. It will be appreciated that the map being represented by a bit is just one possibility. Virtually any data structure may be referred by the matrix.
Snapshots of the heat map are taken for each unit of time, producing many heat maps indexed over time, and a 3D structure that represents the actual allocation over time can be created. Another possibility would be to create a “stream” version of the data that could be played so that a user could see the evolution over time. The user, however, may not be able to easily spot differences over long periods of time. One possible approach is to apply a transformation over the two-dimensional structure to create a one-dimensional structure (or transform a three-dimensional structure to create a two-dimensional structure), which is easier to visualize. The transformed structure may be rendered as a regular data collection set without the need of any special processing and any correlation tools of DVAS 101 may be used.
Referring to the memory representation example above, instead of representing the memory as (bank, relative-address) in the two-dimensional structure, the memory layout may be flattened in accordance with the following pseudo-code, converting it into (absolute-address) [or “relative enough” address]: Mem(Bank, RelativeAddress) Mem(AbsoluteAddress); Mem(Time, Bank, RelativeAddress) Mem(Time, AbsoluteAddress).
Optimization Algorithms
Multi-System Trace Alignment and Skew Optimization
When implementing the approach to align different trace data sets, either from different CSUTs or different logs of the same CSUT, the near real-time mapping of data may pose challenges. In one exemplary implementation, data conversions/transformations are only performed for the segments that will be visible by the user, to reduce computations. For example, if the user wants to see events from [1000, 2000], by reverting and converting the viewport to a set of world coordinates, the translation of the original data may be accomplished by just translating the viewport.
Similarly, when it is determined (by the system and/or user) that one system is slipped or skewed by a factor, that transformation may be applied to the system's viewport. Once the new viewport is calculated, all the transformations to “match” the events can happen automatically.
High Density System Activity Visual Representation Drawing Optimization
When drawing data of large segments, many “collisions” of data will happen. That means that although the events are not that close to each other in time, they may appear to be on the display. The amount of computation may be reduced by finding those collisions and removing them.
The following algorithms, presented in pseudo-code form, filter these collisions and increase the performance of DVAS 101. For each, the optimizations the drawing mode varies on X and Y where X is the width and Y is the height of the screen.
Point Optimization
Simple Tick Optimization
Bar Optimization
Bar Bi-Directional Optimization
With continuing reference to
One or more components shown in
Communication interface(s) 710 are one or more physical or logical elements that enhance the ability of operating environment 700 to receive information from, or transmit information to, another operating environment (not shown) via a communication medium. Examples of communication media include but are not limited to: wireless or wired signals; computer-readable storage media; computer-executable instructions; communication hardware or firmware; and communication protocols or techniques.
Specialized hardware/firmware 742 represents any hardware or firmware that implements functions of operating environment 700. Examples of specialized hardware/firmware 742 include encoder/decoders (“CODECs”), decrypters, application-specific integrated circuits, secure clocks, and the like.
A processor 702, which may be one or more real or virtual processors, controls functions of operating environment 700 by executing computer-executable instructions 706 (discussed further below).
Computer-readable media 704 represent any number and combination of local or remote components, in any form, now known or later developed, capable of recording, storing, or transmitting computer-readable data, such as instructions 706 (discussed further below) executable by processor 702. In particular, computer-readable media 704 may be, or may include persistent memory 310 and main memory 311, and may be in the form of: a semiconductor memory (such as a read only memory (“ROM”), any type of programmable ROM (“PROM”), a random access memory (“RAM”), or a flash memory, for example); a magnetic storage device (such as a floppy disk drive, a hard disk drive, a magnetic drum, a magnetic tape, or a magneto-optical disk); an optical storage device (such as any type of compact disk or digital versatile disk); a bubble memory; a cache memory; a core memory; a holographic memory; a memory stick; or any combination thereof. Computer-readable media 104 may also include transmission media and data associated therewith. Examples of transmission media/data include, but are not limited to, data embodied in any form of wireline or wireless transmission, such as packetized or non-packetized data carried by a modulated carrier signal.
Computer-executable instructions 706 represent any signal processing methods or stored instructions that electronically control predetermined operations on data. In general, computer-executable instructions 706 are implemented as software programs according to well-known practices for component-based software development, and encoded in computer-readable media (such as one or more types of computer-readable storage media 704). Software programs may be combined or distributed in various ways.
User interface(s) 716 represent a combination of presentation tools and controls that define the way user 111 interacts with operating environment 700. One type of user interface 716 is a graphical user interface (“GUI”), although any known or later developed type of user interface is possible. Presentation tools are used to receive input from, or provide output to, a user. An example of a physical presentation tool is a display such as a monitor device. An example of a logical presentation tool is a data organization technique (for example, a window, a menu, or a layout thereof). Controls facilitate the receipt of input from a user. An example of a physical control is an input device such as a remote control, a display, a mouse, a pen, a stylus, a trackball, a keyboard, a microphone, or a scanning device. An example of a logical control is a data organization technique (for example, a window, a menu, or a layout thereof) via which a user may issue commands. It will be appreciated that the same physical device or logical construct may function as an interface for both inputs to, and outputs from, a user.
Various aspects of an operating environment and an architecture/techniques that are used to implement aspects of DVAS 101 have been described. It will be understood, however, that all of the described elements need not be used, nor must the elements, when used, be present concurrently. Elements described as being computer programs are not limited to implementation by any specific embodiments of computer programs, and rather are processes that convey or transform data, and may generally be implemented by, or executed in, hardware, software, firmware, or any combination thereof.
Although the subject matter herein has been described in language specific to structural features and/or methodological acts, it is also to be understood that the subject matter defined in the claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
It will further be understood that when one element is indicated as being responsive to another element, the elements may be directly or indirectly coupled. Connections depicted herein may be logical or physical in practice to achieve a coupling or communicative interface between elements. Connections may be implemented, among other ways, as inter-process communications among software processes, or inter-machine communications among networked computers.
The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any implementation or aspect thereof described herein as “exemplary” is not necessarily to be constructed as preferred or advantageous over other implementations or aspects thereof.
As it is understood that embodiments other than the specific embodiments described above may be devised without departing from the spirit and scope of the appended claims, it is intended that the scope of the subject matter herein will be governed by the following claims.