Big data applications often rely on a search for patterns in the data. Such patterns may be detected, for example, based on a visualization of the data.
Identifying data patterns is an important task for many big data applications, such as, for example, in an investigation and analysis of security threats. Various visual techniques may be utilized to facilitate discovery of patterns. Some visual techniques may include an interactive visual representation of the data.
Generally, existing methods may aid analysis of the data by providing means to select portions of the visual representation for further analysis. However, visual representations of data are not at the pixel-level. For example, a pixel in the visual representation may not be representative of a data element. Accordingly, the selection of a portion of the visual representation may not be based on an actual data element.
Also, for example, existing methods for selection and/or extracting the portion of the visual representation may only allow for selection of regular shaped regions, such as rectangles. However, interesting data patterns may appear with arbitrary shapes in the visual representation.
Some interactive techniques may aid analysis of the data by providing means to highlight or delete portions of the visual representation for further analysis. However, such highlighting and/or deletion may generally result in a loss of useful information in the form of the underlying relations between data elements.
The interactive approach described herein is based on a two-phase processing to allow users to analyze data patterns. For example, network hunters may visually detect threats and identify actionable insights. The two-phase processing may generally include clipping interesting patterns from a big graph for detailed analysis. There is often a need to remove an interesting pattern from a visual representation. Existing methods include rectangle rubber-banding. However, many interesting patterns are arbitrarily shaped. Separate such interesting patterns from a complex visual representation, and zooming in to provide a modified visual representation are two important issues. For example, clipping interesting patterns may include cutting and zooming into, and/or extracting an arbitrary region with a diagonal line in a graph for further analysis of the behavior of a port scan. For example, in data related to security, clipping may be utilized to zoom into a region of the visual representation where the data pattern is indicative of suspicious threats, and conduct further analysis at an individual security record level (e.g., for an IP address).
Also, as described herein, certain portions of the visual representation may be highlighted to aid in an analysis of data patterns of interest to a user. For example, blurring of portions of the visual representation may be utilized to preserve respective data relations between the data elements. Accordingly, useful information in the form of the underlying relations between data elements may not be lost, while data elements of interest may be highlighted. In big data visualization, coloring plays an important role. However, there is generally no sufficient indication to identify portions of the data that may be ignored, and portions of the data that may be relevant to a user. Using a blurring technique, non-interesting data points may be blurred. For example, coloring may be utilized to apply blurring colors to assist users to ignore burred data points, and focus on important data points in a big data graph.
Also, in some examples, each pixel of a visual representation may represent a data element. In some examples, each pixel may be associated with a pixel attribute to represent an attribute of a data element. Accordingly, a user may be provided access to data record level information. Generally, visualizations for big data are not interactive. Users are unable to interact with the data at the record level. For example, a user may not have access to messages related to a specific IP address to work at a record level analysis. Further, users may not be able to process data iteratively to identify root-cause. As described herein, the visual analytics workflow may be iterative for users to validate and refine their hypotheses. Also, as described herein, interactive visual analytics may be utilized to gain situational awareness of big data and visualize the security threats.
As described in various examples herein, a visually interactive and iterative analysis of data patterns by a user is disclosed. One example is a system including a display module and an interaction processor. The display module displays, via an interactive graphical user interface, a visual representation of a plurality of data elements and respective data relations between the data elements, and wherein each data element is represented by pixel attributes of a pixel. The interaction processor iteratively and interactively processes analysis by a user based on identifying selection, by the user, of an arbitrarily shaped region of the visual representation, clipping the selected region by zooming in to the selected region, identifying, in the clipped region, selection of data elements of interest to the user, and prompting the display module to automatically blur visual representations of data elements different from the data elements of interest by modifying the pixel attributes of respective pixels.
In the following detailed description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific examples in which the disclosure may be practiced. It is to be understood that other examples may be utilized, and structural or logical changes may be made without departing from the scope of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims. It is to be understood that features of the various examples described herein may be combined, in part or whole, with each other, unless specifically noted otherwise.
The term “system” may be used to refer to a single computing device or multiple computing devices that communicate with each other (e.g. via a network) and operate together to provide a unified service. In some examples, the components of system 100 may communicate with one another over a network. As described herein, the network may be any wired or wireless network, and may include any number of hubs, routers, switches, cell towers, and so forth. Such a network may be, for example, part of a cellular network, part of the internet, part of an intranet, and/or any other type of network.
The system 100 displays, via an interactive graphical user interface, a visual representation of a plurality of data elements and respective data relations between the data elements, and wherein each data element is represented by pixel attributes of a pixel. The system 100 iteratively and interactively processes analysis by a domain expert based on identifying selection, by the user, of an arbitrarily shaped region of the visual representation, clipping the selected region by zooming in to the selected region, identifying, in the clipped region, selection of data elements of interest to the user, and prompting the display module to automatically blur visual representations of data elements different from the data elements of interest by modifying the pixel attributes of respective pixels.
System 100 includes a display module 104 to display, via an interactive graphical user interface 106, a visual representation of a plurality of data elements 102 and respective data relations between data elements, and where each data element is represented by pixel attributes of a pixel. Generally, the plurality of data elements 102 describes contents of a high-dimensional dataset. In some examples, the plurality of data elements 102 may be a cyber-security log file, proxy data, Web navigation logs (e.g. click stream), and healthcare data. In some examples, the plurality of data elements 102 may be representative of a data related to a disease, the data covering a 12-hour period, and including a terabyte of data elements.
In some examples, the visual representation may include a representation of each data element by a pixel. For example, the terabyte of data elements from the disease database may be represented graphically, where each pixel in the graph represents a data element. Also, for example, the plurality of data elements 102 may represent IP addresses that are logged into a secured network during a time interval, and the visual representation may be a graphical representation of the IP addresses, where each data pixel represents a record of an IP address.
In some examples, a pixel attribute associated with the pixel may represent a characteristic of the data element represented by the pixel. For example, the pixel attribute may be color, and a color scheme may be associated with a data element. In some examples, a color may be associated with an IP address, and each pixel representing the IP address may be associated with the respective color. In some examples, each pixel may represent a range of IP addresses and a color may be associated with the range.
As described herein, the visual representation may be interactive. For example, portions of the representation may be selected to display additional features of the data elements. For example, clicking on pixel 208 may cause a pop-up 210 to be displayed. In some examples, the pop-up may be overlaid on the visual representation. Also, for example, portions of the representation may be selected for zooming in, zooming out, highlighting, deleting, and so forth. Such a visual representation of data may allow for identification of patterns in big data.
Referring to
Generally, the user may be an individual in possession of domain knowledge. For example, the domain may be a retail store, and the user may be the store manager. Also, for example, the domain may be a hospital, and the user may be a member of the hospital management staff. As another example, the domain may be a casino, and the user may be the casino manager. Also, for example, the domain may be a secure office space, and the user may be a member of the security staff.
As described herein, the visual representation may be interactive and iterative. Interactive processing may be performed via the interactive graphical user interface 106 by providing the visual representation to the user, identifying selection of an arbitrarily shaped region of the visual representation by the user, and providing a modified visual representation to the user. The iterative processing may include identifying another selection, by the user, of an arbitrarily shaped region of the visual representation. For example, existing methods generally rubber-band an area. Such regions may be regular shaped patterns, such as a rectangle. However, interesting patterns may appear in an arbitrarily shaped region. As described herein, system 100 facilitates selection of an arbitrarily shaped region. The user may select, modify, and/or deselect the arbitrarily shaped region, and the interaction processor 108 in communication with the display module 104, may iteratively process such selection, modification, and/or deselection to generate a modified visual representation to be provided to the user via the interactive graphical user interface 106.
In some examples, the interaction processor 106 may provide the modified visual representation of the sub-plurality of data elements by clipping the selected region by zooming in. For example, the user may want to select data elements of interest (e.g., an area representing potentially suspicious network activity) of the visual representation to perform further analysis. The user may select the data elements of interest on the visual representation, and clip and zoom-in to this arbitrary area for further analysis. The selected area may be marked for clipping. In some examples, the clipping may include removal of a diagonal line from the displayed graphical representation.
In some examples, the interaction processor 108 may identify, identifying, in the clipped region, selection of data elements of interest to the user, and may prompt the display module to automatically blur visual representations of data elements different from the data elements of interest by modifying the pixel attributes of respective pixels. In some examples, the blurring may include modifying pixel attributes such as color, light intensity, sharpness, and so forth. In some examples, the highlighting may include blurring a pixel. For example, sharp edges and/or pixels may drag a user's attention in an overcrowded display. Accordingly, a visual analysis may suffer under visual cognitive overload since the user may not be able to focus on every data point. Generally, in existing methods, in highlighting and/or filtering, only the data elements (e.g., pixels) of interest to the user may be shown. However, this removal of all of the other data elements (other than those of Immediate interest to the user) also removes data relations that may be relevant for analysis of the data pattern.
As disclosed herein, blurring provides the means to emphasize some data elements while maintaining their context and thus, preserving respective data relations. Generally, this may be beneficial to an analysis of data patterns, in contrast to the existing standard methods of highlighting and/or filtering.
In some examples, the blurring may be based on a Gaussian kernel. A radius around a pixel in both directions may define an effect of blurring. In some examples, the radius may be defined by the user. Generally, a value of 10 may be utilized in many applications. In some examples, the user may specify a variable, threshold, and/or condition applied for blurring. For example, the user may decide to analyze one IP address, and may therefore blur all pixels that contain other IP addresses. In some examples, a pixel represented by color “Red” may be surrounded by pixels represented by blurred (e.g., lighter) shades of “Red” such as pink. In some examples, the further a pixel is from the pixel represented by color “Red”, the lighter the shade of “Red”. Likewise, a pixel represented by color “Green” may be surrounded by pixels represented by blurred (e.g., lighter) shades of “Green” such as light green. In some examples, the further a pixel is from the pixel represented by color “Green”, the lighter the shade of “Green”. Accordingly, the blurring algorithm blurs pixels of the visualization that fulfill such a condition—all pixels other than the pixels associated with the specified IP address may be blurred.
Accordingly, a modified visual representation including the data elements of interest to the user may be generated. For example, if a pixel is associated with a blurrValue 0 at 806 of
The components of system 100 may be computing resources, each including a suitable combination of a physical computing device, a virtual computing device, a network, software, a cloud infrastructure, a hybrid cloud infrastructure that includes a first cloud infrastructure and a second cloud infrastructure that is different from the first cloud infrastructure, and so forth. The components of system 100 may be a combination of hardware and programming for performing a designated function. In some instances, each component may include a processor and a memory, while programming code is stored on that memory and executable by a processor to perform a designated function.
For example, the plurality of data elements 102 may be stored in a plurality of databases communicatively linked over a network. System 100 may include hardware to physically store the plurality of data elements 102, and processors to physically process the plurality of data elements 102. System 100 may also include software algorithms to access the plurality of data elements 102 and share them over a network.
As another example, display module 104 may include software programming to receive the plurality of data elements 102 over a physical network. Display module 104 may also include software programming to automatically generate a visual representation of the plurality of data elements 102. For example, display module 104 may include software programming to represent a given data element by a pixel, and determine and associate pixel attributes based on the characteristics of the given data element. Display module 104 may also include software programming to dynamically interact with the interaction processor 108 to receive feedback related to selection of an arbitrarily shaped region of the visual representation, and selection of data elements of interest to the user, and modify the visual representation accordingly. Display module 104 may include hardware, including physical processors and memory to house and process such software algorithms. Display module 104 may also include physical networks to be communicatively linked to the interaction processor 108 and the interactive graphical user interface 106.
Also, for example, the interactive graphical user interface 106 may include software programming to receive and implement the visual representation for display from the display module 104. Interactive graphical user interface 106 may also include software programming to interactively and iteratively interact with the user. The interactive graphical user interface 106 may include hardware, including physical processors and memory to display the interactive visual representation of the plurality of data elements 102. Also, for example, the interactive graphical user interface 106 may include a computing device to provide the graphical user interface. Interactive graphical user interface 106 may also include software programming to dynamically interact with the interaction processor 108 to provide feedback related to selection of an arbitrarily shaped region of the visual representation, and selection of data elements of interest to the user. Evaluator 108 may also include hardware, including physical processors and memory to house and process such software algorithms, and physical networks to be communicatively linked to the display module 104 and the interaction processor 108.
Likewise, the interaction processor 108 may include software programming to receive feedback from the interactive graphical user interface 106. The interaction processor 114 may also include software programming to provide the feedback to the display module 104 to modify the visual representation. The interaction processor 114 may also include hardware, including physical processors and memory to house and process such software algorithms, and physical networks to be communicatively linked to the display module 104, the interactive graphical user interface 106, and to computing devices.
The computing device may be, for example, a web-based server, a local area network server, a cloud-based server, a notebook computer, a desktop computer, an all-in-one system, a tablet computing device, a mobile phone, an electronic book reader, or any other electronic device suitable for provisioning a computing resource to perform an interactive selection of data features based on a dimension interestingness measure. Computing device may include a processor and a computer-readable storage medium.
Processor 1102 includes a Central Processing Unit (CPU) or another suitable processor. In some examples, memory 1104 stores machine readable instructions executed by processor 1102 for operating processing system 1100. Memory 1104 includes any suitable combination of volatile and/or non-volatile memory, such as combinations of Random Access Memory (RAM), Read-Only Memory (ROM), flash memory, and/or other suitable memory.
Memory 1104 stores instructions to be executed by processor 1102 including instructions of a display module 1106, and instructions of an interaction processor 1108. In some examples, instructions of display module 1106, and instructions of an interaction processor 1108, include instructions of display module 104, and instructions of interaction processor 108, respectively, as previously described and illustrated with reference to
Processor 1102 executes instructions of display module 1106 to display, via an interactive graphical user interface, a visual representation of a plurality of data elements and respective data relations between the data elements, and wherein each data element is represented by pixel attributes of a pixel. In some examples, processor 1102 also executes instructions of display module 1104 to represent each data element with a pixel.
Processor 1102 executes instructions of an interaction processor 1108 to iteratively and interactively process analysis by a user by identifying selection, by the user, of an arbitrarily shaped region of the visual representation, clipping the selected region by zooming in to the selected region, identifying, in the clipped region, selection of data elements of interest to the user, and prompting the display module to automatically blur visual representations of data elements different from the data elements of interest by modifying the pixel attributes of respective pixels.
In some examples, processor 1102 executes instructions of an interaction processor 1106 to provide the modified visual representation of the sub-plurality of data elements by clipping the selected region. In some examples, processor 1102 executes instructions of an interaction processor 1106 to clip the visual representation by removal of a diagonal line from the displayed graphical representation.
Input devices 1110 include a keyboard, mouse, data ports, and/or other suitable devices for inputting information into processing system 1100. In some examples, input devices 1110, such as a computing device, are used by the interaction processor 1108 to interact with a user. Output devices 1112 include a monitor, speakers, data ports, and/or other suitable devices for outputting information from processing system 1100. In some examples, output devices 1112 are used to provide an interactive visual representation of the plurality of data elements.
Processor 1202 executes instructions included in the computer readable medium 1208. Computer readable medium 1208 includes data element access instructions 1210 of a display module 1204 to access a plurality of data elements from a database. Computer readable medium 1208 includes visual representation display instructions 1212 of a display module 1204 to display, via an interactive graphical user interface, a visual representation of a plurality of data elements and respective data relations between the data elements, and wherein each data element is represented by pixel attributes of a pixel.
Computer readable medium 1208 includes iterative processing instructions 1214 of an interaction processor 1206 to iteratively and interactively process visual analysis by a user, the iterative processing instructions 1214 including selection identification instructions 1216 to identify selection, by the user, of an arbitrarily shaped region of the visual representation, clipping instructions 1218 to clip the selected region by zooming in to the selected region, data element of interest selection instructions 1220 to identify, in the clipped region, selection of data elements of interest to the user, and blurring instructions 1222 to automatically blur visual representations of data elements different from the data elements of interest by modifying the pixel attributes of respective pixels.
As used herein, a “computer readable medium” may be any electronic, magnetic, optical, or other physical storage apparatus to contain or store information such as executable instructions, data, and the like. For example, any computer readable storage medium described herein may be any of Random Access Memory (RAM), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., a hard drive), a solid state drive, and the like, or a combination thereof. For example, the computer readable medium 1208 can include one of or multiple different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other types of storage devices.
As described herein, various components of the processing system 400 are identified and refer to a combination of hardware and programming configured to perform a designated function. As illustrated in
Such computer readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components. The storage medium or media can be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions can be downloaded over a network for execution.
Computer readable medium 1208 may be any of a number of memory components capable of storing instructions that can be executed by processor 1202. Computer readable medium 1208 may be non-transitory in the sense that it does not encompass a transitory signal but instead is made up of one or more memory components configured to store the relevant instructions. Computer readable medium 1208 may be implemented in a single device or distributed across devices. Likewise, processor 1202 represents any number of processors capable of executing instructions stored by computer readable medium 1208. Processor 1202 may be integrated in a single device or distributed across devices. Further, computer readable medium 1208 may be fully or partially integrated in the same device as processor 1202 (as illustrated), or it may be separate but accessible to that device and processor 1202. In some examples, computer readable medium 1208 may be a machine-readable storage medium.
In some examples, clipping may include removal of a diagonal line from the displayed graphical representation.
In some examples, the blurring may be based on a Gaussian kernel.
In some examples, modifying the pixel attributes is based on a threshold provided by the user.
In some examples, a given data element of the plurality of data elements is a pair comprising an IP address and a port number at a time interval, and the interaction processor further identifies the selection of the region based on a selection of an IP address.
In some examples, the pixel attribute associated with the pixel represents a characteristic of the data element represented by the pixel.
Examples of the disclosure provide a generalized system for visually interactive and iterative analysis of data patterns by a user. The generalized system provides a combination of visual analytics methods with human interactions to dynamically explore security threats in big data. Users may be able to refine their hypotheses through interaction and re-process these two phases of visual analytics techniques. These two phases of process may be built at a record level. Each data point (e.g., a pair of IP address and port number at a certain time period) may be represented by a smallest element, such as a pixel. Each pixel may be accessible by users.
Although specific examples have been illustrated and described herein, especially as related to healthcare data, the examples illustrate applications to any structured data. Accordingly, there may be a variety of alternate and/or equivalent implementations that may be substituted for the specific examples shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific examples discussed herein. Therefore, it is intended that this disclosure be limited only by the claims and the equivalents thereof.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US15/12924 | 1/26/2015 | WO | 00 |