Data visualization is used to help people understand the significance of data by placing it in a visual context. Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized more easily with data visualization. Data visualization enables analyzers to see analytics more easily allowing them to grasp difficult concepts or identify new patterns quickly and without laborious computation. With interactive visualization, a user may “drill down” into charts and graphs to view additional detail, thereby interactively changing the displayed data and the processing thereof.
Data visualization tools go well beyond the standard charts and graphs used in spreadsheets, displaying data in more sophisticated ways such as infographics, dials and gauges, geographic maps, heat maps, and detailed bar, pie and fever charts. The images may include interactive capabilities, enabling users to manipulate them or drill into the data for querying and analysis. Indicators designed to alert users when data has been updated or predefined conditions occur can also be included. As will be appreciated, visualizations of data can take many forms across many areas, and may be more useful in different forms to different organizations. However, the only tools that allow for the creation of customized visualizations require significant amounts of programming.
Features and advantages of the example embodiments, and the manner in which the same are accomplished, will become more readily apparent with reference to the following detailed description taken in conjunction with the accompanying drawings, in which:
Throughout the drawings and the detailed description, unless otherwise described, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated or adjusted for clarity, illustration, and/or convenience.
In the following description, specific details are set forth in order to provide a thorough understanding of the various example embodiments. It should be appreciated that various modifications to the embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Moreover, in the following description, numerous details are set forth for the purpose of explanation. However, one of ordinary skill in the art should understand that embodiments may be practiced without the use of these specific details. In other instances, well-known structures and processes are not shown or described in order not to obscure the description with unnecessary detail. Thus, the present disclosure is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
One of the primary drawbacks for creating data driven customized visualizations is the lack of tools to create such custom visualizations. The examples described herein relate to a visualization framework for generating customized visualizations based on data. The framework may include or may be used with a new software library (referred to herein as a ‘visualization library’ or ‘visualization factory’) and a data query language oriented for graphical visualization and layout. The framework allows a user to create new and customized visualizations for data from a foundation of basic shapes and layouts. The framework makes use of construction by composition and allows for reuse of existing visualizations to fit different and specific needs. A representation (e.g., shapes and layouts) created using the visualization library can be reused in another representation based on the same data or based on different data using a parameterization and bidding mechanism provided by the visualization library. As an example, the visualization library may be a JavaScript graphing and charting based library that is based on a descriptive layouting language combined with the data query language. Data for the representation may be extracted or otherwise captured from data sets using one or more functions from the data query language.
According to various examples, a visualization may include one or more shapes, one or more layouts organizing shapes, and content. The layouts may be bound to data from a data file through repeaters which may be used to specify how many times a layout is repeated within a single visualization. Creating a new visualization with the visualization library includes describing the content of the visualization. To do so, a designer (e.g., programmer, analyst, engineer, user, etc.) may use a set of shapes and/or layouts which may contain contents. In these examples, shapes may contain multiple shapes and layouts may contain one or more shapes, one or more layouts, or a combination thereof. Shapes are very similar to what may be found in computer assisted drawing/diagramming tools and may include, for example, a rectangle, an ellipse, a circle, a line, a sector, a label, a polygon, and many others. Additional examples of shapes are described in Appendix A. Layouts are used to organize the visualization (or representation) into patterns. For example, layouts may expose and allow users to manipulate the spatial organization of the representation. Examples of types of layouts include bar, circular, column, flow, grid, hierarchies, a squarified map, and many others.
As described in various examples, shapes and layouts may be combined to form more sophisticated visualization content. In addition, users may build data driven visualizations by combining the layouts with various data from a data file such as a spreadsheet, .csv file, document, notepad file, and the like. The data may be extracted, collected, gathered, and the like, from the data based on functions included in the data query language. Functions of the data query language may start with keyword data to signify that current data context is to be worked on. Examples of functions included in the data query language are shown in Appendix B. Furthermore, data may be bound to the representation at the layout level using a repeater, examples of which are further described herein. Using a repeater, a designer may define the number of times that a layout is repeated in a drawing space and the number of repetition may be driven by or determined by data from a data file. For example, each repetition of a layout may define a data context and content added in each layout space in the representation may be driven by this data context. The shapes, layouts, and/or repeaters may expose properties that determine their positioning and behavior. Also, data may be exposed through data contexts and a set of tools may be provided that allow the user to navigate the dataset and define selections and/or calculations.
The shapes, layouts, and/or repeaters may be combined together and referred to as a visualization definition. Before being rendered, a visualization definition may be evaluated (all properties are resolved) in what is referred to as a visualization output that is then pushed into the rendering engine that may draw the result on a graphic display using the visualization library, and the like. According to various aspects, the dynamics of the visualization such as interactions and animations may at least be partly handled the same way and surfaced in the grammar again allowing for definition of tailored behavior. The navigation of the content and the selection and highlighting may be distinguished from one another. For example, the navigation (e.g., zoom, pan, rotate, and the like) may be handled in a viewer, and may not be exposed in the grammar. Meanwhile, selection may be exposed in the grammar allowing for full fledge customization the same way that static aspects may be exposed.
In
In
In
In
In
Layouts allow you to organize a visualization according to patterns.
In this example, the code is used to define a column type layout, with each column having four sides (left, top, right, and bottom). The code also includes a “repeater” that is used to define the number of times that the layout is repeated within the visualization. Furthermore, the content portion of the repeater code is used to define the shapes/pattern of the layout 310 which includes the first and second rectangles 311 and 312. Accordingly, the layout 310 including the first and second rectangles 311 and 312 is repeated three times within the visualization 300. In this example, each layout includes shapes (i.e., two rectangles) however a layout may not include any shapes.
In
According to various aspects, the data may be from a spreadsheet file, a document, a .csv file, an .xls file, and the like. The dataset in Table 1 includes three columns including a column for Country, Vehicle type, and the number of Vehicles. Using one or more functions of the data query language described herein the layout may be repeated based on data included in one of the columns from the data set. A non-limiting example of code executed to generate the display of
In this case, the code repeats the layout 310 a number of times equal to the number of distinct Country values included in the Country column of the dataset in Table 1. Here, the number of distinct countries is five (i.e., China, UK, France, Brazil, and Germany). As a result, the layout 310 is repeated five times within the visualization 350 of
In this case, the content section of the code defines the type of layout, and also defines a repeater based on the distinct amount of countries included in Table 1. Furthermore, a rectangle and a label are defined in the code for each column layout.
In this example, China has the widest rectangle because China has the most vehicles, while the UK has the smallest rectangle because the UK has the least amount of vehicles. In the example of
In this example, the code defines a bar type layout, and the layout repeats each time for each country. The bar is shaped in the form of a rectangle beginning from the left side 381 of the visualization 380 and ending at a point with respect to the right side 382 based on the number of vehicles included in Table 1. In the example of
As shown in some of the examples herein, a designer is able to navigate in the layout by using the layout.parent expression. It allows the designer to access the properties of a parent layout. (Example: layout.parent.bounds.left). A designer may also navigate deeper by using chained calls. (Example: layout.parent.parent.parent.bounds.left which creates access to the third parent). For example, the layout behavior may be defined by:
A previous or next cell may also be called in layouts such as by:
In Table 2, the number of bicycles, motorcycles, and cars sold and rented per country are shown. The table includes a column for country, vehicle type, purchase type, and a measure of the number (Nb) of vehicles sold or rented. In the example of
In this example, the column layout defines a layout to be created where each object is inside a column next to each other. The object property is set to “data.distincts(‘Country’)”, which indicates that each column of the layout is to contain the data from the dataset for one distinct Country. Based on Table 2, the layout will include three columns, one for FR, one for ALL, and one for JP. The content property is used to define the content of each column. In this case, each column is to include a label displaying the name of the country.
In the example of
In this example, Country data values are placed in a column layout and include a list of vehicles if there are vehicles in the dataset for the specific country. For example, FR has no Scooter and therefore the layout may prevent a column for Scooter from being displayed. Furthermore, for each vehicle the number of pieces sold and rented may displayed. If there is no value of vehicles sold or rented, a null value may be displayed or the type (rental or sell) may be omitted. In the example of
To iterate on all Type values for each vehicle in each country orthogonal axes may be used. In this example, the visualization library engine may establish some orthogonal axes for the iteration on Type column as shown in
In this example, a new property xTabContext is added to all Bar layouts on which the orthogonalization is done. Here, the orthogonalization is done on the Vehicle subaxis (data.parent) but the distinct values will still be filtered by the Country context. In this example, the “Sell” distinct value is generated for FR/Bicycle as a result of the orthogonalization because France has the “Sell” value for the Motorcycle. However, the “Sell” distinct value is not created for ALL/Bicycle or All/Car because ALL has no “Sell” value for all vehicles in the Country and the orthogonalization is on Vehicle. In order to generate distinct values of Type column for vehicles inside ALL context the orthogonalization may be performed on the upper axis as shown in
In this case, the orthogonalization is done on the Country axis (data.parent.parent) and the result is all the Type distinct values for all countries are displayed in the visualization of
Just as all shapes included in the visualization library, the bar chart can be positioned relatively with the top, left, bottom and right properties of the visualization. The bar chart code defines various properties:
According to various aspects, customization of the properties may be used to customize the display of each bar in the Bar Chart as shown in the example of
In this example, the customization is done through the barContent property. This property can contain any visualization library content description, to render the bar with any other shapes available in the language. In the example of
Color may not be the only automatically created variable that can be used to customize each bar. Examples of other customizable variables are provided below.
In this example, instead of displaying the default bars, bottles are filled to a level depending on the measure of “Revenue”. In the example of
Referring to
The memory 640 may store computer readable instructions including the visualization library and the data query language of the visualization framework described herein. The processor 620 may execute instructions stored in the memory 640. For example, the instructions may include a method for generating a visualization based on the visualization library and the data query language. Based on the executed instructions, the processor 620 may select a layout for the visualization from among a plurality of layouts included in the visualization library. Here, the layout may define a format for how content is spatially organized within the respective layout. The processor 620 may bind the layout to a dataset using one or more functions from the data query language. For example, the functions may include a repeater function defining a number of times a display of the layout is to be repeated based on data from the dataset, examples of which are described above. Furthermore, the processor 620 may cause the display 630 to display the visualization including the layout displayed a number of times as defined by the repeater function. For example, the repeater function may repeat the layout a plurality of times based on data included in the dataset. In some examples, the repeater function repeats the layout a plurality of times equal to an amount of distinct values within a column of the dataset.
For example, the display layout may include one or more shapes defined by the visualization library, and the format of the layout may define how the shapes are spatially organized within the layout. As an example, the layout format may include at least one of a bar chart format, a column format, a circular format, a grid format, and a hierarchy format. The layout may include boundaries that are defined based on boundaries of the visualization. The dataset may include various data formats such as a spreadsheet having a plurality of columns and a plurality of rows of data. As another example, the dataset may include a .csv file, a .xls file, a word file, a document file, and the like.
The method further includes binding the layout to a dataset using one or more functions from the data query language, in 720. For example, the one or more functions may include a repeater function defining a number of times a display of the layout is to be repeated based on data from the dataset. Here, the dataset may include various data formats such as a spreadsheet having a plurality of columns and a plurality of rows of data. As another example, the dataset may include a .csv file, a .xls file, a word file, a document file, and the like. According to various aspects, the repeater function may repeat the layout a plurality of times based on data included in the dataset. For example, the repeater function may repeat the layout a plurality of times equal to an amount of distinct values within a column of the dataset. The method further includes displaying the visualization including the layout displayed a number of times as defined by the repeater function, in 730.
According to various example embodiments, provided herein is a visualization library and a data query language that may be used in conjunction with the visualization library to generate customized visualizations using shapes and/or formats that are defined by the visualization library. The data query language may bind the shapes and/or formats to data included in a dataset to generate visualizations distinguishing values from the dataset from one another.
Rectangle
Positioning Properties
Styling Properties
Content Property
Ellipse
Positioning Properties
Styling Properties
Content Property
Sector
Positioning Properties
Styling Properties
Content Property
Line
Positioning Properties
Styling Properties
Polyline
Positioning Properties
Styling Properties
Polygon
Positioning Properties
Styling Properties
Content Property
Path
Positioning Properties
Styling Properties
Content Property
Image
Positioning Properties
Styling Properties
SvgShape
Positioning Properties
Styling Properties
SvgShape2
Positioning Properties
Styling Properties
Content Property
Label
Positioning Properties
Styling Properties
Group
Positioning Properties
Styling Properties
Content Property
Include
Positioning Properties
Styling Properties
BarChart
Positioning Properties
BarChart Feeding Properties
BarChart Properties
BarChart Customization
BomChart
Positioning Properties
BomChart Feeding Properties
BomChart Properties
BomChart Customization
LineChart
Positioning Properties
LineChart Feeding Properties
LineChart Properties
BCGMatrixChart
Positioning Properties
BCGMatrixChart Feeding Properties
BCGMatrixChart Properties
distincts(colName1, . . . , colNameN)
This function's purpose is to create DataContexts on one or multiple columns.
A DataContext
colName: a string that is a column name
An array of DataContexts. Each DataContext points on a different distinct value (or tuple) of the column names passed as arguments.
This function's purpose is to retrieve the value of a specific column.
A DataContext or an array of DataContexts
colName (optional): a string that is a column name
This is an aggregation function, it returns the sum aggregation on a column or the sum of numeric elements of an array.
This is an aggregation function, it returns the maximum aggregation on a column or the maximum numeric value of an array.
colName (optional): a string that is a column name
This is an aggregation function, it returns the minimum aggregation on a column or the minimum numeric value of an array.
colName (optional): a string that is a column name
In Europe DataContext:
This is an aggregation function, it returns the average aggregation on a column or the average of numeric elements of an array.
This is an aggregation function, it returns the count aggregation on a column or the number of lines (or unique tuples for all other columns).
A DataContext or an array of DataContexts
colName (optional): a string that is a column name
This purpose of this function is to filter the current DataContext or to select the right DataContexts from an original array of DataContexts.
A DataContext or an array of DataContexts.
A formula which evaluates to a Boolean condition based on either dimension or measure but not both.
As for sort method dimensions are introduced via the value method and measures via any aggregate method (sum, max, min, avg)
This purpose of this function is to sort an array of DataContexts on dimensions or measures
An array of DataContexts
Several options are possible:
sort( )
or sort (sortOrder)
or sort (sortItemList)
or sort (sortItemList, sortOrder)
The ordered array of DataContexts on which it is applied to.
As will be appreciated based on the foregoing specification, the above-described examples of the disclosure may be implemented using computer programming or engineering techniques including computer software, firmware, hardware or any combination or subset thereof. Any such resulting program, having computer-readable code, may be embodied or provided within one or more non transitory computer-readable media, thereby making a computer program product, i.e., an article of manufacture, according to the discussed examples of the disclosure. For example, the non-transitory computer-readable media may be, but is not limited to, a fixed drive, diskette, optical disk, magnetic tape, flash memory, semiconductor memory such as read-only memory (ROM), and/or any transmitting/receiving medium such as the Internet or other communication network or link. The article of manufacture containing the computer code may be made and/or used by executing the code directly from one medium, by copying the code from one medium to another medium, or by transmitting the code over a network.
The computer programs (also referred to as programs, software, software applications, “apps”, or code) may include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine-readable medium” and “computer-readable medium” refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The “machine-readable medium” and “computer-readable medium,” however, do not include transitory signals. The term “machine-readable signal” refers to any signal that may be used to provide machine instructions and/or any other kind of data to a programmable processor.
The above descriptions and illustrations of processes herein should not be considered to imply a fixed order for performing the process steps. Rather, the process steps may be performed in any order that is practicable, including simultaneous performance of at least some steps. Although the disclosure has been described in connection with specific examples, it should be understood that various changes, substitutions, and alterations apparent to those skilled in the art can be made to the disclosed embodiments without departing from the spirit and scope of the invention as set forth in the appended claims.