1. Field of the Invention
The present invention relates to a user interface that enables users to graphically manipulate and analyze large datasets, where each dataset represents a dimensionally-modeled fact collection. More specifically, the present invention relates to a user interface that enables users to graphically group one or more multi-dimensional records from a large dataset into separate data groups, perform operations between two or more data groups, and graphically represent the results of the operations.
2. Background of the Invention
When interacting with and/or analyzing large datasets, where each dataset may contain a million or more multi-dimensional records, for example, it can be difficult, impractical, and even impossible for users to consider each multi-dimensional record and/or each single data value within the records individually. Instead, users often prefer to organize portions of the records into groups, perhaps based on some type of criteria. For example, a user may wish to group one portion of related records into one data group based on one type of criteria and another portion of related records into another data group based on a different type of criteria. Thereafter, the user may work with these data groups.
In order to organize portions of multi-dimensional records into data groups, users need a way to identify and/or select those records to be grouped together. One way is for users to manually go through the entire dataset, picking out each record of interest individually. However, this method may be very time consuming and impractical, especially when working with large datasets. It can be impractical and even impossible to display a million or more multi-dimensional records textually, such as in a spread sheet. And even if such large number of records could be displayed textually, it would be almost impossible for users to locate those records of particular interests in any reasonable amount of time. In addition, understanding the inter-relationships of these groups may be very difficult when the groups are displayed textually.
Accordingly, what is needed are systems and methods to address the above-identified problems.
Broadly speaking, the present invention relates to a user interface that enables users to graphically manipulate and analyze large datasets, where each dataset represents a dimensionally-modeled fact collection.
In one embodiment, a computer-implemented method of operating a user interface is provided, which comprises the following: receiving a graphical selection of a subset from a set of data points, each data point representing at least one record of a dimensionally-modeled fact collection; receiving a graphical manipulation of the selected subset of data points; defining at least one data group using the selected subset of data points and based on the graphical manipulation, wherein each data group comprises between 0 to n records represented by the selected subset of data points, wherein n is the total number of data points in the set of data points; and graphically representing the at least one data group.
In another embodiment, a computer-implemented method of operating a user interface is provided, which comprises the following: performing an operation on at least one data group, wherein each data group comprises between 0 to n records, each record represented by a data point, wherein n is the total number of records in a dimensionally-modeled fact collection, wherein each data point represents at least one record; and graphically representing a result of the operation.
These and other features, aspects, and advantages of the invention will be described in more detail below in the detailed description and in conjunction with the following figures.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
The present invention will now be described in detail with reference to a few preferred embodiments thereof as illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process steps and/or structures have not been described in detail in order to not unnecessarily obscure the present invention. In addition, while the invention will be described in conjunction with the particular embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. To the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims.
Businesses and other types of institutions or entities often collect factual-based data for various purposes, such as analyzing market trends, planning for business growth, conducting targeted advertisements, etc. For example, a business may collect various types of information about its customers, such as the customers' age, gender, spending habit, buying power, preferred products, etc. Alternatively, a business may collect factual data about individual business transactions. Often, the amount of factual data collected may be quite large. It is not unusual for a large dataset to contain one million or more multi-dimensional records, where each record represents a customer, a business transaction, an entity, etc. Each record may comprise multiple data values, where each data value represents a particular piece of factual information within the record.
For ease of use, the records in a dataset may be organized as, or otherwise accessible, according to a dimensional data model, such as a table. The following is a sample representation of such a table.
In the example shown in Table 1, each row of the table represents a single record, and in this case, each record is a customer, identified by a unique customer ID (as shown in the first column). Alternatively in another example, each record/row may be a business transaction or an entity. Each column of the table represents a different dimension of the records, such as a category or a type of data (e.g., age, gender, monthly income, etc.). Inside the cells of the table are the specific data values, each value representing a particular piece of factual information about the corresponding record (e.g., customer or transaction) in a corresponding dimension (e.g., category or characteristic), and a data value may either be a text, a number, or a combination of both. For example, customer A is aged 31, a male, located in California, and so on. The entire table is a collection of facts, and such collection of facts may be referred to as a dimensionally-modeled fact collection.
When working with such large datasets, it may be impractical, even impossible, to display all the multi-dimensional records textually. Instead, it can be more convenient to represent the records graphically in various formats. For example, a scatter plot may be used to graphically represent the records shown in Table 1, with each axis representing a particular dimension (column) and each data point representing a particular record (row). Users may then interact with the data points in the scatter plot graphically (e.g., using a mouse or other method to interact with the graphical display), such as creating and/or defining data groups that comprises subsets of the graphical data points and performing various types of operations and/or analysis on one or more of these data groups. In addition, the results of the operations and analysis may also be displayed in graphical formats, either with the data points or using separate graphical representations.
The inventors have realized that it would be useful to enable users to easily and quickly identify or select graphically displayed data points from a large master dataset to form data groups. In addition, users may desire to move or copy data points from one data group to another data group, add data points to a group, or remove data points from a group. It may also be useful to allow the visualization of each group dynamically as well as visualization of the interactions between the groups.
Using a scatter plot as an example, the axes may represent the dimensions (columns of a table) and the data points may represent the records (rows of a table). Additional graphical characteristics, such as color, size, shape, label, etc., may also be used to represent additional dimensions. The records may be displayed in raw format or in aggregated format. If the records are displayed in raw format, then each data point represents one record. If the records are displayed in aggregated format, then each data point represents multiple records aggregated together.
In order to allow more flexible visualization of the large dataset, in one embodiment, a default master group may be created initially that contains all the records in the dataset, and the records are represented by the data points with each data point representing at least one record. Data points representing these records may then be removed from the master group or copied into new groups. The master group allows the visualization to exclude member records of the other groups as well as show only those member records belonging to the other groups.
Once the records are displayed graphically, at 10, a user may interact with the display and cause one or more data groups to be created and/or defined, each data group containing a subset of the data points. In other words, each data group may contain anywhere between 0 and n data points, where n is the total number of data points in the master dataset. Furthermore, one data point may belong to multiple data groups. Recall that each data point represents a multi-dimensional record (row of the table), and thus, in effect, each data group comprises 0 or more records. For example, a user may select a subset of the data points and create a new data group. Alternatively, a user may select a subset of the data points and copy or move them into one or more existing data groups. More specifically, a computer operates based on indications of the user's actions with respect to the display to perform these operations. This step is described in more detail below in
At 120, the user may cause various types of analysis to be performed on the data groups, such as performing one or more set operations or statistical operations on one data group or between two or more data groups. The set operations may include the union of two or more groups, the intersection of two or more groups, the exclusion of two or more groups, the exclusion of one group from another group, etc. The statistical operations may include the histogram, mean, median, first quartile, etc. of a data group. Again, the computer actually performs these analysis and/or operations based on the user's input, selection, or control. The user may choose to cause any set operation to be performed on one or more of the data groups. In addition, the user may choose to cause various types of operations to be performed on individual data groups, such as determining the maximum or minimum value of the data points and/or the corresponding records in a particular data group, or calculating the mean value or histogram for the data points and/or the corresponding records in a data group.
At 130, the results of the set operations may be graphically represented in graphical formats, either with the data points or separately. Again, the actual graphical formats used to represent the results may vary depending on user preferences, and colors, sizes, shapes, and other graphical characteristics may be used to graphically distinguish types of operation results.
As will be understood, 100, 110, 120, and 130 may be implemented as a software program. For example, an existing graphical library, such as OpenGL or Java 3D, may be utilized in displaying the data points in various graphical formats and providing the necessary graphical and image functionalities. Data structures such as arrays, sets, or other data structures may be used to represent the records, data points, and/or data groups. The set operations are performed based on their respective mathematical definitions. For example, the result of a union operation between two data groups, group I and group 2, is a group that contains all the data points from either group 1 or group 2. The result of an intersection operation between two data groups, group 1 and group 2, is a group that contains only those data points that originally belong to both group I and group 2.
Next, at 201, the user may cause a new data group to be created with the selected data points of interest. Again, since each data point represents a multi-dimensional record, the user in effect has caused the corresponding records to be organized into a new group. The user may provide a unique name for the new data group so that the new data group may be identified and referred to easily in the future. Alternatively, if the user chooses not to provide a unique name for the new data group, the software may provide a default unique name for the new data group instead.
From an implementation point of view, assuming an array data structure is used to represent each individual data group, then a new array may be constructed to represent the newly created data group, and the selected data points are the elements of the array.
In another example,
In another example,
In another example,
There are additional ways for a user to define data groups. For example, a user may cause an existing data group to be deleted entirely, two or more existing groups to be combined, one group to be divided into multiple groups, etc. The user may cause these operations to be performed by the computer by taking the appropriate actions via a computer-implemented user interface that enables the user to work with the data points and data groups graphically. The actual design and implementation of such a user interface often depends on user preferences. The layout of the user interface may take into consideration the functionalities of the software as well as factors such as easy of use, aesthetics, robustness, etc.
To simply the description,
As described above, to select any data point 301, the user may click on the particular data point 301 of interest using a mouse. Alternatively, the user may drag the mouse over a group of data points 301 while holding down the left mouse button.
In addition to graphically selecting one or more data points, the user may cause data groups to be defined. The existing data groups may be listed. The user may choose to cause various set operations to be performed on one or more data groups.
Below the group listing are control components 420 that allow the user to define the data groups. The user may indicate what he or she desires to do by clicking on the appropriate control buttons. For example, once the user has selected some data points of interest, the user may click the “Create Group” button 421 to create a new group that contains the selected data points. Alternatively, the user may click the “Copy Data Points” button 422 to copy the selected data points into one or more groups.
Near the bottom is a list of available operations 430 that the user may perform on the data groups. For example, the user may click the “Union” button 431 to perform a union operation on two or more groups, or the “Intersection” button 432 to perform an intersection operation on two or more groups. Additional or different components may be included in different embodiments of the user interface depending on user preferences and to accommodate or handle different types of operations to be performed on the data groups.
In the sample user interface shown in
As described above, the results of the operations may also be displayed graphically.
In
The method described above in FIGS. 1 and 2A-2D may be carried out, for example, in a programmed computing system.
According to various embodiments, the data values that belong to large datasets may be stored in a database 614. The datasets may be accessed via the network using different methods, such as from computers 602, 603 connected to the network 612.
The software program implementing various embodiments may be executed on the server 608. Alternatively, the software program may be executed on the users' computers 602, 603. The graphical representation of the data points may be displayed on the users' computer screens, and the users may interact with the data points through the user interface provided by the software program.
While this invention has been described in terms of several preferred embodiments, there are alterations, permutations, and various substitute equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and apparatuses of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and various substitute equivalents as fall within the true spirit and scope of the present invention.