Various obstacles exist in order to analyze large volumes of multi-dimensional data. Queries on large multi-dimensional datasets typically are a relatively slow operation due to processes required on setup time, search time, and connection time. Further, most query specifications are limited to attributes and categories. As such, complex and multiple queries are required to query on a specific value in a transaction record. Further, traditional query results are written in pages or listings which can be difficult to interpret.
Prior data comparison techniques include simple graphical techniques, such as bar charts, pie charts, and x-y charts. These simple graphical techniques are easy to use but offer limited information when analyzing large amounts of business data. For example, simple bar charts or pie charts only show highly aggregated data. Drilldown techniques can be employed. Such techniques, however, merely allow users to drill down by attributes or categories.
In order to view specific content of a large multi-dimensional dataset, numerous and time-consuming queries are often necessary since query specifications are limited to attributes or categories of the dataset. Complex and multiple queries are required to query a specific value in a transaction record. Multiple queries often require manual evaluation of vast amounts of data. In some instances, the query results are written in multiple pages of search results or listings. Users are required to manually review the listings to find information on specific content or values in the data.
Exemplary embodiments in accordance with the present invention are directed to systems, methods, and apparatus for content queries that generate visualizations of multi-dimensional datasets. Embodiments in accordance with the invention provide systems and methods for generating fast graphical interface content queries for visually mining large multi-dimensional datasets. Users are thus able to make a content query through a graphical interface on certain attribute values and to get real-time visual feedback while analyzing data. Visualizations are provided as interactive graphical displays.
These embodiments are utilized with various systems and apparatus.
The system 10 includes a host computer system 20 and a repository, warehouse, or database 30. The host computer system 20 comprises a processing unit 50 (such as one or more processors of central processing units, CPUs) for controlling the overall operation of memory 60 (such as random access memory (RAM) for temporary data storage and read only memory (ROM) for permanent data storage) and a graphical display algorithm or content query algorithm 70 for generating visualizations for content queries in multi-dimensional datasets. The memory 60, for example, stores data, control programs, and other data associate with the host computer system 20. In some embodiments, the memory 60 stores the graphical display algorithm 70. The processing unit 50 communicates with memory 60, data base 30, graphical display algorithm 70, and many other components via buses 90.
Embodiments in accordance with the present invention are not limited to any particular type or number of data bases and/or host computer systems. The host computer system, for example, includes various portable and non-portable computers and/or electronic devices. Exemplary host computer systems include, but are not limited to, computers (portable and non-portable), servers, main frame computers, distributed computing devices, laptops, and other electronic devices and systems whether such devices and systems are portable or non-portable.
In order to generate fast visual content queries for visually mining large multi-dimensional datasets, exemplary embodiments in accordance with the invention utilize one or more of the following: (1) select a value from a transaction record and generate a content query to visually select an attribute content (value) from a transaction record (dataset); (2) use an association algorithm to associate all transaction records that include the attribute values matching the query contents; (3) store the query results in an associated hash map for quick retrieval; (4) partition and layout the associated hash map to a sequence of graphical displays. The use of real-time content queries, in-core association, and hash map partitioning and rendering shorten the entire processing time to generate interactive graphical results to queries. Users receive near instant feedback to queries, including graphically based queries. Further, the graphical displays themselves are interactive with the user such that queries are generated on or at the graphical display (example, at the location of a specific content or value).
In order to assess the data, users or analysts can have various questions, such as “Who is the top caller? How many calls did this particular caller make? To and from which countries did the caller call?” These and other queries are answered in real-time using, for example, one or more of the following visual query steps: (1) moving a pointer or input device to particular cell, pixel, or graph location; (2) displaying the content of a transaction record at a selected location; and (3) selecting one or more graphical contents in a record. Then, a sequence of graphical displays (example, bar charts, pixel bar charts, graphs, maps, etc.) is presented to the analyst to answer the above questions.
Looking to
According to block 210, a user (example, data analyst) selects one or more attribute contents (i.e., values) in the displayed dataset and a hash table is constructed. Selection is made with various inputs, such as a mouse, a pointer, cursor/arrow selection, etc. The selection is made at a value location to get the transaction record on which the user requests more information. For instance, a user moves the pointer to a data item to perform a content query, display a transaction record, select any attribute. By way of example, content queries are made on specific content or values and range from small values (such as single cells, items, people, names, addresses, countries, etc.) to large values (such as multiple cells, plural items, etc.). This selection process is illustrated in connection with
Pixels in the graphical illustration represent a data item that enables the visualization of large volumes of data. The color of a pixel represents the value of a data item shown at an adjacent color map. A consistent or common scale 460 (such as a color scale) is used through the various graphs. The color scale 460 represents the scale for data across various layers or graphs. A color scale can be generated with a variety of symbols, letter, markings, colors, indicia, etc. Further, the scale is divided into a continuous plurality of ranges wherein each range has a different visual identification or marking, such as a different color. As shown, range 460A includes values of $0-$1; range 460B includes values of above $1-$5; range 460C includes values of above $5-$10; range 460D includes values of above $10-$20; range 460E includes values of above $20-$100; and range 460F includes values above $100-$500. Further, each color scale is a continuous color map for each data item. For example, $0-$1 contains data items for dollar values $0.01, 0.02, 0.03 etc. (with each dollar amount within the range having its own shading of color).
In one embodiment, the graphs are constructed and presented in a pixel-oriented layout to enable quick access or instant-drilldown to content or value queries. A pixel, for example, can represent a data record. Further, the color of a pixel can represent the value of a data item. In some exemplary embodiments, the graphs are pixel bar charts. The pixel are arranged from left to right and bottom to top based on the value data items in a bar. In the graphs, the X-axis represents a business group; the Y-axis represents a number of transactions; and color represents a dollar amount for a transaction (telephone call). Of course, representations in the X and Y axes could be switched or altered and still be within the scope of embodiments in accordance with the invention.
Each pixel can be arranged in a variety of ways. For example, pixels are arranged from bottom to top and left to right in each bar. Further, information (such as the employee who placed the telephone call) are encoded in each pixel and represented, for example, as a color or other graphical representation. As one example, each separate transaction (example, telephone call) in each layer or graph is represented with a pixel. The amount of time for each transaction is encoded into the pixel. Further, the color of the pixel correlates to the color scale 460. Thus, business groups with higher bars in the graph have more pixels (i.e., more transactions or telephone calls). Likewise, more red or darker pixels in the bar indicate more expensive telephone calls. A user can also “click” or otherwise activate any individual pixel and get specific information or data regarding the individual transaction.
Looking to
As used herein, a “visual content query” is request for information stored in a database, wherein the request is on content that is being visually displayed or presented.
Content query 520 is one option or menu selection in pull-down menu 510. Selection of content query 520 generates a second pull-down menu 530 on the graph 400. The pull-down menu 530 provides a plurality of different subjects or topics for a content query on data record 470. For example, a user can select one or more of different values associated with Business Group, Cost, Duration, etc. For discussion purposes, a selection is made for a particular employee 550 (“EmployeeID=81352”). Selection of this specific content value indicates that the user requests a content query for a selected content of a category in the dataset. Issuance of this selection is represented at block 220 in
In one exemplary embodiment, an association algorithm is used to associate all transaction records having attribute values that match the specified content query. The association algorithm associates all data to the specific content query. In the example of
The hash is generally smaller than the data itself and is generated by a formula. A hash function H, for example, is a transformation that takes an input “m” and returns a fixed-size string, called a hash value “h” (such that h=H(m)). Hashed data is computationally quicker to process than un-hashed data. Further, hashes provide efficient data structures for lookup and comparison (example, display in pixel bar charts).
Once the data is hashed, the associated hash map is partitioned and rendered to a sequence or plurality of visualizations, such as graphical displays (see block 230 of
In one exemplary embodiment, once the data is hashed, the associated hash map is partitioned and rendered to generate various outputs. Examples of static outputs include, but are not limited to, spreadsheets, pivot tables, JPEG (Joint Photographic Experts Group) files, and other print-outs or formats for over-night reporting. The non-static outputs include, but are not limited to, interactive outputs, such as interactive graphs (pixel bar charts, bar charts, spreadsheets, and other “clickable” formats).
Interactive graphs and displays also support drilldown on pixels to get detail records. The term “drilldown” or “drill down” (or variations thereof) is used when referring to moving down through a hierarchy of folders and/or files in a file system like that of Windows. The term may also mean clicking, selecting, and/or navigating through a series of dropdown menus or graphical illustrations in a graphical user interface. Drilldown layers, for example, allow the user to explore the graphical illustration in a hierarchical manner by pointing, clicking, and/or selecting on the part of the graphical illustration where more detail is needed but illustrate an exemplary embodiment for discussion.
Embodiments in accordance with the invention automatically export the content query results to an interactive graphic or display and/or a static graphic or display. Further, exemplary embodiments enable users to issue and receive real-time content queries on a specified value and quickly visualize the results in a sequence of displays. Queries are not limited to attributes, dimensions, or categories but include specific values in a transactional record (such as a specific data value displayed in a dataset). The query results are then graphically displayed to the user.
In one exemplary embodiment, the flow diagrams are automated. In other words, apparatus, systems, and methods occur automatically. As used herein, the terms “automated” or “automatically” (and like variations thereof) mean controlled operation of an apparatus, system, and/or process using computers and/or mechanical/electrical devices without the necessity of human intervention, observation, effort and/or decision.
The flow diagrams in accordance with exemplary embodiments of the present invention are provided as examples and should not be construed to limit other embodiments within the scope of the invention. For instance, the blocks should not be construed as steps that must proceed in a particular order. Additional blocks/steps may be added, some blocks/steps removed, or the order of the blocks/steps altered and still be within the scope of the invention. Further, specific numerical data values (such as specific quantities, numbers, categories, etc.) or other specific information should be interpreted as illustrative for discussing exemplary embodiments. Such specific information is not provided to limit the invention. Further, although bars are used in the graphs, other graphical illustrations can also be used.
In an exemplary embodiment, the present invention is directed to visual data comparison techniques for quickly and easily comparing, analyzing, and/or revealing information in large amounts of data. Embodiments in accordance with the invention, for example, are utilized to visualize valuable information concealed in vast amounts of data, such as business data.
Further, it should be noted that the display of graphical results is not limited to single illustrations. In other words, multiple illustrations can simultaneously be displayed to the user. For example, the graphical illustrations of
In the various embodiments in accordance with the present invention, embodiments are implemented as a method, system, and/or apparatus. As one example, exemplary embodiments are implemented as one or more computer software programs to implement the methods described herein. The software is implemented as one or more modules (also referred to as code subroutines, or “objects” in object-oriented programming). The location of the software (whether on the host computer system of
The above discussion is meant to be illustrative of the principles and various embodiments of the present invention. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Number | Name | Date | Kind |
---|---|---|---|
3487308 | Johnson | Dec 1969 | A |
5581797 | Baker et al. | Dec 1996 | A |
5588117 | Karp et al. | Dec 1996 | A |
5608904 | Chaudhuri et al. | Mar 1997 | A |
5623590 | Becker et al. | Apr 1997 | A |
5623598 | Voigt et al. | Apr 1997 | A |
5634133 | Kelley | May 1997 | A |
5659768 | Forbes et al. | Aug 1997 | A |
5694591 | Du et al. | Dec 1997 | A |
5742778 | Hao et al. | Apr 1998 | A |
5757356 | Takeasaki et al. | May 1998 | A |
5801688 | Mead et al. | Sep 1998 | A |
5828866 | Hao et al. | Oct 1998 | A |
5844553 | Hao et al. | Dec 1998 | A |
5878206 | Chen et al. | Mar 1999 | A |
5903891 | Chen et al. | May 1999 | A |
5924103 | Ahmed et al. | Jul 1999 | A |
5929863 | Tabei et al. | Jul 1999 | A |
5940839 | Chen et al. | Aug 1999 | A |
5986673 | Martz | Nov 1999 | A |
5999193 | Conley, Jr. et al. | Dec 1999 | A |
6052890 | Malagrino, Jr. et al. | Apr 2000 | A |
6097399 | Bhatt | Aug 2000 | A |
6115027 | Hao et al. | Sep 2000 | A |
6144379 | Bertram et al. | Nov 2000 | A |
6211880 | Impink, Jr. | Apr 2001 | B1 |
6211887 | Meier et al. | Apr 2001 | B1 |
6269325 | Lee et al. | Jul 2001 | B1 |
6314453 | Hao et al. | Nov 2001 | B1 |
6377287 | Hao et al. | Apr 2002 | B1 |
6400366 | Davies et al. | Jun 2002 | B1 |
6429868 | Dehner, Jr. et al. | Aug 2002 | B1 |
6466946 | Mishra et al. | Oct 2002 | B1 |
6502091 | Chundi et al. | Dec 2002 | B1 |
6581068 | Bensoussan | Jun 2003 | B1 |
6584433 | Zhang et al. | Jun 2003 | B1 |
6590577 | Yonts | Jul 2003 | B1 |
6603477 | Tittle | Aug 2003 | B1 |
6658358 | Hao et al. | Dec 2003 | B2 |
6684206 | Chen et al. | Jan 2004 | B2 |
6727926 | Utsuki et al. | Apr 2004 | B1 |
6934578 | Ramseth | Aug 2005 | B2 |
7020869 | Abrari et al. | Mar 2006 | B2 |
7202868 | Hao | Apr 2007 | B2 |
7218325 | Buck | May 2007 | B1 |
7221474 | Hao et al. | May 2007 | B2 |
7313533 | Chang et al. | Dec 2007 | B2 |
7567250 | Hao et al. | Jul 2009 | B2 |
7714876 | Hao | May 2010 | B1 |
20020118193 | Halstead, Jr. | Aug 2002 | A1 |
20020171646 | Kandogan | Nov 2002 | A1 |
20030065546 | Goruer et al. | Apr 2003 | A1 |
20030071815 | Hao et al. | Apr 2003 | A1 |
20030187716 | Lee | Oct 2003 | A1 |
20030221005 | Betge-Brezetz et al. | Nov 2003 | A1 |
20040051721 | Ramseth | Mar 2004 | A1 |
20040054294 | Ramseth | Mar 2004 | A1 |
20040054295 | Ramseth | Mar 2004 | A1 |
20040168115 | Bauernschmidt | Aug 2004 | A1 |
20040205450 | Hao et al. | Oct 2004 | A1 |
20040210540 | Israel et al. | Oct 2004 | A1 |
20040252128 | Hao et al. | Dec 2004 | A1 |
20050038784 | Zait | Feb 2005 | A1 |
20050066026 | Chen et al. | Mar 2005 | A1 |
20050088441 | Hao et al. | Apr 2005 | A1 |
20050119932 | Hao | Jun 2005 | A1 |
20050177598 | Hao et al. | Aug 2005 | A1 |
20050219262 | Hao et al. | Oct 2005 | A1 |
20060095858 | Hao et al. | May 2006 | A1 |
20060116989 | Bellamkonda | Jun 2006 | A1 |
20070203902 | Bauerle et al. | Aug 2007 | A1 |
20070225986 | Bowe, Jr. et al. | Sep 2007 | A1 |
20080180382 | Hao | Jul 2008 | A1 |
20090033664 | Hao et al. | Feb 2009 | A1 |
Number | Date | Country |
---|---|---|
0778001 | Nov 1996 | EP |
Entry |
---|
D. Keim, M.C. Hao; U. Dayal; “Hierarchical pixel bar charts”, IEEE Trans. on Visualization and Computer Graphics, vol. 8, No. 3, Jul.-Sep. 2002, pp. 255-269. |
D.A. Keim, M. Hao, U. Dayal, and M. Hsu, “Pixel Bar Charts: A Visualization Technique for Very Large Multi-Attribute Data Sets,” Information Visualization J., vol. 1, No. 1, Mar. 2002, pp. 20-34. |
Deun et al., Multidimensional Scaling, Open and Distance Learning, Jan. 12, 2000 (pp. 1-16). |
http://www.pavis.org/essay/multidimensional—scaling.html, 2001 Wojciech Basalaj, (pp. 1-30). |
D. Keim et al Pixel Bar Charts: A New Technique for Visualization Large Multi-Attribute Data Sets with Aggregation:, HP Technical Report, Apr. 2001, pp. 1-10. |
M. Ankerst et al “Towards an effective cooperation of the computer and the computer user for classification, Proc. 6th Int. Conf. on Knowledge Discovery and Data Mining ,” (KDD'2000), Aug. 20-23, 2000, Boston, MA, 2000, pp. 1-10. |
M.C. Hao et al “Visual Mining of E-customer Behavior Using Pixel Bar Charts,”, HP Technical Report, Jun. 20, 2001, pp. 1-7. |
B. Shneiderman, “Tree Visualization with Treemaps: a 2-D Space-Filling Approach”, pp. 1-10, Jun. 1991. |
Daniel Keim et al “Designing Pixel-Orientated Visualization Techniques: Theory and Applications” IEEE Transactions on Visualization and Computer Graphics, vol. 6, No. 1, Jan.-Mar. 2000, pp. 59-78. |