The present invention relates to a graphics display technique for generating a graphics image of hierarchical data.
With the widespread use of computer-based database system, various approaches to data mining systems for extracting desired information from vast amounts of data have been proposed.
The following documents are considered:
[Non-Patent Document 1]
[Non-Patent Document 2]
[Non-Patent Document 3]
[Non-Patent Document 4]
[Non-Patent Document 5]
For example, some text mining systems for document files such as research paper files have the capability of finding insights contained in a large number of documents by using category information, words, and modification relations between words in the documents (see non-patent document 1, for example).
For example, the United States National Library of Medicine stores 11,000,000 biomedical research papers (as of September 2002). The library defines a category system called MeSHTerm and a label is assigned to each paper to indicate which category the paper belongs to. The labels can be used for searches. More than one category is assigned to one document. This category system has a huge hierarchical structure including as many as 38,000 nodes in total (as of September 2002).
A text mining system called IBM TAKMI for biomedical documents (abbreviated to MedTAKMI system) described in non-patent document 1 provides an analysis function for such hierarchical structures. In this system, by specifying one node (category) in a tree-structured category system, all the documents in the category system, including documents in all descendant node categories of that node can be aggregated and analyzed.
Various other technologies have been proposed that display such a data constellation (hereinafter referred to as hierarchical data) in graphical form in which a number of data elements are organized into a hierarchical structure (see non-patent documents 2 to 5, for example).
A prior-art approach using a Hyperbolic Tree method disclosed in non-patent document 2 arranges a tree structure in a hyperbolic space to represent both a hierarchical structure of data and a link structure among date elements.
Another prior-art approach using Treemap method disclosed in non-patent document 3 splits a screen space on which hierarchical data is to be displayed into regions in alternating horizontal and vertical directions and associates each of the regions with each data element, thereby representing a hierarchical structure of the data.
In prior-art graphics display technologies disclosed in non-patent documents 4 and 5, icons of data at the lowest level are enclosed in a graphic such as a rectangle, then a graphic enclosing a cluster of such graphics is created to represent a higher level, another graphic enclosing the graphics at the higher level is created, and this process is repeated until the highest level is reached to arrange data in the screen space.
As described above, data mining systems have the capability of analyzing vast amounts of data to obtain insights contained in the data. However, if the size of a database analyzed is large, a difficulty arises in how information obtained should be presented to a user.
For example, most researchers (analysts) who analyze the research paper database in the United States National Library of Medicine by performing text mining by means of the text mining system IBM TAKMI for biomedical documents described know categories to research because the category system defined by the library is in the public domain. Most analysts analyze only the categories familiar to them because it is difficult for them to search the database to find nodes that contain insights of interest from among as many as 40,000 nodes or to investigate all nodes by following related nodes.
From the viewpoint of data mining, however, it is desirable that, if any other categories include noteworthy insights, the insights should be presented to researchers. For example, if an analyst searches first a category familiar to him or her for information that relates to a number of categories, he or she may not notice the other categories that relates to the information. To avoid this, it is desirable to provide a function that indicates to a user with which node in a category system of a database the user should start analysis and provides the user with an overview of the whole category system of the database.
The graphics display technologies described above can provide an at-a-glance output in which a user can see data elements of a hierarchical structure. However, the output is not always easy to look at if it displays all of thousands of tens of thousands data elements directly.
The prior art disclosed in non-patent document 2 that uses the Hyperbolic Tree method locates lowest-level data elements at the edge of a radial tree structure. Therefore, it is difficult to display thousands or tens of thousands data elements.
The prior art disclosed in non-patent document 3 that uses the Treemap method provides a graphics display method suitable for relatively large scale hierarchical data. However, when thousands or tens of thousands data elements were displayed, a unit display area mapped to each data element would be so small that the visibility decreases.
The prior art disclosed in non-patent documents 4 and 5 provides the capability of displaying a bar graph representing the attributes of data on a graphic associated with each data element. With this capability, when a piece of information that relates to a number of categories is used as search criteria, bars associated with particular data elements in the categories project from the graphics around that information. Thus, the relation between the information used as the search criteria and the data elements can readily be known. However, if bars are displayed for thousands or tens of thousands data elements, the image is crowded with the bars, degrading the visibility.
In light of these problems, an aspect of the present invention is to provide a graphics display system and method for effectively presenting information obtained by data mining.
Another aspect of the present invention is to improve the visibility of the display of each individual data element and attributes of data included in a particular category while allowing an overview of whole large-scale hierarchical data to be provided.
The present invention achieves these aspects implemented as a graphics image generation apparatus which visualizes a hierarchical structure of hierarchical data and presenting the visualized data and is configured as follows. The graphics image generation apparatus includes an aggregation unit for performing aggregation of attributes of nodes in the hierarchical data according to given aggregation criteria; a filtering unit for filtering the result of aggregation performed by the aggregation unit according to given filtering criteria to select nodes to be displayed from the hierarchical data; and a visualization unit for generating a graphics image that includes the nodes selected by the filtering unit and reflects the hierarchical structure of the hierarchical data. More specifically, the aggregation unit obtains an aggregate value for a given node in the hierarchical data, the aggregate value being the result of aggregation of an attribute of the given node, and obtains a summarized aggregate value by summarizing aggregate values for descendant nodes of the given node into the aggregate value of the given node. The filtering unit replaces an aggregate value of a node that is determined as being ineligible to be displayed according to the given filtering criteria with the summarized aggregate value and determines whether or not the node is to be displayed.
Furthermore, the present invention is also implemented as a graphics image generation method including the steps performed by the aggregation unit, filtering unit, and visualization unit described above, or as a data analysis method including the steps performed by the aggregation unit, filtering unit, and analysis result output unit described above.
Moreover, the present invention is also implemented as a program for controlling a computer to function as the graphics image generation apparatus or data analysis apparatus described above. The program can be provided by storing in a magnetic disk, optical disk, semiconductor memory, or other recording media and delivering the medium, or by distributing over a network.
Furthermore, according to the present invention, a graphics image can be output in which results of data analysis conducted based on a given aggregation or filtering criteria is properly reflected, therefore information obtained by data mining can effectively presented.
These and other aspects, features, and advantages of the present invention will become apparent upon further consideration of the following detailed description of the invention when read in conjunction with the drawing figures, in which:
Description of Symbols
The present invention provides a graphics display apparatus, system and method for effectively presenting information obtained by data mining. It also improves the visibility of the display of each individual data element and attributes of data included in a particular category while allowing an overview of whole large-scale hierarchical data to be provided.
In an example embodiment, the present invention is implemented as a graphics image generation apparatus which visualizes a hierarchical structure of hierarchical data and presenting the visualized data and is configured as follows. The graphics image generation apparatus includes an aggregation unit for performing aggregation of attributes of nodes in the hierarchical data according to given aggregation criteria; a filtering unit for filtering the result of aggregation performed by the aggregation unit according to given filtering criteria to select nodes to be displayed from the hierarchical data; and a visualization unit for generating a graphics image that includes the nodes selected by the filtering unit and reflects the hierarchical structure of the hierarchical data. More specifically, the aggregation unit obtains an aggregate value for a given node in the hierarchical data, the aggregate value being the result of aggregation of an attribute of the given node, and obtains a summarized aggregate value by summarizing aggregate values for descendant nodes of the given node into the aggregate value of the given node. The filtering unit replaces an aggregate value of a node that is determined as being ineligible to be displayed according to the given filtering criteria with the summarized aggregate value and determines whether or not the node is to be displayed.
The filtering unit determines, on the basis of the degree of meeting the aggregation criteria in the aggregation unit, the order in which determination is made on nodes in the hierarchical data as to whether the nodes are to be displayed by using the summarized aggregate value.
The present invention is also implemented as a data analysis apparatus that analyzes a set of data stored in a database and is configured as follows. The data analysis apparatus includes an aggregation unit for aggregating, according to given aggregation criteria, attributes of data classified in a give category system having a hierarchical structure; a filtering unit for filtering the category system according to given filtering criteria by using the result of aggregation by the aggregation unit to select valid categories according to the filtering criteria; and an analysis result output unit for generating and displaying a graphics image that includes the valid categories selected by the filtering unit and represents attributes of data included in the valid categories with visual elements.
More specifically, the aggregation unit aggregation unit obtains an aggregate value for a give category in the category system, the aggregate value being the result of aggregation of an attribute included only in the category, and obtains a summarized aggregate value by summarizing aggregate values of attributes of data included in a lower-level category of the category. The filtering unit replaces an aggregate value of a category that is determined as being invalid according to the given filtering criteria with the summarized aggregate value and determines whether or not the category is valid.
The data analysis apparatus may further include an event extraction unit for extracting a given input operation on the visual element of the graphics image displayed by the visualization unit as an event for specifying a category including data corresponding to the visual element. In this case, the filtering unit performs filtering according to information indicating the specification of the category that has been extracted by said event extraction unit.
Furthermore, the present invention is also implemented as a graphics image generation method including the steps performed by the aggregation unit, filtering unit, and visualization unit described above, or as a data analysis method including the steps performed by the aggregation unit, filtering unit, and analysis result output unit described above.
Moreover, the present invention is also implemented as a program for controlling a computer to function as the graphics image generation apparatus or data analysis apparatus described above. The program can be provided by storing in a magnetic disk, optical disk, semiconductor memory, or other recording media and delivering the medium, or by distributing over a network.
According to the present invention configured as described above, the combination of filtering technology for hierarchical data in data analysis and data visualization technology allows a graphics image to be generated that displays data elements and attributes of a particular category with high visibility while providing an overview of the whole large-scale hierarchical data.
Furthermore, according to the present invention, a graphics image can be output in which results of data analysis conducted based on a given aggregation or filtering criteria is properly reflected, therefore information obtained by data mining can effectively presented.
An advantageous embodiment for carrying out the present invention (hereinafter referred to as an embodiment) will be detailed below with reference to the accompanying drawings. An overview of the present invention will be given first. The present invention uses a computer system to analyze hierarchical data and generate a graphics image that visually represents the results of the analysis. While any types of graphics images that can represent hierarchical data can be used, an approach will be used in the embodiment described below in which a hierarchical structure is represented in two-dimensional form by a set of nested areas that represent levels.
A nested graphics image is generated as follows. First, lowest-level data elements are located in a space in which the graphics image is to be generated (a space displayed on a display device; hereinafter referred to as a display space). Then, an area is created that encloses a set of data elements to represent the higher level immediately above. The areas thus generated are rearranged in the display space and a larger area enclosing the set of areas is generated to represent the higher level immediately above. This process is repeated recursively until the highest level of the hierarchical data is represented.
In other words, a graphics image of hierarchical data is generated by placing levels of data in order from lowest to highest.
The processor 11 is controlled by a program stored in the main memory 12 to read hierarchical data to be processed from the storage device 15, generate a graphics image (image data) of the hierarchical data, and store it in the video memory 13. The graphics image store in the video memory 13 is displayed on the display unit 14. The main memory 12 is also used as a stack for temporarily holding cells and clusters in the course of generation of a graphics image by the processor 11, which will be described later. On the other hand, programs and data stored in the main memory 12 can be saved in the storage device 15 as required.
Shown in
Hierarchical data can be either individual pieces of real data (for example, individual research papers in research paper data base of the U.S. National Library of Medicine) or a category system (for example MeSHTerm in the document database of the U.S. National Library of Medicine) that defines a hierarchical structure. In this embodiment, a category system is treated as data to be displayed as a graphics image (hereinafter a category system, namely hierarchical data to be displayed is referred to as a category hierarchy).
As shown in
The aggregation unit 100 may be implemented by the processor 11 of the computer system 10 shown in
In the present embodiment, pieces of real data that match given criteria in each category are aggregated to obtain an aggregated value. In addition, the pieces of real data that meet the aggregation criteria in the categories below each category whose aggregate value has been calculated (the descendant nodes of the node corresponding to each category whose aggregate value has been calculated) are also aggregated to obtain an aggregation result (hereinafter called a summarized aggregate value). The aggregation results (aggregate values and summarized aggregate value), which represent attributes of the nodes, obtained in this way are stored in the main memory 12 or the storage device 15 shown in
The filtering unit 200 may be implemented by the processor 11 of the computer system 10 shown in
In the present embodiment, given to each node of hierarchical data is an aggregate value of the category corresponding to that node and, in addition, a summarized value of the aggregation values of its descendent nodes, as described above. Therefore, if a given node is not displayed as a result of filtering but its higher level node is displayed, an attribute of the given node can be reflected in the display of its higher level node.
There are various methods for generating a graphics image of hierarchical data, including a method using the visualization unit 300, which will be described later. One method is to display a bar graph representing attributes of nodes. That is, a bar that stands on a node is drawn. The height, shape, or color represents attributes of the node of a category corresponding to a cell (In the following description, an example will be described in which the height and color of a bar are used to represent attributes). For example, the height of a bar can represent the number of document files in each category in the document database of the U.S. National Library of Medicine and the color of the bar can represent the relative frequency for IBM TAKMI for biomedical documents. A relative frequency is a value obtained by dividing the occurrence ratio of a keyword in extracted document files by the occurrence ratio of the words in all document files. The relative frequency can be used as an indicator of how strong the keyword correlates with criteria.
Prior to filtering, the filtering unit 200 determines the height and color of a bar representing attributes of the node at each category based on the result of aggregation by the aggregation unit 100. Also, filtering criteria (an attribute and threshold) for filtering are input into the filtering unit 200. The filtering criteria, which will be display parameters used in generating a graphics image, may specifies the height of bars (in which case, bard higher than or equal to the specified height will be displayed) or the color of bars (in which case, bars of the specified color will be displayed), or a numerical value corresponding to an aggregate value.
Then, the filtering unit 200 replaces the aggregate value of the attributes of the nodes that have been determined as not eligible to be displayed at step 401 with a summarized aggregate value including aggregate values of attributes of its descendent nodes (step 402). The filtering unit 200 then determines, for each of the nodes determined as not eligible to be displayed, whether or not all of its descendent nodes, down to the leaf (the end or lowest-level node), are ineligible to be displayed (step S403). If all nodes below a given node are ineligible to be displayed, determination is made as to whether or not the attribute value (summarized aggregate value) at the given node exceed the filtering criteria to determine whether or not the given node should be displayed or not (step S404). The results of the filtering thus obtained (display node information) are stored in the main memory 12 or the storage device 15 shown in
Thus, the filtering makes it possible that only the nodes of the hierarchical data a given attribute of which exceeds a predetermined threshold will be displayed on a graphics image. The attribute of a displayed node reflects the attributes of the lower-level nodes below the nodes, as appropriate. That is, even if the aggregate value for a given attribute of a given category and all the categories below it is too low to exceed the threshold of filtering criteria, the given category will be displayed on a graphics image, provided that the aggregate value resulting from summarization of the aggregated values of them exceeds the threshold of the filtering criteria through the filtering.
Suppose that there are categories such as “Muscle pain in the leg” and “Muscle pain in the back” below the category “Muscle pain” in the category system of the U.S. National Library of Medicine. If the number of research papers that meets given aggregation criteria in each of the lower level categories “Muscle pain in the leg” and “Muscle pain in the back” does not exceed the threshold of filtering criteria but the number of research papers included in the higher category “Muscle pain” (namely the number of all the research papers that belong to the lower-level categories) exceeds the threshold of the filtering, the category “Muscle pain” will be displayed as a node (cell) on an graphics image.
Now notice the nodes enclosed in dashed-line box. Node 5a of the three nodes, 5a, 5b, and 5c, is the node one level higher than nodes 5b and 5c. The categories at nodes 5b and 5c are included in the category at node 5a. An aggregate value indicating the relation between aggregation criteria at node 5b and 5c is contained in each node and node 5a contains an aggregate value relating to data that is not contained in node 5b and 5c. If the values at these nodes are evaluated as they are, none of nodes 5a, 5b, and 5c meet the filtering criteria. However, the aggregate values at nodes 5b and 5c should be taken into account when the aggregate value at node 5a is evaluated because node 5a is the higher-level node of nodes 5b and 5c.
In the present embodiment, therefore, provided to a higher-level node of a hierarchical structure is, in addition to its own aggregate value, a summarized aggregate value obtained by summing up aggregate values at its lower-level nodes, as described earlier. If all of the lower-level nodes are determined as being ineligible to be displayed, then determination is made as to whether the summarized aggregate value at the higher-level node exceeds filtering criteria.
A visualization unit 300 is implemented by the processor 11 and storage device 15 in the computer system 10 shown in
As described earlier, a graphics image of hierarchical data (category hierarchy) is generated by nesting areas representing levels. In the graphics image, each data element (a category corresponding to a node at the lowest level displayed) in the hierarchical data is called a cell. The cells are represented by squares of the same size.
A node at a higher level (higher-level category) that represents a category to which data elements belong is called a cluster and is represented by a rectangle that encloses cells and lower-level clusters. That is, the graphics image is made up of one or more rectangular clusters and square cells arranged in the cluster or clusters. However, cells and clusters are nested simple square and rectangular graphics at the stage where the graphics image is being generated, and therefore can be treated in the same manner. Therefore, unless there is necessity to distinguish between cells and clusters in the following description, cells and clusters are generically called nodes. While the present embodiment will be described by mainly using an example in which rectangular clusters, which can take various sizes, are arranged, the same description applies to a case where cells are arranged if the cells are not limited to the same size, because cells and clusters are treated in the same way, as described above.
In the configuration shown in
In the present example embodiment, when a number of graphics images of the same hierarchical data are generated with different aggregate criteria or filtering criteria, the first or previous graphics data generated (hereinafter referred to as an original image) can be used as a template for rearranging nodes to enhance the visibility of each individual nodes or bar graph. In particular, the position of each cell placed in the original image is expressed by coordinates and nodes making up a new graphics image are placed based on the coordinates. Thus, when different graphics images based on different aggregation criteria or filtering criteria or nodes are rearranged, corresponding nodes can be placed in the positions same as or as closest to their original position as possible.
The sorting unit 310 first normalizes the coordinates of the four vertices of an arrangement area (having the same size as the template) in which nodes are to be placed as: (−1, −1), (1, −1), (1, 1) and (−1, 1). This normalization is shown in
The node arrangement unit 320 places nodes (clusters or cells) of hierarchical data in a display space in the order sorted by the sorting unit 310. The position in which a node is placed is determined based on the following criteria:
A position that surely meets [Criterion 1] and satisfies [Criteria 2 and 3] as much as possible is located. In the present embodiment, a position that provides the smallest aD+bS (where a and b are constants defined by a user) is considered as the position that best meets criteria 2 and 3. By setting constants a and b as appropriate, priorities can be assigned to [Criteria 2 and 3].
If there is no template, the node arrangement unit 320 in the present embodiment arranges rectangles in the display space in the order sorted by the sorting unit 310 by following the following policy:
In the present embodiment, in order to quickly find a gap in which a rectangle can be placed as required by the policy (2), a triangular mesh that connects the center points of rectangles is used. The triangular mesh should meet the Delaunay condition.
Furthermore, when the node arrangement unit 320 places a cluster of a given level, the clusters and cells at the lower levels below that cluster have already been arranged because the nodes in hierarchical data are arranged in the order from lowest to highest level in the present embodiment. Therefore, when placing a given cluster, the node arrangement unit 320 stores the physical relation of the rectangular representing that cluster to lower-level rectangles or squares previously placed. That is, the node arrangement unit 320 treats these graphics as one graphic and performs arrangement.
The arrangement control unit 330 causes arrangement of clusters or cells by the sorting unit 310 and the node arrangement unit 320 on a level by level basis to be recursively performed, starting from the lowest level of the hierarchical data, thereby generating a graphics image of the entire hierarchical data. The graphics image generated is stored in the video memory 13 shown in
According to the present embodiment, filtering by the filtering unit 200 can control which nodes are displayed on a graphic image. Nodes to be displayed can dynamically be changed by changing filtering criteria, which is a display parameter. When filtering is performed with changed filtering criteria in the filtering unit 200, the arrangement control unit 330 controls the sorting unit 310 and the node arrangement unit 320 to rearrange nodes and regenerate a graphics image.
The bar graph generation unit 340 generates on a cell (a node at the lowest level) arranged by the arrangement control unit 330 a bar graph representing an attribute of the node of the category corresponding to that cell based on the result of aggregation by the aggregation unit 100 and the result of filtering by the filtering unit 200. As described earlier, a display property such as the height or color of a bar is associated with an attribute of the node. The bar graph is displayed on the cell when a graphics image is displayed.
According to the present embodiment, when a display configuration of a graphics image is dynamically changed by filtering of the filtering unit 200, a bar graph can be reshaped as appropriate. Reshaping of a bar graph will be detailed later.
The template holding unit 350, which may be implemented by the storage device 15 of the computer system 10 shown in
A process for generation a graphics image of hierarchical data in the configuration described above will be described below.
How the position at which a rectangle (cluster) should be placed is located using a triangular mesh will be described first. In the present embodiment, if a number of rectangles have been placed, an area that is not crowded with rectangles is found and the next rectangle is placed there. This process is repeated to arrange rectangles in a small space. In order to extract an uncrowded area, a triangular mesh that connects the center points of the previously placed rectangles is generated in an area in which the rectangle is to be placed. An area in the triangular mesh in which a large rectangular element is generated is likely to be uncrowded. Therefore, placement of a new rectangle in the area is tried. Criteria for determining the size of a rectangular element may be the radius of the circle circumscribing the triangular element, the radius of the circle inscribed in the triangular element, or the maximum value of the three sides of the triangular element. In the following description, an example is used in which the size of a rectangular element is determined based on the radius of the circle circumscribing the element.
The area in which a rectangular is to be placed is a rectangular area representing a cluster one level higher than the rectangle (cluster) to be placed (the arrangement area in this sense is called a graphic area). Therefore, four dummy vertices are placed at appropriate positions in a display space to reserve a rectangular arrangement area and the coordinates of the four vertices v1, v2, v3, and v4 of the arrangement area are set as (−1, −1), (1, −1), (1, 1), and (−1, 1). A diagonal line is drawn between two vertices to generate a triangular mesh consisting of two triangular mesh elements. Because no rectangle is placed at this point, a triangular mesh is generated such that the rectangular arrangement area is divided into two triangles. Each time a rectangle is placed, the center point of that rectangle is added as a new vertex. In this way, the triangular mesh is made finer.
In the initial state, the first rectangle can be placed in anywhere in the arrangement area because no rectangle has been placed previously. In this example, it is assumed that a rectangle is placed at the center of an arrangement area indicated by dummy vertices.
Then, rectangles that represent nodes (hereinafter simply called rectangles) are placed within the triangular mesh one by one in order. As described earlier, the order in which the rectangles are placed is determine based on the coordinate values in a template that specify the locations of the rectangle in the present embodiment. Assume that rectangles r1 and r2 have been placed as shown in
In the present embodiment, triangular elements at positions that are closer to the normalized coordinates on the template are extracted in order and a number of possible positions of a rectangle are set within an extracted triangular mesh element. Then, the rectangle is placed in one of the possible positions. This is repeated in several extracted triangular mesh elements to find one of the possible positions that meets [Criterion 1] and provides the smallest aD+bS. The position is chosen as the position in which the rectangle is to be placed.
These steps are repeated for each triangular mesh element. The center vi+4 of the rectangle is placed at the position Cmin recorded after the repetition is completed. If there is no template, the first rectangle is placed in anywhere and then another rectangle is placed as described below.
A process for expanding an arrangement area in which a rectangle is to be placed will be described below. In the present embodiment, any of the four vertices v1, v2, v3, and v4 of an arrangement area is moved to expand the arrangement area if:
When an aggregate criterion is input into the aggregation unit 100 (step 1603), then the category hierarchy and real data classified by the category hierarchy is read from the data storage 400 to the aggregation unit 100, where pieces of data are collected and added according to a collection criterion (step 1604). As described earlier, the aggregate value of the node of each category is calculated and a summarized aggregate value is calculated by adding aggregate values of its descendant nodes. These values are assigned to the node. The aggregation result is stored in storage means such as the main memory 12 or the storage device 15 shown in
While the aggregation is performed at steps 1603 and 1604 after the nodes are arranged at steps 1601 and 1602 in
The result of aggregation by the aggregation unit 100 and the category hierarchy are input into the filtering unit 200, where filtering is performed by using a filtering criterion (display parameter) and nodes to be displayed are determined (step 1605). The details of the filtering have been describe earlier with reference to
The visualization unit 300 determines the height and color of a bar graph of each node of the category hierarchy that represent attributes of that node according to the result of filtering performed by the filtering unit 200 (step 1606). Then, bar graphs are placed on the nodes to be displayed in the graphics image generated at step 1602 (step 1607). If required, nodes are rearranged and bar graphs are reshaped according to the result of filtering.
After the bar graphs are placed, the generated graphics image is stored in the video memory 13 and output and displayed on the display unit 14 (step 1608).
A user looks at the graphics image displayed on the display unit 14 and changes a filtering criterion (display parameter) and re-generates the graphics image as required. This operation (steps S1605-1608) can be repeated to obtain a desired (easy to look at) graphics image.
The visualization unit 300 of the present embodiment can use a template to restrict the positions in which nodes are to be placed and thereby generate a graphics image as described above. Therefore, when a filtering criterion is changed to re-generate a graphics image, the graphics image previously generated for the same category hierarchy can be used as a template to generate a graphics image in which corresponding nodes are placed at approximately same positions.
The hierarchical data shown in
In the process shown in
An overview of a whole graphics image of a large-scale database can be displayed in visible form by performing filtering in this way to exclude some of the lower-level categories appropriately. And yet, attributes of the excluded lower-level categories can be reflected in the graphics image, without being lost.
For analysis such as data mining, interactive operations such as mouse-clicking a displayed visual object such as a dot, line, or bar to obtain real data corresponding to a node or display a label are useful and essential, besides obtaining insights only from visual features. In order to provide such enhanced functions, it would be effective to reshape a graphics image appropriately to enhance the visibility of the whole image or to enhance the clickability of a visual object of interest, in addition to applying filtering to reduce the number of displayed elements.
Therefore, a GUI (Graphical User Interface) is provided by using a graphics image generated according to the present embodiment. The GUI includes a function for specifying a node (cell or cluster) by clicking a corresponding bar graph displayed in a graphics image or a rectangle representing that node. This function may be provided by adding an event extraction unit 500, for example, to the graphics image generation apparatus of this embodiment shown in
In order to implement an operation that uses the GUI, a preprocess is performed for determining the order in which filtering is focused on nodes to select nodes to be displayed. In particular, the order may be determined as follows.
A threshold for an aggregate value is assumed as filtering criteria and a higher-level node is searched for at which the aggregate value of a lower-level node is summarized when filtering is performed using that threshold. The search is repeated by increasing the threshold progressively. Thus, the located nodes are ordered in the order in which they have been located. That is, the first node located becomes the last in order and the last node located becomes the first. This ordering can be said to indicate the degree of meeting aggregation criteria in the aggregation unit 100. The aggregate values of the ordered parent nodes and summarized aggregate values at those parent nodes are recorded.
Thus, the order in which nodes to be displayed are searched for during filtering is determined. Consequently, if filtering criteria is given, determination as to whether each of the nodes meets the filtering criteria is made by following the order, and thus whether or not the node should be displayed is determined.
It is assumed that a graphics image of hierarchical data to which the preprocess described above was applied has been generated and displayed on the display unit 14 shown in
Furthermore, the preprocess for determining the order in which nodes to be displayed are located as describe above can be used for operations other than GUI operations. For example, after the preprocess, filtering criteria for displaying x nodes that provide top x aggregate values is used to search through hierarchical data from the root node to lower-level nodes in the order determined in the preprocess until the specified x number is reached, then the search is ended and a graphics image is generated by using the nodes located as nodes to be displayed.
In the process for generating a graphics image according to the present embodiment, the combination of filtering of hierarchical data and information visualization as described above can avoid overcrowding of a generated graphics image on a display. Thus, important data can readily attract attention without diverting attention to less important data. However, when zooming out a display of large-scale data in order to provide an overview of the whole data, bar graphs displayed on nodes may become so thin that their heights or colors cannot be identified or they are hard to click, even if less important nodes are excluded from the display through filtering.
Comparing the figures with each other, the graphics image in
Therefore, it is contemplated that a graphics image in which less important nodes are excluded by filtering and consequently becomes uncrowded as a whole is transformed in such a manner that the colors and heights of bar graphs can easily be recognized. For achieving this, the present embodiment proposes the approach of transforming the shape of bar graphs (first approach) and the approach of rearranging nodes according to the number of nodes displayed (second approach).
First Approach: Reshaping Bar Graphs:
In the first approach to making bar graphs more visible, each bar is represented by an inverted quadrangular pyramid whose cross-section area becomes gradually larger toward the top (if the bar is originally a quadrangle). Because the top of the bar is thicker, the representation is easily visible to the user. In addition, because the area of base of the bar is small (the size equivalent to the node), the position of the node in the hierarchical structure can readily be known.
While a bar graph is represented by a quadrangular pyramid in this example, a circular cone or triangular pyramid (cone) may also be used depending on the cross-section area of an original bar.
Second Approach: Rearranging Nodes:
In the second approach to making bar graphs more visible, nodes to be displayed are rearranged so as to bring them closer to each other, thereby reducing the size of a graphics image (the size of the rectangle corresponding to the root node). Then, the graphics image in which the nodes are rearranged is zoomed in to relatively expand the display size of each node. Because the display size of each node is made large, bar graphs are displayed thicker and thus a representation easily visible to a user is provided.
For the purpose of examining a generated graphics image, it is important that rearrangement of nodes does not substantially change the relative positions of the nodes. Since a template is used to generate a graphics image in the visualization unit 300 in the present embodiment, a graphics image before rearrangement can be used as a template to generate a graphics image in which nodes are rearranged to satisfy this requirement.
While a generated graphics image is stored in the video memory 13 and then displayed on the displayed on the display unit 14 in the embodiment described above, the graphics image data stored in the video memory 13 can be used also in a CAD (Computer Aided Design) system.
While the visualization unit 300 nests areas representing layers to generate a graphics image representing a hierarchical structure in the present embodiment, the aggregation process and filtering process according to the present embodiment are also effective in generating various other types of graphics images such as Hyperbolic Tree and Treemap images that can represent hierarchical data.
The present invention can be realized in hardware, software, or a combination of hardware and software. The present invention can be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system—or other apparatus adapted for carrying out the methods described herein—is suitable. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which—when loaded in a computer system—is able to carry out these methods.
Computer program means or computer program in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after conversion to another language, code or notation and/or reproduction in a different material form.
It is noted that the foregoing has outlined some of the more pertinent objects and embodiments of the present invention. This invention may be used for many applications. Thus, although the description is made for particular arrangements and methods, the intent and concept of the invention is suitable and applicable to other arrangements and applications. It will be clear to those skilled in the art that other modifications to the disclosed embodiments can be effected without departing from the spirit and scope of the invention. The described embodiments ought to be construed to be merely illustrative of some of the more prominent features and applications of the invention. Other beneficial results can be realized by applying the disclosed invention in a different manner or modifying the invention in ways known to those familiar with the art.
Number | Date | Country | Kind |
---|---|---|---|
2003-318890 | Sep 2003 | JP | national |