With traditional techniques of visualizing attributes (or variables) of large numbers of data records, it can be difficult to understand patterns or other information of the data records. When a relatively large amount of information is to be visualized, the result can be a cluttered visualization where users have difficulty in understanding the visualized information.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Some embodiments are described with respect to the following figures.
Data records can be collected from various sources. For example, a health insurance company may collect data records regarding payments made to healthcare providers, such as hospitals, at various different times. The hospitals may be located at many different geographic locations, such as different locations within the United States or some other geographic region. A location can be represented as a longitude and latitude, or by some other location indicator in other examples. Note that each hospital can also have multiple diagnostic groups (e.g. cardiology group, neurology group, oncology group, etc.). These diagnostic groups are considered to be located at the same location (the location of the respective hospital).
Although reference is made to data records collected for hospitals, it is noted that in other implementations, data records can relate to other types of information. For example, data records can be collected by a financial company, an energy company, and so forth.
More generally, data records can include both spatial information and temporal information, where spatial information relates to geographical locations associated with the data records, and temporal information relates to time associated with the data records. The spatial information of the data records relate to a geographic attribute (or variable) of the data records, while the temporal information relates to a temporal attribute (or variable) of the data records. In the ensuing discussion, the terms “attribute” and “variable” are interchangeably used. The data records can also include other attributes in addition to the spatial and temporal attributes.
To assist analysts in better understanding various information included in collected data records, a multi-view, multi-attribute geo-based visualization is provided. The “multi-view” feature of the visualization refers to the inclusion of multiple views in the visualization, where the multiple views correspond to respective different time intervals. The multi-attribute (or multivariate) feature of the visualization refers to the ability of the visualization to concurrently present information relating to multiple attributes of the data records. Stated differently, multiple attributes are encoded into the visualization, by using different visual features to represent the different attributes. The geo-based feature of the visualization refers to the ability of the visualization to present indications of locations associated with the data records.
The multi-view, multi-attribute geo-based visualization according to some implementations includes cells that represent items about which the data records contain information. The items associated with the data records can be located at various different geographic locations. In the healthcare industry, the items represented by the cells can include hospitals (or other healthcare providers) to which payments are made by a health insurance company. In some implementations, the items can also represent diagnostic groups within hospitals. In other industries, the items represented by the cells can include other objects, such as wells used for extracting hydrocarbons, retail outlets for selling goods or services, electronic devices relating to delivery of electronic services, and so forth. The multiple views are coordinated with each other, in the sense that they represent the same collection of items, but at different time intervals. For example, a first view can be of hospitals in the United States in a first year, while a second view can be of the same hospitals in the United States in a second year. More generally, multiple coordinated views refer to views of items that share a common geographic extent and/or other attribute(s).
In further implementations, the visualization can also have a multi-focus feature, which allows automatic parallel drilldown into a sub-region of each of the multiple views of the visualization in response to a user drilldown selection of a sub-region (“focus region”) in just a single one of the multiple views. The focus region selected by the drilldown selection allows the user to drilldown into a subset of the data records represented by the focus region, so that the user can obtain a more detailed or closer view of the focus region. By automatically drilling down into multiple focus regions in the multiple views in response to just a user selection of a focus region in just a single view, a more convenient mechanism is provided to allow the user to visually compare the focus regions of the multiple views without the user having to individually select the respective focus regions in the multiple views.
A cell can refer to a graphical element that is used for representing a respective item. A cell can be in the form of a dot or graphical structure of any other shape. A data record can refer to any discrete unit of data that is received by a system. Each data record can have multiple attributes that represent different aspects of an item. As noted above, the multiple attributes can include a spatial attribute (which indicates a geographic location of an item), a temporal attribute (which indicates a time of an item), and other attributes.
Visual indicators can be assigned to the respective cells, based on one or more specific attributes of the data records. The visual indicators assigned to cells can include different colors, such as colors of a color scale 102 depicted in
Note that the color scale 102 can represent a relatively large range of values, such as between 3,000 (minimum payment value) and 100,000 (maximum payment value) in the example of
A higher payment value is represented by a red color, while a lower payment value is represented by a blue or purple color. Payments of intermediate values are represented by other colors, including yellow and green. In other examples, the color assigned to a cell can be based on another attribute(s) in the data records.
Multiple cells can be used to represent a given individual hospital. The number of cells that are used to represent the given hospital can be based on the number of cases of the given hospital. The number of cases of a given hospital is indicated by a number-of-cases attribute in the data records.
The cases of a hospital can refer to the number of patients treated by the hospital, the number of categories of diseases treated by the hospital, or other types of events associated with the hospital. More generally, the number of cells used to represent a respective item can be based on cases associated with the item, where cases of an item can refer to various distinct events associated with the item. In a different example, where items correspond to retail outlets, the number of cases of each retail outlet can indicate the number of products or the number of services sold by the retail outlet.
As discussed further below, the number of cells used to represent a specific item can be based on a normalized number of cases. Normalization of the number of cases is performed to avoid using a very large number of cells to represent an individual item. For example, a large hospital can treat hundreds of thousands of patients in a year. Using hundreds of thousands of cells to represent this large hospital would likely take up a large part of the visualization 100. Normalization can be performed to map the hundreds of thousands of cases to a normalized number, which can be much smaller. More generally, normalization of numbers of cases involves mapping the numbers of cases to respective specific numbers (which are the normalized numbers). Normalization is discussed further below.
The number of cells that correspond to a given item, where the number of cells is based on the number of cases associated with the given item, are grouped into a cluster of cells and included in the visualization 100. For example, a cluster 104 of cells is indicated in the visualization 100, where this cluster of cells can represent a hospital in the Seattle area, for example. The size of a cluster provides an indication of the number of cases corresponding to the hospital that is in the Seattle area.
The cells used to represent respective hospitals can be placed in the visualization 100 without overlap. If two hospitals are located at the same location, then respective clusters of cells representing the two hospitals can be placed in nearby locations (e.g. adjacent each other) so that the clusters of cells do not overlap. Overlapping of cells can lead to occlusion of the visualized information.
To enhance the clarity of the spatial information depicted by the visualization 100, border lines can be added, such as border lines representing states of the United States. These border lines allow a user to more easily determine where a specific item is located. In other examples, border lines can represent other geographic features.
The cells provided in the example visualization 100 allow a user to visualize at least the following attributes: a spatial attribute relating to locations of the hospitals (or diagnostic groups), a payment attribute relating to amounts of payments made to the hospitals, and a number-of-cases attribute indicating the number of cases associated with each hospital. In the example of
The visualization 100 can also include multiple coordinated views that correspond to different time intervals. The multiple views are coordinated in the sense that they represent the same hospitals in the same overall geographic region for the different time intervals. In the example visualization 100 of
In the example of
Also, an animation button 114 can be selected to perform animation of the information that is presented by the visualization 100. If animation is started (such as by a user clicking on the animation button 114 with a user input device), then the visualization 100 successively presents information relating to the different years. For example, when animation is started, the visualization 100 first presents cells representing data records for the year 2011, followed by cells representing data records in the year 2012, and then followed by cells representing data records in the year 2013. During the animation, the slider 112 can be automatically moved to indicate to the user which year is being visualized. In this manner, a user can be able to see the change over time of the visualized information.
In alternative implementations, some of the control elements shown in
The visualization 100 allows a user to easily visualize geospatial patterns relating to cost and care at different hospitals. In this way, the user can quickly identify any anomalies. In the visualization 100, hospitals that are associated with high payments but low numbers of cases may be considered anomalous, since the cost per case in such hospitals may be considered unusually high. A health insurance company may take steps to identify reasons for the high cost per case in such hospitals, and can take steps to address the issue.
In addition, if remedial measures or other policies have been implemented, the multiple views of the visualization 100 can allow the user to see effects of such remedial measures or other policies, by visually comparing the visualized information in the different time intervals (e.g. an interval before implementation of the remedial measures or other policies, and an interval after implementation of the remedial measures or other policies).
The views 100A, 1008, and 100C can be displayed simultaneously, or they can be displayed successively. Also, the views 100A, 1008, and 100C can be displayed in an overlapped fashion, such as shown in
A further feature provided by some implementations is the ability to perform simultaneous drilldown in the multiple views that correspond to different time intervals, in response to interactive user input that provides a drilldown selection into a focus region. In the example of
The simultaneous drilldown capability of some implementations allows the selection of the focus region 202 (in the view 100A) to be also reflected in the other views 1008 and 100C, without the user having to explicitly select focus regions 204 and 206. In other words, in response to selection of the focus region 202 in the view 100A by the user, the focus regions 204 and 206 in the corresponding views 1008 and 100C are automatically selected.
Note that user selection of a focus region can be performed while animation of the different views 100A, 1008, and 100C is occurring. During the animation, the user can select the focus region in the view that is currently being displayed, and the corresponding focus regions in the other views are then automatically selected.
As shown in
The visualization process arranges (at 404) the cells in clusters in the visualization, where a size of each cluster indicates a corresponding number of cases associated with the corresponding item.
In addition, the visualization process presents (at 406) multiple coordinated views of the cells in the visualization. The views correspond to respective different time intervals.
As discussed above, to avoid including too many cells that represent respective cases associated with each item represented in the visualization, normalization can be performed to normalize the numbers of cases. Table 1 below indicates the number of cases associated with each of multiple hospitals (hospital—0 to hospital—10).
In Table 1, the number of cases for hospital—0 is 100, the number of cases for hospital—1 is 200, and so forth. To avoid including too many cells in a visualization according to some implementations, the number of cases are normalized to be within a specific range. For example, the number of cases can be normalized to a range between 1 and 50, where 50 represents the hospital with the highest number of cases, and 1 represents the hospital with the lowest number of cases. Stated differently, one cell is used for representing a hospital with the lowest number of cases, while 50 cells are used for representing the hospital with the highest number of cases. The values between 1 and 50 are mapped to other numbers of cases accordingly.
Table 1 illustrates an example of such mapping, where the numbers of cases in the second column are mapped to respective normalized numbers of cases in the third column. In the example of Table 1, hospital—1 has the largest number of cases (200), while hospital—6 has the lowest number of cases (5). The number of cases (200) is mapped to the normalized number of cases (50), while the lowest number of cases (5) is mapped to the normalized number of cases (1). The other numbers of cases in Table 1 are mapped to other normalized numbers of cases.
The normalization performed according to some implementations can be a non-linear normalization, such as logarithmic or square root normalization.
If the numbers of cases of respective hospitals are bunched together in a small range, then linear normalization may also produce normalized numbers of cases that are bunched together in a small range. For example, if a first hospital has 300 cases, while the remaining hospitals vary between 10 and 100 cases, then a linear normalization would result in 300 being mapped to 50, while the values between 10 and 100 are mapped to values in a small range around the normalized value 20. As a result, it can be difficult to distinguish the numbers of cases of the hospitals indicated by cell cluster sizes in a visualization.
Non-linear normalization can spread out the values between 10 and 100 across a wider range of normalized values. More generally, non-linear normalization seeks to achieve a more even distribution of normalized numbers of cases.
Pseudocode for an example logarithmic normalization for mapping between numbers of cases and normalized numbers of cases is provided below:
In the foregoing, the normalized number of cases computed for a hospital h is represented by normalized_#cases. The parameter #cases_from_h represents the actual number of cases of the hospital. The parameter min represents the minimum actual number of cases of all the hospitals considered, while the parameter max represents the maximum actual number of cases of all the hospitals considered. The parameter maxPixelCell represents the maximum normalized value (e.g. 50 in the foregoing examples).
The processor(s) 504 can be coupled to a network interface 506, which allows the computer system 500 to communicate over a data network. The processor(s) 504 can also be coupled to a storage medium (or storage media) 508, which can store data records 510. A processor can include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device.
The storage medium (or storage media) can be implemented as one or multiple computer-readable or machine-readable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other types of storage devices. Note that the instructions discussed above can be provided on one computer-readable or machine-readable storage medium, or alternatively, can be provided on multiple computer-readable or machine-readable storage media distributed in a large system having possibly plural nodes. Such computer-readable or machine-readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components. The storage medium or media can be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions can be downloaded over a network for execution.
In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.