This application is related to U.S. patent application Ser. No. 11/005,652, titled “Computer Systems and Methods for Visualizing Data with Generation of Marks,” filed Dec. 2, 2004, which is hereby incorporated by reference in its entirety.
The disclosed embodiments relate generally to generating map views of data, and more specifically to adding geographical coding automatically to data to enable generation of map views.
Map views provide user-friendly ways to analyze data by displaying geographical variation of data. Creation of map views may be enabled by adding location fields such as latitude and longitude to a dataset, which is referred to as geocoding the data. Geocoding data presents significant obstacles: users may not know latitude and longitude values for their data, may not have time to perform geocoding, and may not have write permission on their dataset. Furthermore, users may desire to geocode data by multiple geographical levels (e.g., by state, county, and ZIP code).
In some embodiments, a computer-implemented method of generating a map view includes accessing a dataset having multiple records and multiple fields. One or more of the multiple fields are identified as geographical fields. Geographical codes are automatically associated with a first one of the identified geographical fields. A geographical map is generated for the dataset. Generating the geographical map includes generating a first plurality of marks on the geographical map. The first plurality of marks is positioned on the geographical map in accordance with the geographical codes associated with the first one of the identified geographical fields.
In other embodiments, a system for generating a map view includes memory, one or more processors, and one or more programs stored in the memory and configured for execution by the one or more processors. The one or more programs include: instructions to access a dataset having multiple records and multiple fields, instructions to identify one or more of the multiple fields as geographical fields, and instructions to automatically associate geographical codes with a first one of the identified geographical fields. The one or more programs also include instructions to generate a geographical map for the dataset, including instructions to generate a first plurality of marks on the geographical map. The first plurality of marks is positioned on the geographical map in accordance with the geographical codes associated with the first one of the identified geographical fields.
In yet other embodiments, a computer readable storage medium stores one or more programs for use in generating a map view. The one or more programs are configured to be executed by a computer system and include: instructions to access a dataset having multiple records and multiple fields, instructions to identify one or more of the multiple fields as geographical fields, and instructions to automatically associate geographical codes with a first one of the identified geographical fields. The one or more programs also include instructions to generate a geographical map for the dataset, including instructions to generate a first plurality of marks on the geographical map. The first plurality of marks is positioned on the geographical map in accordance with the geographical codes associated with the first one of the identified geographical fields.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
Like reference numerals refer to corresponding parts throughout the drawings.
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that the subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
To generate map views, a user of a data analysis software application first accesses a dataset containing data to be analyzed. The dataset includes multiple records and multiple fields, including fields with data to be analyzed and fields with corresponding geographical information, referred to as geographical fields. The geographical information specifies geographical areas corresponding to the data to be analyzed. The term “geographical area” as used herein can include both geographical regions and geographical locations. For example, the geographical information may include one or more of the following fields: country; state or province; state or provincial capital; county or parish; Metropolitan Statistical Area (MSA); Core Based Statistical Area (CBSA); Designated Market Area (DMA); arbitrarily defined market region; school, congressional, or other district; address; city; street; street number; and ZIP code or other postal code. In some embodiments the geographical information is stored using Federal Information Processing Standards (FIPS) codes. Inclusion of fields specifying geographical areas allows data to be analyzed with respect to the specified geographical areas, thus permitting the user to study variation of raw data, or of parameters calculated from raw data, across the specified geographical areas.
The dataset may be stored in any appropriate arrangement and location. For example, the dataset may be stored in a table or in a database containing multiple tables. The database may be stored locally or remotely.
As used herein, the term “location fields” refers to fields that specify map coordinates corresponding to one or more geographical fields An example of map coordinates is latitude and longitude values, although the map coordinates may be any set of coordinates capable of being mapped. A dataset with location fields is said to be geocoded, and adding location fields to a dataset is referred to as geocoding the dataset. Similarly, adding location fields to a result set generated from a dataset is referred to as geocoding the result set.
To enable creation of map views of data in a dataset, location fields may be automatically added to the dataset or to a result set generated from the dataset. The result set may be generated, for example, by querying the dataset. Adding location fields to a result set allows a user who lacks write permission for the dataset to generate map views of data from the dataset. Adding location fields to a result set instead of a dataset also may offer performance advantages, since a result set may be significantly smaller (e.g., may have significantly fewer records) than its corresponding dataset and thus may be geocoded more quickly than its corresponding dataset. Adding location fields to a result set also allows the level of geocoding to be adjusted dynamically: instead of being limited to those location fields that are included in the dataset, location fields may be added to the result set for any geographical field(s) and the determination of which geographical field(s) the geocoding is to be based on may be made on the fly.
To add location fields to a dataset or corresponding result set automatically, one or more of the original fields in the dataset are identified as geographical fields, as described in detail below. A set of coordinates is accessed that includes coordinates for geographical areas listed in respective records of an identified geographical field. The respective records in the dataset, or corresponding records in the result set, are updated to include coordinates that correspond to respective geographical areas listed in the identified geographical field(s) of the records. For example, the set of coordinates may be stored in a table and a join operation between the table and the dataset or result set may be performed.
In some embodiments, data analysis software for generating map views specifies one or more geographical hierarchies. A geographical hierarchy is an ordered set of levels of geographical information. An example of a geographical hierarchy is (Country, State/Province, County/Parish, City, ZIP Code/Post Code, Street, Street Number). In this example, “Country” is one level, “State/Province” is another level, and so forth. The levels in a geographical hierarchy are arranged in order of increasing detail. However, there may be overlap between successive levels in a geographical hierarchy. For example, in the hierarchy (State, MSA, City), there is potential overlap between the “State” and “MSA” levels, because an MSA may cross state boundaries. Two successive levels of increasing detail are referred to as parent and child levels; the child level is more detailed than the parent level. For example, in the hierarchy (State, County) county is a child level of state and state is a parent level of county. A parent level may have multiple child levels. For example, a zip code level and a county level both may be children of a state level. Associated with each geographical hierarchy is a set of coordinates for geographical areas at the most detailed level of the hierarchy, and possibly also for other levels of the hierarchy.
In some embodiments, identifying a field in a dataset or result set as a geographical field includes determining that the field corresponds to a level in a geographical hierarchy. Once this determination has been made, the coordinates associated with the geographical hierarchy are used to geocode the dataset or result set. In some embodiments, if a dataset or result set includes multiple geographical fields corresponding to multiple respective levels in a geographical hierarchy, the most detailed level in the geographical hierarchy is used to determine the coordinates to be added to the dataset or result set. In some embodiments, the level that has the largest number of distinct values as stored in a local table (e.g., a level table 200 or 220,
In some embodiments, a user may manually identify a field as a geographical field by providing input that specifies that the field is geographical. For example, the user may specify that a field is geographical through a dialog initiated by right-clicking on a listed field in a user interface or by selecting a command from a drop-down menu.
Alternatively, a field may automatically be identified as geographical. Various techniques for automatically identifying a field as a geographical field are available. In some embodiments, the schema of the dataset is queried for metadata specifying that a field is a geographical field. For example, if a field in the dataset has semantic metadata that matches semantic metadata describing a level in a geographical hierarchy, the field is automatically identified as a geographical field that corresponds to the level in the hierarchy. In some embodiments, the name of a field is used to identify the field as a geographical field.
In some embodiments, values in a field are sampled and the sampled values compared to known geographical areas, such as geographical areas associated with various levels in a geographical hierarchy. If at least a predetermined percentage (e.g., 80%) of the sampled values corresponds to known geographical areas (e.g., known geographic areas associated with a particular level in a geographical hierarchy), the field is automatically identified as a geographical field. Automatically identifying a field as a geographical field also may include verifying various expected field attributes, such as verifying that the data type matches an expected type for a particular geographical field and verifying that the width of the field is equal to, or greater than or equal to, an expected width for a particular geographical field.
Attention is now directed to data structures associated with automatic geocoding.
In some embodiments, ambiguities occur when determining which latitude and longitude values in the fields 246 and 248 to add to a record in a dataset or result set. Ambiguity may result from multiple occurrences of a name: for example, two different countries may have identically named states. In some embodiments, once the most detailed geographical level has been identified for geocoding, one or more (e.g., all) parent level fields for the most detailed geographical level are added to the dataset or result set, to allow resolution of ambiguities. For example, if state is determined to be the most detailed level and is used for geocoding, a country field is added to the dataset or result set. If parent-level data (e.g., the country names) is not present, the ambiguity cannot be removed; however, the ambiguity may be arbitrarily resolved using heuristics. In some embodiments, the ambiguity is resolved in favor of the name with the largest number of children. For example, given identically named states, the state with the largest number of children (e.g., with the largest number of postal codes, if postal code is a child of state) is the default state to be used for geocoding.
In some embodiments, a geographical field identified as corresponding to a particular level may include values not listed in the level table for that level. This may occur, for example, in a field that a user specifies as a geographical field. The user may explicitly map values in the field to values in the level table (e.g., the user may specify that “Cal” corresponds to “California”). User-specified mappings may be stored in an appropriate data structure, such as a table or text file. In some embodiments, the geocoding software may attempt to identify correspondences between values. For example, the geocoding software may attempt to identify whether the field includes aliases for values in the level table.
In some embodiments, instead of or in addition to applying heuristics specified in the table 260 (
Once the dataset 130 (
In some embodiments, once the desired data have been specified in the appropriate shelves 302, 304, and 314 and mark specification fields 308, 310, and 312, the user may issue an instruction to generate the map views 320-1 and 320-2. For example, the user may select a “generate map” or “run query” icon (not shown) or an instruction from a drop-down menu (not shown). Alternatively, in some other embodiments a map view or other type of graphical display is automatically generated every time the content of a shelf or mark specification field is modified. In some embodiments, the displayed geographical map in each map view 320 is selected based on the geographical area or areas specified on the level-of-detail shelf 314 and listed in the dataset 130. For example, in the UI 300A, maps of the United States are displayed, since “State” 106, which corresponds to states of the United States as listed in the dataset 100 or 130, is specified on the level-of-detail shelf 314. A mark 322 is displayed for each state for which the dataset 100 or 130 includes inventory data. The size of each mark 322 corresponds to the quantity “SUM(Inventory)” 311, as illustrated in the key 324. In this example, the size of each mark is proportional to the quantity “SUM(Inventory)” 311, such that mark sizes increase with increasing values of “SUM(Inventory)” 311.
In some embodiments, the user does not need to add longitude 301 to the columns shelf 302 and latitude 303 to the rows shelf 304 to specify that a map view is to be generated. Instead, if an identified geographical field (e.g., “State” 106) is added to the level-of-detail shelf 314, the system assumes that a map view is to be generated and automatically uses longitude 301 and latitude 303 to generate the map view.
If a dataset includes multiple identified geographical fields (e.g., market 108, state 106, and city 104 in the dataset 100,
In some embodiments, a user may transition between a map view and another type of graphical display. For example, a user viewing the map views 340-1 and 340-2 in UI 300B (
In some embodiments, one or more databases 420 are stored externally to the computer system 400. For example, the database 420 may be stored on a server in communication with the computer system 400 through a network. A dataset 422 accessed from a server may be cached in the memory 404.
In some embodiments, a dataset is distributed among multiple databases 420.
In some embodiments, a respective geographical hierarchy 424 includes level tables (e.g., tables 200 & 220,
In some embodiments, the geocoding module 430 includes a heuristics table 432 (e.g., a heuristics table 260,
In some embodiments, the map generation module 440 includes a map drawing module 442 for selecting and generating a geographical map for display; a mark generation module 444 for determining mark types, appearances, and locations and generating corresponding marks on the geographical map; and a database query module 446 for querying a dataset 422 for data to display.
In some embodiments, instructions corresponding to all or a portion of the map generation module 440 and/or geocoding module 430 are stored at and executed by a server that transmits the results to the computer system 400 for display.
In some embodiments, the combination of the geocoding module 430 and the map generation module 440 includes instructions to perform the method 500 (
Each of the above identified elements 416-446 in
In the method 500, a dataset having multiple records and multiple fields (e.g., a dataset 100,
One or more of the multiple fields are identified (504) as geographical fields.
In some embodiments, at least one of the multiple fields is determined (505) to correspond to a level in a geographical hierarchy.
In some embodiments, user input is received (506) specifying that one of the multiple fields is a geographical field. For example, the user may specify that a field is geographical through a dialog initiated by right-clicking on a listed field in a user interface or by selecting a command from a drop-down menu.
In some embodiments, at least one of the multiple fields is automatically identified (507) as a geographical field. In some embodiments, automatically identifying at least one of the multiple fields as a geographical field includes querying a schema of the dataset for metadata (e.g., semantic metadata) specifying a geographical field. For example, an Application Programming Interface (API) call may be performed to query the dataset for metadata specifying a geographical field. In some embodiments, automatically identifying at least one of the multiple fields as a geographical field further includes determining that the metadata specifying the geographical field corresponds to a level in a geographical hierarchy. For example, the name of a field may match a pattern specified in the pattern field 266 of the heuristics table 260 (
In some embodiments, automatically identifying at least one of the multiple fields as a geographical field includes taking a sample (e.g., a random sample) of values in at least one of the multiple fields and verifying that at least a predefined percentage (e.g., 80%) of the sample (i.e., of the sampled values) corresponds to geographical areas. For example, the sampled values may be compared to geographical areas listed in level tables (e.g., the state level table 200,
In some embodiments, automatically identifying at least one of the multiple fields as a geographical field includes verifying that the field has an expected data type and a width at least as wide as a specified minimum width or, alternatively, equal to a specified width. The data type and width may be specified, for example, in the data type field 268 and width field 270 of the heuristics table 260 (
One or more geographical codes are automatically associated (508) with a first one of the identified geographical fields.
In some embodiments, location fields are added (510) to the dataset. The location fields specify map coordinates that correspond to respective values of the geographical field of respective records in the dataset. An example of location fields is latitude and longitude fields (e.g., latitude field 116 and longitude field 118,
In some embodiments, location fields are added (511) to a result set (e.g., result set 280,
In some embodiments, identified geographical fields correspond to respective levels in a geographical hierarchy and the first one of the identified geographical fields corresponds to a respective level having a greatest level of detail of the respective levels. In some embodiments, identified geographical fields have respective detail numbers and the first one of the identified geographical fields has a highest detail number of the respective detail numbers, indicating that the first one of the identified geographical fields is the most relevant geographical field.
A geographical map (e.g., the maps in map views 320-1 and 320-2,
In some embodiments, a user request is received (516,
In response to the user request, the first plurality of marks is removed (518) from the geographical map. For example, in the UI 300C (
The method 500 provides a user-friendly way to generate a map view from a dataset that is not geocoded, and thus spares users from having to geocode data explicitly. In some embodiments, the method 500 also provides a user-friendly way to vary the level of geocoding in a map view. While the method 500 includes a number of operations that appear to occur in a specific order, it should be apparent that the method 500 can include more or fewer operations, which can be executed serially or in parallel (e.g., using parallel processors or a multi-threading environment), an order of two or more operations may be changed, and/or two or more operations may be combined into a single operation.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
Number | Name | Date | Kind |
---|---|---|---|
5383029 | Kojima | Jan 1995 | A |
5533107 | Irwin et al. | Jul 1996 | A |
5581677 | Myers et al. | Dec 1996 | A |
5864856 | Young | Jan 1999 | A |
6058179 | Shaffer et al. | May 2000 | A |
6301579 | Becker | Oct 2001 | B1 |
6574554 | Beesley et al. | Jun 2003 | B1 |
6661884 | Shaffer et al. | Dec 2003 | B2 |
6674445 | Chithambaram et al. | Jan 2004 | B1 |
6725230 | Ruth et al. | Apr 2004 | B2 |
6750864 | Anwar | Jun 2004 | B1 |
6928436 | Baudel | Aug 2005 | B2 |
6954764 | Biswas et al. | Oct 2005 | B2 |
7035843 | Bellamkonda et al. | Apr 2006 | B1 |
7089266 | Stolte et al. | Aug 2006 | B2 |
7162484 | Grobler et al. | Jan 2007 | B2 |
7379601 | Yang et al. | May 2008 | B2 |
7467109 | Simon et al. | Dec 2008 | B1 |
7467125 | Khatchatrian et al. | Dec 2008 | B2 |
7480663 | Colossi et al. | Jan 2009 | B2 |
7483880 | Rossi et al. | Jan 2009 | B2 |
7499046 | Wright et al. | Mar 2009 | B1 |
7546312 | Xu et al. | Jun 2009 | B1 |
7559023 | Hays et al. | Jul 2009 | B2 |
7659895 | Kandogan | Feb 2010 | B2 |
7703028 | Tomlyn | Apr 2010 | B2 |
7707143 | Bruce et al. | Apr 2010 | B2 |
7707490 | Hays et al. | Apr 2010 | B2 |
7716167 | Colossi et al. | May 2010 | B2 |
7716173 | Stolte et al. | May 2010 | B2 |
7747598 | Buron et al. | Jun 2010 | B2 |
7756907 | Stolte et al. | Jul 2010 | B2 |
7778993 | Sacco | Aug 2010 | B2 |
7800613 | Hanrahan et al. | Sep 2010 | B2 |
7912837 | Buron et al. | Mar 2011 | B2 |
8032484 | Godoy et al. | Oct 2011 | B2 |
8055691 | Delgaudio et al. | Nov 2011 | B2 |
8117245 | DelGaudio et al. | Feb 2012 | B2 |
20020057283 | Biswas et al. | May 2002 | A1 |
20040172466 | Douglas et al. | Sep 2004 | A1 |
20040183800 | Peterson | Sep 2004 | A1 |
20040243593 | Stolte et al. | Dec 2004 | A1 |
20050060300 | Stolte et al. | Mar 2005 | A1 |
20050099423 | Brauss | May 2005 | A1 |
20050102101 | Beesley et al. | May 2005 | A1 |
20050108057 | Cohen et al. | May 2005 | A1 |
20050228688 | Visser et al. | Oct 2005 | A1 |
20050261928 | Skeadas | Nov 2005 | A1 |
20060069693 | DelGaudio et al. | Mar 2006 | A1 |
20060206512 | Hanrahan et al. | Sep 2006 | A1 |
20060211404 | Cromp et al. | Sep 2006 | A1 |
20060259509 | Stolte et al. | Nov 2006 | A1 |
20070061611 | Mackinlay et al. | Mar 2007 | A1 |
20070239527 | Nazer et al. | Oct 2007 | A1 |
20070288312 | Wang | Dec 2007 | A1 |
20070294275 | Delgaudio et al. | Dec 2007 | A1 |
20080167794 | Fuchs et al. | Jul 2008 | A1 |
20080176583 | Brachet et al. | Jul 2008 | A1 |
20080177464 | Fuchs et al. | Jul 2008 | A1 |
20080215524 | Fuchs et al. | Sep 2008 | A1 |
20080243876 | Godoy et al. | Oct 2008 | A1 |
20090085806 | Piersol et al. | Apr 2009 | A1 |
20090262131 | Suntinger et al. | Oct 2009 | A1 |
20090319891 | MacKinlay et al. | Dec 2009 | A1 |
Number | Date | Country | |
---|---|---|---|
20090319556 A1 | Dec 2009 | US |